* [PATCH v5 00/12] box: indexes by JSON path
@ 2018-10-29 6:56 Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 01/12] box: refactor key_def_find routine Kirill Shcherbatov
` (11 more replies)
0 siblings, 12 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-indexes
https://github.com/tarantool/tarantool/issues/1012
Sometimes field data could have complex document structure.
When this structure is consistent across whole document,
you are able to create an index by JSON path.
New JSON tree class is used to manage tuple_field(s) defined for
format. This allows to work with fields in unified way.
To speed-up data access by JSON index key_part structure extended
with offset_slot cache that points to field_map item containing
data offset for current tuple. Initialization of the field map is
done by traversing the tree to detect vertices that are missing
in the msgpack.
Introduced offset_slot_cache in key_part to tune data access
for typical scenario of using tuples that have same format.
Kirill Shcherbatov (12):
box: refactor key_def_find routine
box: introduce key_def_parts_are_sequential
box: introduce tuple_field_go_to_path
box: introduce tuple_format_add_key_part
lib: implement JSON tree class for json library
box: manage format fields with JSON tree class
lib: introduce json_path_normalize routine
box: introduce JSON indexes
box: introduce has_json_paths flag in templates
box: tune tuple_field_raw_by_path for indexed data
box: introduce offset slot cache in key_part
box: specify indexes in user-friendly form
src/box/alter.cc | 14 +-
src/box/blackhole.c | 5 +-
src/box/engine.h | 11 +-
src/box/errcode.h | 2 +-
src/box/index_def.c | 6 +-
src/box/key_def.c | 219 ++++++++++--
src/box/key_def.h | 48 ++-
src/box/lua/index.c | 59 ++++
src/box/lua/schema.lua | 22 +-
src/box/lua/space.cc | 5 +
src/box/memtx_engine.c | 6 +-
src/box/memtx_space.c | 5 +-
src/box/memtx_space.h | 2 +-
src/box/schema.cc | 4 +-
src/box/space.c | 6 +-
src/box/space.h | 8 +-
src/box/sql.c | 17 +-
src/box/sql/build.c | 8 +-
src/box/sql/pragma.c | 2 +-
src/box/sql/select.c | 6 +-
src/box/sql/where.c | 1 +
src/box/sysview.c | 3 +-
src/box/tuple.c | 40 +--
src/box/tuple_compare.cc | 125 +++++--
src/box/tuple_extract_key.cc | 131 ++++---
src/box/tuple_format.c | 730 +++++++++++++++++++++++++++++++---------
src/box/tuple_format.h | 76 ++++-
src/box/tuple_hash.cc | 47 ++-
src/box/vinyl.c | 9 +-
src/box/vy_log.c | 3 +-
src/box/vy_lsm.c | 5 +-
src/box/vy_point_lookup.c | 2 -
src/box/vy_stmt.c | 168 +++++++--
src/lib/json/CMakeLists.txt | 2 +
src/lib/json/path.c | 25 ++
src/lib/json/path.h | 18 +
src/lib/json/tree.c | 327 ++++++++++++++++++
src/lib/json/tree.h | 224 ++++++++++++
test-run | 2 +-
test/box/misc.result | 1 +
test/engine/tuple.result | 424 +++++++++++++++++++++++
test/engine/tuple.test.lua | 121 +++++++
test/unit/json_path.c | 250 +++++++++++++-
test/unit/json_path.result | 54 ++-
test/unit/vy_iterators_helper.c | 6 +-
test/unit/vy_mem.c | 2 +-
test/unit/vy_point_lookup.c | 2 +-
47 files changed, 2826 insertions(+), 427 deletions(-)
create mode 100644 src/lib/json/tree.c
create mode 100644 src/lib/json/tree.h
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 01/12] box: refactor key_def_find routine
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
2018-11-19 17:48 ` Vladimir Davydov
2018-10-29 6:56 ` [PATCH v5 10/12] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
` (10 subsequent siblings)
11 siblings, 1 reply; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
Refactored key_def_find routine to use key_part as a second
argument. Introduced key_def_find_by_fieldno helper to use in
scenarios where no key_part exists.
New API is more convenient for complex key_part that will appear
with JSON paths introduction.
Need for #1012
---
src/box/alter.cc | 7 ++++---
src/box/key_def.c | 19 ++++++++++++++-----
src/box/key_def.h | 9 ++++++++-
src/box/sql/build.c | 2 +-
src/box/sql/pragma.c | 2 +-
5 files changed, 28 insertions(+), 11 deletions(-)
diff --git a/src/box/alter.cc b/src/box/alter.cc
index a6bb5a0..fc4d44e 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -3952,9 +3952,10 @@ on_replace_dd_fk_constraint(struct trigger * /* trigger*/, void *event)
continue;
uint32_t j;
for (j = 0; j < fk_def->field_count; ++j) {
- if (key_def_find(idx->def->key_def,
- fk_def->links[j].parent_field)
- == NULL)
+ if (key_def_find_by_fieldno(idx->def->key_def,
+ fk_def->links[j].
+ parent_field) ==
+ NULL)
break;
}
if (j != fk_def->field_count)
diff --git a/src/box/key_def.c b/src/box/key_def.c
index 3a560bb..2119ca3 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -519,12 +519,21 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
}
const struct key_part *
-key_def_find(const struct key_def *key_def, uint32_t fieldno)
+key_def_find_by_fieldno(const struct key_def *key_def, uint32_t fieldno)
+{
+ struct key_part part;
+ memset(&part, 0, sizeof(struct key_part));
+ part.fieldno = fieldno;
+ return key_def_find(key_def, &part);
+}
+
+const struct key_part *
+key_def_find(const struct key_def *key_def, const struct key_part *to_find)
{
const struct key_part *part = key_def->parts;
const struct key_part *end = part + key_def->part_count;
for (; part != end; part++) {
- if (part->fieldno == fieldno)
+ if (part->fieldno == to_find->fieldno)
return part;
}
return NULL;
@@ -536,7 +545,7 @@ key_def_contains(const struct key_def *first, const struct key_def *second)
const struct key_part *part = second->parts;
const struct key_part *end = part + second->part_count;
for (; part != end; part++) {
- if (key_def_find(first, part->fieldno) == NULL)
+ if (key_def_find(first, part) == NULL)
return false;
}
return true;
@@ -553,7 +562,7 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
const struct key_part *part = second->parts;
const struct key_part *end = part + second->part_count;
for (; part != end; part++) {
- if (key_def_find(first, part->fieldno))
+ if (key_def_find(first, part) != NULL)
--new_part_count;
}
@@ -584,7 +593,7 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
part = second->parts;
end = part + second->part_count;
for (; part != end; part++) {
- if (key_def_find(first, part->fieldno))
+ if (key_def_find(first, part) != NULL)
continue;
key_def_set_part(new_def, pos++, part->fieldno, part->type,
part->nullable_action, part->coll,
diff --git a/src/box/key_def.h b/src/box/key_def.h
index 20e79f9..cfed3f1 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -320,7 +320,14 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
* If fieldno is not in index_def->parts returns NULL.
*/
const struct key_part *
-key_def_find(const struct key_def *key_def, uint32_t fieldno);
+key_def_find_by_fieldno(const struct key_def *key_def, uint32_t fieldno);
+
+/**
+ * Returns the part in index_def->parts for the specified key part.
+ * If to_find is not in index_def->parts returns NULL.
+ */
+const struct key_part *
+key_def_find(const struct key_def *key_def, const struct key_part *to_find);
/**
* Check if key definition @a first contains all parts of
diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index fb5d61c..7556a78 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -145,7 +145,7 @@ sql_space_column_is_in_pk(struct space *space, uint32_t column)
if (column < 63)
return (pk_mask & (((uint64_t) 1) << column)) != 0;
else if ((pk_mask & (((uint64_t) 1) << 63)) != 0)
- return key_def_find(key_def, column) != NULL;
+ return key_def_find_by_fieldno(key_def, column) != NULL;
return false;
}
diff --git a/src/box/sql/pragma.c b/src/box/sql/pragma.c
index 9ece82a..cc99eb1 100644
--- a/src/box/sql/pragma.c
+++ b/src/box/sql/pragma.c
@@ -273,7 +273,7 @@ sql_pragma_table_info(struct Parse *parse, const char *tbl_name)
k = 1;
} else {
struct key_def *kdef = pk->def->key_def;
- k = key_def_find(kdef, i) - kdef->parts + 1;
+ k = key_def_find_by_fieldno(kdef, i) - kdef->parts + 1;
}
sqlite3VdbeMultiLoad(v, 1, "issisi", i, field->name,
field_type_strs[field->type],
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 10/12] box: tune tuple_field_raw_by_path for indexed data
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 01/12] box: refactor key_def_find routine Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 11/12] box: introduce offset slot cache in key_part Kirill Shcherbatov
` (9 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
We don't need to parse tuple in tuple_field_raw_by_path if
required field has been indexed. We do path lookup in field
tree of JSON paths and return data by it's offset from field_map
instead of whole tuple parsing.
Part of #1012
---
src/box/tuple_format.c | 32 +++++++++++++++++++++++---------
test/engine/tuple.result | 5 +++++
test/engine/tuple.test.lua | 2 ++
3 files changed, 30 insertions(+), 9 deletions(-)
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 151d9e5..920968c 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -928,15 +928,12 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
goto error;
switch(node.type) {
case JSON_PATH_NUM: {
- int index = node.num;
- if (index == 0) {
+ fieldno = node.num;
+ if (fieldno == 0) {
*field = NULL;
return 0;
}
- index -= TUPLE_INDEX_BASE;
- *field = tuple_field_raw(format, tuple, field_map, index);
- if (*field == NULL)
- return 0;
+ fieldno -= TUPLE_INDEX_BASE;
break;
}
case JSON_PATH_STR: {
@@ -953,9 +950,8 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
*/
name_hash = field_name_hash(node.str, node.len);
}
- *field = tuple_field_raw_by_name(format, tuple, field_map,
- node.str, node.len, name_hash);
- if (*field == NULL)
+ if (tuple_fieldno_by_name(format->dict, node.str, node.len,
+ name_hash, &fieldno) != 0)
return 0;
break;
}
@@ -964,6 +960,24 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
*field = NULL;
return 0;
}
+ /* Optimize indexed JSON field data access. */
+ assert(field != NULL);
+ struct tuple_field *indexed_field =
+ unlikely(fieldno >= tuple_format_field_count(format)) ? NULL :
+ tuple_format_field_by_path(format,
+ tuple_format_field(format, fieldno),
+ path + parser.offset,
+ path_len - parser.offset);
+ if (indexed_field != NULL &&
+ indexed_field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
+ *field = tuple + field_map[indexed_field->offset_slot];
+ return 0;
+ }
+
+ /* No such field in index. Continue parsing JSON path. */
+ *field = tuple_field_raw(format, tuple, field_map, fieldno);
+ if (*field == NULL)
+ return 0;
rc = tuple_field_go_to_path(field, path + parser.offset,
path_len - parser.offset);
if (rc == 0)
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index 9a1ceb8..92927a0 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -1148,6 +1148,11 @@ assert(idx2 ~= nil)
t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
---
...
+-- Test field_map in tuple speed-up access by indexed path.
+t["[3][\"FIO\"][\"fname\"]"]
+---
+- Agent
+...
idx:select()
---
- - [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua
index f1000dd..9e6807e 100644
--- a/test/engine/tuple.test.lua
+++ b/test/engine/tuple.test.lua
@@ -367,6 +367,8 @@ s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["
idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
assert(idx2 ~= nil)
t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+-- Test field_map in tuple speed-up access by indexed path.
+t["[3][\"FIO\"][\"fname\"]"]
idx:select()
idx:min()
idx:max()
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 11/12] box: introduce offset slot cache in key_part
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 01/12] box: refactor key_def_find routine Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 10/12] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
2018-11-01 13:32 ` [tarantool-patches] " Konstantin Osipov
2018-10-29 6:56 ` [PATCH v5 12/12] box: specify indexes in user-friendly form Kirill Shcherbatov
` (8 subsequent siblings)
11 siblings, 1 reply; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
Same key_part could be used in different formats multiple
times, so different field->offset_slot would be allocated.
In most scenarios we work with series of tuples of same
format, and (in general) format lookup for field would be
expensive operation for JSON-paths defined in key_part.
New offset_slot_cache field in key_part structure and epoch-based
mechanism to validate it's actuality should be effective
approach to improve performance.
Part of #1012
---
src/box/alter.cc | 7 +++--
src/box/blackhole.c | 5 ++--
src/box/engine.h | 11 ++++----
src/box/key_def.c | 12 ++++++--
src/box/key_def.h | 8 ++++++
src/box/memtx_engine.c | 4 +--
src/box/memtx_space.c | 5 ++--
src/box/memtx_space.h | 2 +-
src/box/schema.cc | 4 +--
src/box/space.c | 6 ++--
src/box/space.h | 8 ++++--
src/box/sysview.c | 3 +-
src/box/tuple.c | 4 +--
src/box/tuple_format.c | 62 ++++++++++++++++++++++++++---------------
src/box/tuple_format.h | 6 +++-
src/box/vinyl.c | 7 +++--
src/box/vy_lsm.c | 5 ++--
test/unit/vy_iterators_helper.c | 6 ++--
test/unit/vy_mem.c | 2 +-
test/unit/vy_point_lookup.c | 2 +-
20 files changed, 108 insertions(+), 61 deletions(-)
diff --git a/src/box/alter.cc b/src/box/alter.cc
index fc4d44e..261b4b2 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -869,7 +869,10 @@ alter_space_do(struct txn *txn, struct alter_space *alter)
* Create a new (empty) space for the new definition.
* Sic: the triggers are not moved over yet.
*/
- alter->new_space = space_new_xc(alter->space_def, &alter->key_list);
+ alter->new_space =
+ space_new_xc(alter->space_def, &alter->key_list,
+ alter->old_space->format != NULL ?
+ alter->old_space->format->epoch + 1 : 1);
/*
* Copy the replace function, the new space is at the same recovery
* phase as the old one. This hack is especially necessary for
@@ -1680,7 +1683,7 @@ on_replace_dd_space(struct trigger * /* trigger */, void *event)
access_check_ddl(def->name, def->id, def->uid, SC_SPACE,
PRIV_C, true);
RLIST_HEAD(empty_list);
- struct space *space = space_new_xc(def, &empty_list);
+ struct space *space = space_new_xc(def, &empty_list, 0);
/**
* The new space must be inserted in the space
* cache right away to achieve linearisable
diff --git a/src/box/blackhole.c b/src/box/blackhole.c
index 88dab3f..efd70f9 100644
--- a/src/box/blackhole.c
+++ b/src/box/blackhole.c
@@ -138,7 +138,7 @@ blackhole_engine_shutdown(struct engine *engine)
static struct space *
blackhole_engine_create_space(struct engine *engine, struct space_def *def,
- struct rlist *key_list)
+ struct rlist *key_list, uint64_t epoch)
{
if (!rlist_empty(key_list)) {
diag_set(ClientError, ER_UNSUPPORTED, "Blackhole", "indexes");
@@ -155,7 +155,8 @@ blackhole_engine_create_space(struct engine *engine, struct space_def *def,
/* Allocate tuples on runtime arena, but check space format. */
struct tuple_format *format;
format = tuple_format_new(&tuple_format_runtime->vtab, NULL, 0,
- def->fields, def->field_count, def->dict);
+ def->fields, def->field_count, def->dict,
+ epoch);
if (format == NULL) {
free(space);
return NULL;
diff --git a/src/box/engine.h b/src/box/engine.h
index 5b96c74..0e8c76c 100644
--- a/src/box/engine.h
+++ b/src/box/engine.h
@@ -72,7 +72,8 @@ struct engine_vtab {
void (*shutdown)(struct engine *);
/** Allocate a new space instance. */
struct space *(*create_space)(struct engine *engine,
- struct space_def *def, struct rlist *key_list);
+ struct space_def *def, struct rlist *key_list,
+ uint64_t epoch);
/**
* Write statements stored in checkpoint @vclock to @stream.
*/
@@ -237,9 +238,9 @@ engine_find(const char *name)
static inline struct space *
engine_create_space(struct engine *engine, struct space_def *def,
- struct rlist *key_list)
+ struct rlist *key_list, uint64_t epoch)
{
- return engine->vtab->create_space(engine, def, key_list);
+ return engine->vtab->create_space(engine, def, key_list, epoch);
}
static inline int
@@ -390,9 +391,9 @@ engine_find_xc(const char *name)
static inline struct space *
engine_create_space_xc(struct engine *engine, struct space_def *def,
- struct rlist *key_list)
+ struct rlist *key_list, uint64_t epoch)
{
- struct space *space = engine_create_space(engine, def, key_list);
+ struct space *space = engine_create_space(engine, def, key_list, epoch);
if (space == NULL)
diag_raise();
return space;
diff --git a/src/box/key_def.c b/src/box/key_def.c
index 0043f4e..17071fd 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -189,6 +189,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
def->parts[part_no].coll = coll;
def->parts[part_no].coll_id = coll_id;
def->parts[part_no].sort_order = sort_order;
+ def->parts[part_no].offset_slot_cache = TUPLE_OFFSET_SLOT_NIL;
+ def->parts[part_no].format_cache = NULL;
if (path != NULL) {
def->parts[part_no].path_len = path_len;
assert(def->parts[part_no].path != NULL);
@@ -722,10 +724,13 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
new_def->parts[pos].path = data;
data += part->path_len + 1;
}
- key_def_set_part(new_def, pos++, part->fieldno, part->type,
+ key_def_set_part(new_def, pos, part->fieldno, part->type,
part->nullable_action, part->coll,
part->coll_id, part->sort_order, part->path,
part->path_len);
+ new_def->parts[pos].offset_slot_cache = part->offset_slot_cache;
+ new_def->parts[pos].format_cache = part->format_cache;
+ pos++;
}
/* Set-append second key def's part to the new key def. */
@@ -738,10 +743,13 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
new_def->parts[pos].path = data;
data += part->path_len + 1;
}
- key_def_set_part(new_def, pos++, part->fieldno, part->type,
+ key_def_set_part(new_def, pos, part->fieldno, part->type,
part->nullable_action, part->coll,
part->coll_id, part->sort_order, part->path,
part->path_len);
+ new_def->parts[pos].offset_slot_cache = part->offset_slot_cache;
+ new_def->parts[pos].format_cache = part->format_cache;
+ pos++;
}
key_def_set_cmp(new_def);
return new_def;
diff --git a/src/box/key_def.h b/src/box/key_def.h
index 5c0dfe3..8699487 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -101,6 +101,14 @@ struct key_part {
char *path;
/** The length of JSON path. */
uint32_t path_len;
+ /**
+ * Source format for offset_slot_cache actuality
+ * validations. Cache is expected to use "the format with
+ * the newest epoch is most relevant" strategy.
+ */
+ struct tuple_format *format_cache;
+ /** Cache with format's field offset slot. */
+ int32_t offset_slot_cache;
};
struct key_def;
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index c572751..fc9f019 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -358,10 +358,10 @@ memtx_engine_end_recovery(struct engine *engine)
static struct space *
memtx_engine_create_space(struct engine *engine, struct space_def *def,
- struct rlist *key_list)
+ struct rlist *key_list, uint64_t epoch)
{
struct memtx_engine *memtx = (struct memtx_engine *)engine;
- return memtx_space_new(memtx, def, key_list);
+ return memtx_space_new(memtx, def, key_list, epoch);
}
static int
diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
index 1a248b7..d57dad3 100644
--- a/src/box/memtx_space.c
+++ b/src/box/memtx_space.c
@@ -955,7 +955,7 @@ static const struct space_vtab memtx_space_vtab = {
struct space *
memtx_space_new(struct memtx_engine *memtx,
- struct space_def *def, struct rlist *key_list)
+ struct space_def *def, struct rlist *key_list, uint64_t epoch)
{
struct memtx_space *memtx_space = malloc(sizeof(*memtx_space));
if (memtx_space == NULL) {
@@ -981,7 +981,8 @@ memtx_space_new(struct memtx_engine *memtx,
struct tuple_format *format =
tuple_format_new(&memtx_tuple_format_vtab, keys, key_count,
- def->fields, def->field_count, def->dict);
+ def->fields, def->field_count, def->dict,
+ epoch);
if (format == NULL) {
free(memtx_space);
return NULL;
diff --git a/src/box/memtx_space.h b/src/box/memtx_space.h
index 7dc3410..b5bec0c 100644
--- a/src/box/memtx_space.h
+++ b/src/box/memtx_space.h
@@ -79,7 +79,7 @@ memtx_space_replace_all_keys(struct space *, struct tuple *, struct tuple *,
struct space *
memtx_space_new(struct memtx_engine *memtx,
- struct space_def *def, struct rlist *key_list);
+ struct space_def *def, struct rlist *key_list, uint64_t epoch);
#if defined(__cplusplus)
} /* extern "C" */
diff --git a/src/box/schema.cc b/src/box/schema.cc
index 8625d92..865e07e 100644
--- a/src/box/schema.cc
+++ b/src/box/schema.cc
@@ -283,7 +283,7 @@ sc_space_new(uint32_t id, const char *name,
struct rlist key_list;
rlist_create(&key_list);
rlist_add_entry(&key_list, index_def, link);
- struct space *space = space_new_xc(def, &key_list);
+ struct space *space = space_new_xc(def, &key_list, 0);
space_cache_replace(NULL, space);
if (replace_trigger)
trigger_add(&space->on_replace, replace_trigger);
@@ -495,7 +495,7 @@ schema_init()
space_def_delete(def);
});
RLIST_HEAD(key_list);
- struct space *space = space_new_xc(def, &key_list);
+ struct space *space = space_new_xc(def, &key_list, 0);
space_cache_replace(NULL, space);
init_system_space(space);
trigger_run_xc(&on_alter_space, space);
diff --git a/src/box/space.c b/src/box/space.c
index 548f667..99badf6 100644
--- a/src/box/space.c
+++ b/src/box/space.c
@@ -183,18 +183,18 @@ fail:
}
struct space *
-space_new(struct space_def *def, struct rlist *key_list)
+space_new(struct space_def *def, struct rlist *key_list, uint64_t epoch)
{
struct engine *engine = engine_find(def->engine_name);
if (engine == NULL)
return NULL;
- return engine_create_space(engine, def, key_list);
+ return engine_create_space(engine, def, key_list, epoch);
}
struct space *
space_new_ephemeral(struct space_def *def, struct rlist *key_list)
{
- struct space *space = space_new(def, key_list);
+ struct space *space = space_new(def, key_list, 0);
if (space == NULL)
return NULL;
space->def->opts.is_temporary = true;
diff --git a/src/box/space.h b/src/box/space.h
index f3e9e1e..ad2bc64 100644
--- a/src/box/space.h
+++ b/src/box/space.h
@@ -417,10 +417,11 @@ struct field_def;
* Allocate and initialize a space.
* @param space_def Space definition.
* @param key_list List of index_defs.
+ * @param epoch Last epoch to initialize format.
* @retval Space object.
*/
struct space *
-space_new(struct space_def *space_def, struct rlist *key_list);
+space_new(struct space_def *space_def, struct rlist *key_list, uint64_t epoch);
/**
* Create an ephemeral space.
@@ -471,9 +472,10 @@ int generic_space_prepare_alter(struct space *, struct space *);
} /* extern "C" */
static inline struct space *
-space_new_xc(struct space_def *space_def, struct rlist *key_list)
+space_new_xc(struct space_def *space_def, struct rlist *key_list,
+ uint64_t epoch)
{
- struct space *space = space_new(space_def, key_list);
+ struct space *space = space_new(space_def, key_list, epoch);
if (space == NULL)
diag_raise();
return space;
diff --git a/src/box/sysview.c b/src/box/sysview.c
index ed5bca3..f18d9cd 100644
--- a/src/box/sysview.c
+++ b/src/box/sysview.c
@@ -507,8 +507,9 @@ sysview_engine_shutdown(struct engine *engine)
static struct space *
sysview_engine_create_space(struct engine *engine, struct space_def *def,
- struct rlist *key_list)
+ struct rlist *key_list, uint64_t epoch)
{
+ (void)epoch;
struct space *space = (struct space *)calloc(1, sizeof(*space));
if (space == NULL) {
diag_set(OutOfMemory, sizeof(*space),
diff --git a/src/box/tuple.c b/src/box/tuple.c
index 62e06e7..d8cf517 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -205,7 +205,7 @@ tuple_init(field_name_hash_f hash)
* Create a format for runtime tuples
*/
tuple_format_runtime = tuple_format_new(&tuple_format_runtime_vtab,
- NULL, 0, NULL, 0, NULL);
+ NULL, 0, NULL, 0, NULL, 0);
if (tuple_format_runtime == NULL)
return -1;
@@ -377,7 +377,7 @@ box_tuple_format_new(struct key_def **keys, uint16_t key_count)
{
box_tuple_format_t *format =
tuple_format_new(&tuple_format_runtime_vtab,
- keys, key_count, NULL, 0, NULL);
+ keys, key_count, NULL, 0, NULL, 0);
if (format != NULL)
tuple_format_ref(format);
return format;
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 920968c..3c939cb 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -451,7 +451,8 @@ tuple_format_delete(struct tuple_format *format)
struct tuple_format *
tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
uint16_t key_count, const struct field_def *space_fields,
- uint32_t space_field_count, struct tuple_dictionary *dict)
+ uint32_t space_field_count, struct tuple_dictionary *dict,
+ uint64_t epoch)
{
struct tuple_format *format =
tuple_format_alloc(keys, key_count, space_field_count, dict);
@@ -460,6 +461,7 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
format->vtab = *vtab;
format->engine = NULL;
format->is_temporary = false;
+ format->epoch = epoch;
if (tuple_format_register(format) < 0) {
tuple_format_destroy(format);
free(format);
@@ -999,29 +1001,43 @@ tuple_field_by_part_raw(struct tuple_format *format, const char *data,
if (likely(part->path == NULL))
return tuple_field_raw(format, data, field_map, part->fieldno);
- uint32_t field_count = tuple_format_field_count(format);
- struct tuple_field *root_field =
- likely(part->fieldno < field_count) ?
- tuple_format_field(format, part->fieldno) : NULL;
- struct tuple_field *field =
- unlikely(root_field == NULL) ? NULL:
- tuple_format_field_by_path(format, root_field, part->path,
- part->path_len);
- if (unlikely(field == NULL)) {
- /*
- * Legacy tuple having no field map for JSON
- * index require full path parse.
- */
- const char *field_raw =
- tuple_field_raw(format, data, field_map, part->fieldno);
- if (unlikely(field_raw == NULL))
- return NULL;
- if (tuple_field_go_to_path(&field_raw, part->path,
- part->path_len) != 0)
- return NULL;
- return field_raw;
+ int32_t offset_slot;
+ if (likely(part->format_cache == format)) {
+ assert(format->epoch != 0);
+ offset_slot = part->offset_slot_cache;
+ } else {
+ uint32_t field_count = tuple_format_field_count(format);
+ struct tuple_field *root_field =
+ likely(part->fieldno < field_count) ?
+ tuple_format_field(format, part->fieldno) : NULL;
+ struct tuple_field *field =
+ unlikely(root_field == NULL) ? NULL:
+ tuple_format_field_by_path(format, root_field, part->path,
+ part->path_len);
+ if (unlikely(field == NULL)) {
+ /*
+ * Legacy tuple having no field map for JSON
+ * index require full path parse.
+ */
+ const char *field_raw =
+ tuple_field_raw(format, data, field_map, part->fieldno);
+ if (unlikely(field_raw == NULL))
+ return NULL;
+ if (tuple_field_go_to_path(&field_raw, part->path,
+ part->path_len) != 0)
+ return NULL;
+ return field_raw;
+ }
+ offset_slot = field->offset_slot;
+ /* Cache offset_slot if required. */
+ if (part->format_cache != format &&
+ (part->format_cache == NULL ||
+ part->format_cache->epoch < format->epoch)) {
+ assert(format->epoch != 0);
+ part->offset_slot_cache = offset_slot;
+ part->format_cache = format;
+ }
}
- int32_t offset_slot = field->offset_slot;
assert(offset_slot < 0);
assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size);
if (unlikely(field_map[offset_slot] == 0))
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index e82af67..adb4a7b 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -137,6 +137,8 @@ tuple_field_is_nullable(const struct tuple_field *tuple_field)
* Tuple format describes how tuple is stored and information about its fields
*/
struct tuple_format {
+ /** Counter that grows incrementally on space rebuild. */
+ uint64_t epoch;
/** Virtual function table */
struct tuple_format_vtab vtab;
/** Pointer to engine-specific data. */
@@ -250,6 +252,7 @@ tuple_format_unref(struct tuple_format *format)
* @param key_count The number of keys in @a keys array.
* @param space_fields Array of fields, defined in a space format.
* @param space_field_count Length of @a space_fields.
+ * @param epoch Epoch of new format.
*
* @retval not NULL Tuple format.
* @retval NULL Memory error.
@@ -257,7 +260,8 @@ tuple_format_unref(struct tuple_format *format)
struct tuple_format *
tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
uint16_t key_count, const struct field_def *space_fields,
- uint32_t space_field_count, struct tuple_dictionary *dict);
+ uint32_t space_field_count, struct tuple_dictionary *dict,
+ uint64_t epoch);
/**
* Check, if @a format1 can store any tuples of @a format2. For
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index eb40ef1..e4793d9 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -584,7 +584,7 @@ vinyl_engine_check_space_def(struct space_def *def)
static struct space *
vinyl_engine_create_space(struct engine *engine, struct space_def *def,
- struct rlist *key_list)
+ struct rlist *key_list, uint64_t epoch)
{
struct space *space = malloc(sizeof(*space));
if (space == NULL) {
@@ -610,7 +610,8 @@ vinyl_engine_create_space(struct engine *engine, struct space_def *def,
struct tuple_format *format =
tuple_format_new(&vy_tuple_format_vtab, keys, key_count,
- def->fields, def->field_count, def->dict);
+ def->fields, def->field_count, def->dict,
+ epoch);
if (format == NULL) {
free(space);
return NULL;
@@ -3016,7 +3017,7 @@ vy_send_lsm(struct vy_join_ctx *ctx, struct vy_lsm_recovery_info *lsm_info)
if (ctx->key_def == NULL)
goto out;
ctx->format = tuple_format_new(&vy_tuple_format_vtab, &ctx->key_def,
- 1, NULL, 0, NULL);
+ 1, NULL, 0, NULL, 0);
if (ctx->format == NULL)
goto out_free_key_def;
tuple_format_ref(ctx->format);
diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index 681b165..e57f864 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -61,7 +61,7 @@ vy_lsm_env_create(struct vy_lsm_env *env, const char *path,
void *upsert_thresh_arg)
{
env->key_format = tuple_format_new(&vy_tuple_format_vtab,
- NULL, 0, NULL, 0, NULL);
+ NULL, 0, NULL, 0, NULL, 0);
if (env->key_format == NULL)
return -1;
tuple_format_ref(env->key_format);
@@ -154,7 +154,8 @@ vy_lsm_new(struct vy_lsm_env *lsm_env, struct vy_cache_env *cache_env,
lsm->disk_format = format;
} else {
lsm->disk_format = tuple_format_new(&vy_tuple_format_vtab,
- &cmp_def, 1, NULL, 0, NULL);
+ &cmp_def, 1, NULL, 0, NULL,
+ format->epoch);
if (lsm->disk_format == NULL)
goto fail_format;
}
diff --git a/test/unit/vy_iterators_helper.c b/test/unit/vy_iterators_helper.c
index 7fad560..bbb3149 100644
--- a/test/unit/vy_iterators_helper.c
+++ b/test/unit/vy_iterators_helper.c
@@ -22,7 +22,7 @@ vy_iterator_C_test_init(size_t cache_size)
vy_cache_env_create(&cache_env, cord_slab_cache());
vy_cache_env_set_quota(&cache_env, cache_size);
vy_key_format = tuple_format_new(&vy_tuple_format_vtab, NULL, 0,
- NULL, 0, NULL);
+ NULL, 0, NULL, 0);
tuple_format_ref(vy_key_format);
size_t mem_size = 64 * 1024 * 1024;
@@ -202,7 +202,7 @@ create_test_mem(struct key_def *def)
struct key_def * const defs[] = { def };
struct tuple_format *format =
tuple_format_new(&vy_tuple_format_vtab, defs, def->part_count,
- NULL, 0, NULL);
+ NULL, 0, NULL, 0);
fail_if(format == NULL);
/* Create mem */
@@ -220,7 +220,7 @@ create_test_cache(uint32_t *fields, uint32_t *types,
assert(*def != NULL);
vy_cache_create(cache, &cache_env, *def, true);
*format = tuple_format_new(&vy_tuple_format_vtab, def, 1, NULL, 0,
- NULL);
+ NULL, 0);
tuple_format_ref(*format);
}
diff --git a/test/unit/vy_mem.c b/test/unit/vy_mem.c
index ebf3fbc..325c7cc 100644
--- a/test/unit/vy_mem.c
+++ b/test/unit/vy_mem.c
@@ -78,7 +78,7 @@ test_iterator_restore_after_insertion()
/* Create format */
struct tuple_format *format = tuple_format_new(&vy_tuple_format_vtab,
&key_def, 1, NULL, 0,
- NULL);
+ NULL, 0);
assert(format != NULL);
tuple_format_ref(format);
diff --git a/test/unit/vy_point_lookup.c b/test/unit/vy_point_lookup.c
index 65dafcb..d8e3a9e 100644
--- a/test/unit/vy_point_lookup.c
+++ b/test/unit/vy_point_lookup.c
@@ -85,7 +85,7 @@ test_basic()
vy_cache_create(&cache, &cache_env, key_def, true);
struct tuple_format *format = tuple_format_new(&vy_tuple_format_vtab,
&key_def, 1, NULL, 0,
- NULL);
+ NULL, 0);
isnt(format, NULL, "tuple_format_new is not NULL");
tuple_format_ref(format);
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 12/12] box: specify indexes in user-friendly form
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
` (2 preceding siblings ...)
2018-10-29 6:56 ` [PATCH v5 11/12] box: introduce offset slot cache in key_part Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
2018-11-01 13:34 ` [tarantool-patches] " Konstantin Osipov
2018-11-01 14:18 ` Konstantin Osipov
2018-10-29 6:56 ` [PATCH v5 02/12] box: introduce key_def_parts_are_sequential Kirill Shcherbatov
` (7 subsequent siblings)
11 siblings, 2 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
Since now it is possible to create indexes by JSON-path
using field names specified in format.
Closes #1012
@TarantoolBot document
Title: Indexes by JSON path
Sometimes field data could have complex document structure.
When this structure is consistent across whole document,
you are able to create an index by JSON path.
Example:
s:create_index('json_index', {parts = {{'FIO["fname"]', 'str'}}})
---
src/box/lua/index.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++
src/box/lua/schema.lua | 22 ++++++++---------
test/engine/tuple.result | 42 +++++++++++++++++++++++++++++++++
test/engine/tuple.test.lua | 12 ++++++++++
4 files changed, 124 insertions(+), 11 deletions(-)
diff --git a/src/box/lua/index.c b/src/box/lua/index.c
index ef89c39..f3f1c96 100644
--- a/src/box/lua/index.c
+++ b/src/box/lua/index.c
@@ -35,6 +35,9 @@
#include "box/info.h"
#include "box/lua/info.h"
#include "box/lua/tuple.h"
+#include "box/schema.h"
+#include "box/tuple_format.h"
+#include "json/path.h"
#include "box/lua/misc.h" /* lbox_encode_tuple_on_gc() */
/** {{{ box.index Lua library: access to spaces and indexes
@@ -328,6 +331,61 @@ lbox_index_compact(lua_State *L)
return 0;
}
+static int
+lbox_index_resolve_path(struct lua_State *L)
+{
+ if (lua_gettop(L) != 3 ||
+ !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) {
+ return luaL_error(L, "Usage box.internal."
+ "path_resolve(part_id, space_id, path)");
+ }
+ uint32_t part_id = lua_tonumber(L, 1);
+ uint32_t space_id = lua_tonumber(L, 2);
+ size_t path_len;
+ const char *path = lua_tolstring(L, 3, &path_len);
+ struct space *space = space_cache_find(space_id);
+ if (space == NULL)
+ return luaT_error(L);
+ struct json_path_parser parser;
+ struct json_path_node node;
+ json_path_parser_create(&parser, path, path_len);
+ int rc = json_path_next(&parser, &node);
+ if (rc != 0) {
+ const char *err_msg =
+ tt_sprintf("options.parts[%d]: error in path on "
+ "position %d", part_id, rc);
+ diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+ return luaT_error(L);
+ }
+ assert(space->format != NULL && space->format->dict != NULL);
+ uint32_t fieldno;
+ uint32_t field_count = tuple_format_field_count(space->format);
+ if (node.type == JSON_PATH_NUM &&
+ (fieldno = node.num - TUPLE_INDEX_BASE) >= field_count) {
+ const char *err_msg =
+ tt_sprintf("options.parts[%d]: field '%d' referenced "
+ "in path is greater than format field "
+ "count %d", part_id,
+ fieldno + TUPLE_INDEX_BASE, field_count);
+ diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+ return luaT_error(L);
+ } else if (node.type == JSON_PATH_STR &&
+ tuple_fieldno_by_name(space->format->dict, node.str, node.len,
+ field_name_hash(node.str, node.len),
+ &fieldno) != 0) {
+ const char *err_msg =
+ tt_sprintf("options.parts[%d]: field was not found by "
+ "name '%.*s'", part_id, node.len, node.str);
+ diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+ return luaT_error(L);
+ }
+ fieldno += TUPLE_INDEX_BASE;
+ path += parser.offset;
+ lua_pushnumber(L, fieldno);
+ lua_pushstring(L, path);
+ return 2;
+}
+
/* }}} */
void
@@ -365,6 +423,7 @@ box_lua_index_init(struct lua_State *L)
{"truncate", lbox_truncate},
{"stat", lbox_index_stat},
{"compact", lbox_index_compact},
+ {"path_resolve", lbox_index_resolve_path},
{NULL, NULL}
};
diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
index 8a804f0..7874ef9 100644
--- a/src/box/lua/schema.lua
+++ b/src/box/lua/schema.lua
@@ -575,7 +575,7 @@ local function update_index_parts_1_6_0(parts)
return result
end
-local function update_index_parts(format, parts)
+local function update_index_parts(format, parts, space_id)
if type(parts) ~= "table" then
box.error(box.error.ILLEGAL_PARAMS,
"options.parts parameter should be a table")
@@ -626,16 +626,16 @@ local function update_index_parts(format, parts)
box.error(box.error.ILLEGAL_PARAMS,
"options.parts[" .. i .. "]: field (name or number) is expected")
elseif type(part.field) == 'string' then
- for k,v in pairs(format) do
- if v.name == part.field then
- part.field = k
- break
- end
- end
- if type(part.field) == 'string' then
+ local idx, path = box.internal.path_resolve(i, space_id, part.field)
+ if part.path ~= nil and part.path ~= path then
box.error(box.error.ILLEGAL_PARAMS,
- "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'")
+ "options.parts[" .. i .. "]: field path '"..
+ part.path.." doesn't math path resolved by name '" ..
+ part.field .. "'")
end
+ parts_can_be_simplified = parts_can_be_simplified and path == nil
+ part.field = idx
+ part.path = path or part.path
elseif part.field == 0 then
box.error(box.error.ILLEGAL_PARAMS,
"options.parts[" .. i .. "]: field (number) must be one-based")
@@ -792,7 +792,7 @@ box.schema.index.create = function(space_id, name, options)
end
end
local parts, parts_can_be_simplified =
- update_index_parts(format, options.parts)
+ update_index_parts(format, options.parts, space_id)
-- create_index() options contains type, parts, etc,
-- stored separately. Remove these members from index_opts
local index_opts = {
@@ -959,7 +959,7 @@ box.schema.index.alter = function(space_id, index_id, options)
if options.parts then
local parts_can_be_simplified
parts, parts_can_be_simplified =
- update_index_parts(format, options.parts)
+ update_index_parts(format, options.parts, space_id)
-- save parts in old format if possible
if parts_can_be_simplified then
parts = simplify_index_parts(parts)
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index 92927a0..5d7632e 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -1008,6 +1008,48 @@ assert(idx.parts[2].path == "[\"FIO\"][\"fname\"]")
---
- true
...
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+---
+...
+s:format(format)
+---
+- error: Field [2]["FIO"]["fname"] has type 'array' in one index, but type 'map' in
+ another
+...
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+---
+...
+s:format(format)
+---
+...
+s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: error in path on position 1'
+...
+s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: field ''666'' referenced in path is
+ greater than format field count 5'
+...
+s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: field was not found by name ''invalid'''
+...
+idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}})
+---
+...
+assert(idx3 ~= nil)
+---
+- true
+...
+assert(idx3.parts[2].path == "[\"FIO\"][\"fname\"]")
+---
+- true
+...
+-- Vinyl has optimizations that omit index checks, so errors could differ.
+idx3:drop()
+---
+...
s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
---
- error: 'Tuple field 3 type does not match one required by operation: expected map'
diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua
index 9e6807e..a24aa28 100644
--- a/test/engine/tuple.test.lua
+++ b/test/engine/tuple.test.lua
@@ -327,6 +327,18 @@ s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname
idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}})
assert(idx ~= nil)
assert(idx.parts[2].path == "[\"FIO\"][\"fname\"]")
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+s:format(format)
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+s:format(format)
+s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}})
+s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}})
+s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}})
+idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}})
+assert(idx3 ~= nil)
+assert(idx3.parts[2].path == "[\"FIO\"][\"fname\"]")
+-- Vinyl has optimizations that omit index checks, so errors could differ.
+idx3:drop()
s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 02/12] box: introduce key_def_parts_are_sequential
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
` (3 preceding siblings ...)
2018-10-29 6:56 ` [PATCH v5 12/12] box: specify indexes in user-friendly form Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
2018-11-01 14:23 ` [tarantool-patches] " Konstantin Osipov
2018-11-19 17:48 ` Vladimir Davydov
2018-10-29 6:56 ` [PATCH v5 03/12] box: introduce tuple_field_go_to_path Kirill Shcherbatov
` (6 subsequent siblings)
11 siblings, 2 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
Introduced a new key_def_parts_are_sequential routine that test,
does specified key_def have sequential fields. This would be
useful with introducing JSON path as there would be another
complex criteria as fields with JSONs can't be 'sequential' in
this meaning.
Need for #1012
---
| 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
--git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index e493c3b..e9d7cac 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -4,12 +4,21 @@
enum { MSGPACK_NULL = 0xc0 };
+/** True if key part i and i+1 are sequential. */
+static inline bool
+key_def_parts_are_sequential(const struct key_def *def, int i)
+{
+ uint32_t fieldno1 = def->parts[i].fieldno + 1;
+ uint32_t fieldno2 = def->parts[i + 1].fieldno;
+ return fieldno1 == fieldno2;
+}
+
/** True, if a key con contain two or more parts in sequence. */
static bool
key_def_contains_sequential_parts(const struct key_def *def)
{
for (uint32_t i = 0; i < def->part_count - 1; ++i) {
- if (def->parts[i].fieldno + 1 == def->parts[i + 1].fieldno)
+ if (key_def_parts_are_sequential(def, i))
return true;
}
return false;
@@ -123,8 +132,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
* minimize tuple_field_raw() calls.
*/
for (; i < part_count - 1; i++) {
- if (key_def->parts[i].fieldno + 1 !=
- key_def->parts[i + 1].fieldno) {
+ if (!key_def_parts_are_sequential(key_def, i)) {
/*
* End of sequential part.
*/
@@ -165,8 +173,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
* minimize tuple_field_raw() calls.
*/
for (; i < part_count - 1; i++) {
- if (key_def->parts[i].fieldno + 1 !=
- key_def->parts[i + 1].fieldno) {
+ if (!key_def_parts_are_sequential(key_def, i)) {
/*
* End of sequential part.
*/
@@ -231,8 +238,7 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
uint32_t fieldno = key_def->parts[i].fieldno;
uint32_t null_count = 0;
for (; i < key_def->part_count - 1; i++) {
- if (key_def->parts[i].fieldno + 1 !=
- key_def->parts[i + 1].fieldno)
+ if (!key_def_parts_are_sequential(key_def, i))
break;
}
uint32_t end_fieldno = key_def->parts[i].fieldno;
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 03/12] box: introduce tuple_field_go_to_path
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
` (4 preceding siblings ...)
2018-10-29 6:56 ` [PATCH v5 02/12] box: introduce key_def_parts_are_sequential Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
2018-11-19 17:48 ` Vladimir Davydov
2018-10-29 6:56 ` [PATCH v5 04/12] box: introduce tuple_format_add_key_part Kirill Shcherbatov
` (5 subsequent siblings)
11 siblings, 1 reply; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
The new tuple_field_go_to_path routine is used in function
tuple_field_raw_by_path to retrieve data by JSON path from field.
We need this routine exported in future to access data by JSON
path specified in key_part.
Need for #1012
---
src/box/tuple_format.c | 59 +++++++++++++++++++++++++++++++++++---------------
1 file changed, 42 insertions(+), 17 deletions(-)
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index c19ff39..6f76158 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -552,6 +552,41 @@ tuple_field_go_to_key(const char **field, const char *key, int len)
return -1;
}
+/**
+ * Retrieve msgpack data by JSON path.
+ * @param data Pointer to msgpack with data.
+ * @param path The path to process.
+ * @param path_len The length of the @path.
+ * @retval 0 On success.
+ * @retval >0 On path parsing error, invalid character position.
+ */
+static int
+tuple_field_go_to_path(const char **data, const char *path, uint32_t path_len)
+{
+ int rc;
+ struct json_path_parser parser;
+ struct json_path_node node;
+ json_path_parser_create(&parser, path, path_len);
+ while ((rc = json_path_next(&parser, &node)) == 0) {
+ switch (node.type) {
+ case JSON_PATH_NUM:
+ rc = tuple_field_go_to_index(data, node.num);
+ break;
+ case JSON_PATH_STR:
+ rc = tuple_field_go_to_key(data, node.str, node.len);
+ break;
+ default:
+ assert(node.type == JSON_PATH_END);
+ return 0;
+ }
+ if (rc != 0) {
+ *data = NULL;
+ return 0;
+ }
+ }
+ return rc;
+}
+
int
tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
const uint32_t *field_map, const char *path,
@@ -615,23 +650,13 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
*field = NULL;
return 0;
}
- while ((rc = json_path_next(&parser, &node)) == 0) {
- switch(node.type) {
- case JSON_PATH_NUM:
- rc = tuple_field_go_to_index(field, node.num);
- break;
- case JSON_PATH_STR:
- rc = tuple_field_go_to_key(field, node.str, node.len);
- break;
- default:
- assert(node.type == JSON_PATH_END);
- return 0;
- }
- if (rc != 0) {
- *field = NULL;
- return 0;
- }
- }
+ rc = tuple_field_go_to_path(field, path + parser.offset,
+ path_len - parser.offset);
+ if (rc == 0)
+ return 0;
+ /* Setup absolute error position. */
+ rc += parser.offset;
+
error:
assert(rc > 0);
diag_set(ClientError, ER_ILLEGAL_PARAMS,
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 04/12] box: introduce tuple_format_add_key_part
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
` (5 preceding siblings ...)
2018-10-29 6:56 ` [PATCH v5 03/12] box: introduce tuple_field_go_to_path Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
2018-11-01 14:38 ` [tarantool-patches] " Konstantin Osipov
2018-11-19 17:50 ` Vladimir Davydov
2018-10-29 6:56 ` [PATCH v5 05/12] lib: implement JSON tree class for json library Kirill Shcherbatov
` (4 subsequent siblings)
11 siblings, 2 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
Introduced a new tuple_format_add_key_part that makes format
initialization for specified key_part and configuration.
This decrease tuple_format_create routine complexity and would
be used to initialize structures in format for JSON path.
Need for #1012
---
src/box/tuple_format.c | 153 ++++++++++++++++++++++++++-----------------------
1 file changed, 82 insertions(+), 71 deletions(-)
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 6f76158..088579c 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -43,6 +43,84 @@ static const struct tuple_field tuple_field_default = {
ON_CONFLICT_ACTION_DEFAULT, NULL, COLL_NONE,
};
+static int
+tuple_format_add_key_part(struct tuple_format *format,
+ const struct field_def *fields, uint32_t field_count,
+ const struct key_part *part, bool is_sequential,
+ int *current_slot)
+{
+ assert(part->fieldno < format->field_count);
+ struct tuple_field *field = &format->fields[part->fieldno];
+ /*
+ * Field and part nullable actions may differ only
+ * if one of them is DEFAULT, in which case we use
+ * the non-default action *except* the case when
+ * the other one is NONE, in which case we assume
+ * DEFAULT. The latter is needed so that in case
+ * index definition and space format have different
+ * is_nullable flag, we will use the strictest option,
+ * i.e. DEFAULT.
+ */
+ if (part->fieldno >= field_count) {
+ field->nullable_action = part->nullable_action;
+ } else if (field->nullable_action == ON_CONFLICT_ACTION_DEFAULT) {
+ if (part->nullable_action != ON_CONFLICT_ACTION_NONE)
+ field->nullable_action = part->nullable_action;
+ } else if (part->nullable_action == ON_CONFLICT_ACTION_DEFAULT) {
+ if (field->nullable_action == ON_CONFLICT_ACTION_NONE)
+ field->nullable_action = part->nullable_action;
+ } else if (field->nullable_action != part->nullable_action) {
+ diag_set(ClientError, ER_ACTION_MISMATCH,
+ part->fieldno + TUPLE_INDEX_BASE,
+ on_conflict_action_strs[field->nullable_action],
+ on_conflict_action_strs[part->nullable_action]);
+ return -1;
+ }
+
+ /**
+ * Check that there are no conflicts between index part
+ * types and space fields. If a part type is compatible
+ * with field's one, then the part type is more strict
+ * and the part type must be used in tuple_format.
+ */
+ if (field_type1_contains_type2(field->type,
+ part->type)) {
+ field->type = part->type;
+ } else if (!field_type1_contains_type2(part->type,
+ field->type)) {
+ const char *name;
+ int fieldno = part->fieldno + TUPLE_INDEX_BASE;
+ if (part->fieldno >= field_count) {
+ name = tt_sprintf("%d", fieldno);
+ } else {
+ const struct field_def *def =
+ &fields[part->fieldno];
+ name = tt_sprintf("'%s'", def->name);
+ }
+ int errcode;
+ if (!field->is_key_part)
+ errcode = ER_FORMAT_MISMATCH_INDEX_PART;
+ else
+ errcode = ER_INDEX_PART_TYPE_MISMATCH;
+ diag_set(ClientError, errcode, name,
+ field_type_strs[field->type],
+ field_type_strs[part->type]);
+ return -1;
+ }
+ field->is_key_part = true;
+ /*
+ * In the tuple, store only offsets necessary to access
+ * fields of non-sequential keys. First field is always
+ * simply accessible, so we don't store an offset for it.
+ */
+ if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
+ is_sequential == false && part->fieldno > 0) {
+ *current_slot = *current_slot - 1;
+ field->offset_slot = *current_slot;
+ }
+ return 0;
+}
+
/**
* Extract all available type info from keys and field
* definitions.
@@ -93,78 +171,11 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
const struct key_part *parts_end = part + key_def->part_count;
for (; part < parts_end; part++) {
- assert(part->fieldno < format->field_count);
- struct tuple_field *field = &format->fields[part->fieldno];
- /*
- * Field and part nullable actions may differ only
- * if one of them is DEFAULT, in which case we use
- * the non-default action *except* the case when
- * the other one is NONE, in which case we assume
- * DEFAULT. The latter is needed so that in case
- * index definition and space format have different
- * is_nullable flag, we will use the strictest option,
- * i.e. DEFAULT.
- */
- if (part->fieldno >= field_count) {
- field->nullable_action = part->nullable_action;
- } else if (field->nullable_action == ON_CONFLICT_ACTION_DEFAULT) {
- if (part->nullable_action != ON_CONFLICT_ACTION_NONE)
- field->nullable_action = part->nullable_action;
- } else if (part->nullable_action == ON_CONFLICT_ACTION_DEFAULT) {
- if (field->nullable_action == ON_CONFLICT_ACTION_NONE)
- field->nullable_action = part->nullable_action;
- } else if (field->nullable_action != part->nullable_action) {
- diag_set(ClientError, ER_ACTION_MISMATCH,
- part->fieldno + TUPLE_INDEX_BASE,
- on_conflict_action_strs[field->nullable_action],
- on_conflict_action_strs[part->nullable_action]);
+ if (tuple_format_add_key_part(format, fields,
+ field_count, part,
+ is_sequential,
+ ¤t_slot) != 0)
return -1;
- }
-
- /*
- * Check that there are no conflicts
- * between index part types and space
- * fields. If a part type is compatible
- * with field's one, then the part type is
- * more strict and the part type must be
- * used in tuple_format.
- */
- if (field_type1_contains_type2(field->type,
- part->type)) {
- field->type = part->type;
- } else if (! field_type1_contains_type2(part->type,
- field->type)) {
- const char *name;
- int fieldno = part->fieldno + TUPLE_INDEX_BASE;
- if (part->fieldno >= field_count) {
- name = tt_sprintf("%d", fieldno);
- } else {
- const struct field_def *def =
- &fields[part->fieldno];
- name = tt_sprintf("'%s'", def->name);
- }
- int errcode;
- if (! field->is_key_part)
- errcode = ER_FORMAT_MISMATCH_INDEX_PART;
- else
- errcode = ER_INDEX_PART_TYPE_MISMATCH;
- diag_set(ClientError, errcode, name,
- field_type_strs[field->type],
- field_type_strs[part->type]);
- return -1;
- }
- field->is_key_part = true;
- /*
- * In the tuple, store only offsets necessary
- * to access fields of non-sequential keys.
- * First field is always simply accessible,
- * so we don't store an offset for it.
- */
- if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
- is_sequential == false && part->fieldno > 0) {
-
- field->offset_slot = --current_slot;
- }
}
}
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 05/12] lib: implement JSON tree class for json library
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
` (6 preceding siblings ...)
2018-10-29 6:56 ` [PATCH v5 04/12] box: introduce tuple_format_add_key_part Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
2018-11-01 15:08 ` [tarantool-patches] " Konstantin Osipov
2018-11-20 16:43 ` Vladimir Davydov
2018-10-29 6:56 ` [PATCH v5 06/12] box: manage format fields with JSON tree class Kirill Shcherbatov
` (3 subsequent siblings)
11 siblings, 2 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
New JSON tree class would store JSON paths for tuple fields
for registered non-plain indexes. It is a hierarchical data
structure that organize JSON nodes produced by parser.
Class provides API to lookup node by path and iterate over the
tree.
JSON Indexes patch require such functionality to make lookup
for tuple_fields by path, make initialization of field map and
build vynyl_stmt msgpack for secondary index via JSON tree
iteration.
Need for #1012
---
src/lib/json/CMakeLists.txt | 2 +
src/lib/json/tree.c | 327 ++++++++++++++++++++++++++++++++++++++++++++
src/lib/json/tree.h | 224 ++++++++++++++++++++++++++++++
test/unit/json_path.c | 211 +++++++++++++++++++++++++++-
test/unit/json_path.result | 42 +++++-
5 files changed, 804 insertions(+), 2 deletions(-)
create mode 100644 src/lib/json/tree.c
create mode 100644 src/lib/json/tree.h
diff --git a/src/lib/json/CMakeLists.txt b/src/lib/json/CMakeLists.txt
index 203fe6f..9eaba37 100644
--- a/src/lib/json/CMakeLists.txt
+++ b/src/lib/json/CMakeLists.txt
@@ -1,6 +1,8 @@
set(lib_sources
path.c
+ tree.c
)
set_source_files_compile_flags(${lib_sources})
add_library(json_path STATIC ${lib_sources})
+target_link_libraries(json_path misc)
diff --git a/src/lib/json/tree.c b/src/lib/json/tree.c
new file mode 100644
index 0000000..e40c7ac
--- /dev/null
+++ b/src/lib/json/tree.c
@@ -0,0 +1,327 @@
+
+/*
+ * Copyright 2010-2018 Tarantool AUTHORS: please see AUTHORS file.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the
+ * following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <assert.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+
+#include "small/rlist.h"
+#include "trivia/util.h"
+#include "tree.h"
+#include "path.h"
+#include "third_party/PMurHash.h"
+
+
+/**
+ * Hash table: json_path_node => json_tree_node.
+ */
+struct mh_json_tree_node {
+ struct json_path_node key;
+ uint32_t key_hash;
+ struct json_tree_node *parent;
+ struct json_tree_node *node;
+};
+
+/**
+ * Compare hash records of two json tree nodes. Return 0 if equal.
+ */
+static inline int
+mh_json_tree_node_cmp(const struct mh_json_tree_node *a,
+ const struct mh_json_tree_node *b)
+{
+ if (a->key.type != b->key.type)
+ return a->key.type - b->key.type;
+ if (a->parent != b->parent)
+ return a->parent - b->parent;
+ if (a->key.type == JSON_PATH_STR) {
+ if (a->key.len != b->key.len)
+ return a->key.len - b->key.len;
+ return memcmp(a->key.str, b->key.str, a->key.len);
+ } else if (a->key.type == JSON_PATH_NUM) {
+ return a->key_hash - b->key_hash;
+ }
+ unreachable();
+}
+
+#define MH_SOURCE 1
+#define mh_name _json_tree_node
+#define mh_key_t struct mh_json_tree_node *
+#define mh_node_t struct mh_json_tree_node
+#define mh_arg_t void *
+#define mh_hash(a, arg) ((a)->key_hash)
+#define mh_hash_key(a, arg) ((a)->key_hash)
+#define mh_cmp(a, b, arg) (mh_json_tree_node_cmp((a), (b)))
+#define mh_cmp_key(a, b, arg) mh_cmp(a, b, arg)
+#include "salad/mhash.h"
+
+static const uint32_t json_path_node_hash_seed = 13U;
+
+uint32_t
+json_path_node_hash(struct json_path_node *key, uint32_t seed)
+{
+ uint32_t h = seed;
+ uint32_t carry = 0;
+ const void *data;
+ uint32_t data_size;
+ if (key->type == JSON_PATH_STR) {
+ data = key->str;
+ data_size = key->len;
+ } else if (key->type == JSON_PATH_NUM) {
+ data = &key->num;
+ data_size = sizeof(key->num);
+ } else {
+ unreachable();
+ }
+ PMurHash32_Process(&h, &carry, data, data_size);
+ return PMurHash32_Result(h, carry, data_size);
+}
+
+int
+json_tree_create(struct json_tree *tree)
+{
+ memset(tree, 0, sizeof(struct json_tree));
+ tree->root.rolling_hash = json_path_node_hash_seed;
+ tree->root.key.type = JSON_PATH_END;
+ tree->hash = mh_json_tree_node_new();
+ if (unlikely(tree->hash == NULL))
+ return -1;
+ return 0;
+}
+
+void
+json_tree_destroy(struct json_tree *tree)
+{
+ assert(tree->hash != NULL);
+ json_tree_node_destroy(&tree->root);
+ mh_json_tree_node_delete(tree->hash);
+}
+
+void
+json_tree_node_create(struct json_tree_node *node,
+ struct json_path_node *path_node)
+{
+ memset(node, 0, sizeof(struct json_tree_node));
+ node->key = *path_node;
+}
+
+void
+json_tree_node_destroy(struct json_tree_node *node)
+{
+ free(node->children);
+}
+
+struct json_tree_node *
+json_tree_lookup_by_path_node(struct json_tree *tree,
+ struct json_tree_node *parent,
+ struct json_path_node *path_node,
+ uint32_t rolling_hash)
+{
+ assert(parent != NULL);
+ assert(rolling_hash == json_path_node_hash(path_node,
+ parent->rolling_hash));
+ struct mh_json_tree_node info;
+ info.key = *path_node;
+ info.key_hash = rolling_hash;
+ info.parent = parent;
+ mh_int_t id = mh_json_tree_node_find(tree->hash, &info, NULL);
+ if (unlikely(id == mh_end(tree->hash)))
+ return NULL;
+ struct mh_json_tree_node *ht_node =
+ mh_json_tree_node_node(tree->hash, id);
+ assert(ht_node == NULL || ht_node->node != NULL);
+ struct json_tree_node *ret = ht_node != NULL ? ht_node->node : NULL;
+ assert(ret == NULL || ret->parent == parent);
+ return ret;
+}
+
+
+int
+json_tree_add(struct json_tree *tree, struct json_tree_node *parent,
+ struct json_tree_node *node, uint32_t rolling_hash)
+{
+ assert(parent != NULL);
+ assert(node->parent == NULL);
+ assert(rolling_hash ==
+ json_path_node_hash(&node->key, parent->rolling_hash));
+ assert(json_tree_lookup_by_path_node(tree, parent, &node->key,
+ rolling_hash) == NULL);
+ uint32_t insert_idx = (node->key.type == JSON_PATH_NUM) ?
+ (uint32_t)node->key.num - 1 :
+ parent->children_count;
+ if (insert_idx >= parent->children_count) {
+ uint32_t new_size = insert_idx + 1;
+ struct json_tree_node **children =
+ realloc(parent->children, new_size*sizeof(void *));
+ if (unlikely(children == NULL))
+ return -1;
+ memset(children + parent->children_count, 0,
+ (new_size - parent->children_count)*sizeof(void *));
+ parent->children = children;
+ parent->children_count = new_size;
+ }
+ assert(parent->children[insert_idx] == NULL);
+ parent->children[insert_idx] = node;
+ node->sibling_idx = insert_idx;
+ node->rolling_hash = rolling_hash;
+
+ struct mh_json_tree_node ht_node;
+ ht_node.key = node->key;
+ ht_node.key_hash = rolling_hash;
+ ht_node.node = node;
+ ht_node.parent = parent;
+ mh_int_t rc = mh_json_tree_node_put(tree->hash, &ht_node, NULL, NULL);
+ if (unlikely(rc == mh_end(tree->hash))) {
+ parent->children[insert_idx] = NULL;
+ return -1;
+ }
+ node->parent = parent;
+ assert(json_tree_lookup_by_path_node(tree, parent, &node->key,
+ rolling_hash) == node);
+ return 0;
+}
+
+void
+json_tree_remove(struct json_tree *tree, struct json_tree_node *parent,
+ struct json_tree_node *node, uint32_t rolling_hash)
+{
+ assert(parent != NULL);
+ assert(node->parent == parent);
+ assert(json_tree_lookup_by_path_node(tree, parent, &node->key,
+ rolling_hash) == node);
+ struct json_tree_node **child_slot = NULL;
+ if (node->key.type == JSON_PATH_NUM) {
+ child_slot = &parent->children[node->key.num - 1];
+ } else {
+ uint32_t idx = 0;
+ while (idx < parent->children_count &&
+ parent->children[idx] != node)
+ child_slot = &parent->children[idx++];
+ }
+ assert(child_slot != NULL && *child_slot == node);
+ *child_slot = NULL;
+
+ struct mh_json_tree_node info;
+ info.key = node->key;
+ info.key_hash = rolling_hash;
+ info.parent = parent;
+ mh_int_t id = mh_json_tree_node_find(tree->hash, &info, NULL);
+ assert(id != mh_end(tree->hash));
+ mh_json_tree_node_del(tree->hash, id, NULL);
+ assert(json_tree_lookup_by_path_node(tree, parent, &node->key,
+ rolling_hash) == NULL);
+}
+
+struct json_tree_node *
+json_tree_lookup_by_path(struct json_tree *tree, struct json_tree_node *parent,
+ const char *path, uint32_t path_len)
+{
+ int rc;
+ struct json_path_parser parser;
+ struct json_path_node path_node;
+ struct json_tree_node *ret = parent;
+ uint32_t rolling_hash = ret->rolling_hash;
+ json_path_parser_create(&parser, path, path_len);
+ while ((rc = json_path_next(&parser, &path_node)) == 0 &&
+ path_node.type != JSON_PATH_END && ret != NULL) {
+ rolling_hash = json_path_node_hash(&path_node, rolling_hash);
+ ret = json_tree_lookup_by_path_node(tree, ret, &path_node,
+ rolling_hash);
+ }
+ if (rc != 0 || path_node.type != JSON_PATH_END)
+ return NULL;
+ return ret;
+}
+
+static struct json_tree_node *
+json_tree_next_child(struct json_tree_node *parent, struct json_tree_node *pos)
+{
+ assert(pos == NULL || pos->parent == parent);
+ struct json_tree_node **arr = parent->children;
+ uint32_t arr_size = parent->children_count;
+ if (arr == NULL)
+ return NULL;
+ uint32_t idx = pos != NULL ? pos->sibling_idx + 1 : 0;
+ while (idx < arr_size && arr[idx] == NULL)
+ idx++;
+ if (idx >= arr_size)
+ return NULL;
+ return arr[idx];
+}
+
+static struct json_tree_node *
+json_tree_leftmost(struct json_tree_node *pos)
+{
+ struct json_tree_node *last;
+ do {
+ last = pos;
+ pos = json_tree_next_child(pos, NULL);
+ } while (pos != NULL);
+ return last;
+}
+
+struct json_tree_node *
+json_tree_next_pre(struct json_tree_node *parent, struct json_tree_node *pos)
+{
+ if (pos == NULL)
+ pos = parent;
+ struct json_tree_node *next = json_tree_next_child(pos, NULL);
+ if (next != NULL)
+ return next;
+ while (pos != parent) {
+ next = json_tree_next_child(pos->parent, pos);
+ if (next != NULL)
+ return next;
+ pos = pos->parent;
+ }
+ return NULL;
+}
+
+struct json_tree_node *
+json_tree_next_post(struct json_tree_node *parent, struct json_tree_node *pos)
+{
+ struct json_tree_node *next;
+ if (pos == NULL) {
+ next = json_tree_leftmost(parent);
+ return next != parent ? next : NULL;
+ }
+ if (pos == parent)
+ return NULL;
+ next = json_tree_next_child(pos->parent, pos);
+ if (next != NULL) {
+ next = json_tree_leftmost(next);
+ return next != parent ? next : NULL;
+ }
+ return pos->parent != parent ? pos->parent : NULL;
+}
diff --git a/src/lib/json/tree.h b/src/lib/json/tree.h
new file mode 100644
index 0000000..6166024
--- /dev/null
+++ b/src/lib/json/tree.h
@@ -0,0 +1,224 @@
+#ifndef TARANTOOL_JSON_TREE_H_INCLUDED
+#define TARANTOOL_JSON_TREE_H_INCLUDED
+/*
+ * Copyright 2010-2018 Tarantool AUTHORS: please see AUTHORS file.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the
+ * following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#include <stdbool.h>
+#include <stdint.h>
+#include "small/rlist.h"
+#include "path.h"
+#include <stdio.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdlib.h>
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+struct mh_json_tree_node_t;
+
+/**
+ * JSON tree is a hierarchical data structure that organize JSON
+ * nodes produced by parser. Key record point to source strings
+ * memory.
+ */
+struct json_tree_node {
+ /** JSON path node produced by json_next_token. */
+ struct json_path_node key;
+ /**
+ * Rolling hash for node calculated with
+ * json_path_node_hash(key, parent).
+ */
+ uint32_t rolling_hash;
+ /** Array of child records. Match indexes. */
+ struct json_tree_node **children;
+ /** Size of children array. */
+ uint32_t children_count;
+ /** Index of node in parent children array. */
+ uint32_t sibling_idx;
+ /** Pointer to parent node. */
+ struct json_tree_node *parent;
+};
+
+/** JSON tree object managing data relations. */
+struct json_tree {
+ /** JSON tree root node. */
+ struct json_tree_node root;
+ /** Hashtable af all tree nodes. */
+ struct mh_json_tree_node_t *hash;
+};
+
+/** Create a JSON tree object to manage data relations. */
+int
+json_tree_create(struct json_tree *tree);
+
+/**
+ * Destroy JSON tree object. This routine doesn't subtree so
+ * should be called at the end of it's manual destruction.
+ */
+void
+json_tree_destroy(struct json_tree *tree);
+
+/** Compute the hash value of a JSON path component. */
+uint32_t
+json_path_node_hash(struct json_path_node *key, uint32_t seed);
+
+/** Init a JSON tree node. */
+void
+json_tree_node_create(struct json_tree_node *node,
+ struct json_path_node *path_node);
+
+/** Destroy a JSON tree node. */
+void
+json_tree_node_destroy(struct json_tree_node *node);
+
+/**
+ * Make child lookup in tree by path_node at position specified
+ * with parent. Rolling hash should be calculated for path_node
+ * and parent with json_path_node_hash.
+ */
+struct json_tree_node *
+json_tree_lookup_by_path_node(struct json_tree *tree,
+ struct json_tree_node *parent,
+ struct json_path_node *path_node,
+ uint32_t rolling_hash);
+
+/**
+ * Append node to the given parent position in JSON tree. The
+ * parent mustn't have a child with such content. Rolling
+ * hash should be calculated for path_node and parent with
+ * json_path_node_hash.
+ */
+int
+json_tree_add(struct json_tree *tree, struct json_tree_node *parent,
+ struct json_tree_node *node, uint32_t rolling_hash);
+
+/**
+ * Delete a JSON tree node at the given parent position in JSON
+ * tree. The parent node must have such child. Rolling hash should
+ * be calculated for path_node and parent with
+ * json_path_node_hash.
+ */
+void
+json_tree_remove(struct json_tree *tree, struct json_tree_node *parent,
+ struct json_tree_node *node, uint32_t rolling_hash);
+
+/** Make child lookup by path in JSON tree. */
+struct json_tree_node *
+json_tree_lookup_by_path(struct json_tree *tree, struct json_tree_node *parent,
+ const char *path, uint32_t path_len);
+
+/** Make pre order traversal in JSON tree. */
+struct json_tree_node *
+json_tree_next_pre(struct json_tree_node *parent,
+ struct json_tree_node *pos);
+
+/** Make post order traversal in JSON tree. */
+struct json_tree_node *
+json_tree_next_post(struct json_tree_node *parent,
+ struct json_tree_node *pos);
+
+/** Return entry by json_tree_node item. */
+#define json_tree_entry(item, type, member) ({ \
+ const typeof( ((type *)0)->member ) *__mptr = (item); \
+ (type *)( (char *)__mptr - ((size_t) &((type *)0)->member) ); })
+
+/** Return entry by json_tree_node item or NULL if item is NULL.*/
+#define json_tree_entry_safe(item, type, member) ({ \
+ (item) != NULL ? json_tree_entry((item), type, member) : NULL; })
+
+/** Make lookup in tree by path and return entry. */
+#define json_tree_lookup_entry_by_path(tree, parent, path, path_len, type, \
+ member) \
+({struct json_tree_node *__tree_node = \
+ json_tree_lookup_by_path((tree), (parent), path, path_len); \
+ __tree_node != NULL ? json_tree_entry(__tree_node, type, member) : NULL; })
+
+/** Make lookup in tree by path_node and return entry. */
+#define json_tree_lookup_entry_by_path_node(tree, parent, path_node, \
+ path_node_hash, type, member) \
+({struct json_tree_node *__tree_node = \
+ json_tree_lookup_by_path_node((tree), (parent), path_node, \
+ path_node_hash); \
+ __tree_node != NULL ? json_tree_entry(__tree_node, type, member) : \
+ NULL; })
+
+/** Make pre-order traversal in JSON tree. */
+#define json_tree_foreach_pre(item, root) \
+for ((item) = json_tree_next_pre((root), NULL); (item) != NULL; \
+ (item) = json_tree_next_pre((root), (item)))
+
+/** Make post-order traversal in JSON tree. */
+#define json_tree_foreach_post(item, root) \
+for ((item) = json_tree_next_post((root), NULL); (item) != NULL; \
+ (item) = json_tree_next_post((root), (item)))
+
+/**
+ * Make safe post-order traversal in JSON tree.
+ * Safe for building destructors.
+ */
+#define json_tree_foreach_safe(item, root) \
+for (struct json_tree_node *__iter = json_tree_next_post((root), NULL); \
+ (((item) = __iter) && (__iter = json_tree_next_post((root), (item))), \
+ (item) != NULL);)
+
+/** Make post-order traversal in JSON tree and return entry. */
+#define json_tree_foreach_entry_pre(item, root, type, member) \
+for (struct json_tree_node *__iter = \
+ json_tree_next_pre((root), NULL); \
+ __iter != NULL && ((item) = json_tree_entry(__iter, type, member)); \
+ __iter = json_tree_next_pre((root), __iter))
+
+/** Make pre-order traversal in JSON tree and return entry. */
+#define json_tree_foreach_entry_post(item, root, type, member) \
+for (struct json_tree_node *__iter = \
+ json_tree_next_post((root), NULL ); \
+ __iter != NULL && ((item) = json_tree_entry(__iter, type, member)); \
+ __iter = json_tree_next_post((root), __iter))
+
+/**
+ * Make secure post-order traversal in JSON tree and return entry.
+ */
+#define json_tree_foreach_entry_safe(item, root, type, member) \
+for (struct json_tree_node *__tmp_iter, *__iter = \
+ json_tree_next_post((root), NULL); \
+ __iter != NULL && ((item) = json_tree_entry(__iter, type, member)) && \
+ (__iter != NULL && (__tmp_iter = \
+ json_tree_next_post((root), __iter))), \
+ (__iter != NULL && ((item) = json_tree_entry(__iter, type, member))); \
+ __iter = __tmp_iter)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* TARANTOOL_JSON_TREE_H_INCLUDED */
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index 1d7e7d3..75ca11b 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -1,7 +1,9 @@
#include "json/path.h"
+#include "json/tree.h"
#include "unit.h"
#include "trivia/util.h"
#include <string.h>
+#include <stdbool.h>
#define reset_to_new_path(value) \
path = value; \
@@ -159,14 +161,221 @@ test_errors()
footer();
}
+struct test_struct {
+ int value;
+ struct json_tree_node tree_node;
+};
+
+struct test_struct *
+test_add_path(struct json_tree *tree, const char *path, uint32_t path_len,
+ struct test_struct *records_pool, int *pool_idx)
+{
+ int rc;
+ struct json_path_parser parser;
+ struct json_path_node path_node;
+ struct json_tree_node *parent = &tree->root;
+ json_path_parser_create(&parser, path, path_len);
+ while ((rc = json_path_next(&parser, &path_node)) == 0 &&
+ path_node.type != JSON_PATH_END) {
+ uint32_t rolling_hash =
+ json_path_node_hash(&path_node, parent->rolling_hash);
+ struct json_tree_node *new_node =
+ json_tree_lookup_by_path_node(tree, parent, &path_node,
+ rolling_hash);
+ if (new_node == NULL) {
+ new_node = &records_pool[*pool_idx].tree_node;
+ *pool_idx = *pool_idx + 1;
+ json_tree_node_create(new_node, &path_node);
+ rc = json_tree_add(tree, parent, new_node,
+ rolling_hash);
+ fail_if(rc != 0);
+ }
+ parent = new_node;
+ }
+ fail_if(rc != 0 || path_node.type != JSON_PATH_END);
+ return json_tree_entry_safe(parent, struct test_struct, tree_node);
+}
+
+void
+test_tree()
+{
+ header();
+ plan(36);
+
+ struct json_tree tree;
+ int rc = json_tree_create(&tree);
+ fail_if(rc != 0);
+
+ struct test_struct records[6];
+ for (int i = 0; i < 6; i++)
+ records[i].value = i;
+
+ const char *path1 = "[1][10]";
+ const char *path2 = "[1][20].file";
+ const char *path3_to_remove = "[2]";
+ const char *path_unregistered = "[1][3]";
+
+ int records_idx = 1;
+ struct test_struct *node;
+ node = test_add_path(&tree, path1, strlen(path1), records,
+ &records_idx);
+ is(node, &records[2], "add path '%s'", path1);
+
+ node = test_add_path(&tree, path2, strlen(path2), records,
+ &records_idx);
+ is(node, &records[4], "add path '%s'", path2);
+
+ node = json_tree_lookup_entry_by_path(&tree, &tree.root, path1,
+ strlen(path1), struct test_struct,
+ tree_node);
+ is(node, &records[2], "lookup path '%s'", path1);
+
+ node = json_tree_lookup_entry_by_path(&tree, &tree.root, path2,
+ strlen(path2), struct test_struct,
+ tree_node);
+ is(node, &records[4], "lookup path '%s'", path2);
+
+ node = json_tree_lookup_entry_by_path(&tree, &tree.root,
+ path_unregistered,
+ strlen(path_unregistered),
+ struct test_struct, tree_node);
+ is(node, NULL, "lookup unregistered path '%s'", path_unregistered);
+
+ node = test_add_path(&tree, path3_to_remove, strlen(path3_to_remove),
+ records, &records_idx);
+ is(node, &records[5], "add path to remove '%s'", path3_to_remove);
+ if (node != NULL) {
+ uint32_t rolling_hash =
+ json_path_node_hash(&node->tree_node.key,
+ tree.root.rolling_hash);
+ json_tree_remove(&tree, &tree.root, &node->tree_node,
+ rolling_hash);
+ json_tree_node_destroy(&node->tree_node);
+ } else {
+ isnt(node, NULL, "can't remove empty node!");
+ }
+
+ /* Test iterators. */
+ struct json_tree_node *tree_record = NULL;
+ const struct json_tree_node *tree_nodes_preorder[] =
+ {&records[1].tree_node, &records[2].tree_node,
+ &records[3].tree_node, &records[4].tree_node};
+ int cnt = sizeof(tree_nodes_preorder)/sizeof(tree_nodes_preorder[0]);
+ int idx = 0;
+
+ json_tree_foreach_pre(tree_record, &tree.root) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t1 =
+ json_tree_entry(tree_record, struct test_struct,
+ tree_node);
+ struct test_struct *t2 =
+ json_tree_entry(tree_nodes_preorder[idx],
+ struct test_struct, tree_node);
+ is(tree_record, tree_nodes_preorder[idx],
+ "test foreach pre order %d: have %d expected of %d",
+ idx, t1->value, t2->value);
+ ++idx;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+ const struct json_tree_node *tree_nodes_postorder[] =
+ {&records[2].tree_node, &records[4].tree_node,
+ &records[3].tree_node, &records[1].tree_node};
+ cnt = sizeof(tree_nodes_postorder)/sizeof(tree_nodes_postorder[0]);
+ idx = 0;
+ json_tree_foreach_post(tree_record, &tree.root) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t1 =
+ json_tree_entry(tree_record, struct test_struct,
+ tree_node);
+ struct test_struct *t2 =
+ json_tree_entry(tree_nodes_postorder[idx],
+ struct test_struct, tree_node);
+ is(tree_record, tree_nodes_postorder[idx],
+ "test foreach post order %d: have %d expected of %d",
+ idx, t1->value, t2->value);
+ ++idx;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+ idx = 0;
+ json_tree_foreach_safe(tree_record, &tree.root) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t1 =
+ json_tree_entry(tree_record, struct test_struct,
+ tree_node);
+ struct test_struct *t2 =
+ json_tree_entry(tree_nodes_postorder[idx],
+ struct test_struct, tree_node);
+ is(tree_record, tree_nodes_postorder[idx],
+ "test foreach safe order %d: have %d expected of %d",
+ idx, t1->value, t2->value);
+ ++idx;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+ idx = 0;
+ json_tree_foreach_entry_pre(node, &tree.root, struct test_struct,
+ tree_node) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t =
+ json_tree_entry(tree_nodes_preorder[idx],
+ struct test_struct, tree_node);
+ is(&node->tree_node, tree_nodes_preorder[idx],
+ "test foreach entry pre order %d: have %d expected of %d",
+ idx, node->value, t->value);
+ idx++;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+ idx = 0;
+ json_tree_foreach_entry_post(node, &tree.root, struct test_struct,
+ tree_node) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t =
+ json_tree_entry(tree_nodes_postorder[idx],
+ struct test_struct, tree_node);
+ is(&node->tree_node, tree_nodes_postorder[idx],
+ "test foreach entry post order %d: have %d expected of %d",
+ idx, node->value, t->value);
+ idx++;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+ idx = 0;
+ json_tree_foreach_entry_safe(node, &tree.root, struct test_struct,
+ tree_node) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t =
+ json_tree_entry(tree_nodes_postorder[idx],
+ struct test_struct, tree_node);
+ is(&node->tree_node, tree_nodes_postorder[idx],
+ "test foreach entry safe order %d: have %d expected of %d",
+ idx, node->value, t->value);
+ json_tree_node_destroy(&node->tree_node);
+ idx++;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+ check_plan();
+ footer();
+}
+
int
main()
{
header();
- plan(2);
+ plan(3);
test_basic();
test_errors();
+ test_tree();
int rc = check_plan();
footer();
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index a2a2f82..5b44fd2 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,5 +1,5 @@
*** main ***
-1..2
+1..3
*** test_basic ***
1..71
ok 1 - parse <[0]>
@@ -99,4 +99,44 @@ ok 1 - subtests
ok 20 - tab inside identifier
ok 2 - subtests
*** test_errors: done ***
+ *** test_tree ***
+ 1..36
+ ok 1 - add path '[1][10]'
+ ok 2 - add path '[1][20].file'
+ ok 3 - lookup path '[1][10]'
+ ok 4 - lookup path '[1][20].file'
+ ok 5 - lookup unregistered path '[1][3]'
+ ok 6 - add path to remove '[2]'
+ ok 7 - test foreach pre order 0: have 1 expected of 1
+ ok 8 - test foreach pre order 1: have 2 expected of 2
+ ok 9 - test foreach pre order 2: have 3 expected of 3
+ ok 10 - test foreach pre order 3: have 4 expected of 4
+ ok 11 - records iterated count 4 of 4
+ ok 12 - test foreach post order 0: have 2 expected of 2
+ ok 13 - test foreach post order 1: have 4 expected of 4
+ ok 14 - test foreach post order 2: have 3 expected of 3
+ ok 15 - test foreach post order 3: have 1 expected of 1
+ ok 16 - records iterated count 4 of 4
+ ok 17 - test foreach safe order 0: have 2 expected of 2
+ ok 18 - test foreach safe order 1: have 4 expected of 4
+ ok 19 - test foreach safe order 2: have 3 expected of 3
+ ok 20 - test foreach safe order 3: have 1 expected of 1
+ ok 21 - records iterated count 4 of 4
+ ok 22 - test foreach entry pre order 0: have 1 expected of 1
+ ok 23 - test foreach entry pre order 1: have 2 expected of 2
+ ok 24 - test foreach entry pre order 2: have 3 expected of 3
+ ok 25 - test foreach entry pre order 3: have 4 expected of 4
+ ok 26 - records iterated count 4 of 4
+ ok 27 - test foreach entry post order 0: have 2 expected of 2
+ ok 28 - test foreach entry post order 1: have 4 expected of 4
+ ok 29 - test foreach entry post order 2: have 3 expected of 3
+ ok 30 - test foreach entry post order 3: have 1 expected of 1
+ ok 31 - records iterated count 4 of 4
+ ok 32 - test foreach entry safe order 0: have 2 expected of 2
+ ok 33 - test foreach entry safe order 1: have 4 expected of 4
+ ok 34 - test foreach entry safe order 2: have 3 expected of 3
+ ok 35 - test foreach entry safe order 3: have 1 expected of 1
+ ok 36 - records iterated count 4 of 4
+ok 3 - subtests
+ *** test_tree: done ***
*** main: done ***
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 06/12] box: manage format fields with JSON tree class
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
` (7 preceding siblings ...)
2018-10-29 6:56 ` [PATCH v5 05/12] lib: implement JSON tree class for json library Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 07/12] lib: introduce json_path_normalize routine Kirill Shcherbatov
` (2 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
As we going to work with format fields in a unified way, we
started to use the tree JSON class for working with first-level
format fields.
Need for #1012
---
src/box/sql.c | 16 +++----
src/box/sql/build.c | 5 +-
src/box/tuple.c | 10 ++--
src/box/tuple_format.c | 121 ++++++++++++++++++++++++++++++++++---------------
src/box/tuple_format.h | 32 ++++++++++---
src/box/vy_stmt.c | 4 +-
test-run | 2 +-
7 files changed, 130 insertions(+), 60 deletions(-)
diff --git a/src/box/sql.c b/src/box/sql.c
index c7b87e5..3d08936 100644
--- a/src/box/sql.c
+++ b/src/box/sql.c
@@ -197,7 +197,8 @@ tarantoolSqlite3TupleColumnFast(BtCursor *pCur, u32 fieldno, u32 *field_size)
struct tuple_format *format = tuple_format(pCur->last_tuple);
assert(format->exact_field_count == 0
|| fieldno < format->exact_field_count);
- if (format->fields[fieldno].offset_slot == TUPLE_OFFSET_SLOT_NIL)
+ if (tuple_format_field(format, fieldno)->offset_slot ==
+ TUPLE_OFFSET_SLOT_NIL)
return NULL;
const char *field = tuple_field(pCur->last_tuple, fieldno);
const char *end = field;
@@ -891,7 +892,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
struct key_def *key_def;
const struct tuple *tuple;
const char *base;
- const struct tuple_format *format;
+ struct tuple_format *format;
const uint32_t *field_map;
uint32_t field_count, next_fieldno = 0;
const char *p, *field0;
@@ -909,7 +910,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
base = tuple_data(tuple);
format = tuple_format(tuple);
field_map = tuple_field_map(tuple);
- field_count = format->field_count;
+ field_count = tuple_format_field_count(format);
field0 = base; mp_decode_array(&field0); p = field0;
for (i = 0; i < n; i++) {
/*
@@ -927,9 +928,10 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
uint32_t fieldno = key_def->parts[i].fieldno;
if (fieldno != next_fieldno) {
+ struct tuple_field *field =
+ tuple_format_field(format, fieldno);
if (fieldno >= field_count ||
- format->fields[fieldno].offset_slot ==
- TUPLE_OFFSET_SLOT_NIL) {
+ field->offset_slot == TUPLE_OFFSET_SLOT_NIL) {
/* Outdated field_map. */
uint32_t j = 0;
@@ -937,9 +939,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
while (j++ != fieldno)
mp_next(&p);
} else {
- p = base + field_map[
- format->fields[fieldno].offset_slot
- ];
+ p = base + field_map[field->offset_slot];
}
}
next_fieldno = fieldno + 1;
diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index 7556a78..83e89bb 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -864,8 +864,9 @@ sql_column_collation(struct space_def *def, uint32_t column, uint32_t *coll_id)
struct coll_id *collation = coll_by_id(*coll_id);
return collation != NULL ? collation->coll : NULL;
}
- *coll_id = space->format->fields[column].coll_id;
- return space->format->fields[column].coll;
+ struct tuple_field *field = tuple_format_field(space->format, column);
+ *coll_id = field->coll_id;
+ return field->coll;
}
struct ExprList *
diff --git a/src/box/tuple.c b/src/box/tuple.c
index ef4d16f..aae1c3c 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -138,7 +138,7 @@ runtime_tuple_delete(struct tuple_format *format, struct tuple *tuple)
int
tuple_validate_raw(struct tuple_format *format, const char *tuple)
{
- if (format->field_count == 0)
+ if (tuple_format_field_count(format) == 0)
return 0; /* Nothing to check */
/* Check to see if the tuple has a sufficient number of fields. */
@@ -158,10 +158,12 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple)
}
/* Check field types */
- struct tuple_field *field = &format->fields[0];
+ struct tuple_field *field = tuple_format_field(format, 0);
uint32_t i = 0;
- uint32_t defined_field_count = MIN(field_count, format->field_count);
- for (; i < defined_field_count; ++i, ++field) {
+ uint32_t defined_field_count =
+ MIN(field_count, tuple_format_field_count(format));
+ for (; i < defined_field_count; ++i) {
+ field = tuple_format_field(format, i);
if (key_mp_type_validate(field->type, mp_typeof(*tuple),
ER_FIELD_TYPE, i + TUPLE_INDEX_BASE,
tuple_field_is_nullable(field)))
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 088579c..9ffb5e4 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -38,10 +38,29 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL;
static uint32_t formats_size = 0, formats_capacity = 0;
-static const struct tuple_field tuple_field_default = {
- FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
- ON_CONFLICT_ACTION_DEFAULT, NULL, COLL_NONE,
-};
+static struct tuple_field *
+tuple_field_create(struct json_path_node *node)
+{
+ struct tuple_field *ret = calloc(1, sizeof(struct tuple_field));
+ if (ret == NULL) {
+ diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc",
+ "ret");
+ return NULL;
+ }
+ ret->type = FIELD_TYPE_ANY;
+ ret->offset_slot = TUPLE_OFFSET_SLOT_NIL;
+ ret->coll_id = COLL_NONE;
+ ret->nullable_action = ON_CONFLICT_ACTION_DEFAULT;
+ json_tree_node_create(&ret->tree_node, node);
+ return ret;
+}
+
+static void
+tuple_field_destroy(struct tuple_field *field)
+{
+ json_tree_node_destroy(&field->tree_node);
+ free(field);
+}
static int
tuple_format_add_key_part(struct tuple_format *format,
@@ -49,8 +68,8 @@ tuple_format_add_key_part(struct tuple_format *format,
const struct key_part *part, bool is_sequential,
int *current_slot)
{
- assert(part->fieldno < format->field_count);
- struct tuple_field *field = &format->fields[part->fieldno];
+ assert(part->fieldno < tuple_format_field_count(format));
+ struct tuple_field *field = tuple_format_field(format, part->fieldno);
/*
* Field and part nullable actions may differ only
* if one of them is DEFAULT, in which case we use
@@ -133,16 +152,15 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
format->min_field_count =
tuple_format_min_field_count(keys, key_count, fields,
field_count);
- if (format->field_count == 0) {
+ if (tuple_format_field_count(format) == 0) {
format->field_map_size = 0;
return 0;
}
/* Initialize defined fields */
for (uint32_t i = 0; i < field_count; ++i) {
- format->fields[i].is_key_part = false;
- format->fields[i].type = fields[i].type;
- format->fields[i].offset_slot = TUPLE_OFFSET_SLOT_NIL;
- format->fields[i].nullable_action = fields[i].nullable_action;
+ struct tuple_field *field = tuple_format_field(format, i);
+ field->type = fields[i].type;
+ field->nullable_action = fields[i].nullable_action;
struct coll *coll = NULL;
uint32_t cid = fields[i].coll_id;
if (cid != COLL_NONE) {
@@ -154,12 +172,9 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
}
coll = coll_id->coll;
}
- format->fields[i].coll = coll;
- format->fields[i].coll_id = cid;
+ field->coll = coll;
+ field->coll_id = cid;
}
- /* Initialize remaining fields */
- for (uint32_t i = field_count; i < format->field_count; i++)
- format->fields[i] = tuple_field_default;
int current_slot = 0;
@@ -179,7 +194,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
}
}
- assert(format->fields[0].offset_slot == TUPLE_OFFSET_SLOT_NIL);
+ assert(tuple_format_field(format, 0)->offset_slot ==
+ TUPLE_OFFSET_SLOT_NIL);
size_t field_map_size = -current_slot * sizeof(uint32_t);
if (field_map_size > UINT16_MAX) {
/** tuple->data_offset is 16 bits */
@@ -253,39 +269,67 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
}
}
uint32_t field_count = MAX(space_field_count, index_field_count);
- uint32_t total = sizeof(struct tuple_format) +
- field_count * sizeof(struct tuple_field);
- struct tuple_format *format = (struct tuple_format *) malloc(total);
+ struct tuple_format *format = malloc(sizeof(struct tuple_format));
if (format == NULL) {
diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc",
"tuple format");
return NULL;
}
+ if (json_tree_create(&format->field_tree) != 0) {
+ free(format);
+ return NULL;
+ }
+ struct json_path_node path_node;
+ memset(&path_node, 0, sizeof(path_node));
+ path_node.type = JSON_PATH_NUM;
+ for (int32_t i = field_count - 1; i >= 0; i--) {
+ path_node.num = i + TUPLE_INDEX_BASE;
+ struct tuple_field *field = tuple_field_create(&path_node);
+ if (field == NULL)
+ goto error;
+ uint32_t rolling_hash = format->field_tree.root.rolling_hash;
+ rolling_hash = json_path_node_hash(&path_node, rolling_hash);
+ if (json_tree_add(&format->field_tree, &format->field_tree.root,
+ &field->tree_node, rolling_hash) != 0) {
+ tuple_field_destroy(field);
+ goto error;
+ }
+ }
if (dict == NULL) {
assert(space_field_count == 0);
format->dict = tuple_dictionary_new(NULL, 0);
- if (format->dict == NULL) {
- free(format);
- return NULL;
- }
+ if (format->dict == NULL)
+ goto error;
} else {
format->dict = dict;
tuple_dictionary_ref(dict);
}
format->refs = 0;
format->id = FORMAT_ID_NIL;
- format->field_count = field_count;
format->index_field_count = index_field_count;
format->exact_field_count = 0;
format->min_field_count = 0;
return format;
+error:;
+ struct tuple_field *field;
+ json_tree_foreach_entry_safe(field, &format->field_tree.root,
+ struct tuple_field, tree_node)
+ tuple_field_destroy(field);
+ json_tree_destroy(&format->field_tree);
+ free(format);
+ return NULL;
}
/** Free tuple format resources, doesn't unregister. */
static inline void
tuple_format_destroy(struct tuple_format *format)
{
+ struct tuple_field *field;
+ json_tree_foreach_entry_safe(field, &format->field_tree.root,
+ struct tuple_field, tree_node)
+ tuple_field_destroy(field);
+ json_tree_destroy(&format->field_tree);
tuple_dictionary_unref(format->dict);
}
@@ -323,18 +367,21 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
}
bool
-tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
- const struct tuple_format *format2)
+tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
+ struct tuple_format *format2)
{
if (format1->exact_field_count != format2->exact_field_count)
return false;
- for (uint32_t i = 0; i < format1->field_count; ++i) {
- const struct tuple_field *field1 = &format1->fields[i];
+ uint32_t format1_field_count = tuple_format_field_count(format1);
+ uint32_t format2_field_count = tuple_format_field_count(format2);
+ for (uint32_t i = 0; i < format1_field_count; ++i) {
+ const struct tuple_field *field1 =
+ tuple_format_field(format1, i);
/*
* The field has a data type in format1, but has
* no data type in format2.
*/
- if (i >= format2->field_count) {
+ if (i >= format2_field_count) {
/*
* The field can get a name added
* for it, and this doesn't require a data
@@ -350,7 +397,8 @@ tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
else
return false;
}
- const struct tuple_field *field2 = &format2->fields[i];
+ const struct tuple_field *field2 =
+ tuple_format_field(format2, i);
if (! field_type1_contains_type2(field1->type, field2->type))
return false;
/*
@@ -369,7 +417,7 @@ int
tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
const char *tuple, bool validate)
{
- if (format->field_count == 0)
+ if (tuple_format_field_count(format) == 0)
return 0; /* Nothing to initialize */
const char *pos = tuple;
@@ -392,17 +440,17 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
/* first field is simply accessible, so we do not store offset to it */
enum mp_type mp_type = mp_typeof(*pos);
- const struct tuple_field *field = &format->fields[0];
+ const struct tuple_field *field =
+ tuple_format_field((struct tuple_format *)format, 0);
if (validate &&
key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
TUPLE_INDEX_BASE, tuple_field_is_nullable(field)))
return -1;
mp_next(&pos);
/* other fields...*/
- ++field;
uint32_t i = 1;
uint32_t defined_field_count = MIN(field_count, validate ?
- format->field_count :
+ tuple_format_field_count(format) :
format->index_field_count);
if (field_count < format->index_field_count) {
/*
@@ -412,7 +460,8 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
memset((char *)field_map - format->field_map_size, 0,
format->field_map_size);
}
- for (; i < defined_field_count; ++i, ++field) {
+ for (; i < defined_field_count; ++i) {
+ field = tuple_format_field((struct tuple_format *)format, i);
mp_type = mp_typeof(*pos);
if (validate &&
key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 232df22..8426153 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -34,6 +34,7 @@
#include "key_def.h"
#include "field_def.h"
#include "errinj.h"
+#include "json/tree.h"
#include "tuple_dictionary.h"
#if defined(__cplusplus)
@@ -113,6 +114,8 @@ struct tuple_field {
struct coll *coll;
/** Collation identifier. */
uint32_t coll_id;
+ /** An JSON tree entry to organize tree. */
+ struct json_tree_node tree_node;
};
/**
@@ -166,16 +169,29 @@ struct tuple_format {
* index_field_count <= min_field_count <= field_count.
*/
uint32_t min_field_count;
- /* Length of 'fields' array. */
- uint32_t field_count;
/**
* Shared names storage used by all formats of a space.
*/
struct tuple_dictionary *dict;
- /* Formats of the fields */
- struct tuple_field fields[0];
+ /** JSON tree of fields. */
+ struct json_tree field_tree;
};
+
+static inline uint32_t
+tuple_format_field_count(const struct tuple_format *format)
+{
+ return format->field_tree.root.children_count;
+}
+
+static inline struct tuple_field *
+tuple_format_field(struct tuple_format *format, uint32_t fieldno)
+{
+ assert(fieldno < tuple_format_field_count(format));
+ struct json_tree_node *tree_node = format->field_tree.root.children[fieldno];
+ return json_tree_entry(tree_node, struct tuple_field, tree_node);
+}
+
extern struct tuple_format **tuple_formats;
static inline uint32_t
@@ -238,8 +254,8 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
* @retval True, if @a format1 can store any tuples of @a format2.
*/
bool
-tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
- const struct tuple_format *format2);
+tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
+ struct tuple_format *format2);
/**
* Calculate minimal field count of tuples with specified keys and
@@ -333,7 +349,9 @@ tuple_field_raw(const struct tuple_format *format, const char *tuple,
return tuple;
}
- int32_t offset_slot = format->fields[field_no].offset_slot;
+ int32_t offset_slot =
+ tuple_format_field((struct tuple_format *)format,
+ field_no)->offset_slot;
if (offset_slot != TUPLE_OFFSET_SLOT_NIL) {
if (field_map[offset_slot] != 0)
return tuple + field_map[offset_slot];
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index d838404..3e60fec 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -411,7 +411,7 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
uint32_t *field_map = (uint32_t *) raw;
char *wpos = mp_encode_array(raw, field_count);
for (uint32_t i = 0; i < field_count; ++i) {
- const struct tuple_field *field = &format->fields[i];
+ const struct tuple_field *field = tuple_format_field(format, i);
if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
field_map[field->offset_slot] = wpos - raw;
if (iov[i].iov_base == NULL) {
@@ -465,7 +465,7 @@ vy_stmt_new_surrogate_delete_raw(struct tuple_format *format,
}
char *pos = mp_encode_array(data, field_count);
for (uint32_t i = 0; i < field_count; ++i) {
- const struct tuple_field *field = &format->fields[i];
+ const struct tuple_field *field = tuple_format_field(format, i);
if (! field->is_key_part) {
/* Unindexed field - write NIL. */
assert(i < src_count);
diff --git a/test-run b/test-run
index 670f330..b8764e1 160000
--- a/test-run
+++ b/test-run
@@ -1 +1 @@
-Subproject commit 670f330aacaf44bc8b1f969fa0cd5f811c5ceb1b
+Subproject commit b8764e17ccc79a26d1e661a0aaeaad90bd0aa1ea
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 07/12] lib: introduce json_path_normalize routine
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
` (8 preceding siblings ...)
2018-10-29 6:56 ` [PATCH v5 06/12] box: manage format fields with JSON tree class Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
2018-11-01 15:22 ` [tarantool-patches] " Konstantin Osipov
2018-11-20 15:14 ` Vladimir Davydov
2018-10-29 6:56 ` [PATCH v5 08/12] box: introduce JSON indexes Kirill Shcherbatov
2018-10-29 6:56 ` [tarantool-patches] [PATCH v5 09/12] box: introduce has_json_paths flag in templates Kirill Shcherbatov
11 siblings, 2 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
Introduced a new routine json_path_normalize that makes a
conversion of JSON path to the 'canonical' form:
- all maps keys are specified with operator ["key"] form
- all array indexes are specified with operator [i] form.
This notation is preferable because in the general case it can
be uniquely parsed.
We need such API in JSON indexes patch to store all paths in
'canonical' form to commit the path uniqueness checks and
to tune access with JSON path hashtable.
Need for #1012
---
src/lib/json/path.c | 25 +++++++++++++++++++++++++
src/lib/json/path.h | 18 ++++++++++++++++++
test/unit/json_path.c | 41 ++++++++++++++++++++++++++++++++++++++++-
test/unit/json_path.result | 14 +++++++++++++-
4 files changed, 96 insertions(+), 2 deletions(-)
diff --git a/src/lib/json/path.c b/src/lib/json/path.c
index 2e72930..0eb5d49 100644
--- a/src/lib/json/path.c
+++ b/src/lib/json/path.c
@@ -242,3 +242,28 @@ json_path_next(struct json_path_parser *parser, struct json_path_node *node)
return json_parse_identifier(parser, node);
}
}
+
+int
+json_path_normalize(const char *path, uint32_t path_len, char *out)
+{
+ struct json_path_parser parser;
+ struct json_path_node node;
+ json_path_parser_create(&parser, path, path_len);
+ int rc;
+ while ((rc = json_path_next(&parser, &node)) == 0 &&
+ node.type != JSON_PATH_END) {
+ if (node.type == JSON_PATH_NUM) {
+ out += sprintf(out, "[%llu]",
+ (unsigned long long)node.num);
+ } else if (node.type == JSON_PATH_STR) {
+ out += sprintf(out, "[\"%.*s\"]", node.len, node.str);
+ } else {
+ unreachable();
+ }
+ };
+ if (rc != 0)
+ return rc;
+ *out = '\0';
+ assert(node.type == JSON_PATH_END);
+ return 0;
+}
diff --git a/src/lib/json/path.h b/src/lib/json/path.h
index c3c381a..f6b2ee2 100644
--- a/src/lib/json/path.h
+++ b/src/lib/json/path.h
@@ -105,6 +105,24 @@ json_path_parser_create(struct json_path_parser *parser, const char *src,
int
json_path_next(struct json_path_parser *parser, struct json_path_node *node);
+/**
+ * Convert path to the 'canonical' form:
+ * - all maps keys are specified with operator ["key"] form
+ * - all array indexes are specified with operator [i] form.
+ * This notation is preferable because in the general case it can
+ * be uniquely parsed.
+ * @param path Source path string to be converted.
+ * @param path_len The length of the @path.
+ * @param[out] out Memory to store normalized string.
+ * The worst-case scenario require
+ * 2.5 * path_len + 1 buffer.
+ * @retval 0 On success.
+ * @retval > 0 Position of a syntax error. A position is 1-based
+ * and starts from a beginning of a source string.
+ */
+int
+json_path_normalize(const char *path, uint32_t path_len, char *out);
+
#ifdef __cplusplus
}
#endif
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index 75ca11b..583101e 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -367,15 +367,54 @@ test_tree()
footer();
}
+void
+test_normalize_path()
+{
+ header();
+ plan(8);
+
+ const char *path_normalized = "[\"FIO\"][3][\"fname\"]";
+ const char *path1 = "FIO[3].fname";
+ const char *path2 = "[\"FIO\"][3].fname";
+ const char *path3 = "FIO[3][\"fname\"]";
+ char buff[strlen(path_normalized) + 1];
+ int rc;
+
+ rc = json_path_normalize(path_normalized, strlen(path_normalized),
+ buff);
+ is(rc, 0, "normalize '%s' path status", path_normalized);
+ is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
+ path_normalized);
+
+ rc = json_path_normalize(path1, strlen(path1), buff);
+ is(rc, 0, "normalize '%s' path status", path1);
+ is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
+ path1);
+
+ rc = json_path_normalize(path2, strlen(path2), buff);
+ is(rc, 0, "normalize '%s' path status", path2);
+ is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
+ path2);
+
+ rc = json_path_normalize(path3, strlen(path3), buff);
+ is(rc, 0, "normalize '%s' path status", path3);
+ is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
+ path3);
+
+ check_plan();
+ footer();
+}
+
int
main()
{
header();
- plan(3);
+ plan(4);
test_basic();
test_errors();
test_tree();
+ test_normalize_path();
int rc = check_plan();
footer();
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index 5b44fd2..1331f71 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,5 +1,5 @@
*** main ***
-1..3
+1..4
*** test_basic ***
1..71
ok 1 - parse <[0]>
@@ -139,4 +139,16 @@ ok 2 - subtests
ok 36 - records iterated count 4 of 4
ok 3 - subtests
*** test_tree: done ***
+ *** test_normalize_path ***
+ 1..8
+ ok 1 - normalize '["FIO"][3]["fname"]' path status
+ ok 2 - normalize '["FIO"][3]["fname"]' path compare
+ ok 3 - normalize 'FIO[3].fname' path status
+ ok 4 - normalize 'FIO[3].fname' path compare
+ ok 5 - normalize '["FIO"][3].fname' path status
+ ok 6 - normalize '["FIO"][3].fname' path compare
+ ok 7 - normalize 'FIO[3]["fname"]' path status
+ ok 8 - normalize 'FIO[3]["fname"]' path compare
+ok 4 - subtests
+ *** test_normalize_path: done ***
*** main: done ***
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v5 08/12] box: introduce JSON indexes
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
` (9 preceding siblings ...)
2018-10-29 6:56 ` [PATCH v5 07/12] lib: introduce json_path_normalize routine Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
2018-11-20 16:52 ` Vladimir Davydov
2018-10-29 6:56 ` [tarantool-patches] [PATCH v5 09/12] box: introduce has_json_paths flag in templates Kirill Shcherbatov
11 siblings, 1 reply; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
New JSON-path-based indexes allows to index documents content.
As we need to store user-defined JSON path in key_part
and key_part_def, we have introduced path and path_len
fields. JSON path is verified and transformed to canonical
form on index msgpack unpack.
Path string stored as a part of the key_def allocation:
+-------+---------+-------+---------+-------+-------+-------+
|key_def|key_part1| ... |key_partN| path1 | pathK | pathN |
+-------+---------+-------+---------+-------+-------+-------+
| ^
|-> path _________________|
With format creation JSON paths are stored at the end of format
allocation:
+------------+------------+-------+------------+-------+
|tuple_format|tuple_field1| ... |tuple_fieldN| pathK |
+------------+------------+-------+------------+-------+
Part of #1012
---
src/box/errcode.h | 2 +-
src/box/index_def.c | 6 +-
src/box/key_def.c | 190 ++++++++++++++++++---
src/box/key_def.h | 31 +++-
src/box/lua/space.cc | 5 +
src/box/memtx_engine.c | 2 +
src/box/sql.c | 1 +
src/box/sql/build.c | 1 +
src/box/sql/select.c | 6 +-
src/box/sql/where.c | 1 +
src/box/tuple.c | 38 +----
src/box/tuple_compare.cc | 13 +-
| 21 ++-
src/box/tuple_format.c | 397 +++++++++++++++++++++++++++++++++++++------
src/box/tuple_format.h | 38 ++++-
src/box/tuple_hash.cc | 2 +-
src/box/vinyl.c | 2 +
src/box/vy_log.c | 3 +-
src/box/vy_point_lookup.c | 2 -
src/box/vy_stmt.c | 166 ++++++++++++++----
test/box/misc.result | 1 +
test/engine/tuple.result | 377 ++++++++++++++++++++++++++++++++++++++++
test/engine/tuple.test.lua | 107 ++++++++++++
23 files changed, 1251 insertions(+), 161 deletions(-)
diff --git a/src/box/errcode.h b/src/box/errcode.h
index 04f4f34..6dab6d5 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -138,7 +138,7 @@ struct errcode_record {
/* 83 */_(ER_ROLE_EXISTS, "Role '%s' already exists") \
/* 84 */_(ER_CREATE_ROLE, "Failed to create role '%s': %s") \
/* 85 */_(ER_INDEX_EXISTS, "Index '%s' already exists") \
- /* 86 */_(ER_UNUSED6, "") \
+ /* 86 */_(ER_DATA_STRUCTURE_MISMATCH, "Tuple doesn't math document structure: %s") \
/* 87 */_(ER_ROLE_LOOP, "Granting role '%s' to role '%s' would create a loop") \
/* 88 */_(ER_GRANT, "Incorrect grant arguments: %s") \
/* 89 */_(ER_PRIV_GRANTED, "User '%s' already has %s access on %s '%s'") \
diff --git a/src/box/index_def.c b/src/box/index_def.c
index 45c74d9..6aaeeeb 100644
--- a/src/box/index_def.c
+++ b/src/box/index_def.c
@@ -298,8 +298,10 @@ index_def_is_valid(struct index_def *index_def, const char *space_name)
* Courtesy to a user who could have made
* a typo.
*/
- if (index_def->key_def->parts[i].fieldno ==
- index_def->key_def->parts[j].fieldno) {
+ struct key_part *part_a = &index_def->key_def->parts[i];
+ struct key_part *part_b = &index_def->key_def->parts[j];
+ if (part_a->fieldno == part_b->fieldno &&
+ key_part_path_cmp(part_a, part_b) == 0) {
diag_set(ClientError, ER_MODIFY_INDEX,
index_def->name, space_name,
"same key part is indexed twice");
diff --git a/src/box/key_def.c b/src/box/key_def.c
index 2119ca3..0043f4e 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -28,6 +28,8 @@
* THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
+#include "fiber.h"
+#include "json/path.h"
#include "key_def.h"
#include "tuple_compare.h"
#include "tuple_extract_key.h"
@@ -44,7 +46,8 @@ const struct key_part_def key_part_def_default = {
COLL_NONE,
false,
ON_CONFLICT_ACTION_DEFAULT,
- SORT_ORDER_ASC
+ SORT_ORDER_ASC,
+ NULL
};
static int64_t
@@ -59,6 +62,7 @@ part_type_by_name_wrapper(const char *str, uint32_t len)
#define PART_OPT_NULLABILITY "is_nullable"
#define PART_OPT_NULLABLE_ACTION "nullable_action"
#define PART_OPT_SORT_ORDER "sort_order"
+#define PART_OPT_PATH "path"
const struct opt_def part_def_reg[] = {
OPT_DEF_ENUM(PART_OPT_TYPE, field_type, struct key_part_def, type,
@@ -71,6 +75,7 @@ const struct opt_def part_def_reg[] = {
struct key_part_def, nullable_action, NULL),
OPT_DEF_ENUM(PART_OPT_SORT_ORDER, sort_order, struct key_part_def,
sort_order, NULL),
+ OPT_DEF(PART_OPT_PATH, OPT_STRPTR, struct key_part_def, path),
OPT_END,
};
@@ -106,13 +111,25 @@ const uint32_t key_mp_type[] = {
struct key_def *
key_def_dup(const struct key_def *src)
{
- size_t sz = key_def_sizeof(src->part_count);
- struct key_def *res = (struct key_def *)malloc(sz);
+ const struct key_part *parts = src->parts;
+ const struct key_part *parts_end = parts + src->part_count;
+ size_t sz = 0;
+ for (; parts < parts_end; parts++)
+ sz += parts->path != NULL ? parts->path_len + 1 : 0;
+ sz = key_def_sizeof(src->part_count, sz);
+ struct key_def *res = (struct key_def *)calloc(1, sz);
if (res == NULL) {
diag_set(OutOfMemory, sz, "malloc", "res");
return NULL;
}
memcpy(res, src, sz);
+ /* Update paths to point to the new memory chunk.*/
+ for (uint32_t i = 0; i < src->part_count; i++) {
+ if (src->parts[i].path == NULL)
+ continue;
+ size_t path_offset = src->parts[i].path - (char *)src;
+ res->parts[i].path = (char *)res + path_offset;
+ }
return res;
}
@@ -120,8 +137,23 @@ void
key_def_swap(struct key_def *old_def, struct key_def *new_def)
{
assert(old_def->part_count == new_def->part_count);
- for (uint32_t i = 0; i < new_def->part_count; i++)
- SWAP(old_def->parts[i], new_def->parts[i]);
+ for (uint32_t i = 0; i < new_def->part_count; i++) {
+ if (old_def->parts[i].path == NULL) {
+ SWAP(old_def->parts[i], new_def->parts[i]);
+ } else {
+ /*
+ * Since the data is located in memory
+ * in the same order (otherwise rebuild
+ * would be called), just update the
+ * pointers.
+ */
+ size_t path_offset =
+ old_def->parts[i].path - (char *)old_def;
+ SWAP(old_def->parts[i], new_def->parts[i]);
+ old_def->parts[i].path = (char *)old_def + path_offset;
+ new_def->parts[i].path = (char *)new_def + path_offset;
+ }
+ }
SWAP(*old_def, *new_def);
}
@@ -144,24 +176,38 @@ static void
key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
enum field_type type, enum on_conflict_action nullable_action,
struct coll *coll, uint32_t coll_id,
- enum sort_order sort_order)
+ enum sort_order sort_order, const char *path,
+ uint32_t path_len)
{
assert(part_no < def->part_count);
assert(type < field_type_MAX);
def->is_nullable |= (nullable_action == ON_CONFLICT_ACTION_NONE);
+ def->has_json_paths |= path != NULL;
def->parts[part_no].nullable_action = nullable_action;
def->parts[part_no].fieldno = fieldno;
def->parts[part_no].type = type;
def->parts[part_no].coll = coll;
def->parts[part_no].coll_id = coll_id;
def->parts[part_no].sort_order = sort_order;
+ if (path != NULL) {
+ def->parts[part_no].path_len = path_len;
+ assert(def->parts[part_no].path != NULL);
+ memcpy(def->parts[part_no].path, path, path_len);
+ def->parts[part_no].path[path_len] = '\0';
+ } else {
+ def->parts[part_no].path_len = 0;
+ def->parts[part_no].path = NULL;
+ }
column_mask_set_fieldno(&def->column_mask, fieldno);
}
struct key_def *
key_def_new(const struct key_part_def *parts, uint32_t part_count)
{
- size_t sz = key_def_sizeof(part_count);
+ ssize_t sz = 0;
+ for (uint32_t i = 0; i < part_count; i++)
+ sz += parts[i].path != NULL ? strlen(parts[i].path) + 1 : 0;
+ sz = key_def_sizeof(part_count, sz);
struct key_def *def = calloc(1, sz);
if (def == NULL) {
diag_set(OutOfMemory, sz, "malloc", "struct key_def");
@@ -171,6 +217,7 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
def->part_count = part_count;
def->unique_part_count = part_count;
+ char *data = (char *)def + key_def_sizeof(part_count, 0);
for (uint32_t i = 0; i < part_count; i++) {
const struct key_part_def *part = &parts[i];
struct coll *coll = NULL;
@@ -184,16 +231,23 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
}
coll = coll_id->coll;
}
+ uint32_t path_len = 0;
+ if (part->path != NULL) {
+ path_len = strlen(part->path);
+ def->parts[i].path = data;
+ data += path_len + 1;
+ }
key_def_set_part(def, i, part->fieldno, part->type,
part->nullable_action, coll, part->coll_id,
- part->sort_order);
+ part->sort_order, part->path, path_len);
}
key_def_set_cmp(def);
return def;
}
-void
-key_def_dump_parts(const struct key_def *def, struct key_part_def *parts)
+int
+key_def_dump_parts(struct region *pool, const struct key_def *def,
+ struct key_part_def *parts)
{
for (uint32_t i = 0; i < def->part_count; i++) {
const struct key_part *part = &def->parts[i];
@@ -203,13 +257,27 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts)
part_def->is_nullable = key_part_is_nullable(part);
part_def->nullable_action = part->nullable_action;
part_def->coll_id = part->coll_id;
+ if (part->path != NULL) {
+ char *path = region_alloc(pool, part->path_len + 1);
+ if (path == NULL) {
+ diag_set(OutOfMemory, part->path_len + 1,
+ "region_alloc", "part_def->path");
+ return -1;
+ }
+ memcpy(path, part->path, part->path_len);
+ path[part->path_len] = '\0';
+ part_def->path = path;
+ } else {
+ part_def->path = NULL;
+}
}
+ return 0;
}
box_key_def_t *
box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
{
- size_t sz = key_def_sizeof(part_count);
+ size_t sz = key_def_sizeof(part_count, 0);
struct key_def *key_def = calloc(1, sz);
if (key_def == NULL) {
diag_set(OutOfMemory, sz, "malloc", "struct key_def");
@@ -223,7 +291,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
key_def_set_part(key_def, item, fields[item],
(enum field_type)types[item],
ON_CONFLICT_ACTION_DEFAULT,
- NULL, COLL_NONE, SORT_ORDER_ASC);
+ NULL, COLL_NONE, SORT_ORDER_ASC, NULL, 0);
}
key_def_set_cmp(key_def);
return key_def;
@@ -272,6 +340,9 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1,
if (key_part_is_nullable(part1) != key_part_is_nullable(part2))
return key_part_is_nullable(part1) <
key_part_is_nullable(part2) ? -1 : 1;
+ int rc;
+ if ((rc = key_part_path_cmp(part1, part2)) != 0)
+ return rc;
}
return part_count1 < part_count2 ? -1 : part_count1 > part_count2;
}
@@ -303,8 +374,15 @@ key_def_snprint_parts(char *buf, int size, const struct key_part_def *parts,
for (uint32_t i = 0; i < part_count; i++) {
const struct key_part_def *part = &parts[i];
assert(part->type < field_type_MAX);
- SNPRINT(total, snprintf, buf, size, "%d, '%s'",
- (int)part->fieldno, field_type_strs[part->type]);
+ if (part->path != NULL) {
+ SNPRINT(total, snprintf, buf, size, "%d, '%s', '%s'",
+ (int)part->fieldno, part->path,
+ field_type_strs[part->type]);
+ } else {
+ SNPRINT(total, snprintf, buf, size, "%d, '%s'",
+ (int)part->fieldno,
+ field_type_strs[part->type]);
+ }
if (i < part_count - 1)
SNPRINT(total, snprintf, buf, size, ", ");
}
@@ -323,6 +401,8 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
count++;
if (part->is_nullable)
count++;
+ if (part->path != NULL)
+ count++;
size += mp_sizeof_map(count);
size += mp_sizeof_str(strlen(PART_OPT_FIELD));
size += mp_sizeof_uint(part->fieldno);
@@ -337,6 +417,10 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
size += mp_sizeof_str(strlen(PART_OPT_NULLABILITY));
size += mp_sizeof_bool(part->is_nullable);
}
+ if (part->path != NULL) {
+ size += mp_sizeof_str(strlen(PART_OPT_PATH));
+ size += mp_sizeof_str(strlen(part->path));
+ }
}
return size;
}
@@ -352,6 +436,8 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
count++;
if (part->is_nullable)
count++;
+ if (part->path != NULL)
+ count++;
data = mp_encode_map(data, count);
data = mp_encode_str(data, PART_OPT_FIELD,
strlen(PART_OPT_FIELD));
@@ -371,6 +457,12 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
strlen(PART_OPT_NULLABILITY));
data = mp_encode_bool(data, part->is_nullable);
}
+ if (part->path != NULL) {
+ data = mp_encode_str(data, PART_OPT_PATH,
+ strlen(PART_OPT_PATH));
+ data = mp_encode_str(data, part->path,
+ strlen(part->path));
+ }
}
return data;
}
@@ -432,6 +524,7 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count,
fields[part->fieldno].is_nullable :
key_part_def_default.is_nullable);
part->coll_id = COLL_NONE;
+ part->path = NULL;
}
return 0;
}
@@ -445,6 +538,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
return key_def_decode_parts_166(parts, part_count, data,
fields, field_count);
}
+ struct region *region = &fiber()->gc;
for (uint32_t i = 0; i < part_count; i++) {
struct key_part_def *part = &parts[i];
if (mp_typeof(**data) != MP_MAP) {
@@ -468,7 +562,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
const char *key = mp_decode_str(data, &key_len);
if (opts_parse_key(part, part_def_reg, key, key_len, data,
ER_WRONG_INDEX_OPTIONS,
- i + TUPLE_INDEX_BASE, NULL,
+ i + TUPLE_INDEX_BASE, region,
false) != 0)
return -1;
if (is_action_missing &&
@@ -514,6 +608,34 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
"index part: unknown sort order");
return -1;
}
+ /*
+ * We should convert JSON path to canonical form
+ * to prevent indexing same field twice.
+ */
+ if (part->path != NULL) {
+ uint32_t path_len = strlen(part->path);
+ char *path_normalized =
+ region_alloc(region, 2.5 * path_len + 1);
+ if (path_normalized == NULL) {
+ diag_set(OutOfMemory, 2.5 * path_len + 1,
+ "region_alloc", "path");
+ return -1;
+ }
+ int rc = json_path_normalize(part->path, path_len,
+ path_normalized);
+ if (rc != 0) {
+ const char *err_msg =
+ tt_sprintf("invalid JSON path '%s': "
+ "path has invalid structure "
+ "(error at position %d)",
+ part->path, rc);
+ diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
+ part->fieldno + TUPLE_INDEX_BASE,
+ err_msg);
+ return -1;
+ }
+ part->path = path_normalized;
+ }
}
return 0;
}
@@ -533,7 +655,8 @@ key_def_find(const struct key_def *key_def, const struct key_part *to_find)
const struct key_part *part = key_def->parts;
const struct key_part *end = part + key_def->part_count;
for (; part != end; part++) {
- if (part->fieldno == to_find->fieldno)
+ if (part->fieldno == to_find->fieldno &&
+ key_part_path_cmp(part, to_find) == 0)
return part;
}
return NULL;
@@ -559,18 +682,27 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
* Find and remove part duplicates, i.e. parts counted
* twice since they are present in both key defs.
*/
- const struct key_part *part = second->parts;
- const struct key_part *end = part + second->part_count;
+ size_t sz = 0;
+ const struct key_part *part = first->parts;
+ const struct key_part *end = part + first->part_count;
+ for (; part != end; part++) {
+ if (part->path != NULL)
+ sz += part->path_len + 1;
+ }
+ part = second->parts;
+ end = part + second->part_count;
for (; part != end; part++) {
if (key_def_find(first, part) != NULL)
--new_part_count;
+ else if (part->path != NULL)
+ sz += part->path_len + 1;
}
+ sz = key_def_sizeof(new_part_count, sz);
struct key_def *new_def;
- new_def = (struct key_def *)calloc(1, key_def_sizeof(new_part_count));
+ new_def = (struct key_def *)calloc(1, sz);
if (new_def == NULL) {
- diag_set(OutOfMemory, key_def_sizeof(new_part_count), "malloc",
- "new_def");
+ diag_set(OutOfMemory, sz, "malloc", "new_def");
return NULL;
}
new_def->part_count = new_part_count;
@@ -578,15 +710,22 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
new_def->is_nullable = first->is_nullable || second->is_nullable;
new_def->has_optional_parts = first->has_optional_parts ||
second->has_optional_parts;
+ /* Path data write position in the new key_def. */
+ char *data = (char *)new_def + key_def_sizeof(new_part_count, 0);
/* Write position in the new key def. */
uint32_t pos = 0;
/* Append first key def's parts to the new index_def. */
part = first->parts;
end = part + first->part_count;
for (; part != end; part++) {
+ if (part->path != NULL) {
+ new_def->parts[pos].path = data;
+ data += part->path_len + 1;
+ }
key_def_set_part(new_def, pos++, part->fieldno, part->type,
part->nullable_action, part->coll,
- part->coll_id, part->sort_order);
+ part->coll_id, part->sort_order, part->path,
+ part->path_len);
}
/* Set-append second key def's part to the new key def. */
@@ -595,9 +734,14 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
for (; part != end; part++) {
if (key_def_find(first, part) != NULL)
continue;
+ if (part->path != NULL) {
+ new_def->parts[pos].path = data;
+ data += part->path_len + 1;
+ }
key_def_set_part(new_def, pos++, part->fieldno, part->type,
part->nullable_action, part->coll,
- part->coll_id, part->sort_order);
+ part->coll_id, part->sort_order, part->path,
+ part->path_len);
}
key_def_set_cmp(new_def);
return new_def;
diff --git a/src/box/key_def.h b/src/box/key_def.h
index cfed3f1..5c0dfe3 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -68,6 +68,8 @@ struct key_part_def {
enum on_conflict_action nullable_action;
/** Part sort order. */
enum sort_order sort_order;
+ /** JSON path to data. */
+ const char *path;
};
extern const struct key_part_def key_part_def_default;
@@ -92,6 +94,13 @@ struct key_part {
enum on_conflict_action nullable_action;
/** Part sort order. */
enum sort_order sort_order;
+ /**
+ * JSON path to data in 'canonical' form.
+ * Read json_path_normalize to get more details.
+ */
+ char *path;
+ /** The length of JSON path. */
+ uint32_t path_len;
};
struct key_def;
@@ -158,6 +167,8 @@ struct key_def {
uint32_t unique_part_count;
/** True, if at least one part can store NULL. */
bool is_nullable;
+ /** True, if some key part has JSON path. */
+ bool has_json_paths;
/**
* True, if some key parts can be absent in a tuple. These
* fields assumed to be MP_NIL.
@@ -251,9 +262,10 @@ box_tuple_compare_with_key(const box_tuple_t *tuple_a, const char *key_b,
/** \endcond public */
static inline size_t
-key_def_sizeof(uint32_t part_count)
+key_def_sizeof(uint32_t part_count, uint32_t paths_size)
{
- return sizeof(struct key_def) + sizeof(struct key_part) * part_count;
+ return sizeof(struct key_def) + sizeof(struct key_part) * part_count +
+ paths_size;
}
/**
@@ -266,8 +278,9 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count);
/**
* Dump part definitions of the given key def.
*/
-void
-key_def_dump_parts(const struct key_def *def, struct key_part_def *parts);
+int
+key_def_dump_parts(struct region *pool, const struct key_def *def,
+ struct key_part_def *parts);
/**
* Update 'has_optional_parts' of @a key_def with correspondence
@@ -374,6 +387,8 @@ key_validate_parts(const struct key_def *key_def, const char *key,
static inline bool
key_def_is_sequential(const struct key_def *key_def)
{
+ if (key_def->has_json_paths)
+ return false;
for (uint32_t part_id = 0; part_id < key_def->part_count; part_id++) {
if (key_def->parts[part_id].fieldno != part_id)
return false;
@@ -424,6 +439,14 @@ key_mp_type_validate(enum field_type key_type, enum mp_type mp_type,
return 0;
}
+static inline int
+key_part_path_cmp(const struct key_part *part1, const struct key_part *part2)
+{
+ if (part1->path_len != part2->path_len)
+ return part1->path_len - part2->path_len;
+ return memcmp(part1->path, part2->path, part1->path_len);
+}
+
/**
* Compare two key part arrays.
*
diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index c75ba47..6d67c03 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -296,6 +296,11 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
lua_pushnumber(L, part->fieldno + TUPLE_INDEX_BASE);
lua_setfield(L, -2, "fieldno");
+ if (part->path != NULL) {
+ lua_pushstring(L, part->path);
+ lua_setfield(L, -2, "path");
+ }
+
lua_pushboolean(L, key_part_is_nullable(part));
lua_setfield(L, -2, "is_nullable");
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 6a91d0b..c572751 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -1317,6 +1317,8 @@ memtx_index_def_change_requires_rebuild(struct index *index,
return true;
if (old_part->coll != new_part->coll)
return true;
+ if (key_part_path_cmp(old_part, new_part) != 0)
+ return true;
}
return false;
}
diff --git a/src/box/sql.c b/src/box/sql.c
index 3d08936..fdbb071 100644
--- a/src/box/sql.c
+++ b/src/box/sql.c
@@ -379,6 +379,7 @@ sql_ephemeral_space_create(uint32_t field_count, struct sql_key_info *key_info)
part->nullable_action = ON_CONFLICT_ACTION_NONE;
part->is_nullable = true;
part->sort_order = SORT_ORDER_ASC;
+ part->path = NULL;
if (def != NULL && i < def->part_count)
part->coll_id = def->parts[i].coll_id;
else
diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index 83e89bb..5802add 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -2336,6 +2336,7 @@ index_fill_def(struct Parse *parse, struct index *index,
part->is_nullable = part->nullable_action == ON_CONFLICT_ACTION_NONE;
part->sort_order = SORT_ORDER_ASC;
part->coll_id = coll_id;
+ part->path = NULL;
}
key_def = key_def_new(key_parts, expr_list->nExpr);
if (key_def == NULL)
diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index 77e0c5d..65959da 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -1348,6 +1348,7 @@ sql_key_info_new(sqlite3 *db, uint32_t part_count)
part->is_nullable = false;
part->nullable_action = ON_CONFLICT_ACTION_ABORT;
part->sort_order = SORT_ORDER_ASC;
+ part->path = NULL;
}
return key_info;
}
@@ -1355,6 +1356,9 @@ sql_key_info_new(sqlite3 *db, uint32_t part_count)
struct sql_key_info *
sql_key_info_new_from_key_def(sqlite3 *db, const struct key_def *key_def)
{
+ /** SQL key_parts could not have JSON paths. */
+ for (uint32_t i = 0; i < key_def->part_count; i++)
+ assert(key_def->parts[i].path == NULL);
struct sql_key_info *key_info = sqlite3DbMallocRawNN(db,
sql_key_info_sizeof(key_def->part_count));
if (key_info == NULL) {
@@ -1365,7 +1369,7 @@ sql_key_info_new_from_key_def(sqlite3 *db, const struct key_def *key_def)
key_info->key_def = NULL;
key_info->refs = 1;
key_info->part_count = key_def->part_count;
- key_def_dump_parts(key_def, key_info->parts);
+ key_def_dump_parts(&fiber()->gc, key_def, key_info->parts);
return key_info;
}
diff --git a/src/box/sql/where.c b/src/box/sql/where.c
index 713caba..a01392a 100644
--- a/src/box/sql/where.c
+++ b/src/box/sql/where.c
@@ -2808,6 +2808,7 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder, /* WHERE clause information */
part.is_nullable = false;
part.sort_order = SORT_ORDER_ASC;
part.coll_id = COLL_NONE;
+ part.path = NULL;
struct key_def *key_def = key_def_new(&part, 1);
if (key_def == NULL) {
diff --git a/src/box/tuple.c b/src/box/tuple.c
index aae1c3c..62e06e7 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -138,38 +138,18 @@ runtime_tuple_delete(struct tuple_format *format, struct tuple *tuple)
int
tuple_validate_raw(struct tuple_format *format, const char *tuple)
{
- if (tuple_format_field_count(format) == 0)
- return 0; /* Nothing to check */
-
- /* Check to see if the tuple has a sufficient number of fields. */
- uint32_t field_count = mp_decode_array(&tuple);
- if (format->exact_field_count > 0 &&
- format->exact_field_count != field_count) {
- diag_set(ClientError, ER_EXACT_FIELD_COUNT,
- (unsigned) field_count,
- (unsigned) format->exact_field_count);
+ struct region *region = &fiber()->gc;
+ uint32_t used = region_used(region);
+ uint32_t *field_map = region_alloc(region, format->field_map_size);
+ if (field_map == NULL) {
+ diag_set(OutOfMemory, format->field_map_size, "region_alloc",
+ "field_map");
return -1;
}
- if (unlikely(field_count < format->min_field_count)) {
- diag_set(ClientError, ER_MIN_FIELD_COUNT,
- (unsigned) field_count,
- (unsigned) format->min_field_count);
+ field_map = (uint32_t *)((char *)field_map + format->field_map_size);
+ if (tuple_init_field_map(format, field_map, tuple, true) != 0)
return -1;
- }
-
- /* Check field types */
- struct tuple_field *field = tuple_format_field(format, 0);
- uint32_t i = 0;
- uint32_t defined_field_count =
- MIN(field_count, tuple_format_field_count(format));
- for (; i < defined_field_count; ++i) {
- field = tuple_format_field(format, i);
- if (key_mp_type_validate(field->type, mp_typeof(*tuple),
- ER_FIELD_TYPE, i + TUPLE_INDEX_BASE,
- tuple_field_is_nullable(field)))
- return -1;
- mp_next(&tuple);
- }
+ region_truncate(region, used);
return 0;
}
diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index e21b009..554c29f 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -469,7 +469,8 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
struct key_part *part = key_def->parts;
const char *tuple_a_raw = tuple_data(tuple_a);
const char *tuple_b_raw = tuple_data(tuple_b);
- if (key_def->part_count == 1 && part->fieldno == 0) {
+ if (key_def->part_count == 1 && part->fieldno == 0 &&
+ part->path == NULL) {
/*
* First field can not be optional - empty tuples
* can not exist.
@@ -493,8 +494,8 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
}
bool was_null_met = false;
- const struct tuple_format *format_a = tuple_format(tuple_a);
- const struct tuple_format *format_b = tuple_format(tuple_b);
+ struct tuple_format *format_a = tuple_format(tuple_a);
+ struct tuple_format *format_b = tuple_format(tuple_b);
const uint32_t *field_map_a = tuple_field_map(tuple_a);
const uint32_t *field_map_b = tuple_field_map(tuple_b);
struct key_part *end;
@@ -585,7 +586,7 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
assert(key != NULL || part_count == 0);
assert(part_count <= key_def->part_count);
struct key_part *part = key_def->parts;
- const struct tuple_format *format = tuple_format(tuple);
+ struct tuple_format *format = tuple_format(tuple);
const char *tuple_raw = tuple_data(tuple);
const uint32_t *field_map = tuple_field_map(tuple);
enum mp_type a_type, b_type;
@@ -1027,7 +1028,7 @@ tuple_compare_create(const struct key_def *def)
}
}
assert(! def->has_optional_parts);
- if (!key_def_has_collation(def)) {
+ if (!key_def_has_collation(def) && !def->has_json_paths) {
/* Precalculated comparators don't use collation */
for (uint32_t k = 0;
k < sizeof(cmp_arr) / sizeof(cmp_arr[0]); k++) {
@@ -1247,7 +1248,7 @@ tuple_compare_with_key_create(const struct key_def *def)
}
}
assert(! def->has_optional_parts);
- if (!key_def_has_collation(def)) {
+ if (!key_def_has_collation(def) && !def->has_json_paths) {
/* Precalculated comparators don't use collation */
for (uint32_t k = 0;
k < sizeof(cmp_wk_arr) / sizeof(cmp_wk_arr[0]);
--git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index e9d7cac..04c5463 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -10,7 +10,8 @@ key_def_parts_are_sequential(const struct key_def *def, int i)
{
uint32_t fieldno1 = def->parts[i].fieldno + 1;
uint32_t fieldno2 = def->parts[i + 1].fieldno;
- return fieldno1 == fieldno2;
+ return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
+ def->parts[i + 1].path == NULL;
}
/** True, if a key con contain two or more parts in sequence. */
@@ -111,7 +112,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
const char *data = tuple_data(tuple);
uint32_t part_count = key_def->part_count;
uint32_t bsize = mp_sizeof_array(part_count);
- const struct tuple_format *format = tuple_format(tuple);
+ struct tuple_format *format = tuple_format(tuple);
const uint32_t *field_map = tuple_field_map(tuple);
const char *tuple_end = data + tuple->bsize;
@@ -241,7 +242,8 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
if (!key_def_parts_are_sequential(key_def, i))
break;
}
- uint32_t end_fieldno = key_def->parts[i].fieldno;
+ const struct key_part *part = &key_def->parts[i];
+ uint32_t end_fieldno = part->fieldno;
if (fieldno < current_fieldno) {
/* Rewind. */
@@ -283,6 +285,15 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
current_fieldno++;
}
}
+ const char *field_last, *field_end_last;
+ if (part->path != NULL) {
+ field_last = field;
+ field_end_last = field_end;
+ (void)tuple_field_go_to_path(&field, part->path,
+ part->path_len);
+ field_end = field;
+ mp_next(&field_end);
+ }
memcpy(key_buf, field, field_end - field);
key_buf += field_end - field;
if (has_optional_parts && null_count != 0) {
@@ -291,6 +302,10 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
} else {
assert(key_buf - key <= data_end - data);
}
+ if (part->path != NULL) {
+ field = field_last;
+ field_end = field_end_last;
+ }
}
if (key_size != NULL)
*key_size = (uint32_t)(key_buf - key);
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 9ffb5e4..151d9e5 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -28,6 +28,7 @@
* THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
+#include "fiber.h"
#include "json/path.h"
#include "tuple_format.h"
#include "coll_id_cache.h"
@@ -62,14 +63,110 @@ tuple_field_destroy(struct tuple_field *field)
free(field);
}
+/** Build a JSON tree path for specified path. */
+static struct tuple_field *
+tuple_field_tree_add_path(struct tuple_format *format, const char *path,
+ uint32_t path_len, uint32_t fieldno)
+{
+ int rc = 0;
+ struct json_tree *tree = &format->field_tree;
+ struct tuple_field *field = tuple_format_field(format, fieldno);
+ enum field_type iterm_node_type = FIELD_TYPE_ANY;
+
+ struct json_path_parser parser;
+ struct json_path_node path_node;
+ bool is_last_new = false;
+ json_path_parser_create(&parser, path, path_len);
+ while ((rc = json_path_next(&parser, &path_node)) == 0 &&
+ path_node.type != JSON_PATH_END) {
+ iterm_node_type = path_node.type == JSON_PATH_STR ?
+ FIELD_TYPE_MAP : FIELD_TYPE_ARRAY;
+ if (field->type != FIELD_TYPE_ANY &&
+ field->type != iterm_node_type)
+ goto error_type_mistmatch;
+ uint32_t rolling_hash =
+ json_path_node_hash(&path_node,
+ field->tree_node.rolling_hash);
+ struct tuple_field *next_field =
+ json_tree_lookup_entry_by_path_node(tree,
+ &field->tree_node,
+ &path_node,
+ rolling_hash,
+ struct tuple_field,
+ tree_node);
+ if (next_field == NULL) {
+ next_field = tuple_field_create(&path_node);
+ rc = json_tree_add(tree, &field->tree_node,
+ &next_field->tree_node,
+ rolling_hash);
+ if (rc != 0) {
+ diag_set(OutOfMemory,
+ sizeof(struct json_tree_node),
+ "json_tree_add", "hashtable");
+ return NULL;
+ }
+ is_last_new = true;
+ } else {
+ is_last_new = false;
+ }
+ field->type = iterm_node_type;
+ field = next_field;
+ }
+ /*
+ * Key parts path is already checked and normalized,
+ * so we don't need to handle parse error.
+ */
+ assert(rc == 0 && path_node.type == JSON_PATH_END);
+ assert(field != NULL);
+ if (is_last_new) {
+ uint32_t depth = 1;
+ for (struct json_tree_node *iter = field->tree_node.parent;
+ iter != &format->field_tree.root;
+ iter = iter->parent, ++depth) {
+ struct tuple_field *record =
+ json_tree_entry(iter, struct tuple_field,
+ tree_node);
+ record->subtree_depth =
+ MAX(record->subtree_depth, depth);
+ }
+ }
+ return field;
+
+error_type_mistmatch: ;
+ const char *name = tt_sprintf("[%d]%.*s", fieldno, path_len, path);
+ diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, name,
+ field_type_strs[field->type],
+ field_type_strs[iterm_node_type]);
+ return NULL;
+}
+
static int
tuple_format_add_key_part(struct tuple_format *format,
const struct field_def *fields, uint32_t field_count,
const struct key_part *part, bool is_sequential,
- int *current_slot)
+ int *current_slot, char **path_data)
{
assert(part->fieldno < tuple_format_field_count(format));
struct tuple_field *field = tuple_format_field(format, part->fieldno);
+ if (unlikely(part->path != NULL)) {
+ assert(!is_sequential);
+ /**
+ * Copy JSON path data to reserved area at the
+ * end of format allocation.
+ */
+ memcpy(*path_data, part->path, part->path_len);
+ (*path_data)[part->path_len] = '\0';
+ struct tuple_field *root = field;
+ field = tuple_field_tree_add_path(format, *path_data,
+ part->path_len,
+ part->fieldno);
+ if (field == NULL)
+ return -1;
+ format->subtree_depth =
+ MAX(format->subtree_depth, root->subtree_depth + 1);
+ field->is_key_part = true;
+ *path_data += part->path_len + 1;
+ }
/*
* Field and part nullable actions may differ only
* if one of them is DEFAULT, in which case we use
@@ -109,7 +206,10 @@ tuple_format_add_key_part(struct tuple_format *format,
field->type)) {
const char *name;
int fieldno = part->fieldno + TUPLE_INDEX_BASE;
- if (part->fieldno >= field_count) {
+ if (unlikely(part->path != NULL)) {
+ name = tt_sprintf("[%d]%.*s", fieldno, part->path_len,
+ part->path);
+ } else if (part->fieldno >= field_count) {
name = tt_sprintf("%d", fieldno);
} else {
const struct field_def *def =
@@ -133,10 +233,9 @@ tuple_format_add_key_part(struct tuple_format *format,
* simply accessible, so we don't store an offset for it.
*/
if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
- is_sequential == false && part->fieldno > 0) {
- *current_slot = *current_slot - 1;
- field->offset_slot = *current_slot;
- }
+ is_sequential == false &&
+ (part->fieldno > 0 || part->path != NULL))
+ field->offset_slot = (*current_slot = *current_slot - 1);
return 0;
}
@@ -177,7 +276,7 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
}
int current_slot = 0;
-
+ char *paths_data = (char *)format + sizeof(struct tuple_format);
/* extract field type info */
for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
const struct key_def *key_def = keys[key_no];
@@ -189,7 +288,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
if (tuple_format_add_key_part(format, fields,
field_count, part,
is_sequential,
- ¤t_slot) != 0)
+ ¤t_slot,
+ &paths_data) != 0)
return -1;
}
}
@@ -257,6 +357,8 @@ static struct tuple_format *
tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
uint32_t space_field_count, struct tuple_dictionary *dict)
{
+ /* Size of area to store paths. */
+ uint32_t paths_size = 0;
uint32_t index_field_count = 0;
/* find max max field no */
for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
@@ -266,13 +368,16 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
for (; part < pend; part++) {
index_field_count = MAX(index_field_count,
part->fieldno + 1);
+ if (part->path != NULL)
+ paths_size += part->path_len + 1;
}
}
uint32_t field_count = MAX(space_field_count, index_field_count);
- struct tuple_format *format = malloc(sizeof(struct tuple_format));
+ uint32_t allocation_size = sizeof(struct tuple_format) + paths_size;
+ struct tuple_format *format = malloc(allocation_size);
if (format == NULL) {
- diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc",
+ diag_set(OutOfMemory, allocation_size, "malloc",
"tuple format");
return NULL;
}
@@ -280,6 +385,7 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
free(format);
return NULL;
}
+ format->subtree_depth = 1;
struct json_path_node path_node;
memset(&path_node, 0, sizeof(path_node));
path_node.type = JSON_PATH_NUM;
@@ -305,6 +411,7 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
format->dict = dict;
tuple_dictionary_ref(dict);
}
+ format->allocation_size = allocation_size;
format->refs = 0;
format->id = FORMAT_ID_NIL;
format->index_field_count = index_field_count;
@@ -412,6 +519,77 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
return true;
}
+/** Find a field in format by offset slot. */
+static struct tuple_field *
+tuple_field_by_offset_slot(const struct tuple_format *format,
+ int32_t offset_slot)
+{
+ struct tuple_field *field;
+ struct json_tree_node *root =
+ (struct json_tree_node *)&format->field_tree.root;
+ json_tree_foreach_entry_pre(field, root, struct tuple_field,
+ tree_node) {
+ if (field->offset_slot == offset_slot)
+ return field;
+ }
+ return NULL;
+}
+
+/**
+ * Verify field_map and raise error on some indexed field has
+ * not been initialized. Routine rely on field_map has been
+ * initialized with UINT32_MAX marker before field_map
+ * initialization.
+ */
+static int
+tuple_field_map_validate(const struct tuple_format *format, uint32_t *field_map)
+{
+ struct json_tree_node *tree_node =
+ (struct json_tree_node *)&format->field_tree.root;
+ /* Lookup for absent not-nullable fields. */
+ int32_t field_map_items =
+ (int32_t)(format->field_map_size/sizeof(field_map[0]));
+ for (int32_t i = -1; i >= -field_map_items; i--) {
+ if (field_map[i] != UINT32_MAX)
+ continue;
+
+ struct tuple_field *field =
+ tuple_field_by_offset_slot(format, i);
+ assert(field != NULL);
+ /* Lookup for field number in tree. */
+ struct json_tree_node *parent = &field->tree_node;
+ while (parent->parent != &format->field_tree.root)
+ parent = parent->parent;
+ assert(parent->key.type == JSON_PATH_NUM);
+ uint32_t fieldno = parent->key.num;
+
+ tree_node = &field->tree_node;
+ const char *err_msg;
+ if (field->tree_node.key.type == JSON_PATH_STR) {
+ err_msg = tt_sprintf("invalid field %d document "
+ "content: map doesn't contain a "
+ "key '%.*s' defined in index",
+ fieldno, tree_node->key.len,
+ tree_node->key.str);
+ } else if (field->tree_node.key.type == JSON_PATH_NUM) {
+ err_msg = tt_sprintf("invalid field %d document "
+ "content: array size %d is less "
+ "than size %d defined in index",
+ fieldno, tree_node->key.num,
+ tree_node->parent->children_count);
+ }
+ diag_set(ClientError, ER_DATA_STRUCTURE_MISMATCH, err_msg);
+ return -1;
+ }
+ return 0;
+}
+
+struct parse_ctx {
+ enum json_path_type child_type;
+ uint32_t items;
+ uint32_t curr;
+};
+
/** @sa declaration for details. */
int
tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
@@ -437,44 +615,128 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
(unsigned) format->min_field_count);
return -1;
}
-
- /* first field is simply accessible, so we do not store offset to it */
- enum mp_type mp_type = mp_typeof(*pos);
- const struct tuple_field *field =
- tuple_format_field((struct tuple_format *)format, 0);
- if (validate &&
- key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
- TUPLE_INDEX_BASE, tuple_field_is_nullable(field)))
- return -1;
- mp_next(&pos);
- /* other fields...*/
- uint32_t i = 1;
uint32_t defined_field_count = MIN(field_count, validate ?
tuple_format_field_count(format) :
format->index_field_count);
- if (field_count < format->index_field_count) {
- /*
- * Nullify field map to be able to detect by 0,
- * which key fields are absent in tuple_field().
- */
- memset((char *)field_map - format->field_map_size, 0,
- format->field_map_size);
- }
- for (; i < defined_field_count; ++i) {
- field = tuple_format_field((struct tuple_format *)format, i);
- mp_type = mp_typeof(*pos);
- if (validate &&
- key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
- i + TUPLE_INDEX_BASE,
- tuple_field_is_nullable(field)))
- return -1;
- if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
- field_map[field->offset_slot] =
- (uint32_t) (pos - tuple);
- }
- mp_next(&pos);
+ /*
+ * Fill field_map with marker for toutine
+ * tuple_field_map_validate to detect absent fields.
+ */
+ memset((char *)field_map - format->field_map_size,
+ validate ? UINT32_MAX : 0, format->field_map_size);
+
+ struct region *region = &fiber()->gc;
+ uint32_t mp_stack_items = format->subtree_depth + 1;
+ uint32_t mp_stack_size = mp_stack_items * sizeof(struct parse_ctx);
+ struct parse_ctx *mp_stack = region_alloc(region, mp_stack_size);
+ if (unlikely(mp_stack == NULL)) {
+ diag_set(OutOfMemory, mp_stack_size, "region_alloc",
+ "mp_stack");
+ return -1;
}
- return 0;
+ mp_stack[0] = (struct parse_ctx){
+ .child_type = JSON_PATH_NUM,
+ .items = defined_field_count,
+ .curr = 0,
+ };
+ uint32_t mp_stack_idx = 0;
+ struct json_tree *tree = (struct json_tree *)&format->field_tree;
+ struct json_tree_node *parent = &tree->root;
+ while (mp_stack[0].curr <= mp_stack[0].items) {
+ /* Prepare key for tree lookup. */
+ struct json_path_node node;
+ node.type = mp_stack[mp_stack_idx].child_type;
+ ++mp_stack[mp_stack_idx].curr;
+ if (node.type == JSON_PATH_NUM) {
+ node.num = mp_stack[mp_stack_idx].curr;
+ } else if (node.type == JSON_PATH_STR) {
+ if (mp_typeof(*pos) != MP_STR) {
+ /*
+ * We do not support non-string
+ * keys in maps.
+ */
+ mp_next(&pos);
+ mp_next(&pos);
+ continue;
+ }
+ node.str = mp_decode_str(&pos, (uint32_t *)&node.len);
+ } else {
+ unreachable();
+ }
+ uint32_t rolling_hash =
+ json_path_node_hash(&node, parent->rolling_hash);
+ struct tuple_field *field =
+ json_tree_lookup_entry_by_path_node(tree, parent, &node,
+ rolling_hash,
+ struct tuple_field,
+ tree_node);
+ enum mp_type type = mp_typeof(*pos);
+ if (field != NULL) {
+ bool is_nullable = tuple_field_is_nullable(field);
+ if (validate &&
+ key_mp_type_validate(field->type, type,
+ ER_FIELD_TYPE,
+ mp_stack[0].curr,
+ is_nullable) != 0)
+ return -1;
+ if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
+ field_map[field->offset_slot] =
+ (uint32_t)(pos - tuple);
+ }
+ }
+ /* Prepare stack info for next iteration. */
+ if (field != NULL && type == MP_ARRAY &&
+ mp_stack_idx + 1 < format->subtree_depth) {
+ uint32_t size = mp_decode_array(&pos);
+ if (unlikely(size == 0))
+ continue;
+ parent = &field->tree_node;
+ mp_stack[++mp_stack_idx] = (struct parse_ctx){
+ .child_type = JSON_PATH_NUM,
+ .items = size,
+ .curr = 0,
+ };
+ } else if (field != NULL && type == MP_MAP &&
+ mp_stack_idx + 1 < format->subtree_depth) {
+ uint32_t size = mp_decode_map(&pos);
+ if (unlikely(size == 0))
+ continue;
+ parent = &field->tree_node;
+ mp_stack[++mp_stack_idx] = (struct parse_ctx){
+ .child_type = JSON_PATH_STR,
+ .items = size,
+ .curr = 0,
+ };
+ } else {
+ mp_next(&pos);
+ while (mp_stack[mp_stack_idx].curr >=
+ mp_stack[mp_stack_idx].items) {
+ assert(parent != NULL);
+ parent = parent->parent;
+ if (mp_stack_idx-- == 0)
+ goto end;
+ }
+ }
+ };
+end:;
+ /* Nullify absent nullable fields in field_map. */
+ struct tuple_field *field;
+ struct json_tree_node *tree_node =
+ (struct json_tree_node *)&format->field_tree.root;
+ /*
+ * Field map has already been initialized with zeros when
+ * no validation is required.
+ */
+ if (!validate)
+ return 0;
+ json_tree_foreach_entry_pre(field, tree_node, struct tuple_field,
+ tree_node) {
+ if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL &&
+ tuple_field_is_nullable(field) &&
+ field_map[field->offset_slot] == UINT32_MAX)
+ field_map[field->offset_slot] = 0;
+ }
+ return tuple_field_map_validate(format, field_map);
}
uint32_t
@@ -612,15 +874,7 @@ tuple_field_go_to_key(const char **field, const char *key, int len)
return -1;
}
-/**
- * Retrieve msgpack data by JSON path.
- * @param data Pointer to msgpack with data.
- * @param path The path to process.
- * @param path_len The length of the @path.
- * @retval 0 On success.
- * @retval >0 On path parsing error, invalid character position.
- */
-static int
+int
tuple_field_go_to_path(const char **data, const char *path, uint32_t path_len)
{
int rc;
@@ -723,3 +977,40 @@ error:
tt_sprintf("error in path on position %d", rc));
return -1;
}
+
+const char *
+tuple_field_by_part_raw(struct tuple_format *format, const char *data,
+ const uint32_t *field_map, struct key_part *part)
+{
+ if (likely(part->path == NULL))
+ return tuple_field_raw(format, data, field_map, part->fieldno);
+
+ uint32_t field_count = tuple_format_field_count(format);
+ struct tuple_field *root_field =
+ likely(part->fieldno < field_count) ?
+ tuple_format_field(format, part->fieldno) : NULL;
+ struct tuple_field *field =
+ unlikely(root_field == NULL) ? NULL:
+ tuple_format_field_by_path(format, root_field, part->path,
+ part->path_len);
+ if (unlikely(field == NULL)) {
+ /*
+ * Legacy tuple having no field map for JSON
+ * index require full path parse.
+ */
+ const char *field_raw =
+ tuple_field_raw(format, data, field_map, part->fieldno);
+ if (unlikely(field_raw == NULL))
+ return NULL;
+ if (tuple_field_go_to_path(&field_raw, part->path,
+ part->path_len) != 0)
+ return NULL;
+ return field_raw;
+ }
+ int32_t offset_slot = field->offset_slot;
+ assert(offset_slot < 0);
+ assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size);
+ if (unlikely(field_map[offset_slot] == 0))
+ return NULL;
+ return data + field_map[offset_slot];
+}
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 8426153..e82af67 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -116,6 +116,8 @@ struct tuple_field {
uint32_t coll_id;
/** An JSON tree entry to organize tree. */
struct json_tree_node tree_node;
+ /** A maximum depth of field subtree. */
+ uint32_t subtree_depth;
};
/**
@@ -169,12 +171,16 @@ struct tuple_format {
* index_field_count <= min_field_count <= field_count.
*/
uint32_t min_field_count;
+ /** Size of format allocation. */
+ uint32_t allocation_size;
/**
* Shared names storage used by all formats of a space.
*/
struct tuple_dictionary *dict;
/** JSON tree of fields. */
struct json_tree field_tree;
+ /** A maximum depth of fields subtree. */
+ uint32_t subtree_depth;
};
@@ -192,6 +198,17 @@ tuple_format_field(struct tuple_format *format, uint32_t fieldno)
return json_tree_entry(tree_node, struct tuple_field, tree_node);
}
+static inline struct tuple_field *
+tuple_format_field_by_path(struct tuple_format *format,
+ struct tuple_field *root, const char *path,
+ uint32_t path_len)
+{
+ return json_tree_lookup_entry_by_path(&format->field_tree,
+ &root->tree_node, path, path_len,
+ struct tuple_field, tree_node);
+}
+
+
extern struct tuple_format **tuple_formats;
static inline uint32_t
@@ -393,6 +410,18 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple,
}
/**
+ * Retrieve msgpack data by JSON path.
+ * @param data Pointer to msgpack with data.
+ * @param path The path to process.
+ * @param path_len The length of the @path.
+ * @retval 0 On success.
+ * @retval >0 On path parsing error, invalid character position.
+ */
+int
+tuple_field_go_to_path(const char **data, const char *path,
+ uint32_t path_len);
+
+/**
* Get tuple field by its path.
* @param format Tuple format.
* @param tuple MessagePack tuple's body.
@@ -419,12 +448,9 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
* @param part Index part to use.
* @retval Field data if the field exists or NULL.
*/
-static inline const char *
-tuple_field_by_part_raw(const struct tuple_format *format, const char *data,
- const uint32_t *field_map, struct key_part *part)
-{
- return tuple_field_raw(format, data, field_map, part->fieldno);
-}
+const char *
+tuple_field_by_part_raw(struct tuple_format *format, const char *data,
+ const uint32_t *field_map, struct key_part *part);
#if defined(__cplusplus)
} /* extern "C" */
diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
index b394804..3486ce1 100644
--- a/src/box/tuple_hash.cc
+++ b/src/box/tuple_hash.cc
@@ -222,7 +222,7 @@ key_hash_slowpath(const char *key, struct key_def *key_def);
void
tuple_hash_func_set(struct key_def *key_def) {
- if (key_def->is_nullable)
+ if (key_def->is_nullable || key_def->has_json_paths)
goto slowpath;
/*
* Check that key_def defines sequential a key without holes
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index b09b6ad..eb40ef1 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -982,6 +982,8 @@ vinyl_index_def_change_requires_rebuild(struct index *index,
return true;
if (!field_type1_contains_type2(new_part->type, old_part->type))
return true;
+ if (key_part_path_cmp(old_part, new_part) != 0)
+ return true;
}
return false;
}
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index 0615dcc..a93840f 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -711,7 +711,8 @@ vy_log_record_dup(struct region *pool, const struct vy_log_record *src)
"struct key_part_def");
goto err;
}
- key_def_dump_parts(src->key_def, dst->key_parts);
+ if (key_def_dump_parts(pool, src->key_def, dst->key_parts) != 0)
+ goto err;
dst->key_part_count = src->key_def->part_count;
dst->key_def = NULL;
}
diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c
index 7b704b8..9d5e220 100644
--- a/src/box/vy_point_lookup.c
+++ b/src/box/vy_point_lookup.c
@@ -196,8 +196,6 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx,
const struct vy_read_view **rv,
struct tuple *key, struct tuple **ret)
{
- assert(tuple_field_count(key) >= lsm->cmp_def->part_count);
-
*ret = NULL;
double start_time = ev_monotonic_now(loop());
int rc = 0;
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index 3e60fec..86f7e48 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -29,6 +29,7 @@
* SUCH DAMAGE.
*/
+#include "assoc.h"
#include "vy_stmt.h"
#include <stdlib.h>
@@ -370,6 +371,85 @@ vy_stmt_replace_from_upsert(const struct tuple *upsert)
return replace;
}
+/**
+ * Construct tuple or calculate it's size. The fields_iov_ht
+ * is a hashtable that links leaf field records of field path
+ * tree and iovs that contain raw data. Function also fills the
+ * tuple field_map when write_data flag is set true.
+ */
+static void
+vy_stmt_tuple_restore_raw(struct tuple_format *format, char *tuple_raw,
+ uint32_t *field_map, char **offset,
+ struct mh_i64ptr_t *fields_iov_ht, bool write_data)
+{
+ struct tuple_field *prev = NULL;
+ struct tuple_field *curr;
+ json_tree_foreach_entry_pre(curr, &format->field_tree.root,
+ struct tuple_field, tree_node) {
+ struct json_tree_node *curr_node = &curr->tree_node;
+ struct tuple_field *parent =
+ unlikely(curr_node->parent == NULL) ? NULL :
+ json_tree_entry(curr_node->parent, struct tuple_field,
+ tree_node);
+ if (parent != NULL && parent->type == FIELD_TYPE_ARRAY &&
+ curr_node->sibling_idx > 0) {
+ /*
+ * Fill unindexed array items with nulls.
+ * Gaps size calculated as a difference
+ * between sibling nodes.
+ */
+ for (uint32_t i = curr_node->sibling_idx - 1;
+ curr_node->parent->children[i] == NULL &&
+ i > 0; i--) {
+ *offset = !write_data ?
+ (*offset += mp_sizeof_nil()) :
+ mp_encode_nil(*offset);
+ }
+ } else if (parent != NULL && parent->type == FIELD_TYPE_MAP) {
+ /* Set map key. */
+ const char *str = curr_node->key.str;
+ uint32_t len = curr_node->key.len;
+ *offset = !write_data ?
+ (*offset += mp_sizeof_str(len)) :
+ mp_encode_str(*offset, str, len);
+ }
+ /* Fill data. */
+ uint32_t children_count = curr_node->children_count;
+ if (curr->type == FIELD_TYPE_ARRAY) {
+ *offset = !write_data ?
+ (*offset += mp_sizeof_array(children_count)) :
+ mp_encode_array(*offset, children_count);
+ } else if (curr->type == FIELD_TYPE_MAP) {
+ *offset = !write_data ?
+ (*offset += mp_sizeof_map(children_count)) :
+ mp_encode_map(*offset, children_count);
+ } else {
+ /* Leaf record. */
+ mh_int_t k = mh_i64ptr_find(fields_iov_ht,
+ (uint64_t)curr, NULL);
+ struct iovec *iov =
+ k != mh_end(fields_iov_ht) ?
+ mh_i64ptr_node(fields_iov_ht, k)->val : NULL;
+ if (iov == NULL) {
+ *offset = !write_data ?
+ (*offset += mp_sizeof_nil()) :
+ mp_encode_nil(*offset);
+ } else {
+ uint32_t data_offset = *offset - tuple_raw;
+ int32_t slot = curr->offset_slot;
+ if (write_data) {
+ memcpy(*offset, iov->iov_base,
+ iov->iov_len);
+ if (slot != TUPLE_OFFSET_SLOT_NIL)
+ field_map[slot] = data_offset;
+ }
+ *offset += iov->iov_len;
+ }
+ }
+ prev = curr;
+ }
+}
+
static struct tuple *
vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
const struct key_def *cmp_def,
@@ -378,51 +458,79 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
/* UPSERT can't be surrogate. */
assert(type != IPROTO_UPSERT);
struct region *region = &fiber()->gc;
+ struct tuple *stmt = NULL;
uint32_t field_count = format->index_field_count;
- struct iovec *iov = region_alloc(region, sizeof(*iov) * field_count);
+ uint32_t part_count = mp_decode_array(&key);
+ assert(part_count == cmp_def->part_count);
+ struct iovec *iov = region_alloc(region, sizeof(*iov) * part_count);
if (iov == NULL) {
- diag_set(OutOfMemory, sizeof(*iov) * field_count,
- "region", "iov for surrogate key");
+ diag_set(OutOfMemory, sizeof(*iov) * part_count, "region",
+ "iov for surrogate key");
return NULL;
}
- memset(iov, 0, sizeof(*iov) * field_count);
- uint32_t part_count = mp_decode_array(&key);
- assert(part_count == cmp_def->part_count);
- assert(part_count <= field_count);
- uint32_t nulls_count = field_count - cmp_def->part_count;
- uint32_t bsize = mp_sizeof_array(field_count) +
- mp_sizeof_nil() * nulls_count;
- for (uint32_t i = 0; i < part_count; ++i) {
- const struct key_part *part = &cmp_def->parts[i];
+ /* Hastable linking leaf field and corresponding iov. */
+ struct mh_i64ptr_t *fields_iov_ht = mh_i64ptr_new();
+ if (fields_iov_ht == NULL) {
+ diag_set(OutOfMemory, sizeof(struct mh_i64ptr_t),
+ "mh_i64ptr_new", "fields_iov_ht");
+ return NULL;
+ }
+ if (mh_i64ptr_reserve(fields_iov_ht, part_count, NULL) != 0) {
+ diag_set(OutOfMemory, part_count, "mh_i64ptr_reserve",
+ "fields_iov_ht");
+ goto end;
+ }
+ memset(iov, 0, sizeof(*iov) * part_count);
+ const struct key_part *part = cmp_def->parts;
+ for (uint32_t i = 0; i < part_count; ++i, ++part) {
assert(part->fieldno < field_count);
const char *svp = key;
- iov[part->fieldno].iov_base = (char *) key;
+ iov[i].iov_base = (char *) key;
mp_next(&key);
- iov[part->fieldno].iov_len = key - svp;
- bsize += key - svp;
+ iov[i].iov_len = key - svp;
+ struct tuple_field *field;
+ field = tuple_format_field(format, part->fieldno);
+ assert(field != NULL);
+ if (unlikely(part->path != NULL)) {
+ field = tuple_format_field_by_path(format, field,
+ part->path,
+ part->path_len);
+ }
+ assert(field != NULL);
+ struct mh_i64ptr_node_t node = {(uint64_t)field, &iov[i]};
+ mh_int_t k = mh_i64ptr_put(fields_iov_ht, &node, NULL, NULL);
+ if (unlikely(k == mh_end(fields_iov_ht))) {
+ diag_set(OutOfMemory, part_count, "mh_i64ptr_put",
+ "fields_iov_ht");
+ goto end;
+ }
+ k = mh_i64ptr_find(fields_iov_ht, (uint64_t)field, NULL);
+ assert(k != mh_end(fields_iov_ht));
}
+ /* Calculate tuple size to make allocation. */
+ char *data = NULL;
+ vy_stmt_tuple_restore_raw(format, NULL, NULL, &data, fields_iov_ht,
+ false);
+ uint32_t bsize = mp_sizeof_array(field_count) + data - (char *)NULL;
- struct tuple *stmt = vy_stmt_alloc(format, bsize);
+ stmt = vy_stmt_alloc(format, bsize);
if (stmt == NULL)
- return NULL;
+ goto end;
+ /* Construct tuple. */
char *raw = (char *) tuple_data(stmt);
uint32_t *field_map = (uint32_t *) raw;
+ memset((char *)field_map - format->field_map_size, 0,
+ format->field_map_size);
char *wpos = mp_encode_array(raw, field_count);
- for (uint32_t i = 0; i < field_count; ++i) {
- const struct tuple_field *field = tuple_format_field(format, i);
- if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
- field_map[field->offset_slot] = wpos - raw;
- if (iov[i].iov_base == NULL) {
- wpos = mp_encode_nil(wpos);
- } else {
- memcpy(wpos, iov[i].iov_base, iov[i].iov_len);
- wpos += iov[i].iov_len;
- }
- }
- assert(wpos == raw + bsize);
+ vy_stmt_tuple_restore_raw(format, raw, field_map, &wpos, fields_iov_ht,
+ true);
+
+ assert(wpos <= raw + bsize);
vy_stmt_set_type(stmt, type);
+end:
+ mh_i64ptr_delete(fields_iov_ht);
return stmt;
}
diff --git a/test/box/misc.result b/test/box/misc.result
index 3d7317c..3e0bdc6 100644
--- a/test/box/misc.result
+++ b/test/box/misc.result
@@ -415,6 +415,7 @@ t;
83: box.error.ROLE_EXISTS
84: box.error.CREATE_ROLE
85: box.error.INDEX_EXISTS
+ 86: box.error.DATA_STRUCTURE_MISMATCH
87: box.error.ROLE_LOOP
88: box.error.GRANT
89: box.error.PRIV_GRANTED
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index 35c700e..9a1ceb8 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -954,6 +954,383 @@ type(tuple:tomap().fourth)
s:drop()
---
...
+--
+-- gh-1012: Indexes for JSON-defined paths.
+--
+box.cfg()
+---
+...
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': same key
+ part is indexed twice'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}})
+---
+- error: 'Wrong index options (field 2): ''path'' must be string'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = 'FIO'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
+ ''map'' is not supported'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
+ ''array'' is not supported'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO'}, {3, 'str', path = '["FIO"].fname'}}})
+---
+- error: Field [2]["FIO"]["fname"] has type 'string' in one index, but type 'map'
+ in another
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}})
+---
+- error: Field [2]["FIO"]["fname"] has type 'array' in one index, but type 'map' in
+ another
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname'}}})
+---
+- error: 'Wrong index options (field 3): invalid JSON path ''FIO....fname'': path
+ has invalid structure (error at position 5)'
+...
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+...
+assert(idx ~= nil)
+---
+- true
+...
+assert(idx.parts[2].path == "[\"FIO\"][\"fname\"]")
+---
+- true
+...
+s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected map'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected string'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
+---
+- error: 'Tuple doesn''t math document structure: invalid field 3 document content:
+ map doesn''t contain a key ''sname'' defined in index'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5}
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5}
+---
+- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+ 4, 5]
+...
+idx:select()
+---
+- - [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+ - [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+ 4, 5]
+...
+idx:min()
+---
+- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:max()
+---
+- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+ 4, 5]
+...
+s:drop()
+---
+...
+s = box.schema.create_space('withdata', {engine = engine})
+---
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[2]'}
+---
+...
+pk = s:create_index('pk', {parts = parts})
+---
+...
+s:insert{{1, 2}, 3}
+---
+- [[1, 2], 3]
+...
+s:upsert({{box.null, 2}}, {{'+', 2, 5}})
+---
+...
+s:get(2)
+---
+- [[1, 2], 8]
+...
+s:drop()
+---
+...
+-- Create index on space with data
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+pk = s:create_index('primary', { type = 'tree' })
+---
+...
+s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5}
+---
+- [1, 7, {'town': 'London', 'FIO': 1234}, 4, 5]
+...
+s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [2, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5}
+---
+- [4, 7, {'town': 'London', 'FIO': [1, 2, 3]}, 4, 5]
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected map'
+...
+_ = s:delete(1)
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+_ = s:delete(2)
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected map'
+...
+_ = s:delete(4)
+---
+...
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '["FIO"]["sname"]'}, {3, 'str', path = '["FIO"]["extra"]', is_nullable = true}}})
+---
+...
+assert(idx ~= nil)
+---
+- true
+...
+s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["fname"]'}}})
+---
+- error: Field [3]["FIO"]["fname"] has type 'string' in one index, but type 'number'
+ in another
+...
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
+---
+...
+assert(idx2 ~= nil)
+---
+- true
+...
+t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+---
+...
+idx:select()
+---
+- - [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
+ - [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:min()
+---
+- [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
+...
+idx:max()
+---
+- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:drop()
+---
+...
+s:drop()
+---
+...
+-- Test complex JSON indexes
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+parts = {}
+---
+...
+parts[1] = {1, 'str', path='[3][2].a'}
+---
+...
+parts[2] = {1, 'unsigned', path = '[3][1]'}
+---
+...
+parts[3] = {2, 'str', path = '[2].d[1]'}
+---
+...
+pk = s:create_index('primary', { type = 'tree', parts = parts})
+---
+...
+s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}}
+---
+- [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6,
+ [1, 2, 3]]
+...
+s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+---
+- error: Duplicate key exists in unique index 'primary' in space 'withdata'
+...
+parts = {}
+---
+...
+parts[1] = {4, 'unsigned', path='[1]', is_nullable = false}
+---
+...
+parts[2] = {4, 'unsigned', path='[2]', is_nullable = true}
+---
+...
+parts[3] = {4, 'unsigned', path='[4]', is_nullable = true}
+---
+...
+trap_idx = s:create_index('trap', { type = 'tree', parts = parts})
+---
+...
+s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}}
+---
+- error: 'Tuple doesn''t math document structure: invalid field 4 document content:
+ array size 1 is less than size 4 defined in index'
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[3][2].b' }
+---
+...
+parts[2] = {3, 'unsigned'}
+---
+...
+crosspart_idx = s:create_index('crosspart', { parts = parts})
+---
+...
+s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}}
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+ 2, 3]]
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[3][2].b'}
+---
+...
+num_idx = s:create_index('numeric', {parts = parts})
+---
+...
+s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}}
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+num_idx:get(2)
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+ 2, 3]]
+...
+num_idx:select()
+---
+- - [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [
+ 9, 2, 3]]
+ - [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}],
+ 6, [1, 2, 3]]
+ - [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [
+ 0]]
+...
+num_idx:max()
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+num_idx:min()
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+ 2, 3]]
+...
+assert(crosspart_idx:max() == num_idx:max())
+---
+- true
+...
+assert(crosspart_idx:min() == num_idx:min())
+---
+- true
+...
+trap_idx:max()
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+ 2, 3]]
+...
+trap_idx:min()
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+s:drop()
+---
+...
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+pk_simplified = s:create_index('primary', { type = 'tree', parts = {{1, 'unsigned'}}})
+---
+...
+assert(pk_simplified.path == box.NULL)
+---
+- true
+...
+idx = s:create_index('idx', {parts = {{2, 'integer', path = 'a'}}})
+---
+...
+s:insert{31, {a = 1, aa = -1}}
+---
+- [31, {'a': 1, 'aa': -1}]
+...
+s:insert{22, {a = 2, aa = -2}}
+---
+- [22, {'a': 2, 'aa': -2}]
+...
+s:insert{13, {a = 3, aa = -3}}
+---
+- [13, {'a': 3, 'aa': -3}]
+...
+idx:select()
+---
+- - [31, {'a': 1, 'aa': -1}]
+ - [22, {'a': 2, 'aa': -2}]
+ - [13, {'a': 3, 'aa': -3}]
+...
+idx:alter({parts = {{2, 'integer', path = 'aa'}}})
+---
+...
+idx:select()
+---
+- - [13, {'a': 3, 'aa': -3}]
+ - [22, {'a': 2, 'aa': -2}]
+ - [31, {'a': 1, 'aa': -1}]
+...
+s:drop()
+---
+...
engine = nil
---
...
diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua
index edc3dab..f1000dd 100644
--- a/test/engine/tuple.test.lua
+++ b/test/engine/tuple.test.lua
@@ -312,5 +312,112 @@ tuple:tomap().fourth
type(tuple:tomap().fourth)
s:drop()
+--
+-- gh-1012: Indexes for JSON-defined paths.
+--
+box.cfg()
+s = box.schema.space.create('withdata', {engine = engine})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = 'FIO'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO'}, {3, 'str', path = '["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname'}}})
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+assert(idx ~= nil)
+assert(idx.parts[2].path == "[\"FIO\"][\"fname\"]")
+s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5}
+s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5}
+idx:select()
+idx:min()
+idx:max()
+s:drop()
+
+s = box.schema.create_space('withdata', {engine = engine})
+parts = {}
+parts[1] = {1, 'unsigned', path='[2]'}
+pk = s:create_index('pk', {parts = parts})
+s:insert{{1, 2}, 3}
+s:upsert({{box.null, 2}}, {{'+', 2, 5}})
+s:get(2)
+s:drop()
+
+-- Create index on space with data
+s = box.schema.space.create('withdata', {engine = engine})
+pk = s:create_index('primary', { type = 'tree' })
+s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5}
+s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5}
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(1)
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(2)
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(4)
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '["FIO"]["sname"]'}, {3, 'str', path = '["FIO"]["extra"]', is_nullable = true}}})
+assert(idx ~= nil)
+s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["fname"]'}}})
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
+assert(idx2 ~= nil)
+t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+idx:select()
+idx:min()
+idx:max()
+idx:drop()
+s:drop()
+
+-- Test complex JSON indexes
+s = box.schema.space.create('withdata', {engine = engine})
+parts = {}
+parts[1] = {1, 'str', path='[3][2].a'}
+parts[2] = {1, 'unsigned', path = '[3][1]'}
+parts[3] = {2, 'str', path = '[2].d[1]'}
+pk = s:create_index('primary', { type = 'tree', parts = parts})
+s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}}
+s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+parts = {}
+parts[1] = {4, 'unsigned', path='[1]', is_nullable = false}
+parts[2] = {4, 'unsigned', path='[2]', is_nullable = true}
+parts[3] = {4, 'unsigned', path='[4]', is_nullable = true}
+trap_idx = s:create_index('trap', { type = 'tree', parts = parts})
+s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}}
+parts = {}
+parts[1] = {1, 'unsigned', path='[3][2].b' }
+parts[2] = {3, 'unsigned'}
+crosspart_idx = s:create_index('crosspart', { parts = parts})
+s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}}
+parts = {}
+parts[1] = {1, 'unsigned', path='[3][2].b'}
+num_idx = s:create_index('numeric', {parts = parts})
+s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}}
+num_idx:get(2)
+num_idx:select()
+num_idx:max()
+num_idx:min()
+assert(crosspart_idx:max() == num_idx:max())
+assert(crosspart_idx:min() == num_idx:min())
+trap_idx:max()
+trap_idx:min()
+s:drop()
+
+s = box.schema.space.create('withdata', {engine = engine})
+pk_simplified = s:create_index('primary', { type = 'tree', parts = {{1, 'unsigned'}}})
+assert(pk_simplified.path == box.NULL)
+idx = s:create_index('idx', {parts = {{2, 'integer', path = 'a'}}})
+s:insert{31, {a = 1, aa = -1}}
+s:insert{22, {a = 2, aa = -2}}
+s:insert{13, {a = 3, aa = -3}}
+idx:select()
+idx:alter({parts = {{2, 'integer', path = 'aa'}}})
+idx:select()
+s:drop()
+
engine = nil
test_run = nil
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* [tarantool-patches] [PATCH v5 09/12] box: introduce has_json_paths flag in templates
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
` (10 preceding siblings ...)
2018-10-29 6:56 ` [PATCH v5 08/12] box: introduce JSON indexes Kirill Shcherbatov
@ 2018-10-29 6:56 ` Kirill Shcherbatov
11 siblings, 0 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-10-29 6:56 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
Introduced has_json_path flag for compare, hash and extract
function(that are really hot) to make possible do not look to
path field for flat indexes without any JSON paths.
Part of #1012
---
src/box/tuple_compare.cc | 112 +++++++++++++++++++++++++++++++------------
| 104 ++++++++++++++++++++++++++--------------
src/box/tuple_hash.cc | 45 ++++++++++++-----
3 files changed, 182 insertions(+), 79 deletions(-)
diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index 554c29f..97963d0 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -458,11 +458,12 @@ tuple_common_key_parts(const struct tuple *tuple_a, const struct tuple *tuple_b,
return i;
}
-template<bool is_nullable, bool has_optional_parts>
+template<bool is_nullable, bool has_optional_parts, bool has_json_path>
static inline int
tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
struct key_def *key_def)
{
+ assert(has_json_path == key_def->has_json_paths);
assert(!has_optional_parts || is_nullable);
assert(is_nullable == key_def->is_nullable);
assert(has_optional_parts == key_def->has_optional_parts);
@@ -508,10 +509,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
end = part + key_def->part_count;
for (; part < end; part++) {
- field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
- field_map_a, part);
- field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
- field_map_b, part);
+ if (!has_json_path) {
+ field_a = tuple_field_raw(format_a, tuple_a_raw,
+ field_map_a,
+ part->fieldno);
+ field_b = tuple_field_raw(format_b, tuple_b_raw,
+ field_map_b,
+ part->fieldno);
+ } else {
+ field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
+ field_map_a, part);
+ field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
+ field_map_b, part);
+ }
assert(has_optional_parts ||
(field_a != NULL && field_b != NULL));
if (! is_nullable) {
@@ -558,10 +568,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
*/
end = key_def->parts + key_def->part_count;
for (; part < end; ++part) {
- field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
- field_map_a, part);
- field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
- field_map_b, part);
+ if (!has_json_path) {
+ field_a = tuple_field_raw(format_a, tuple_a_raw,
+ field_map_a,
+ part->fieldno);
+ field_b = tuple_field_raw(format_b, tuple_b_raw,
+ field_map_b,
+ part->fieldno);
+ } else {
+ field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
+ field_map_a, part);
+ field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
+ field_map_b, part);
+ }
/*
* Extended parts are primary, and they can not
* be absent or be NULLs.
@@ -575,11 +594,12 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
return 0;
}
-template<bool is_nullable, bool has_optional_parts>
+template<bool is_nullable, bool has_optional_parts, bool has_json_paths>
static inline int
tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
uint32_t part_count, struct key_def *key_def)
{
+ assert(has_json_paths == key_def->has_json_paths);
assert(!has_optional_parts || is_nullable);
assert(is_nullable == key_def->is_nullable);
assert(has_optional_parts == key_def->has_optional_parts);
@@ -591,9 +611,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
const uint32_t *field_map = tuple_field_map(tuple);
enum mp_type a_type, b_type;
if (likely(part_count == 1)) {
- const char *field =
- tuple_field_by_part_raw(format, tuple_raw, field_map,
- part);
+ const char *field;
+ if (!has_json_paths) {
+ field = tuple_field_raw(format, tuple_raw, field_map,
+ part->fieldno);
+ } else {
+ field = tuple_field_by_part_raw(format, tuple_raw,
+ field_map, part);
+ }
if (! is_nullable) {
return tuple_compare_field(field, key, part->type,
part->coll);
@@ -617,9 +642,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
struct key_part *end = part + part_count;
int rc;
for (; part < end; ++part, mp_next(&key)) {
- const char *field =
- tuple_field_by_part_raw(format, tuple_raw,
- field_map, part);
+ const char *field;
+ if (!has_json_paths) {
+ field = tuple_field_raw(format, tuple_raw, field_map,
+ part->fieldno);
+ } else {
+ field = tuple_field_by_part_raw(format, tuple_raw,
+ field_map, part);
+ }
if (! is_nullable) {
rc = tuple_compare_field(field, key, part->type,
part->coll);
@@ -1012,19 +1042,31 @@ static const comparator_signature cmp_arr[] = {
#undef COMPARATOR
+static const tuple_compare_t compare_slowpath_funcs[] = {
+ tuple_compare_slowpath<false, false, false>,
+ tuple_compare_slowpath<true, false, false>,
+ tuple_compare_slowpath<false, true, false>,
+ tuple_compare_slowpath<true, true, false>,
+ tuple_compare_slowpath<false, false, true>,
+ tuple_compare_slowpath<true, false, true>,
+ tuple_compare_slowpath<false, true, true>,
+ tuple_compare_slowpath<true, true, true>
+};
+
tuple_compare_t
tuple_compare_create(const struct key_def *def)
{
+ int cmp_func_idx = (def->is_nullable ? 1 : 0) +
+ 2 * (def->has_optional_parts ? 1 : 0) +
+ 4 * (def->has_json_paths ? 1 : 0);
if (def->is_nullable) {
if (key_def_is_sequential(def)) {
if (def->has_optional_parts)
return tuple_compare_sequential<true, true>;
else
return tuple_compare_sequential<true, false>;
- } else if (def->has_optional_parts) {
- return tuple_compare_slowpath<true, true>;
} else {
- return tuple_compare_slowpath<true, false>;
+ return compare_slowpath_funcs[cmp_func_idx];
}
}
assert(! def->has_optional_parts);
@@ -1044,10 +1086,9 @@ tuple_compare_create(const struct key_def *def)
return cmp_arr[k].f;
}
}
- if (key_def_is_sequential(def))
- return tuple_compare_sequential<false, false>;
- else
- return tuple_compare_slowpath<false, false>;
+ return key_def_is_sequential(def) ?
+ tuple_compare_sequential<false, false> :
+ compare_slowpath_funcs[cmp_func_idx];
}
/* }}} tuple_compare */
@@ -1229,9 +1270,23 @@ static const comparator_with_key_signature cmp_wk_arr[] = {
#undef KEY_COMPARATOR
+static const tuple_compare_with_key_t compare_with_key_slowpath_funcs[] = {
+ tuple_compare_with_key_slowpath<false, false, false>,
+ tuple_compare_with_key_slowpath<true, false, false>,
+ tuple_compare_with_key_slowpath<false, true, false>,
+ tuple_compare_with_key_slowpath<true, true, false>,
+ tuple_compare_with_key_slowpath<false, false, true>,
+ tuple_compare_with_key_slowpath<true, false, true>,
+ tuple_compare_with_key_slowpath<false, true, true>,
+ tuple_compare_with_key_slowpath<true, true, true>
+};
+
tuple_compare_with_key_t
tuple_compare_with_key_create(const struct key_def *def)
{
+ int cmp_func_idx = (def->is_nullable ? 1 : 0) +
+ 2 * (def->has_optional_parts ? 1 : 0) +
+ 4 * (def->has_json_paths ? 1 : 0);
if (def->is_nullable) {
if (key_def_is_sequential(def)) {
if (def->has_optional_parts) {
@@ -1241,10 +1296,8 @@ tuple_compare_with_key_create(const struct key_def *def)
return tuple_compare_with_key_sequential<true,
false>;
}
- } else if (def->has_optional_parts) {
- return tuple_compare_with_key_slowpath<true, true>;
} else {
- return tuple_compare_with_key_slowpath<true, false>;
+ return compare_with_key_slowpath_funcs[cmp_func_idx];
}
}
assert(! def->has_optional_parts);
@@ -1267,10 +1320,9 @@ tuple_compare_with_key_create(const struct key_def *def)
return cmp_wk_arr[k].f;
}
}
- if (key_def_is_sequential(def))
- return tuple_compare_with_key_sequential<false, false>;
- else
- return tuple_compare_with_key_slowpath<false, false>;
+ return key_def_is_sequential(def) ?
+ tuple_compare_with_key_sequential<false, false> :
+ compare_with_key_slowpath_funcs[cmp_func_idx];
}
/* }}} tuple_compare_with_key */
--git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index 04c5463..63ad970 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -5,13 +5,18 @@
enum { MSGPACK_NULL = 0xc0 };
/** True if key part i and i+1 are sequential. */
+template <bool has_json_paths>
static inline bool
key_def_parts_are_sequential(const struct key_def *def, int i)
{
uint32_t fieldno1 = def->parts[i].fieldno + 1;
uint32_t fieldno2 = def->parts[i + 1].fieldno;
- return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
- def->parts[i + 1].path == NULL;
+ if (!has_json_paths) {
+ return fieldno1 == fieldno2;
+ } else {
+ return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
+ def->parts[i + 1].path == NULL;
+ }
}
/** True, if a key con contain two or more parts in sequence. */
@@ -19,7 +24,7 @@ static bool
key_def_contains_sequential_parts(const struct key_def *def)
{
for (uint32_t i = 0; i < def->part_count - 1; ++i) {
- if (key_def_parts_are_sequential(def, i))
+ if (key_def_parts_are_sequential<true>(def, i))
return true;
}
return false;
@@ -99,11 +104,13 @@ tuple_extract_key_sequential(const struct tuple *tuple, struct key_def *key_def,
* General-purpose implementation of tuple_extract_key()
* @copydoc tuple_extract_key()
*/
-template <bool contains_sequential_parts, bool has_optional_parts>
+template <bool contains_sequential_parts, bool has_optional_parts,
+ bool has_json_paths>
static char *
tuple_extract_key_slowpath(const struct tuple *tuple,
struct key_def *key_def, uint32_t *key_size)
{
+ assert(has_json_paths == key_def->has_json_paths);
assert(!has_optional_parts || key_def->is_nullable);
assert(has_optional_parts == key_def->has_optional_parts);
assert(contains_sequential_parts ==
@@ -118,9 +125,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
/* Calculate the key size. */
for (uint32_t i = 0; i < part_count; ++i) {
- const char *field =
- tuple_field_by_part_raw(format, data, field_map,
- &key_def->parts[i]);
+ const char *field;
+ if (!has_json_paths) {
+ field = tuple_field_raw(format, data, field_map,
+ key_def->parts[i].fieldno);
+ } else {
+ field = tuple_field_by_part_raw(format, data, field_map,
+ &key_def->parts[i]);
+ }
if (has_optional_parts && field == NULL) {
bsize += mp_sizeof_nil();
continue;
@@ -133,7 +145,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
* minimize tuple_field_raw() calls.
*/
for (; i < part_count - 1; i++) {
- if (!key_def_parts_are_sequential(key_def, i)) {
+ if (!key_def_parts_are_sequential
+ <has_json_paths>(key_def, i)) {
/*
* End of sequential part.
*/
@@ -159,9 +172,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
}
char *key_buf = mp_encode_array(key, part_count);
for (uint32_t i = 0; i < part_count; ++i) {
- const char *field =
- tuple_field_by_part_raw(format, data, field_map,
- &key_def->parts[i]);
+ const char *field;
+ if (!has_json_paths) {
+ field = tuple_field_raw(format, data, field_map,
+ key_def->parts[i].fieldno);
+ } else {
+ field = tuple_field_by_part_raw(format, data, field_map,
+ &key_def->parts[i]);
+ }
if (has_optional_parts && field == NULL) {
key_buf = mp_encode_nil(key_buf);
continue;
@@ -174,7 +192,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
* minimize tuple_field_raw() calls.
*/
for (; i < part_count - 1; i++) {
- if (!key_def_parts_are_sequential(key_def, i)) {
+ if (!key_def_parts_are_sequential
+ <has_json_paths>(key_def, i)) {
/*
* End of sequential part.
*/
@@ -207,11 +226,12 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
* General-purpose version of tuple_extract_key_raw()
* @copydoc tuple_extract_key_raw()
*/
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
static char *
tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
struct key_def *key_def, uint32_t *key_size)
{
+ assert(has_json_paths == key_def->has_json_paths);
assert(!has_optional_parts || key_def->is_nullable);
assert(has_optional_parts == key_def->has_optional_parts);
assert(mp_sizeof_nil() == 1);
@@ -239,7 +259,8 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
uint32_t fieldno = key_def->parts[i].fieldno;
uint32_t null_count = 0;
for (; i < key_def->part_count - 1; i++) {
- if (!key_def_parts_are_sequential(key_def, i))
+ if (!key_def_parts_are_sequential
+ <has_json_paths>(key_def, i))
break;
}
const struct key_part *part = &key_def->parts[i];
@@ -312,6 +333,17 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
return key;
}
+static const tuple_extract_key_t extract_key_slowpath_funcs[] = {
+ tuple_extract_key_slowpath<false, false, false>,
+ tuple_extract_key_slowpath<true, false, false>,
+ tuple_extract_key_slowpath<false, true, false>,
+ tuple_extract_key_slowpath<true, true, false>,
+ tuple_extract_key_slowpath<false, false, true>,
+ tuple_extract_key_slowpath<true, false, true>,
+ tuple_extract_key_slowpath<false, true, true>,
+ tuple_extract_key_slowpath<true, true, true>
+};
+
/**
* Initialize tuple_extract_key() and tuple_extract_key_raw()
*/
@@ -332,32 +364,30 @@ tuple_extract_key_set(struct key_def *key_def)
tuple_extract_key_sequential_raw<false>;
}
} else {
- if (key_def->has_optional_parts) {
- assert(key_def->is_nullable);
- if (key_def_contains_sequential_parts(key_def)) {
- key_def->tuple_extract_key =
- tuple_extract_key_slowpath<true, true>;
- } else {
- key_def->tuple_extract_key =
- tuple_extract_key_slowpath<false, true>;
- }
- } else {
- if (key_def_contains_sequential_parts(key_def)) {
- key_def->tuple_extract_key =
- tuple_extract_key_slowpath<true, false>;
- } else {
- key_def->tuple_extract_key =
- tuple_extract_key_slowpath<false,
- false>;
- }
- }
+ int func_idx =
+ (key_def_contains_sequential_parts(key_def) ? 1 : 0) +
+ 2 * (key_def->has_optional_parts ? 1 : 0) +
+ 4 * (key_def->has_json_paths ? 1 : 0);
+ key_def->tuple_extract_key =
+ extract_key_slowpath_funcs[func_idx];
+ assert(!key_def->has_optional_parts || key_def->is_nullable);
}
if (key_def->has_optional_parts) {
assert(key_def->is_nullable);
- key_def->tuple_extract_key_raw =
- tuple_extract_key_slowpath_raw<true>;
+ if (key_def->has_json_paths) {
+ key_def->tuple_extract_key_raw =
+ tuple_extract_key_slowpath_raw<true, true>;
+ } else {
+ key_def->tuple_extract_key_raw =
+ tuple_extract_key_slowpath_raw<true, false>;
+ }
} else {
- key_def->tuple_extract_key_raw =
- tuple_extract_key_slowpath_raw<false>;
+ if (key_def->has_json_paths) {
+ key_def->tuple_extract_key_raw =
+ tuple_extract_key_slowpath_raw<false, true>;
+ } else {
+ key_def->tuple_extract_key_raw =
+ tuple_extract_key_slowpath_raw<false, false>;
+ }
}
}
diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
index 3486ce1..8ede290 100644
--- a/src/box/tuple_hash.cc
+++ b/src/box/tuple_hash.cc
@@ -213,7 +213,7 @@ static const hasher_signature hash_arr[] = {
#undef HASHER
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
uint32_t
tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def);
@@ -256,10 +256,17 @@ tuple_hash_func_set(struct key_def *key_def) {
}
slowpath:
- if (key_def->has_optional_parts)
- key_def->tuple_hash = tuple_hash_slowpath<true>;
- else
- key_def->tuple_hash = tuple_hash_slowpath<false>;
+ if (key_def->has_optional_parts) {
+ if (key_def->has_json_paths)
+ key_def->tuple_hash = tuple_hash_slowpath<true, true>;
+ else
+ key_def->tuple_hash = tuple_hash_slowpath<true, false>;
+ } else {
+ if (key_def->has_json_paths)
+ key_def->tuple_hash = tuple_hash_slowpath<false, true>;
+ else
+ key_def->tuple_hash = tuple_hash_slowpath<false, false>;
+ }
key_def->key_hash = key_hash_slowpath;
}
@@ -319,10 +326,11 @@ tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, const struct tuple *tuple,
return tuple_hash_field(ph1, pcarry, &field, part->coll);
}
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
uint32_t
tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
{
+ assert(has_json_paths == key_def->has_json_paths);
assert(has_optional_parts == key_def->has_optional_parts);
uint32_t h = HASH_SEED;
uint32_t carry = 0;
@@ -331,9 +339,13 @@ tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
struct tuple_format *format = tuple_format(tuple);
const char *tuple_raw = tuple_data(tuple);
const uint32_t *field_map = tuple_field_map(tuple);
- const char *field =
- tuple_field_by_part_raw(format, tuple_raw, field_map,
- key_def->parts);
+ const char *field;
+ if (!has_json_paths) {
+ field = tuple_field(tuple, prev_fieldno);
+ } else {
+ field = tuple_field_by_part_raw(format, tuple_raw, field_map,
+ key_def->parts);
+ }
const char *end = (char *)tuple + tuple_size(tuple);
if (has_optional_parts && field == NULL) {
total_size += tuple_hash_null(&h, &carry);
@@ -347,9 +359,18 @@ tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
* need of tuple_field
*/
if (prev_fieldno + 1 != key_def->parts[part_id].fieldno) {
- struct key_part *part = &key_def->parts[part_id];
- field = tuple_field_by_part_raw(format, tuple_raw,
- field_map, part);
+ if (!has_json_paths) {
+ field = tuple_field(tuple,
+ key_def->parts[part_id].
+ fieldno);
+ } else {
+ struct key_part *part =
+ &key_def->parts[part_id];
+ field = tuple_field_by_part_raw(format,
+ tuple_raw,
+ field_map,
+ part);
+ }
}
if (has_optional_parts && (field == NULL || field >= end)) {
total_size += tuple_hash_null(&h, &carry);
--
2.7.4
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] [PATCH v5 11/12] box: introduce offset slot cache in key_part
2018-10-29 6:56 ` [PATCH v5 11/12] box: introduce offset slot cache in key_part Kirill Shcherbatov
@ 2018-11-01 13:32 ` Konstantin Osipov
2018-11-06 12:15 ` [tarantool-patches] " Kirill Shcherbatov
0 siblings, 1 reply; 39+ messages in thread
From: Konstantin Osipov @ 2018-11-01 13:32 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
* Kirill Shcherbatov <kshcherbatov@tarantool.org> [18/10/29 20:25]:
> Same key_part could be used in different formats multiple
> times
I don't understand this comment. Could you please rephrase?
key_part is a part of key_def. How can it be used in a format at
all?
>, so different field->offset_slot would be allocated.
> In most scenarios we work with series of tuples of same
> format, and (in general) format lookup for field would be
> expensive operation for JSON-paths defined in key_part.
I don't understand this statement either. Could you give an
example?
> New offset_slot_cache field in key_part structure and epoch-based
> mechanism to validate it's actuality should be effective
> approach to improve performance.
Did you consider storing it elsewhere, e.g. in some kind of
index search context?
> - alter->new_space = space_new_xc(alter->space_def, &alter->key_list);
> + alter->new_space =
> + space_new_xc(alter->space_def, &alter->key_list,
> + alter->old_space->format != NULL ?
> + alter->old_space->format->epoch + 1 : 1);
Can't we make it simpler and simply increase epoch id every
time we create a new space? This is only an optimization,
by leaking it into alter.cc you are making alter worry about
stuff which should not be its concern.
> + struct space *space = engine_create_space(engine, def, key_list, epoch);
Passing epoch id around explicitly is ugly.
> - key_def_set_part(new_def, pos++, part->fieldno, part->type,
> + key_def_set_part(new_def, pos, part->fieldno, part->type,
> part->nullable_action, part->coll,
> part->coll_id, part->sort_order, part->path,
> part->path_len);
> + new_def->parts[pos].offset_slot_cache = part->offset_slot_cache;
> + new_def->parts[pos].format_cache = part->format_cache;
> + pos++;
Why can't you do it in key_def_set_part?
> - key_def_set_part(new_def, pos++, part->fieldno, part->type,
> + key_def_set_part(new_def, pos, part->fieldno, part->type,
> part->nullable_action, part->coll,
> part->coll_id, part->sort_order, part->path,
> part->path_len);
> + new_def->parts[pos].offset_slot_cache = part->offset_slot_cache;
> + new_def->parts[pos].format_cache = part->format_cache;
> + pos++;
Lack of code reuse, abstraction leak.
> +++ b/src/box/key_def.h
> @@ -101,6 +101,14 @@ struct key_part {
> char *path;
> /** The length of JSON path. */
> uint32_t path_len;
> + /**
> + * Source format for offset_slot_cache actuality
> + * validations. Cache is expected to use "the format with
The source format to check that offset_slot_epoch is not stale.
Please avoid using the word "actuality".
> + * the newest epoch is most relevant" strategy.
> + */
> + struct tuple_format *format_cache;
> + /** Cache with format's field offset slot. */
> + int32_t offset_slot_cache;
> };
>
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] [PATCH v5 12/12] box: specify indexes in user-friendly form
2018-10-29 6:56 ` [PATCH v5 12/12] box: specify indexes in user-friendly form Kirill Shcherbatov
@ 2018-11-01 13:34 ` Konstantin Osipov
2018-11-01 14:18 ` Konstantin Osipov
1 sibling, 0 replies; 39+ messages in thread
From: Konstantin Osipov @ 2018-11-01 13:34 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
* Kirill Shcherbatov <kshcherbatov@tarantool.org> [18/10/29 20:25]:
> Since now it is possible to create indexes by JSON-path
> using field names specified in format.
It is now possible to create indexes by JSON path and using
field names specified in the space format.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] [PATCH v5 12/12] box: specify indexes in user-friendly form
2018-10-29 6:56 ` [PATCH v5 12/12] box: specify indexes in user-friendly form Kirill Shcherbatov
2018-11-01 13:34 ` [tarantool-patches] " Konstantin Osipov
@ 2018-11-01 14:18 ` Konstantin Osipov
2018-11-06 12:15 ` [tarantool-patches] " Kirill Shcherbatov
1 sibling, 1 reply; 39+ messages in thread
From: Konstantin Osipov @ 2018-11-01 14:18 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
* Kirill Shcherbatov <kshcherbatov@tarantool.org> [18/10/29 20:25]:
>
> +static int
> +lbox_index_resolve_path(struct lua_State *L)
> +{
No comment for the function. What does this function do and what
is it used for?
> + if (lua_gettop(L) != 3 ||
> + !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) {
> + return luaL_error(L, "Usage box.internal."
> + "path_resolve(part_id, space_id, path)");
Why is it called path_resolve in one place and resolve_path in
another?
> - end
> - end
> - if type(part.field) == 'string' then
> + local idx, path = box.internal.path_resolve(i, space_id, part.field)
> + if part.path ~= nil and part.path ~= path then
> box.error(box.error.ILLEGAL_PARAMS,
> - "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'")
> + "options.parts[" .. i .. "]: field path '"..
> + part.path.." doesn't math path resolved by name '" ..
Doesn't match the path
Please check with the docs team all the new messages this patch is
adding to the server.
I don't see how this cross-check help. I can change space format
later on. Looks like we need to push the validation to alter.cc to
ensure any kind of consistency.
Generally, as a rule, we try to avoid referencing anything by
name, and prefer referencing by id, even though the user can use
the name in box.* api. This spares us from the responsibility to
cross-check all the referencing objects whenever a referenced
object changes.
What is the strategy for json paths here? Could you describe it in
a comment?
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] [PATCH v5 02/12] box: introduce key_def_parts_are_sequential
2018-10-29 6:56 ` [PATCH v5 02/12] box: introduce key_def_parts_are_sequential Kirill Shcherbatov
@ 2018-11-01 14:23 ` Konstantin Osipov
2018-11-06 12:14 ` [tarantool-patches] " Kirill Shcherbatov
2018-11-19 17:48 ` Vladimir Davydov
1 sibling, 1 reply; 39+ messages in thread
From: Konstantin Osipov @ 2018-11-01 14:23 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
* Kirill Shcherbatov <kshcherbatov@tarantool.org> [18/10/29 20:25]:
> Introduced a new key_def_parts_are_sequential routine that test,
> does specified key_def have sequential fields. This would be
> useful with introducing JSON path as there would be another
> complex criteria as fields with JSONs can't be 'sequential' in
> this meaning.
They still can:
Paths foo.bar[0] and foo.bar[1] refer to sequential fields.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] [PATCH v5 04/12] box: introduce tuple_format_add_key_part
2018-10-29 6:56 ` [PATCH v5 04/12] box: introduce tuple_format_add_key_part Kirill Shcherbatov
@ 2018-11-01 14:38 ` Konstantin Osipov
2018-11-06 12:15 ` [tarantool-patches] " Kirill Shcherbatov
2018-11-19 17:50 ` Vladimir Davydov
1 sibling, 1 reply; 39+ messages in thread
From: Konstantin Osipov @ 2018-11-01 14:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
* Kirill Shcherbatov <kshcherbatov@tarantool.org> [18/10/29 20:25]:
> Introduced a new tuple_format_add_key_part that makes format
> initialization for specified key_part and configuration.
> This decrease tuple_format_create routine complexity and would
> be used to initialize structures in format for JSON path.
The patch is OK to push but I object to the name.
A format doesn't contain key parts. A key def contains key part. A
format is consistent with key parts.
tuple_format_update_with_key_part() perhaps?
tuple_format_combine_with_key_part()?
tuple_format_mix_in_key_part()?
tuple_format_use_key_part()?
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] [PATCH v5 05/12] lib: implement JSON tree class for json library
2018-10-29 6:56 ` [PATCH v5 05/12] lib: implement JSON tree class for json library Kirill Shcherbatov
@ 2018-11-01 15:08 ` Konstantin Osipov
2018-11-06 12:15 ` [tarantool-patches] " Kirill Shcherbatov
2018-11-20 16:43 ` Vladimir Davydov
1 sibling, 1 reply; 39+ messages in thread
From: Konstantin Osipov @ 2018-11-01 15:08 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
* Kirill Shcherbatov <kshcherbatov@tarantool.org> [18/10/29 20:25]:
> + mh_int_t id = mh_json_tree_node_find(tree->hash, &info, NULL);
> + if (unlikely(id == mh_end(tree->hash)))
> + return NULL;
> + struct mh_json_tree_node *ht_node =
> + mh_json_tree_node_node(tree->hash, id);
This is hard to read. It's hard to see whether it's a hash lookup
or a tree lookup. Let's perhaps rename mh_node_t to mh_entry_t.
Even more confusing is json_path_node and json_tree_node.
Let's take time to come up with better names.
> +json_tree_lookup_by_path(struct json_tree *tree, struct json_tree_node *parent,
> + const char *path, uint32_t path_len)
This function could be called simply json_tree_lookup().
> + struct json_tree_node **arr = parent->children;
> + uint32_t arr_size = parent->children_count;
child_count.
> +struct json_tree_node *
> +json_tree_next_pre(struct json_tree_node *parent, struct json_tree_node *pos)
I would call it preorder/postorder to avoid confusion between pre
and prev.
> +struct json_tree_node *
> +json_tree_next_post(struct json_tree_node *parent, struct json_tree_node *pos)
json_tree_postorder_next
> +/** Compute the hash value of a JSON path component. */
> +uint32_t
> +json_path_node_hash(struct json_path_node *key, uint32_t seed);
This is json_path_fragment or json_path_segment.
> +#define json_tree_entry(item, type, member) ({ \
> + const typeof( ((type *)0)->member ) *__mptr = (item); \
> + (type *)( (char *)__mptr - ((size_t) &((type *)0)->member) ); })
> +
> +/** Return entry by json_tree_node item or NULL if item is NULL.*/
> +#define json_tree_entry_safe(item, type, member) ({ \
> + (item) != NULL ? json_tree_entry((item), type, member) : NULL; })
What is a node item? I'm in hell.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] [PATCH v5 07/12] lib: introduce json_path_normalize routine
2018-10-29 6:56 ` [PATCH v5 07/12] lib: introduce json_path_normalize routine Kirill Shcherbatov
@ 2018-11-01 15:22 ` Konstantin Osipov
2018-11-01 15:27 ` [tarantool-patches] " Kirill Shcherbatov
2018-11-20 15:14 ` Vladimir Davydov
1 sibling, 1 reply; 39+ messages in thread
From: Konstantin Osipov @ 2018-11-01 15:22 UTC (permalink / raw)
To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov
* Kirill Shcherbatov <kshcherbatov@tarantool.org> [18/10/29 20:25]:
> Introduced a new routine json_path_normalize that makes a
> conversion of JSON path to the 'canonical' form:
> - all maps keys are specified with operator ["key"] form
> - all array indexes are specified with operator [i] form.
> This notation is preferable because in the general case it can
> be uniquely parsed.
> We need such API in JSON indexes patch to store all paths in
> 'canonical' form to commit the path uniqueness checks and
> to tune access with JSON path hashtable.
>
> Need for #1012
Let's try to avoid this altogether. We could use parent references
to check if path1 equals to path2.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 39+ messages in thread
* [tarantool-patches] Re: [PATCH v5 07/12] lib: introduce json_path_normalize routine
2018-11-01 15:22 ` [tarantool-patches] " Konstantin Osipov
@ 2018-11-01 15:27 ` Kirill Shcherbatov
2018-11-20 15:13 ` Vladimir Davydov
0 siblings, 1 reply; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-11-01 15:27 UTC (permalink / raw)
To: tarantool-patches, Kostya Osipov
> Let's try to avoid this altogether. We could use parent references
> to check if path1 equals to path2.
This required in index_def_is_valid to raise error on key_part_path_cmp == 0
before tuple_format creation. We need to compare path strings in key_part(s)
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v5 02/12] box: introduce key_def_parts_are_sequential
2018-11-01 14:23 ` [tarantool-patches] " Konstantin Osipov
@ 2018-11-06 12:14 ` Kirill Shcherbatov
0 siblings, 0 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-11-06 12:14 UTC (permalink / raw)
To: tarantool-patches, Kostya Osipov; +Cc: Vladimir Davydov
> They still can:
> Paths foo.bar[0] and foo.bar[1] refer to sequential fields.
Yes, you are right. Generally it should be possible to make correct
is_sequential test for JSON indexes, but - at first - such tests are not trivial
and require already built JSON tree, like the format has. There are two
moments that obstruct it's introducing:
1) in usage scenario tuple_extract_key_slowpath_raw that use
key_def_parts_are_sequential routine doesn't operate with tuple format
entity(we can get around it by doing key_def->has_json_path flag lookup)
2) in key_def prepare step key_def_new/key_def_set_cmp/tuple_extract_key_set
no format object containing JSON path exists in tarantool at all. I don't like temporal JSON tree object
compiled by definitions on key_def_new step to set this unlikely optimization flag.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v5 04/12] box: introduce tuple_format_add_key_part
2018-11-01 14:38 ` [tarantool-patches] " Konstantin Osipov
@ 2018-11-06 12:15 ` Kirill Shcherbatov
0 siblings, 0 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-11-06 12:15 UTC (permalink / raw)
To: tarantool-patches, Kostya Osipov, Vladimir Davydov
> The patch is OK to push but I object to the name.
>
> A format doesn't contain key parts. A key def contains key part. A
> format is consistent with key parts.
>
> tuple_format_update_with_key_part() perhaps?
> tuple_format_combine_with_key_part()?
> tuple_format_mix_in_key_part()?
> tuple_format_use_key_part()?
Ok, I'd like to use tuple_format_use_key_part routine name.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v5 05/12] lib: implement JSON tree class for json library
2018-11-01 15:08 ` [tarantool-patches] " Konstantin Osipov
@ 2018-11-06 12:15 ` Kirill Shcherbatov
2018-11-19 17:53 ` Vladimir Davydov
0 siblings, 1 reply; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-11-06 12:15 UTC (permalink / raw)
To: tarantool-patches, Kostya Osipov; +Cc: Vladimir Davydov
> This is hard to read. It's hard to see whether it's a hash lookup
> or a tree lookup. Let's perhaps rename mh_node_t to mh_entry_t.
Ok, done.
> Even more confusing is json_path_node and json_tree_node.
> Let's take time to come up with better names.
Maybe the "json_tree_entry" is a better name for JSON tree record structure?
As for json_tree_node, this is an existent class that I don't touch in my patch at all.
>> +json_tree_lookup_by_path(struct json_tree *tree, struct json_tree_node *parent,
>> + const char *path, uint32_t path_len)
> This function could be called simply json_tree_lookup().
Done.
>> + uint32_t arr_size = parent->children_count;
>
> child_count.
Ok.
>
>> +struct json_tree_node *
>> +json_tree_next_pre(struct json_tree_node *parent, struct json_tree_node *pos)
>
> I would call it preorder/postorder to avoid confusion between pre
> and prev.
Ok, don't mind.
> json_tree_postorder_next
Ok, same for json_tree_preorder_next
>
>> +/** Compute the hash value of a JSON path component. */
>> +uint32_t
>> +json_path_node_hash(struct json_path_node *key, uint32_t seed);
>
> This is json_path_fragment or json_path_segment.
uint32_t
json_path_fragment_hash(struct json_path_node *key, uint32_t seed)
> What is a node item? I'm in hell./**
* Return container entry by json_tree_entry item or NULL if
* item is NULL.
*/
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v5 12/12] box: specify indexes in user-friendly form
2018-11-01 14:18 ` Konstantin Osipov
@ 2018-11-06 12:15 ` Kirill Shcherbatov
0 siblings, 0 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-11-06 12:15 UTC (permalink / raw)
To: tarantool-patches, Kostya Osipov; +Cc: Vladimir Davydov
> No comment for the function. What does this function do and what
> is it used for?
/**
* Resolve field index by absolute JSON path first component and
* return relative JSON path.
*/
>
>> + if (lua_gettop(L) != 3 ||
>> + !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) {
>> + return luaL_error(L, "Usage box.internal."
>> + "path_resolve(part_id, space_id, path)");
>
> Why is it called path_resolve in one place and resolve_path in
> another?
Ok, lbox_index_path_resolve
>
>> - end
>> - end
>> - if type(part.field) == 'string' then
>> + local idx, path = box.internal.path_resolve(i, space_id, part.field)
>> + if part.path ~= nil and part.path ~= path then
>> box.error(box.error.ILLEGAL_PARAMS,
>> - "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'")
>> + "options.parts[" .. i .. "]: field path '"..
>> + part.path.." doesn't math path resolved by name '" ..
>
> Doesn't match the path
Ok.
>
> Please check with the docs team all the new messages this patch is
> adding to the server.
I'll ask Lena when the patch is ready.
> I don't see how this cross-check help. I can change space format
> later on. Looks like we need to push the validation to alter.cc to
> ensure any kind of consistency.
>
> Generally, as a rule, we try to avoid referencing anything by
> name, and prefer referencing by id, even though the user can use
> the name in box.* api. This spares us from the responsibility to
> cross-check all the referencing objects whenever a referenced
> object changes.
>
> What is the strategy for json paths here? Could you describe it in
> a comment?
There are two ways to define JSON path:
1) manually specify fieldno and relative path -- this is cannonical approach introduced in
previous commits.
idx = s:create_index('test1', {parts = {{3, 'str', path = 'FIO.fname'}}})
2) use a fastpath introduced with this patch:
idx = s:create_index('test1', {parts = {{"user.FIO.fname", 'str'}}})
The second way is just user-friendly interface, string "user.FIO.fname" is not persisted in
Tarantool and indirectly converted to the (1) form in this patch.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v5 11/12] box: introduce offset slot cache in key_part
2018-11-01 13:32 ` [tarantool-patches] " Konstantin Osipov
@ 2018-11-06 12:15 ` Kirill Shcherbatov
0 siblings, 0 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-11-06 12:15 UTC (permalink / raw)
To: tarantool-patches, Kostya Osipov; +Cc: Vladimir Davydov
> I don't understand this comment. Could you please rephrase?
> I don't understand this statement either. Could you give an
> example?
> Did you consider storing it elsewhere, e.g. in some kind of
> index search context?
Tuned tuple_field_by_part_raw routine with key_part's
offset_slot_cache. Introduced tuple_format epoch to test validity
of this cache. The key_part caches last offset_slot source
format to make epoch comparison because same space may have
multiple format of same epoch that have different key_parts and
related offset_slots distribution.
>> - alter->new_space = space_new_xc(alter->space_def, &alter->key_list);
>> + alter->new_space =
>> + space_new_xc(alter->space_def, &alter->key_list,
>> + alter->old_space->format != NULL ?
>> + alter->old_space->format->epoch + 1 : 1);
>
> Can't we make it simpler and simply increase epoch id every
> time we create a new space? This is only an optimization,
> by leaking it into alter.cc you are making alter worry about
> stuff which should not be its concern.> Passing epoch id around explicitly is ugly.
The main problem we faced is that there are two formats in vinyl - regular
and disk format. Alter.cc does not know anything about the internal structure of
the vinyl space object, so he cannot update the disk epoch on his own.
> Why can't you do it in key_def_set_part?
> Lack of code reuse, abstraction leak.
Like this for now:
key_def_set_part(new_def, pos++, part->fieldno, part->type,
part->nullable_action, part->coll,
part->coll_id, part->sort_order, part->path,
part->path_len, part->offset_slot_cache,
part->format_cache);
> The source format to check that offset_slot_epoch is not stale.
>
> Please avoid using the word "actuality".
"hit"
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 01/12] box: refactor key_def_find routine
2018-10-29 6:56 ` [PATCH v5 01/12] box: refactor key_def_find routine Kirill Shcherbatov
@ 2018-11-19 17:48 ` Vladimir Davydov
0 siblings, 0 replies; 39+ messages in thread
From: Vladimir Davydov @ 2018-11-19 17:48 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches
On Mon, Oct 29, 2018 at 09:56:30AM +0300, Kirill Shcherbatov wrote:
> Refactored key_def_find routine to use key_part as a second
> argument. Introduced key_def_find_by_fieldno helper to use in
> scenarios where no key_part exists.
> New API is more convenient for complex key_part that will appear
> with JSON paths introduction.
>
> Need for #1012
> ---
> src/box/alter.cc | 7 ++++---
> src/box/key_def.c | 19 ++++++++++++++-----
> src/box/key_def.h | 9 ++++++++-
> src/box/sql/build.c | 2 +-
> src/box/sql/pragma.c | 2 +-
> 5 files changed, 28 insertions(+), 11 deletions(-)
Pushed to 2.1.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 02/12] box: introduce key_def_parts_are_sequential
2018-10-29 6:56 ` [PATCH v5 02/12] box: introduce key_def_parts_are_sequential Kirill Shcherbatov
2018-11-01 14:23 ` [tarantool-patches] " Konstantin Osipov
@ 2018-11-19 17:48 ` Vladimir Davydov
1 sibling, 0 replies; 39+ messages in thread
From: Vladimir Davydov @ 2018-11-19 17:48 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches
On Mon, Oct 29, 2018 at 09:56:34AM +0300, Kirill Shcherbatov wrote:
> Introduced a new key_def_parts_are_sequential routine that test,
> does specified key_def have sequential fields. This would be
> useful with introducing JSON path as there would be another
> complex criteria as fields with JSONs can't be 'sequential' in
> this meaning.
>
> Need for #1012
> ---
> src/box/tuple_extract_key.cc | 20 +++++++++++++-------
> 1 file changed, 13 insertions(+), 7 deletions(-)
Pushed to 2.1.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 03/12] box: introduce tuple_field_go_to_path
2018-10-29 6:56 ` [PATCH v5 03/12] box: introduce tuple_field_go_to_path Kirill Shcherbatov
@ 2018-11-19 17:48 ` Vladimir Davydov
0 siblings, 0 replies; 39+ messages in thread
From: Vladimir Davydov @ 2018-11-19 17:48 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches
On Mon, Oct 29, 2018 at 09:56:35AM +0300, Kirill Shcherbatov wrote:
> The new tuple_field_go_to_path routine is used in function
> tuple_field_raw_by_path to retrieve data by JSON path from field.
> We need this routine exported in future to access data by JSON
> path specified in key_part.
>
> Need for #1012
> ---
> src/box/tuple_format.c | 59 +++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 42 insertions(+), 17 deletions(-)
Pushed to 2.1.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 04/12] box: introduce tuple_format_add_key_part
2018-10-29 6:56 ` [PATCH v5 04/12] box: introduce tuple_format_add_key_part Kirill Shcherbatov
2018-11-01 14:38 ` [tarantool-patches] " Konstantin Osipov
@ 2018-11-19 17:50 ` Vladimir Davydov
1 sibling, 0 replies; 39+ messages in thread
From: Vladimir Davydov @ 2018-11-19 17:50 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches
On Mon, Oct 29, 2018 at 09:56:36AM +0300, Kirill Shcherbatov wrote:
> Introduced a new tuple_format_add_key_part that makes format
> initialization for specified key_part and configuration.
> This decrease tuple_format_create routine complexity and would
> be used to initialize structures in format for JSON path.
>
> Need for #1012
> ---
> src/box/tuple_format.c | 153 ++++++++++++++++++++++++++-----------------------
> 1 file changed, 82 insertions(+), 71 deletions(-)
>
> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
> index 6f76158..088579c 100644
> --- a/src/box/tuple_format.c
> +++ b/src/box/tuple_format.c
> @@ -43,6 +43,84 @@ static const struct tuple_field tuple_field_default = {
> ON_CONFLICT_ACTION_DEFAULT, NULL, COLL_NONE,
> };
>
> +static int
> +tuple_format_add_key_part(struct tuple_format *format,
> + const struct field_def *fields, uint32_t field_count,
> + const struct key_part *part, bool is_sequential,
> + int *current_slot)
Pushed to 2.1.
Note, tuple_format_add_key_part was renamed to tuple_format_use_key_part
in the latest version.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v5 05/12] lib: implement JSON tree class for json library
2018-11-06 12:15 ` [tarantool-patches] " Kirill Shcherbatov
@ 2018-11-19 17:53 ` Vladimir Davydov
0 siblings, 0 replies; 39+ messages in thread
From: Vladimir Davydov @ 2018-11-19 17:53 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches, Kostya Osipov
On Tue, Nov 06, 2018 at 03:15:08PM +0300, Kirill Shcherbatov wrote:
> > This is hard to read. It's hard to see whether it's a hash lookup
> > or a tree lookup. Let's perhaps rename mh_node_t to mh_entry_t.
> Ok, done.
>
> > Even more confusing is json_path_node and json_tree_node.
> > Let's take time to come up with better names.
> Maybe the "json_tree_entry" is a better name for JSON tree record structure?
> As for json_tree_node, this is an existent class that I don't touch in my patch at all.
I think Kostja meant that we need to rename json_path_node as well so as
to avoid confusion with json_tree_node. What about json_path_token? Then
json_path_parser would be called json_path_tokenizer.
Also, this patch doesn't compile on my laptop:
In file included from /home/vlad/src/tarantool/src/box/tuple_format.h:37:0,
from /home/vlad/src/tarantool/src/box/tuple_format.c:33:
/home/vlad/src/tarantool/src/box/tuple_format.c: In function ‘tuple_format_alloc’:
/home/vlad/src/tarantool/src/lib/json/tree.h:221:22: error: value computed is not used [-Werror=unused-value]
(__iter != NULL && (__tmp_iter = \
^
/home/vlad/src/tarantool/src/box/tuple_format.c:430:2: note: in expansion of macro ‘json_tree_foreach_entry_safe’
json_tree_foreach_entry_safe(field, &format->field_tree.root,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/vlad/src/tarantool/src/box/tuple_format.c: In function ‘tuple_format_destroy’:
/home/vlad/src/tarantool/src/lib/json/tree.h:221:22: error: value computed is not used [-Werror=unused-value]
(__iter != NULL && (__tmp_iter = \
^
/home/vlad/src/tarantool/src/box/tuple_format.c:443:2: note: in expansion of macro ‘json_tree_foreach_entry_safe’
json_tree_foreach_entry_safe(field, &format->field_tree.root,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please fix and rebased on top of the latest 2.1.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v5 07/12] lib: introduce json_path_normalize routine
2018-11-01 15:27 ` [tarantool-patches] " Kirill Shcherbatov
@ 2018-11-20 15:13 ` Vladimir Davydov
2018-11-26 10:50 ` Kirill Shcherbatov
0 siblings, 1 reply; 39+ messages in thread
From: Vladimir Davydov @ 2018-11-20 15:13 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches, Kostya Osipov
On Thu, Nov 01, 2018 at 06:27:55PM +0300, Kirill Shcherbatov wrote:
> > Let's try to avoid this altogether. We could use parent references
> > to check if path1 equals to path2.
> This required in index_def_is_valid to raise error on key_part_path_cmp == 0
> before tuple_format creation. We need to compare path strings in key_part(s)
TBO I find the idea that we have to normalize json paths just to compare
them a bit of an overkill. Why not simply introduce json_path_cmp() and
use it instead? All those functions using key_part_path_cmp() are cold
paths so that wouldn't degrade performance.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 07/12] lib: introduce json_path_normalize routine
2018-10-29 6:56 ` [PATCH v5 07/12] lib: introduce json_path_normalize routine Kirill Shcherbatov
2018-11-01 15:22 ` [tarantool-patches] " Konstantin Osipov
@ 2018-11-20 15:14 ` Vladimir Davydov
1 sibling, 0 replies; 39+ messages in thread
From: Vladimir Davydov @ 2018-11-20 15:14 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches
On Mon, Oct 29, 2018 at 09:56:39AM +0300, Kirill Shcherbatov wrote:
> Introduced a new routine json_path_normalize that makes a
> conversion of JSON path to the 'canonical' form:
> - all maps keys are specified with operator ["key"] form
> - all array indexes are specified with operator [i] form.
> This notation is preferable because in the general case it can
> be uniquely parsed.
> We need such API in JSON indexes patch to store all paths in
> 'canonical' form to commit the path uniqueness checks and
> to tune access with JSON path hashtable.
BTW the comment is misleading. I've looked through the following patches
and failed to find any hash table that would map canonical paths to
tuple fields. Looks like you normalize paths solely to compare them in
key_part_path_cmp().
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 05/12] lib: implement JSON tree class for json library
2018-10-29 6:56 ` [PATCH v5 05/12] lib: implement JSON tree class for json library Kirill Shcherbatov
2018-11-01 15:08 ` [tarantool-patches] " Konstantin Osipov
@ 2018-11-20 16:43 ` Vladimir Davydov
2018-11-21 10:37 ` [tarantool-patches] " Kirill Shcherbatov
2018-11-26 10:50 ` Kirill Shcherbatov
1 sibling, 2 replies; 39+ messages in thread
From: Vladimir Davydov @ 2018-11-20 16:43 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches
On Mon, Oct 29, 2018 at 09:56:37AM +0300, Kirill Shcherbatov wrote:
> New JSON tree class would store JSON paths for tuple fields
> for registered non-plain indexes. It is a hierarchical data
> structure that organize JSON nodes produced by parser.
> Class provides API to lookup node by path and iterate over the
> tree.
> JSON Indexes patch require such functionality to make lookup
> for tuple_fields by path, make initialization of field map and
> build vynyl_stmt msgpack for secondary index via JSON tree
> iteration.
>
> Need for #1012
> ---
> src/lib/json/CMakeLists.txt | 2 +
> src/lib/json/tree.c | 327 ++++++++++++++++++++++++++++++++++++++++++++
> src/lib/json/tree.h | 224 ++++++++++++++++++++++++++++++
> test/unit/json_path.c | 211 +++++++++++++++++++++++++++-
> test/unit/json_path.result | 42 +++++-
> 5 files changed, 804 insertions(+), 2 deletions(-)
> create mode 100644 src/lib/json/tree.c
> create mode 100644 src/lib/json/tree.h
First, I agree with Kostja that naming looks confusing. Please try to
come up with better names and send them to us for review before
reworking the patch so that we could agree first.
Also, once we agree, don't forget to fix comments, function and local
variable names. For example, if you decide to rename json_path_node to
json_path_token and json_path_parser to json_path_tokenizer, you should
also rename all local variables referring to them respectively, e.g.
node => token, parser => tokenizer and so forth. Please avoid calling
local variable 'entry' if the struct is called json_path_node or vice
versa, in other words be consistent.
My another concern is comments to the code and function members. They
look better than they used to be, but still not perfect. E.g. take this
comment:
> + /**
> + * Rolling hash for node calculated with
> + * json_path_node_hash(key, parent).
> + */
> + uint32_t rolling_hash;
It took me a while to realize how you calculate rolling hash - I had to
dive deep into the implementation.
Another example:
> + /** Array of child records. Match indexes. */
> + struct json_tree_node **children;
What is the order of entries in this array for a json array/map?
Please be more thorough when writing comments. Try to explain things
that might not be obvious to the reader. There are quite a few typos
and grammar mistakes, too. I don't want to go over every place in the
code - please try to do some self-review.
Plus, see a few comments inline.
> +/**
> + * Compare hash records of two json tree nodes. Return 0 if equal.
> + */
> +static inline int
> +mh_json_tree_node_cmp(const struct mh_json_tree_node *a,
> + const struct mh_json_tree_node *b)
> +{
> + if (a->key.type != b->key.type)
> + return a->key.type - b->key.type;
> + if (a->parent != b->parent)
> + return a->parent - b->parent;
> + if (a->key.type == JSON_PATH_STR) {
> + if (a->key.len != b->key.len)
> + return a->key.len - b->key.len;
> + return memcmp(a->key.str, b->key.str, a->key.len);
> + } else if (a->key.type == JSON_PATH_NUM) {
> + return a->key_hash - b->key_hash;
> + }
> + unreachable();
unreachable() may turn into nothing in release mode, in which case the
compiler will probably complain that the return value is missing.
> +}
> +
> +#define MH_SOURCE 1
> +#define mh_name _json_tree_node
> +#define mh_key_t struct mh_json_tree_node *
> +#define mh_node_t struct mh_json_tree_node
> +#define mh_arg_t void *
> +#define mh_hash(a, arg) ((a)->key_hash)
> +#define mh_hash_key(a, arg) ((a)->key_hash)
> +#define mh_cmp(a, b, arg) (mh_json_tree_node_cmp((a), (b)))
> +#define mh_cmp_key(a, b, arg) mh_cmp(a, b, arg)
> +#include "salad/mhash.h"
> +
> +static const uint32_t json_path_node_hash_seed = 13U;
> +
> +uint32_t
> +json_path_node_hash(struct json_path_node *key, uint32_t seed)
> +{
> + uint32_t h = seed;
> + uint32_t carry = 0;
> + const void *data;
> + uint32_t data_size;
> + if (key->type == JSON_PATH_STR) {
> + data = key->str;
> + data_size = key->len;
> + } else if (key->type == JSON_PATH_NUM) {
> + data = &key->num;
> + data_size = sizeof(key->num);
> + } else {
> + unreachable();
> + }
> + PMurHash32_Process(&h, &carry, data, data_size);
> + return PMurHash32_Result(h, carry, data_size);
> +}
> +
> +int
> +json_tree_create(struct json_tree *tree)
> +{
> + memset(tree, 0, sizeof(struct json_tree));
> + tree->root.rolling_hash = json_path_node_hash_seed;
> + tree->root.key.type = JSON_PATH_END;
> + tree->hash = mh_json_tree_node_new();
mh_json_tree_node_new() - but it's not a node, it's a hash table.
Please fix the name.
May be, it's worth allocating the hash table on demand? It'd simplify
the API.
> + if (unlikely(tree->hash == NULL))
Please don't use 'unlikely' on cold paths - it is meaningless.
> + return -1;
> + return 0;
> +}
> +
> +void
> +json_tree_destroy(struct json_tree *tree)
> +{
> + assert(tree->hash != NULL);
> + json_tree_node_destroy(&tree->root);
> + mh_json_tree_node_delete(tree->hash);
> +}
> +
> +void
> +json_tree_node_create(struct json_tree_node *node,
> + struct json_path_node *path_node)
> +{
> + memset(node, 0, sizeof(struct json_tree_node));
> + node->key = *path_node;
> +}
> +
> +void
> +json_tree_node_destroy(struct json_tree_node *node)
> +{
> + free(node->children);
> +}
> +
> +struct json_tree_node *
> +json_tree_lookup_by_path_node(struct json_tree *tree,
> + struct json_tree_node *parent,
> + struct json_path_node *path_node,
> + uint32_t rolling_hash)
> +{
> + assert(parent != NULL);
> + assert(rolling_hash == json_path_node_hash(path_node,
> + parent->rolling_hash));
> + struct mh_json_tree_node info;
> + info.key = *path_node;
> + info.key_hash = rolling_hash;
> + info.parent = parent;
> + mh_int_t id = mh_json_tree_node_find(tree->hash, &info, NULL);
> + if (unlikely(id == mh_end(tree->hash)))
> + return NULL;
Wouldn't it be worth looking up in the children array directly in case
of JSON_PATH_NUM? BTW, then we wouldn't have to access 'children'
directly in tuple_format_field() (next patch) thus violating
encapsulation - instead we could use this helper there as well.
> + struct mh_json_tree_node *ht_node =
> + mh_json_tree_node_node(tree->hash, id);
> + assert(ht_node == NULL || ht_node->node != NULL);
> + struct json_tree_node *ret = ht_node != NULL ? ht_node->node : NULL;
> + assert(ret == NULL || ret->parent == parent);
> + return ret;
> +}
> +
> +
> +int
> +json_tree_add(struct json_tree *tree, struct json_tree_node *parent,
> + struct json_tree_node *node, uint32_t rolling_hash)
> +{
> + assert(parent != NULL);
> + assert(node->parent == NULL);
> + assert(rolling_hash ==
> + json_path_node_hash(&node->key, parent->rolling_hash));
> + assert(json_tree_lookup_by_path_node(tree, parent, &node->key,
> + rolling_hash) == NULL);
> + uint32_t insert_idx = (node->key.type == JSON_PATH_NUM) ?
> + (uint32_t)node->key.num - 1 :
> + parent->children_count;
> + if (insert_idx >= parent->children_count) {
> + uint32_t new_size = insert_idx + 1;
We usually double the size with each allocation. If this is intentional,
please add a comment.
> + struct json_tree_node **children =
> + realloc(parent->children, new_size*sizeof(void *));
Coding style: spaces missing.
> + if (unlikely(children == NULL))
> + return -1;
> + memset(children + parent->children_count, 0,
> + (new_size - parent->children_count)*sizeof(void *));
> + parent->children = children;
> + parent->children_count = new_size;
> + }
> + assert(parent->children[insert_idx] == NULL);
> + parent->children[insert_idx] = node;
> + node->sibling_idx = insert_idx;
> + node->rolling_hash = rolling_hash;
> +
> + struct mh_json_tree_node ht_node;
> + ht_node.key = node->key;
> + ht_node.key_hash = rolling_hash;
> + ht_node.node = node;
> + ht_node.parent = parent;
> + mh_int_t rc = mh_json_tree_node_put(tree->hash, &ht_node, NULL, NULL);
> + if (unlikely(rc == mh_end(tree->hash))) {
> + parent->children[insert_idx] = NULL;
> + return -1;
> + }
> + node->parent = parent;
> + assert(json_tree_lookup_by_path_node(tree, parent, &node->key,
> + rolling_hash) == node);
> + return 0;
> +}
> +
> +void
> +json_tree_remove(struct json_tree *tree, struct json_tree_node *parent,
> + struct json_tree_node *node, uint32_t rolling_hash)
I don't think you really need this function - it's not used anywhere in
the following patches.
> diff --git a/src/lib/json/tree.h b/src/lib/json/tree.h
> new file mode 100644
> index 0000000..6166024
> --- /dev/null
> +++ b/src/lib/json/tree.h
> @@ -0,0 +1,224 @@
> +#ifndef TARANTOOL_JSON_TREE_H_INCLUDED
> +#define TARANTOOL_JSON_TREE_H_INCLUDED
> +/*
> + * Copyright 2010-2018 Tarantool AUTHORS: please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + * copyright notice, this list of conditions and the
> + * following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer in the documentation and/or other materials
> + * provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include "small/rlist.h"
rlist isn't used anymore
> +#include "path.h"
> +#include <stdio.h>
> +#include <inttypes.h>
why do you need inttypes?
> +#include <stdint.h>
stdint is included twice
Please cleanup the headers.
> +#include <stdlib.h>
> +
> +#ifdef __cplusplus
> +extern "C"
> +{
> +#endif
> +
> +struct mh_json_tree_node_t;
> +
> +/**
> + * JSON tree is a hierarchical data structure that organize JSON
> + * nodes produced by parser. Key record point to source strings
> + * memory.
> + */
> +struct json_tree_node {
> + /** JSON path node produced by json_next_token. */
> + struct json_path_node key;
> + /**
> + * Rolling hash for node calculated with
> + * json_path_node_hash(key, parent).
> + */
> + uint32_t rolling_hash;
> + /** Array of child records. Match indexes. */
> + struct json_tree_node **children;
> + /** Size of children array. */
> + uint32_t children_count;
> + /** Index of node in parent children array. */
> + uint32_t sibling_idx;
> + /** Pointer to parent node. */
> + struct json_tree_node *parent;
> +};
> +
> +/** JSON tree object managing data relations. */
> +struct json_tree {
> + /** JSON tree root node. */
> + struct json_tree_node root;
> + /** Hashtable af all tree nodes. */
> + struct mh_json_tree_node_t *hash;
> +};
> +
> +/** Create a JSON tree object to manage data relations. */
> +int
> +json_tree_create(struct json_tree *tree);
> +
> +/**
> + * Destroy JSON tree object. This routine doesn't subtree so
> + * should be called at the end of it's manual destruction.
> + */
> +void
> +json_tree_destroy(struct json_tree *tree);
> +
> +/** Compute the hash value of a JSON path component. */
> +uint32_t
> +json_path_node_hash(struct json_path_node *key, uint32_t seed);
> +
> +/** Init a JSON tree node. */
> +void
> +json_tree_node_create(struct json_tree_node *node,
> + struct json_path_node *path_node);
> +
> +/** Destroy a JSON tree node. */
> +void
> +json_tree_node_destroy(struct json_tree_node *node);
> +
> +/**
> + * Make child lookup in tree by path_node at position specified
> + * with parent. Rolling hash should be calculated for path_node
> + * and parent with json_path_node_hash.
> + */
> +struct json_tree_node *
> +json_tree_lookup_by_path_node(struct json_tree *tree,
> + struct json_tree_node *parent,
> + struct json_path_node *path_node,
> + uint32_t rolling_hash);
I really don't like that the caller has to pass rolling_hash explicitly
as this obfuscates the API. Judging by the following patches, you always
compute it as json_path_node_hash(path_node, parent->rolling_hash) so
why not just hide it under the hood? Same goes for json_tree_add() and
json_tree_remove().
Also, I'd rather call this function simply json_tree_lookup().
> +
> +/**
> + * Append node to the given parent position in JSON tree. The
> + * parent mustn't have a child with such content. Rolling
> + * hash should be calculated for path_node and parent with
> + * json_path_node_hash.
> + */
> +int
> +json_tree_add(struct json_tree *tree, struct json_tree_node *parent,
> + struct json_tree_node *node, uint32_t rolling_hash);
> +
> +/**
> + * Delete a JSON tree node at the given parent position in JSON
> + * tree. The parent node must have such child. Rolling hash should
> + * be calculated for path_node and parent with
> + * json_path_node_hash.
> + */
> +void
> +json_tree_remove(struct json_tree *tree, struct json_tree_node *parent,
> + struct json_tree_node *node, uint32_t rolling_hash);
> +
> +/** Make child lookup by path in JSON tree. */
> +struct json_tree_node *
> +json_tree_lookup_by_path(struct json_tree *tree, struct json_tree_node *parent,
> + const char *path, uint32_t path_len);
> +
> +/** Make pre order traversal in JSON tree. */
> +struct json_tree_node *
> +json_tree_next_pre(struct json_tree_node *parent,
> + struct json_tree_node *pos);
> +
> +/** Make post order traversal in JSON tree. */
> +struct json_tree_node *
> +json_tree_next_post(struct json_tree_node *parent,
> + struct json_tree_node *pos);
> +
> +/** Return entry by json_tree_node item. */
> +#define json_tree_entry(item, type, member) ({ \
> + const typeof( ((type *)0)->member ) *__mptr = (item); \
> + (type *)( (char *)__mptr - ((size_t) &((type *)0)->member) ); })
> +
> +/** Return entry by json_tree_node item or NULL if item is NULL.*/
> +#define json_tree_entry_safe(item, type, member) ({ \
> + (item) != NULL ? json_tree_entry((item), type, member) : NULL; })
> +
> +/** Make lookup in tree by path and return entry. */
> +#define json_tree_lookup_entry_by_path(tree, parent, path, path_len, type, \
> + member) \
> +({struct json_tree_node *__tree_node = \
> + json_tree_lookup_by_path((tree), (parent), path, path_len); \
> + __tree_node != NULL ? json_tree_entry(__tree_node, type, member) : NULL; })
> +
> +/** Make lookup in tree by path_node and return entry. */
> +#define json_tree_lookup_entry_by_path_node(tree, parent, path_node, \
> + path_node_hash, type, member) \
> +({struct json_tree_node *__tree_node = \
> + json_tree_lookup_by_path_node((tree), (parent), path_node, \
> + path_node_hash); \
> + __tree_node != NULL ? json_tree_entry(__tree_node, type, member) : \
> + NULL; })
> +
> +/** Make pre-order traversal in JSON tree. */
> +#define json_tree_foreach_pre(item, root) \
> +for ((item) = json_tree_next_pre((root), NULL); (item) != NULL; \
> + (item) = json_tree_next_pre((root), (item)))
> +
> +/** Make post-order traversal in JSON tree. */
> +#define json_tree_foreach_post(item, root) \
> +for ((item) = json_tree_next_post((root), NULL); (item) != NULL; \
> + (item) = json_tree_next_post((root), (item)))
> +
> +/**
> + * Make safe post-order traversal in JSON tree.
> + * Safe for building destructors.
> + */
> +#define json_tree_foreach_safe(item, root) \
> +for (struct json_tree_node *__iter = json_tree_next_post((root), NULL); \
> + (((item) = __iter) && (__iter = json_tree_next_post((root), (item))), \
> + (item) != NULL);)
> +
> +/** Make post-order traversal in JSON tree and return entry. */
> +#define json_tree_foreach_entry_pre(item, root, type, member) \
> +for (struct json_tree_node *__iter = \
> + json_tree_next_pre((root), NULL); \
> + __iter != NULL && ((item) = json_tree_entry(__iter, type, member)); \
> + __iter = json_tree_next_pre((root), __iter))
> +
> +/** Make pre-order traversal in JSON tree and return entry. */
> +#define json_tree_foreach_entry_post(item, root, type, member) \
> +for (struct json_tree_node *__iter = \
> + json_tree_next_post((root), NULL ); \
> + __iter != NULL && ((item) = json_tree_entry(__iter, type, member)); \
> + __iter = json_tree_next_post((root), __iter))
> +
> +/**
> + * Make secure post-order traversal in JSON tree and return entry.
> + */
> +#define json_tree_foreach_entry_safe(item, root, type, member) \
> +for (struct json_tree_node *__tmp_iter, *__iter = \
> + json_tree_next_post((root), NULL); \
> + __iter != NULL && ((item) = json_tree_entry(__iter, type, member)) && \
> + (__iter != NULL && (__tmp_iter = \
> + json_tree_next_post((root), __iter))), \
> + (__iter != NULL && ((item) = json_tree_entry(__iter, type, member))); \
> + __iter = __tmp_iter)
This is barely understandable. Please rewrite without introducing extra
loop variable, similarly to how rlist_foreach_entry_safe is implemented.
Also, we use tabs, not spaces for indentation.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v5 08/12] box: introduce JSON indexes
2018-10-29 6:56 ` [PATCH v5 08/12] box: introduce JSON indexes Kirill Shcherbatov
@ 2018-11-20 16:52 ` Vladimir Davydov
2018-11-26 10:50 ` [tarantool-patches] " Kirill Shcherbatov
0 siblings, 1 reply; 39+ messages in thread
From: Vladimir Davydov @ 2018-11-20 16:52 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches
On Mon, Oct 29, 2018 at 09:56:40AM +0300, Kirill Shcherbatov wrote:
> @@ -305,6 +411,7 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
> format->dict = dict;
> tuple_dictionary_ref(dict);
> }
> + format->allocation_size = allocation_size;
> format->refs = 0;
> format->id = FORMAT_ID_NIL;
> format->index_field_count = index_field_count;
> @@ -412,6 +519,77 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
> return true;
> }
You didn't patch this function, but you probably had to.
Here's something for you to fix:
box.cfg{}
s = box.schema.space.create('test')
i = s:create_index('pk', {parts = {{1, 'integer', path = '[1]'}}})
s:insert{{-1}}
i:alter{parts = {{1, 'string', path = '[1]'}}} -- success
s:insert{{'a'}} -- crash
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v5 05/12] lib: implement JSON tree class for json library
2018-11-20 16:43 ` Vladimir Davydov
@ 2018-11-21 10:37 ` Kirill Shcherbatov
2018-11-26 10:50 ` Kirill Shcherbatov
1 sibling, 0 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-11-21 10:37 UTC (permalink / raw)
To: tarantool-patches, Vladimir Davydov; +Cc: Kostya Osipov
> First, I agree with Kostja that naming looks confusing. Please try to
> come up with better names and send them to us for review before
> reworking the patch so that we could agree first.
I like the following names:
/*********************************************/
/*****/ JSON Parser class: /*****/
/*********************************************/
json_path_token, json_path_parser,
json_path_parser_create()..
/*********************************************/
/*****/ MHash for JSON Tree class: /*****/
/*********************************************/
mh_json_tree_t
mh_json_tree_cmp(), mh_json_tree_find(),
mh_json_tree_node().....
/*********************************************/
/*****/ JSON Tree class: /*****/
/*********************************************/
json_path_node_hash
json_tree, json_tree_node, json_tree_create(...), json_tree_node_create(...),
struct json_tree_node *
json_tree_lookup(struct json_tree *tree, struct json_tree_node *parent,
const char *path, uint32_t path_len)
--- we need parent entry as our paths are relative, i.e. tuple[fieldno][{jsonpath}]
struct json_tree_entry *
json_tree_token_lookup(struct json_tree *tree, struct json_tree_node *parent,
json_path_token *token)
type *
json_tree_lookup_entry(tree, parent, path, path_len, type, member)
type *
json_tree_token_lookup_entry(tree, parent, token, type, member)
json_tree_foreach_entry_preorder(node, &tree.root, struct test_struct,
tree_node) {
}
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v5 08/12] box: introduce JSON indexes
2018-11-20 16:52 ` Vladimir Davydov
@ 2018-11-26 10:50 ` Kirill Shcherbatov
0 siblings, 0 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:50 UTC (permalink / raw)
To: tarantool-patches, Vladimir Davydov; +Cc: Kostya Osipov
> You didn't patch this function, but you probably had to.
> Here's something for you to fix:
>
> box.cfg{}
> s = box.schema.space.create('test')
> i = s:create_index('pk', {parts = {{1, 'integer', path = '[1]'}}})
> s:insert{{-1}}
> i:alter{parts = {{1, 'string', path = '[1]'}}} -- success
> s:insert{{'a'}} -- crash
Thank you for this case. I've introduced new checks in tuple_format1_can_store_format2_tuples
and have written few new tests in test sute.
bool
tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
struct tuple_format *format2)
{
if (format1->exact_field_count != format2->exact_field_count)
return false;
struct tuple_field *field1;
struct json_token *field2_prev_token = NULL;
struct json_token *skip_root_token = NULL;
struct json_token *field1_prev_token = &format1->tree.root;
json_tree_foreach_entry_preorder(field1, &format1->tree.root,
struct tuple_field, token) {
/* Test if subtree skip is required. */
if (skip_root_token != NULL) {
struct json_token *tmp = &field1->token;
while (tmp->parent != NULL &&
tmp->parent != skip_root_token)
tmp = tmp->parent;
if (tmp->parent == skip_root_token)
continue;
}
skip_root_token = NULL;
/* Lookup for a valid parent node in new tree. */
while (field1_prev_token != field1->token.parent) {
field1_prev_token = field1_prev_token->parent;
field2_prev_token = field2_prev_token->parent;
assert(field1_prev_token != NULL);
}
struct tuple_field *field2 =
json_tree_lookup_entry(&format2->tree, field2_prev_token,
&field1->token,
struct tuple_field, token);
/*
* The field has a data type in format1, but has
* no data type in format2.
*/
if (field2 == NULL) {
/*
* The field can get a name added
* for it, and this doesn't require a data
* check.
* If the field is defined as not
* nullable, however, we need a data
* check, since old data may contain
* NULLs or miss the subject field.
*/
if (field1->type == FIELD_TYPE_ANY &&
tuple_field_is_nullable(field1)) {
skip_root_token = &field1->token;
continue;
} else {
return false;
}
}
if (! field_type1_contains_type2(field1->type, field2->type))
return false;
/*
* Do not allow transition from nullable to non-nullable:
* it would require a check of all data in the space.
*/
if (tuple_field_is_nullable(field2) &&
!tuple_field_is_nullable(field1))
return false;
field2_prev_token = &field2->token;
field1_prev_token = &field1->token;
}
return true;
}
-- incompatible format change
s = box.schema.space.create('test')
i = s:create_index('pk', {parts = {{1, 'integer', path = '[1]'}}})
s:insert{{-1}}
i:alter{parts = {{1, 'string', path = '[1]'}}}
s:insert{{'a'}}
i:drop()
i = s:create_index('pk', {parts = {{1, 'integer', path = '[1].FIO'}}})
s:insert{{{FIO=-1}}}
i:alter{parts = {{1, 'integer', path = '[1][1]'}}}
i:alter{parts = {{1, 'integer', path = '[1].FIO[1]'}}}
s:drop()
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v5 07/12] lib: introduce json_path_normalize routine
2018-11-20 15:13 ` Vladimir Davydov
@ 2018-11-26 10:50 ` Kirill Shcherbatov
0 siblings, 0 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:50 UTC (permalink / raw)
To: tarantool-patches, Vladimir Davydov; +Cc: Kostya Osipov
> TBO I find the idea that we have to normalize json paths just to compare
> them a bit of an overkill. Why not simply introduce json_path_cmp() and
> use it instead? All those functions using key_part_path_cmp() are cold
> paths so that wouldn't degrade performance.
Ugum, now we use a new json_path_cmp routine in such scenarios. This allows
us to use paths in form that user has specified.
It is an interesting moment affecting error handling. The first path specified for
routine is assumed to be valid. When the second path contains an error, the first
path is assumed to be greater. In fact, this means that we catch an info that such
paths are differ on index_def validations and error would raised during format
creation, when path would be parsed manually next time. We have tests for this
case.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v5 05/12] lib: implement JSON tree class for json library
2018-11-20 16:43 ` Vladimir Davydov
2018-11-21 10:37 ` [tarantool-patches] " Kirill Shcherbatov
@ 2018-11-26 10:50 ` Kirill Shcherbatov
1 sibling, 0 replies; 39+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:50 UTC (permalink / raw)
To: tarantool-patches, Vladimir Davydov; +Cc: Kostya Osipov
> First, I agree with Kostja that naming looks confusing. Please try to
> come up with better names and send them to us for review before
> reworking the patch so that we could agree first.
>
> Also, once we agree, don't forget to fix comments, function and local
> variable names. For example, if you decide to rename json_path_node to
> json_path_token and json_path_parser to json_path_tokenizer, you should
> also rename all local variables referring to them respectively, e.g.
> node => token, parser => tokenizer and so forth. Please avoid calling
> local variable 'entry' if the struct is called json_path_node or vice
> versa, in other words be consistent.
I've renamed parser class in separate patch:
json_path_parser parser --> json_lexer lexer
json_path_node node --> json_token token
Then, I've moved lexer class to json.{c, h} headers, have made old token
internals as a key-part of new json_token class that is used to organize
JSON tree.
So we have json_lexer class that emits json_token(s) that may be add(ed)
in json_tree class managing them with mh_json hashtable.
>
> My another concern is comments to the code and function members. They
> look better than they used to be, but still not perfect. E.g. take this
> comment:
I've tried to be more pedantic describing details.
> What is the order of entries in this array for a json array/map?
>
> Please be more thorough when writing comments. Try to explain things
> that might not be obvious to the reader. There are quite a few typos
> and grammar mistakes, too. I don't want to go over every place in the
> code - please try to do some self-review.
>
> Plus, see a few comments inline.
Ugum
> unreachable() may turn into nothing in release mode, in which case the
> compiler will probably complain that the return value is missing.
Ok, fixed.
> mh_json_tree_node_new() - but it's not a node, it's a hash table.
> Please fix the name.
mh_json_new()
> May be, it's worth allocating the hash table on demand? It'd simplify
> the API.
> Please don't use 'unlikely' on cold paths - it is meaningless.
Ok.
> Wouldn't it be worth looking up in the children array directly in case
> of JSON_PATH_NUM? BTW, then we wouldn't have to access 'children'
> directly in tuple_format_field() (next patch) thus violating
> encapsulation - instead we could use this helper there as well.
Now works this way.
> We usually double the size with each allocation. If this is intentional,
> please add a comment.
Ok, this required additional field child_size be introduced.
> I don't think you really need this function - it's not used anywhere in
> the following patches.
Now json_tree_del is an important part of destructors. Now we have a
consistent API: json_token emitted by lexer doesn't require construction.
Dynamic members constructions are performed on json_tree_add routine.
So pair json_tree_del releases memory respectively
> rlist isn't used anymore
> why do you need inttypes?
> stdint is included twice
> Please cleanup the headers.
Done.
> I really don't like that the caller has to pass rolling_hash explicitly
> as this obfuscates the API. Judging by the following patches, you always
> compute it as json_path_node_hash(path_node, parent->rolling_hash) so
> why not just hide it under the hood? Same goes for json_tree_add() and
> json_tree_remove().
>
> Also, I'd rather call this function simply json_tree_lookup().
Already done.
> This is barely understandable. Please rewrite without introducing extra
> loop variable, similarly to how rlist_foreach_entry_safe is implemented.
> Also, we use tabs, not spaces for indentation.
Like this:
/** Make pre-order traversal in JSON tree and return entry. */
#define json_tree_foreach_entry_postorder(node, root, type, member) \
for ((node) = json_tree_postorder_next_entry(NULL, (root), type, member); \
(node) != NULL; \
(node) = json_tree_postorder_next_entry(&(node)->member, (root), type, \
member))
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2018-11-26 10:50 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-29 6:56 [PATCH v5 00/12] box: indexes by JSON path Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 01/12] box: refactor key_def_find routine Kirill Shcherbatov
2018-11-19 17:48 ` Vladimir Davydov
2018-10-29 6:56 ` [PATCH v5 10/12] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 11/12] box: introduce offset slot cache in key_part Kirill Shcherbatov
2018-11-01 13:32 ` [tarantool-patches] " Konstantin Osipov
2018-11-06 12:15 ` [tarantool-patches] " Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 12/12] box: specify indexes in user-friendly form Kirill Shcherbatov
2018-11-01 13:34 ` [tarantool-patches] " Konstantin Osipov
2018-11-01 14:18 ` Konstantin Osipov
2018-11-06 12:15 ` [tarantool-patches] " Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 02/12] box: introduce key_def_parts_are_sequential Kirill Shcherbatov
2018-11-01 14:23 ` [tarantool-patches] " Konstantin Osipov
2018-11-06 12:14 ` [tarantool-patches] " Kirill Shcherbatov
2018-11-19 17:48 ` Vladimir Davydov
2018-10-29 6:56 ` [PATCH v5 03/12] box: introduce tuple_field_go_to_path Kirill Shcherbatov
2018-11-19 17:48 ` Vladimir Davydov
2018-10-29 6:56 ` [PATCH v5 04/12] box: introduce tuple_format_add_key_part Kirill Shcherbatov
2018-11-01 14:38 ` [tarantool-patches] " Konstantin Osipov
2018-11-06 12:15 ` [tarantool-patches] " Kirill Shcherbatov
2018-11-19 17:50 ` Vladimir Davydov
2018-10-29 6:56 ` [PATCH v5 05/12] lib: implement JSON tree class for json library Kirill Shcherbatov
2018-11-01 15:08 ` [tarantool-patches] " Konstantin Osipov
2018-11-06 12:15 ` [tarantool-patches] " Kirill Shcherbatov
2018-11-19 17:53 ` Vladimir Davydov
2018-11-20 16:43 ` Vladimir Davydov
2018-11-21 10:37 ` [tarantool-patches] " Kirill Shcherbatov
2018-11-26 10:50 ` Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 06/12] box: manage format fields with JSON tree class Kirill Shcherbatov
2018-10-29 6:56 ` [PATCH v5 07/12] lib: introduce json_path_normalize routine Kirill Shcherbatov
2018-11-01 15:22 ` [tarantool-patches] " Konstantin Osipov
2018-11-01 15:27 ` [tarantool-patches] " Kirill Shcherbatov
2018-11-20 15:13 ` Vladimir Davydov
2018-11-26 10:50 ` Kirill Shcherbatov
2018-11-20 15:14 ` Vladimir Davydov
2018-10-29 6:56 ` [PATCH v5 08/12] box: introduce JSON indexes Kirill Shcherbatov
2018-11-20 16:52 ` Vladimir Davydov
2018-11-26 10:50 ` [tarantool-patches] " Kirill Shcherbatov
2018-10-29 6:56 ` [tarantool-patches] [PATCH v5 09/12] box: introduce has_json_paths flag in templates Kirill Shcherbatov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox