[tarantool-patches] Re: [PATCH v1 1/1] rfc: describe a Tarantool JSON indexes

Kirill Shcherbatov kshcherbatov at tarantool.org
Mon Jul 30 19:14:36 MSK 2018

Hi! Thank you for review. I've accounted all your suggestions.

On 30.07.2018 16:45, Vladislav Shpilevoy wrote:
> Hi! Thanks for the new version! See 12 comments below.
> Vova, please, look at my comments and say what do you think.
We have verbally discuss all of this with Vova.

> 1. It is not correct. I can declare JSON index on enclosed arrays like this
Fields having complex document structure should have 'map' or 'array' type in format if specified.

> 2. As I remember from verbal discussion, we've decided to do not store
> offsets for intermediate nodes. It is too expensive. You actually purpose
> to store an offset for each tuple field, even non-indexed. In such a case
> the field_map would become bigger than the tuple payload. Field_map is
> very expensive storage and should not store non-needed offsets. So you should
> not have an offset on [name], on [birthday]. Only on [first] and [last].
I've already answered with previous letter that this is not slot_offset that allocated as a part of tuple. 
"off. cache" is only implementation-specific detail that allows start parsing with most relevant offset 
on tree traversal.

> What is more, this example do not match the code above. Firstly you
> operate with name/town, then with name/birthday, then again with
> name/town. It confuses. Do you have an index on 'birthday'? If not,
> then why do you need an offset on it?
I've fixed this example. And have improved all schemas to be more associative.
> 3. The table is malformed. Some problems with white spaces. 3 points
> into the middle of 'Richard', additional '+' up to 'Feynman' etc.
Uguhm, fixed "+"s; But this not a middle of offset; Take a look to a new version

> 4. Offset in tuple_format is number of offset slot in a tuple. Index
> in the field_map array actually. So in the format offset slots always
> are sequence of natural numbers + 0: 0, 1, 2, 3 ... with no holes. So
> I do not understand why do you have 0, 2, 3, 4 in the format instead
> of 0, 1, 2, 3.
This is some sample offsets; Same as (3).
> 5. Looks like you have a problem with offset_slot/slot_offset naming. Please,
> use one of them, not both.
Renamed all to offset_slot.

> 6. What is the same as tuple_format? Looks like the first part of the
> sentence is lost.
And extend *tuple_format* where such epoch would be initialized with a new bigger(when source key_parts have changed) value on creation.

> 7. It is not for JSON indexes only and not for offsets only as well. It
> is a tree to validate any JSON space format constructed from
> space:format and indexes. And not all of the nodes will have offset
> slots.

> 8. This is the array of trees. It is not array + tree in a separate
> field. You have array of trees where i-th tree describes format of
> the i-th field and its internals. Some of tree-nodes have offsets
> and some are just to validate the format. Do not forget that these
> trees are going to be used for space:format validation. Offset_slot
> is a part of tuple_field, even now, and is filled optionally if the
> field is a part of an index.
struct tuple_format {

  /** Epoch of tuple format. */
  uint32_t epoch;
  /** Array of data_path trees built for indexes. */
  TREE index_tree[0];

> 9. On 'NO' branch the key_part cache will oscillate. Example: you have a
> tuple1 with format1 and key_part synced with format1. Then you add new
> index and insert new tuples: tuple[2-N] with format2. Then index compares
> the tuples. Each time when you compare tuple1 vs any of tuple[2-N] you
> will change key_part cache version to the old one and make slower either
> new tuples or old tuples. It is not ok.
> Moreover you did not take into account that on comparison of tuples
> you have two formats in a common case. Up to which will you update
> key_part cache version?
> I think that fast fix is to update cache version only to the biggest one.
> And the best solution is find a way how to update key_defs on alter and
> do not handle this case during comparison. We need Vova's opinion here.
Discussed with Vova, fixed.
PARSE def->parts[idx]->data_path and observe tuple_format(tuple)->json_tree structure,  
UPDATE def->parts[idx].slot_epoch and def->parts[idx].slot_cache IF format epoch is bigger

> 10. HASTABLE? Maybe 'hash', not 'has'?

> 11. The real problem of hashes is not linked with the one you described here.
> Hash will work for any formats and will be used on comparison only. Not on
> insertion. On insertion you need to decode the whole tuple to build offset
> map and validate the format going along tuple field trees. Not by JSON path
> of an index. On insertion you have no any paths. You have only tuple and the
> array of trees of tuple fields. The real problem is that these paths can be
> logically identical, but actually have some differences. For example, use
> ["ident"] in one index and .ident in another one.
I've used yor explanation in RFC.

> 12. No such word 're-allocatable'.

Hm, perhaps it is the time to include you and Vova to RFC authors? Don't know does it matter.

More information about the Tarantool-patches mailing list