[tarantool-patches] Re: [PATCH v1 1/1] rfc: describe a Tarantool JSON indexes

Mon Jul 30 21:46:28 MSK 2018

On 30/07/2018 19:14, Kirill Shcherbatov wrote:
> Hi! Thank you for review. I've accounted all your suggestions.
> 
> On 30.07.2018 16:45, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the new version! See 12 comments below.
>>
>> Vova, please, look at my comments and say what do you think.
> We have verbally discuss all of this with Vova.

It is cool, but I do not know what you have discussed. I guess
such things should be done either via email, or verbally with
all participants.

> 
>> 2. As I remember from verbal discussion, we've decided to do not store
>> offsets for intermediate nodes. It is too expensive. You actually purpose
>> to store an offset for each tuple field, even non-indexed. In such a case
>> the field_map would become bigger than the tuple payload. Field_map is
>> very expensive storage and should not store non-needed offsets. So you should
>> not have an offset on [name], on [birthday]. Only on [first] and [last].
> I've already answered with previous letter that this is not slot_offset that allocated as a part of tuple.
> "off. cache" is only implementation-specific detail that allows start parsing with most relevant offset
> on tree traversal.

I still do not clearly understand what are you talking about. You can have
different offsets in the same path: 1) [1][2][3].field and 2) [1][2][3]["field"].
Here 'field' has offset 10 in the first case and 11 in the second
one.

If you want to use prefix length in off.cache on comparison to walk the
tuple_field trees along with the path in key_part on mismatch of cache
versions, then you should explain more clear how do you want to use
off.cache.

Lets suppose you have a key_part with a JSON path and a trees array.
To determine into which tree you first go to find offset_slot, you
should parse first path part. Same for each next part - you should
parse it to go down the tree. So you just do not know into which
tuple_field you go until the next part of the path is parsed. And how
does tuple_field.off_cache help here?

And what will you do when you met a format which does not have
an offset for the needed field? For example, I have created an
index, inserted multiple tuples, then created another index. The
format is changed, but the old tuples have the old format that
does not have an offset to the parts of the new index.

>> 8. This is the array of trees. It is not array + tree in a separate
>> field. You have array of trees where i-th tree describes format of
>> the i-th field and its internals. Some of tree-nodes have offsets
>> and some are just to validate the format. Do not forget that these
>> trees are going to be used for space:format validation. Offset_slot
>> is a part of tuple_field, even now, and is filled optionally if the
>> field is a part of an index.
> struct tuple_format {
>    ...
> 
>    /** Epoch of tuple format. */
>    uint32_t epoch;
>    /** Array of data_path trees built for indexes. */
>    TREE index_tree[0];

As I said, it is not special index tree. It can describe a space
format with tens of fields among which only one is indexed.
Index_tree is not correct name. It is tuple_field array as it is
now, where each field is just a struct tuple_field. And struct
tuple_field can contain more tuple_fields inside either as an array
(if the field type is array) or inside a tree/hash if it is map.

> };
> ```
> 
> Hm, perhaps it is the time to include you and Vova to RFC authors? Don't know does it matter.

Up to you %) If you want to be a single author, you are welcome.