From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [tarantool-patches] Re: [PATCH v1 1/1] rfc: describe a Tarantool JSON indexes References: <7192ba6c28bf9cd637f7e1e5263bbf9771cc6f44.1532603654.git.kshcherbatov@tarantool.org> <20180727151013.goyfa4uuf7nl7nou@esperanza> From: Kirill Shcherbatov Message-ID: <3a467b72-cff9-cc19-7dec-358b5a020e62@tarantool.org> Date: Mon, 30 Jul 2018 19:14:36 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit To: tarantool-patches@freelists.org, Vladislav Shpilevoy , Vladimir Davydov List-ID: Hi! Thank you for review. I've accounted all your suggestions. On 30.07.2018 16:45, Vladislav Shpilevoy wrote: > Hi! Thanks for the new version! See 12 comments below. > > Vova, please, look at my comments and say what do you think. We have verbally discuss all of this with Vova. > 1. It is not correct. I can declare JSON index on enclosed arrays like this Fields having complex document structure should have 'map' or 'array' type in format if specified. > 2. As I remember from verbal discussion, we've decided to do not store > offsets for intermediate nodes. It is too expensive. You actually purpose > to store an offset for each tuple field, even non-indexed. In such a case > the field_map would become bigger than the tuple payload. Field_map is > very expensive storage and should not store non-needed offsets. So you should > not have an offset on [name], on [birthday]. Only on [first] and [last]. I've already answered with previous letter that this is not slot_offset that allocated as a part of tuple. "off. cache" is only implementation-specific detail that allows start parsing with most relevant offset on tree traversal. > What is more, this example do not match the code above. Firstly you > operate with name/town, then with name/birthday, then again with > name/town. It confuses. Do you have an index on 'birthday'? If not, > then why do you need an offset on it? I've fixed this example. And have improved all schemas to be more associative. ^ > 3. The table is malformed. Some problems with white spaces. 3 points > into the middle of 'Richard', additional '+' up to 'Feynman' etc. Uguhm, fixed "+"s; But this not a middle of offset; Take a look to a new version > 4. Offset in tuple_format is number of offset slot in a tuple. Index > in the field_map array actually. So in the format offset slots always > are sequence of natural numbers + 0: 0, 1, 2, 3 ... with no holes. So > I do not understand why do you have 0, 2, 3, 4 in the format instead > of 0, 1, 2, 3. This is some sample offsets; Same as (3). > 5. Looks like you have a problem with offset_slot/slot_offset naming. Please, > use one of them, not both. Renamed all to offset_slot. > 6. What is the same as tuple_format? Looks like the first part of the > sentence is lost. Fixed. And extend *tuple_format* where such epoch would be initialized with a new bigger(when source key_parts have changed) value on creation. > 7. It is not for JSON indexes only and not for offsets only as well. It > is a tree to validate any JSON space format constructed from > space:format and indexes. And not all of the nodes will have offset > slots. Fixed. > 8. This is the array of trees. It is not array + tree in a separate > field. You have array of trees where i-th tree describes format of > the i-th field and its internals. Some of tree-nodes have offsets > and some are just to validate the format. Do not forget that these > trees are going to be used for space:format validation. Offset_slot > is a part of tuple_field, even now, and is filled optionally if the > field is a part of an index. struct tuple_format { ... /** Epoch of tuple format. */ uint32_t epoch; /** Array of data_path trees built for indexes. */ TREE index_tree[0]; }; ``` > 9. On 'NO' branch the key_part cache will oscillate. Example: you have a > tuple1 with format1 and key_part synced with format1. Then you add new > index and insert new tuples: tuple[2-N] with format2. Then index compares > the tuples. Each time when you compare tuple1 vs any of tuple[2-N] you > will change key_part cache version to the old one and make slower either > new tuples or old tuples. It is not ok. > > Moreover you did not take into account that on comparison of tuples > you have two formats in a common case. Up to which will you update > key_part cache version? > > I think that fast fix is to update cache version only to the biggest one. > > And the best solution is find a way how to update key_defs on alter and > do not handle this case during comparison. We need Vova's opinion here. Discussed with Vova, fixed. PARSE def->parts[idx]->data_path and observe tuple_format(tuple)->json_tree structure, UPDATE def->parts[idx].slot_epoch and def->parts[idx].slot_cache IF format epoch is bigger > 10. HASTABLE? Maybe 'hash', not 'has'? Yep. > 11. The real problem of hashes is not linked with the one you described here. > Hash will work for any formats and will be used on comparison only. Not on > insertion. On insertion you need to decode the whole tuple to build offset > map and validate the format going along tuple field trees. Not by JSON path > of an index. On insertion you have no any paths. You have only tuple and the > array of trees of tuple fields. The real problem is that these paths can be > logically identical, but actually have some differences. For example, use > ["ident"] in one index and .ident in another one. I've used yor explanation in RFC. > 12. No such word 're-allocatable'. Reallocatable. Hm, perhaps it is the time to include you and Vova to RFC authors? Don't know does it matter.