From: Kirill Shcherbatov <kshcherbatov@tarantool.org> To: tarantool-patches@freelists.org, Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, Vladimir Davydov <vdavydov.dev@gmail.com> Subject: Re: [tarantool-patches] Re: [PATCH v1 1/1] rfc: describe a Tarantool JSON indexes Date: Mon, 30 Jul 2018 19:14:36 +0300 [thread overview] Message-ID: <3a467b72-cff9-cc19-7dec-358b5a020e62@tarantool.org> (raw) In-Reply-To: <fcfdcde0-3ffc-9a89-5400-4be2f1769fe1@tarantool.org> Hi! Thank you for review. I've accounted all your suggestions. On 30.07.2018 16:45, Vladislav Shpilevoy wrote: > Hi! Thanks for the new version! See 12 comments below. > > Vova, please, look at my comments and say what do you think. We have verbally discuss all of this with Vova. > 1. It is not correct. I can declare JSON index on enclosed arrays like this Fields having complex document structure should have 'map' or 'array' type in format if specified. > 2. As I remember from verbal discussion, we've decided to do not store > offsets for intermediate nodes. It is too expensive. You actually purpose > to store an offset for each tuple field, even non-indexed. In such a case > the field_map would become bigger than the tuple payload. Field_map is > very expensive storage and should not store non-needed offsets. So you should > not have an offset on [name], on [birthday]. Only on [first] and [last]. I've already answered with previous letter that this is not slot_offset that allocated as a part of tuple. "off. cache" is only implementation-specific detail that allows start parsing with most relevant offset on tree traversal. > What is more, this example do not match the code above. Firstly you > operate with name/town, then with name/birthday, then again with > name/town. It confuses. Do you have an index on 'birthday'? If not, > then why do you need an offset on it? I've fixed this example. And have improved all schemas to be more associative. ^ > 3. The table is malformed. Some problems with white spaces. 3 points > into the middle of 'Richard', additional '+' up to 'Feynman' etc. Uguhm, fixed "+"s; But this not a middle of offset; Take a look to a new version > 4. Offset in tuple_format is number of offset slot in a tuple. Index > in the field_map array actually. So in the format offset slots always > are sequence of natural numbers + 0: 0, 1, 2, 3 ... with no holes. So > I do not understand why do you have 0, 2, 3, 4 in the format instead > of 0, 1, 2, 3. This is some sample offsets; Same as (3). > 5. Looks like you have a problem with offset_slot/slot_offset naming. Please, > use one of them, not both. Renamed all to offset_slot. > 6. What is the same as tuple_format? Looks like the first part of the > sentence is lost. Fixed. And extend *tuple_format* where such epoch would be initialized with a new bigger(when source key_parts have changed) value on creation. > 7. It is not for JSON indexes only and not for offsets only as well. It > is a tree to validate any JSON space format constructed from > space:format and indexes. And not all of the nodes will have offset > slots. Fixed. > 8. This is the array of trees. It is not array + tree in a separate > field. You have array of trees where i-th tree describes format of > the i-th field and its internals. Some of tree-nodes have offsets > and some are just to validate the format. Do not forget that these > trees are going to be used for space:format validation. Offset_slot > is a part of tuple_field, even now, and is filled optionally if the > field is a part of an index. struct tuple_format { ... /** Epoch of tuple format. */ uint32_t epoch; /** Array of data_path trees built for indexes. */ TREE index_tree[0]; }; ``` > 9. On 'NO' branch the key_part cache will oscillate. Example: you have a > tuple1 with format1 and key_part synced with format1. Then you add new > index and insert new tuples: tuple[2-N] with format2. Then index compares > the tuples. Each time when you compare tuple1 vs any of tuple[2-N] you > will change key_part cache version to the old one and make slower either > new tuples or old tuples. It is not ok. > > Moreover you did not take into account that on comparison of tuples > you have two formats in a common case. Up to which will you update > key_part cache version? > > I think that fast fix is to update cache version only to the biggest one. > > And the best solution is find a way how to update key_defs on alter and > do not handle this case during comparison. We need Vova's opinion here. Discussed with Vova, fixed. PARSE def->parts[idx]->data_path and observe tuple_format(tuple)->json_tree structure, UPDATE def->parts[idx].slot_epoch and def->parts[idx].slot_cache IF format epoch is bigger > 10. HASTABLE? Maybe 'hash', not 'has'? Yep. > 11. The real problem of hashes is not linked with the one you described here. > Hash will work for any formats and will be used on comparison only. Not on > insertion. On insertion you need to decode the whole tuple to build offset > map and validate the format going along tuple field trees. Not by JSON path > of an index. On insertion you have no any paths. You have only tuple and the > array of trees of tuple fields. The real problem is that these paths can be > logically identical, but actually have some differences. For example, use > ["ident"] in one index and .ident in another one. I've used yor explanation in RFC. > 12. No such word 're-allocatable'. Reallocatable. Hm, perhaps it is the time to include you and Vova to RFC authors? Don't know does it matter.
next prev parent reply other threads:[~2018-07-30 16:14 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-07-26 11:15 [tarantool-patches] " Kirill Shcherbatov 2018-07-27 15:10 ` Vladimir Davydov 2018-07-30 12:02 ` [tarantool-patches] " Kirill Shcherbatov 2018-07-30 12:21 ` Vladimir Davydov 2018-07-30 13:45 ` Vladislav Shpilevoy 2018-07-30 14:09 ` Kirill Shcherbatov 2018-07-30 16:14 ` Kirill Shcherbatov [this message] 2018-07-30 18:46 ` Vladislav Shpilevoy 2018-07-30 19:23 ` Kirill Shcherbatov 2018-07-30 20:19 ` Vladislav Shpilevoy 2018-07-30 23:17 ` Vladislav Shpilevoy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=3a467b72-cff9-cc19-7dec-358b5a020e62@tarantool.org \ --to=kshcherbatov@tarantool.org \ --cc=tarantool-patches@freelists.org \ --cc=v.shpilevoy@tarantool.org \ --cc=vdavydov.dev@gmail.com \ --subject='Re: [tarantool-patches] Re: [PATCH v1 1/1] rfc: describe a Tarantool JSON indexes' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox