Re: [tarantool-patches] Re: [PATCH v1 1/1] rfc: describe a Tarantool JSON indexes

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Kirill Shcherbatov <kshcherbatov@tarantool.org>
To: tarantool-patches@freelists.org,
	Vladislav Shpilevoy <v.shpilevoy@tarantool.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [tarantool-patches] Re: [PATCH v1 1/1] rfc: describe a Tarantool JSON indexes
Date: Mon, 30 Jul 2018 19:14:36 +0300	[thread overview]
Message-ID: <3a467b72-cff9-cc19-7dec-358b5a020e62@tarantool.org> (raw)
In-Reply-To: <fcfdcde0-3ffc-9a89-5400-4be2f1769fe1@tarantool.org>

Hi! Thank you for review. I've accounted all your suggestions.

On 30.07.2018 16:45, Vladislav Shpilevoy wrote:
> Hi! Thanks for the new version! See 12 comments below.
> 
> Vova, please, look at my comments and say what do you think.
We have verbally discuss all of this with Vova.

> 1. It is not correct. I can declare JSON index on enclosed arrays like this
Fields having complex document structure should have 'map' or 'array' type in format if specified.

> 2. As I remember from verbal discussion, we've decided to do not store
> offsets for intermediate nodes. It is too expensive. You actually purpose
> to store an offset for each tuple field, even non-indexed. In such a case
> the field_map would become bigger than the tuple payload. Field_map is
> very expensive storage and should not store non-needed offsets. So you should
> not have an offset on [name], on [birthday]. Only on [first] and [last].
I've already answered with previous letter that this is not slot_offset that allocated as a part of tuple. 
"off. cache" is only implementation-specific detail that allows start parsing with most relevant offset 
on tree traversal.

> What is more, this example do not match the code above. Firstly you
> operate with name/town, then with name/birthday, then again with
> name/town. It confuses. Do you have an index on 'birthday'? If not,
> then why do you need an offset on it?
I've fixed this example. And have improved all schemas to be more associative.
                                        ^
> 3. The table is malformed. Some problems with white spaces. 3 points
> into the middle of 'Richard', additional '+' up to 'Feynman' etc.
Uguhm, fixed "+"s; But this not a middle of offset; Take a look to a new version

> 4. Offset in tuple_format is number of offset slot in a tuple. Index
> in the field_map array actually. So in the format offset slots always
> are sequence of natural numbers + 0: 0, 1, 2, 3 ... with no holes. So
> I do not understand why do you have 0, 2, 3, 4 in the format instead
> of 0, 1, 2, 3.
This is some sample offsets; Same as (3).
> 5. Looks like you have a problem with offset_slot/slot_offset naming. Please,
> use one of them, not both.
Renamed all to offset_slot.

> 6. What is the same as tuple_format? Looks like the first part of the
> sentence is lost.
Fixed.
And extend *tuple_format* where such epoch would be initialized with a new bigger(when source key_parts have changed) value on creation.

> 7. It is not for JSON indexes only and not for offsets only as well. It
> is a tree to validate any JSON space format constructed from
> space:format and indexes. And not all of the nodes will have offset
> slots.
Fixed.

> 8. This is the array of trees. It is not array + tree in a separate
> field. You have array of trees where i-th tree describes format of
> the i-th field and its internals. Some of tree-nodes have offsets
> and some are just to validate the format. Do not forget that these
> trees are going to be used for space:format validation. Offset_slot
> is a part of tuple_field, even now, and is filled optionally if the
> field is a part of an index.
struct tuple_format {
  ...

  /** Epoch of tuple format. */
  uint32_t epoch;
  /** Array of data_path trees built for indexes. */
  TREE index_tree[0];
};
```

> 9. On 'NO' branch the key_part cache will oscillate. Example: you have a
> tuple1 with format1 and key_part synced with format1. Then you add new
> index and insert new tuples: tuple[2-N] with format2. Then index compares
> the tuples. Each time when you compare tuple1 vs any of tuple[2-N] you
> will change key_part cache version to the old one and make slower either
> new tuples or old tuples. It is not ok.
> 
> Moreover you did not take into account that on comparison of tuples
> you have two formats in a common case. Up to which will you update
> key_part cache version?
> 
> I think that fast fix is to update cache version only to the biggest one.
> 
> And the best solution is find a way how to update key_defs on alter and
> do not handle this case during comparison. We need Vova's opinion here.
Discussed with Vova, fixed.
PARSE def->parts[idx]->data_path and observe tuple_format(tuple)->json_tree structure,  
UPDATE def->parts[idx].slot_epoch and def->parts[idx].slot_cache IF format epoch is bigger

> 10. HASTABLE? Maybe 'hash', not 'has'?
Yep.

> 11. The real problem of hashes is not linked with the one you described here.
> Hash will work for any formats and will be used on comparison only. Not on
> insertion. On insertion you need to decode the whole tuple to build offset
> map and validate the format going along tuple field trees. Not by JSON path
> of an index. On insertion you have no any paths. You have only tuple and the
> array of trees of tuple fields. The real problem is that these paths can be
> logically identical, but actually have some differences. For example, use
> ["ident"] in one index and .ident in another one.
I've used yor explanation in RFC.

> 12. No such word 're-allocatable'.
Reallocatable.


Hm, perhaps it is the time to include you and Vova to RFC authors? Don't know does it matter.

next prev parent reply	other threads:[~2018-07-30 16:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-26 11:15 [tarantool-patches] " Kirill Shcherbatov
2018-07-27 15:10 ` Vladimir Davydov
2018-07-30 12:02   ` [tarantool-patches] " Kirill Shcherbatov
2018-07-30 12:21     ` Vladimir Davydov
2018-07-30 13:45     ` Vladislav Shpilevoy
2018-07-30 14:09       ` Kirill Shcherbatov
2018-07-30 16:14       ` Kirill Shcherbatov [this message]
2018-07-30 18:46         ` Vladislav Shpilevoy
2018-07-30 19:23           ` Kirill Shcherbatov
2018-07-30 20:19             ` Vladislav Shpilevoy
2018-07-30 23:17               ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3a467b72-cff9-cc19-7dec-358b5a020e62@tarantool.org \
    --to=kshcherbatov@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --cc=v.shpilevoy@tarantool.org \
    --cc=vdavydov.dev@gmail.com \
    --subject='Re: [tarantool-patches] Re: [PATCH v1 1/1] rfc: describe a Tarantool JSON indexes' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox