Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Kirill Shcherbatov <kshcherbatov@tarantool.org>,
	tarantool-patches@freelists.org,
	Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [tarantool-patches] Re: [PATCH v1 1/1] rfc: describe a Tarantool JSON indexes
Date: Tue, 31 Jul 2018 02:17:56 +0300	[thread overview]
Message-ID: <568589a8-6108-dd3e-688c-cb2477caa69f@tarantool.org> (raw)
In-Reply-To: <659ea582-a614-6e5b-2b09-c283f8b30460@tarantool.org>

I think we can resurrect hash of JSON path strings if
will store JSON strings in a canonical form. Hash allows
to improve performance for fields which have offsets in
the both old and new formats. For the old format we will
lookup by hash, for new we will use cache slot. It allows
to avoid going down the tuple_field trees and to decode
JSON on each comparison.

It is easy to convert a path into a canonical form. For
example, you can convert each identifier in the path
to ["ident"] form.

     .ident    -> ["ident"]
     ["ident"] -> unchanged
     [number]  -> unchanged

With these rules we can store all paths in the same
way.

On 30/07/2018 23:19, Vladislav Shpilevoy wrote:
> Thanks for the fixes! Now I have no any major remark for
> the RFC and the implementation method as well.
> 
> On 30/07/2018 22:23, Kirill Shcherbatov wrote:
>> Actual link
>> https://github.com/tarantool/tarantool/blob/db468f172e642d3d830018a20e545eff77c655e3/doc/rfc/1012-json-indexes.md
>>
>>> I still do not clearly understand what are you talking about. You can have
>>> different offsets in the same path: 1) [1][2][3].field and 2) [1][2][3]["field"].
>>> Here 'field' has offset 10 in the first case and 11 in the second
>>> one.
>>>
>>> If you want to use prefix length in off.cache on comparison to walk the
>>> tuple_field trees along with the path in key_part on mismatch of cache
>>> versions, then you should explain more clear how do you want to use
>>> off.cache.
>>>
>>> Lets suppose you have a key_part with a JSON path and a trees array.
>>> To determine into which tree you first go to find offset_slot, you
>>> should parse first path part. Same for each next part - you should
>>> parse it to go down the tree. So you just do not know into which
>>> tuple_field you go until the next part of the path is parsed. And how
>>> does tuple_field.off_cache help here?
>>>
>>> And what will you do when you met a format which does not have
>>> an offset for the needed field? For example, I have created an
>>> index, inserted multiple tuples, then created another index. The
>>> format is changed, but the old tuples have the old format that
>>> does not have an offset to the parts of the new index.
>> You are right. This feature won't work.
>>
>>> As I said, it is not special index tree. It can describe a space
>>> format with tens of fields among which only one is indexed.
>>> Index_tree is not correct name. It is tuple_field array as it is
>>> now, where each field is just a struct tuple_field. And struct
>>> tuple_field can contain more tuple_fields inside either as an array
>>> (if the field type is array) or inside a tree/hash if it is map.
>> Ok, fixed.
>>
> 

      reply	other threads:[~2018-07-30 23:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-26 11:15 [tarantool-patches] " Kirill Shcherbatov
2018-07-27 15:10 ` Vladimir Davydov
2018-07-30 12:02   ` [tarantool-patches] " Kirill Shcherbatov
2018-07-30 12:21     ` Vladimir Davydov
2018-07-30 13:45     ` Vladislav Shpilevoy
2018-07-30 14:09       ` Kirill Shcherbatov
2018-07-30 16:14       ` Kirill Shcherbatov
2018-07-30 18:46         ` Vladislav Shpilevoy
2018-07-30 19:23           ` Kirill Shcherbatov
2018-07-30 20:19             ` Vladislav Shpilevoy
2018-07-30 23:17               ` Vladislav Shpilevoy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=568589a8-6108-dd3e-688c-cb2477caa69f@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=kshcherbatov@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --cc=vdavydov.dev@gmail.com \
    --subject='Re: [tarantool-patches] Re: [PATCH v1 1/1] rfc: describe a Tarantool JSON indexes' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox