Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Konstantin Osipov <kostja@tarantool.org>
Cc: tarantool-patches@freelists.org
Subject: Re: [tarantool-patches] Re: [PATCH v1 1/1] lib: implement JSON tree class for json library
Date: Fri, 5 Oct 2018 12:50:12 +0300	[thread overview]
Message-ID: <20181005095012.jirfl46btgrphyct@esperanza> (raw)
In-Reply-To: <20181004225908.GR22855@chai>

On Fri, Oct 05, 2018 at 01:59:08AM +0300, Konstantin Osipov wrote:
> * Kirill Shcherbatov <kshcherbatov@tarantool.org> [18/10/03 15:28]:
> 
> Before we proceed with everything else, please, why do you need to
> parse a path to look it up?

Let me try to explain, because I'm backing this implementation.

How else can you look up a json path in a tree without splitting it in
nodes?

The only way I see is maintaining a hash table mapping full paths to
tree nodes.

There will be such a hash table in each root tuple_field object (it's
not a part of the generic json tree infrastructure). It will be used to
speed up data indexing. Why would we need to bother implementing lookup
in a tree then? Because I assume we want to speed up lookups over all
sub-paths, e.g. suppose we have a tuple

t = {
     {
      customer = {
       name = {
        first = ...,
        last = ...,
       },
       ...
      }
     },
     ...
    }

Then we can lookup 'customer.name.first' in Lua using any of the
following ways:

  t['[1].customer.name.first']
  t[1]['customer.name.first']
  t[1].customer['name.first']
  t[1].customer.name.first

  etc

I assume we want to use tuple field map so as to avoid msgpack decode
whenever possible, not just for cases 1 or 2. Moreover, I guess
t[1].customer.name.first will be more commonly used.

We could of course store a hash table over all full paths in each
intermediate node, but that would result in maintaining way too many
hash entries, e.g. for a tuple field corresponding to 'customer', we
would have to store all the following strings in the hash: 'name',
'name.first' and so on for other descendants. So I think that we only
want to use the hash table for data indexing, i.e. when a field is
looked up by full path like t[1]['customer.name.first']. For all other
references we will split the path into nodes and descend down the tree
to find the appropriate tuple field map slot.

Note, we need the tree in any case, even if we decide not to use it for
lookups at all - it's necessary for building a tuple field map without
unpacking the same prefix multiple times, e.g. if we have two indexed
paths

  a.b.c.d
  a.b.c.e

Then we want to decode 'a.b.c' only once, not twice. The tree helps
achieve that.

  reply	other threads:[~2018-10-05  9:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-01 12:27 Kirill Shcherbatov
2018-10-01 21:34 ` Vladimir Davydov
2018-10-03  7:15   ` [tarantool-patches] " Kirill Shcherbatov
2018-10-04 22:59     ` Konstantin Osipov
2018-10-05  9:50       ` Vladimir Davydov [this message]
2018-10-06 12:37         ` Konstantin Osipov
2018-10-08 10:07           ` Vladimir Davydov
2018-10-04 22:54 ` [tarantool-patches] " Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181005095012.jirfl46btgrphyct@esperanza \
    --to=vdavydov.dev@gmail.com \
    --cc=kostja@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [tarantool-patches] Re: [PATCH v1 1/1] lib: implement JSON tree class for json library' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox