[tarantool-patches] Re: [PATCH v1 1/1] lib: implement JSON tree class for json library

Fri Oct 5 12:50:12 MSK 2018

On Fri, Oct 05, 2018 at 01:59:08AM +0300, Konstantin Osipov wrote:
> * Kirill Shcherbatov <kshcherbatov at tarantool.org> [18/10/03 15:28]:
> 
> Before we proceed with everything else, please, why do you need to
> parse a path to look it up?

Let me try to explain, because I'm backing this implementation.

How else can you look up a json path in a tree without splitting it in
nodes?

The only way I see is maintaining a hash table mapping full paths to
tree nodes.

There will be such a hash table in each root tuple_field object (it's
not a part of the generic json tree infrastructure). It will be used to
speed up data indexing. Why would we need to bother implementing lookup
in a tree then? Because I assume we want to speed up lookups over all
sub-paths, e.g. suppose we have a tuple

t = {
     {
      customer = {
       name = {
        first = ...,
        last = ...,
       },
       ...
      }
     },
     ...
    }

Then we can lookup 'customer.name.first' in Lua using any of the
following ways:

  t['[1].customer.name.first']
  t[1]['customer.name.first']
  t[1].customer['name.first']
  t[1].customer.name.first

  etc

I assume we want to use tuple field map so as to avoid msgpack decode
whenever possible, not just for cases 1 or 2. Moreover, I guess
t[1].customer.name.first will be more commonly used.

We could of course store a hash table over all full paths in each
intermediate node, but that would result in maintaining way too many
hash entries, e.g. for a tuple field corresponding to 'customer', we
would have to store all the following strings in the hash: 'name',
'name.first' and so on for other descendants. So I think that we only
want to use the hash table for data indexing, i.e. when a field is
looked up by full path like t[1]['customer.name.first']. For all other
references we will split the path into nodes and descend down the tree
to find the appropriate tuple field map slot.

Note, we need the tree in any case, even if we decide not to use it for
lookups at all - it's necessary for building a tuple field map without
unpacking the same prefix multiple times, e.g. if we have two indexed
paths

  a.b.c.d
  a.b.c.e

Then we want to decode 'a.b.c' only once, not twice. The tree helps
achieve that.