From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounce@freelists.org>
Received: from localhost (localhost [127.0.0.1])
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id EE0872B26E
	for <tarantool-patches@freelists.org>; Sat,  6 Oct 2018 08:37:19 -0400 (EDT)
Received: from turing.freelists.org ([127.0.0.1])
	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id nL8M3SFxJm1Y for <tarantool-patches@freelists.org>;
	Sat,  6 Oct 2018 08:37:19 -0400 (EDT)
Received: from smtp57.i.mail.ru (smtp57.i.mail.ru [217.69.128.37])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 3EBB52AE04
	for <tarantool-patches@freelists.org>; Sat,  6 Oct 2018 08:37:19 -0400 (EDT)
Received: by smtp57.i.mail.ru with esmtpa (envelope-from <kostja@tarantool.org>)
	id 1g8lpc-0000t5-Si
	for tarantool-patches@freelists.org; Sat, 06 Oct 2018 15:37:17 +0300
Date: Sat, 6 Oct 2018 15:37:14 +0300
From: Konstantin Osipov <kostja@tarantool.org>
Subject: [tarantool-patches] Re: [PATCH v1 1/1] lib: implement JSON tree
 class for json library
Message-ID: <20181006123714.GC1380@chai>
References: <51204844504cf430967af4414fb98b552c8f07f7.1538396782.git.kshcherbatov@tarantool.org>
 <20181001213433.i3btyzgp6p4agfpm@esperanza>
 <3e073ef2-bc89-f4a3-0154-110b68cd0139@tarantool.org>
 <20181004225908.GR22855@chai>
 <20181005095012.jirfl46btgrphyct@esperanza>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20181005095012.jirfl46btgrphyct@esperanza>
Sender: tarantool-patches-bounce@freelists.org
Errors-to: tarantool-patches-bounce@freelists.org
Reply-To: tarantool-patches@freelists.org
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <tarantool-patches-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: tarantool-patches <tarantool-patches.freelists.org>
List-subscribe: <tarantool-patches-request@freelists.org?Subject=subscribe>
List-owner: <mailto:>
List-post: <mailto:tarantool-patches@freelists.org>
List-archive: <http://www.freelists.org/archives/tarantool-patches>
To: tarantool-patches@freelists.org

* Vladimir Davydov <vdavydov.dev@gmail.com> [18/10/05 12:52]:
> There will be such a hash table in each root tuple_field object (it's
> not a part of the generic json tree infrastructure). It will be used to
> speed up data indexing. Why would we need to bother implementing lookup
> in a tree then? Because I assume we want to speed up lookups over all
> sub-paths, e.g. suppose we have a tuple

Why do we maintain a separate hash for tuple field names and json
paths within a tuple?

json paths are just a continuation of existing ways to access a
tuple field.

Looks like we need to rebuild existing implementation of field
names support, not add something on a side.

> 
> t = {
>      {
>       customer = {
>        name = {
>         first = ...,
>         last = ...,
>        },
>        ...
>       }
>      },
>      ...
>     }
> 
> Then we can lookup 'customer.name.first' in Lua using any of the
> following ways:
> 
>   t['[1].customer.name.first']
>   t[1]['customer.name.first']
>   t[1].customer['name.first']
>   t[1].customer.name.first
> 
>   etc
> 
> I assume we want to use tuple field map so as to avoid msgpack decode
> whenever possible, not just for cases 1 or 2. Moreover, I guess
> t[1].customer.name.first will be more commonly used.


> 
> We could of course store a hash table over all full paths in each
> intermediate node, but that would result in maintaining way too many
> hash entries, e.g. for a tuple field corresponding to 'customer', we
> would have to store all the following strings in the hash: 'name',
> 'name.first' and so on for other descendants. So I think that we only
> want to use the hash table for data indexing, i.e. when a field is
> looked up by full path like t[1]['customer.name.first']. For all other
> references we will split the path into nodes and descend down the tree
> to find the appropriate tuple field map slot.

I would try to create a single hash table for all possible paths,
including intermediate paths, and store it in the tuple format.

The hash table would store pointers to json tree nodes.

And I also think we need json tree nodes for top level fields -
they are just the same as intermediate fields.

I would also keep in mind that in the future we will automatically
flatten tuples according to a schema before storing in a space.

Flattening, from path access perspective, is simply a replacement
of one path with another  (foo.bar.baz -> [1][1][3]).

> Note, we need the tree in any case, even if we decide not to use it for
> lookups at all - it's necessary for building a tuple field map without
> unpacking the same prefix multiple times, e.g. if we have two indexed
> paths
> 
>   a.b.c.d
>   a.b.c.e
> 
> Then we want to decode 'a.b.c' only once, not twice. The tree helps
> achieve that.

OK, this is clear.

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov