From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id EE0872B26E for ; Sat, 6 Oct 2018 08:37:19 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nL8M3SFxJm1Y for ; Sat, 6 Oct 2018 08:37:19 -0400 (EDT) Received: from smtp57.i.mail.ru (smtp57.i.mail.ru [217.69.128.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 3EBB52AE04 for ; Sat, 6 Oct 2018 08:37:19 -0400 (EDT) Received: by smtp57.i.mail.ru with esmtpa (envelope-from ) id 1g8lpc-0000t5-Si for tarantool-patches@freelists.org; Sat, 06 Oct 2018 15:37:17 +0300 Date: Sat, 6 Oct 2018 15:37:14 +0300 From: Konstantin Osipov Subject: [tarantool-patches] Re: [PATCH v1 1/1] lib: implement JSON tree class for json library Message-ID: <20181006123714.GC1380@chai> References: <51204844504cf430967af4414fb98b552c8f07f7.1538396782.git.kshcherbatov@tarantool.org> <20181001213433.i3btyzgp6p4agfpm@esperanza> <3e073ef2-bc89-f4a3-0154-110b68cd0139@tarantool.org> <20181004225908.GR22855@chai> <20181005095012.jirfl46btgrphyct@esperanza> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181005095012.jirfl46btgrphyct@esperanza> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org * Vladimir Davydov [18/10/05 12:52]: > There will be such a hash table in each root tuple_field object (it's > not a part of the generic json tree infrastructure). It will be used to > speed up data indexing. Why would we need to bother implementing lookup > in a tree then? Because I assume we want to speed up lookups over all > sub-paths, e.g. suppose we have a tuple Why do we maintain a separate hash for tuple field names and json paths within a tuple? json paths are just a continuation of existing ways to access a tuple field. Looks like we need to rebuild existing implementation of field names support, not add something on a side. > > t = { > { > customer = { > name = { > first = ..., > last = ..., > }, > ... > } > }, > ... > } > > Then we can lookup 'customer.name.first' in Lua using any of the > following ways: > > t['[1].customer.name.first'] > t[1]['customer.name.first'] > t[1].customer['name.first'] > t[1].customer.name.first > > etc > > I assume we want to use tuple field map so as to avoid msgpack decode > whenever possible, not just for cases 1 or 2. Moreover, I guess > t[1].customer.name.first will be more commonly used. > > We could of course store a hash table over all full paths in each > intermediate node, but that would result in maintaining way too many > hash entries, e.g. for a tuple field corresponding to 'customer', we > would have to store all the following strings in the hash: 'name', > 'name.first' and so on for other descendants. So I think that we only > want to use the hash table for data indexing, i.e. when a field is > looked up by full path like t[1]['customer.name.first']. For all other > references we will split the path into nodes and descend down the tree > to find the appropriate tuple field map slot. I would try to create a single hash table for all possible paths, including intermediate paths, and store it in the tuple format. The hash table would store pointers to json tree nodes. And I also think we need json tree nodes for top level fields - they are just the same as intermediate fields. I would also keep in mind that in the future we will automatically flatten tuples according to a schema before storing in a space. Flattening, from path access perspective, is simply a replacement of one path with another (foo.bar.baz -> [1][1][3]). > Note, we need the tree in any case, even if we decide not to use it for > lookups at all - it's necessary for building a tuple field map without > unpacking the same prefix multiple times, e.g. if we have two indexed > paths > > a.b.c.d > a.b.c.e > > Then we want to decode 'a.b.c' only once, not twice. The tree helps > achieve that. OK, this is clear. -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov