From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 5 Oct 2018 12:50:12 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] Re: [PATCH v1 1/1] lib: implement JSON tree class for json library Message-ID: <20181005095012.jirfl46btgrphyct@esperanza> References: <51204844504cf430967af4414fb98b552c8f07f7.1538396782.git.kshcherbatov@tarantool.org> <20181001213433.i3btyzgp6p4agfpm@esperanza> <3e073ef2-bc89-f4a3-0154-110b68cd0139@tarantool.org> <20181004225908.GR22855@chai> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181004225908.GR22855@chai> To: Konstantin Osipov Cc: tarantool-patches@freelists.org List-ID: On Fri, Oct 05, 2018 at 01:59:08AM +0300, Konstantin Osipov wrote: > * Kirill Shcherbatov [18/10/03 15:28]: > > Before we proceed with everything else, please, why do you need to > parse a path to look it up? Let me try to explain, because I'm backing this implementation. How else can you look up a json path in a tree without splitting it in nodes? The only way I see is maintaining a hash table mapping full paths to tree nodes. There will be such a hash table in each root tuple_field object (it's not a part of the generic json tree infrastructure). It will be used to speed up data indexing. Why would we need to bother implementing lookup in a tree then? Because I assume we want to speed up lookups over all sub-paths, e.g. suppose we have a tuple t = { { customer = { name = { first = ..., last = ..., }, ... } }, ... } Then we can lookup 'customer.name.first' in Lua using any of the following ways: t['[1].customer.name.first'] t[1]['customer.name.first'] t[1].customer['name.first'] t[1].customer.name.first etc I assume we want to use tuple field map so as to avoid msgpack decode whenever possible, not just for cases 1 or 2. Moreover, I guess t[1].customer.name.first will be more commonly used. We could of course store a hash table over all full paths in each intermediate node, but that would result in maintaining way too many hash entries, e.g. for a tuple field corresponding to 'customer', we would have to store all the following strings in the hash: 'name', 'name.first' and so on for other descendants. So I think that we only want to use the hash table for data indexing, i.e. when a field is looked up by full path like t[1]['customer.name.first']. For all other references we will split the path into nodes and descend down the tree to find the appropriate tuple field map slot. Note, we need the tree in any case, even if we decide not to use it for lookups at all - it's necessary for building a tuple field map without unpacking the same prefix multiple times, e.g. if we have two indexed paths a.b.c.d a.b.c.e Then we want to decode 'a.b.c' only once, not twice. The tree helps achieve that.