From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [tarantool-patches] Re: [PATCH v1 1/1] rfc: describe a Tarantool JSON indexes From: Vladislav Shpilevoy Reply-To: tarantool-patches@freelists.org References: <7192ba6c28bf9cd637f7e1e5263bbf9771cc6f44.1532603654.git.kshcherbatov@tarantool.org> <20180727151013.goyfa4uuf7nl7nou@esperanza> <3a467b72-cff9-cc19-7dec-358b5a020e62@tarantool.org> <275eccb5-b77d-a2ed-7ee2-c002a28cd096@tarantool.org> <634b65cd-f902-6a9e-c3bb-580a38dbb9c9@tarantool.org> <659ea582-a614-6e5b-2b09-c283f8b30460@tarantool.org> Message-ID: <568589a8-6108-dd3e-688c-cb2477caa69f@tarantool.org> Date: Tue, 31 Jul 2018 02:17:56 +0300 MIME-Version: 1.0 In-Reply-To: <659ea582-a614-6e5b-2b09-c283f8b30460@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit To: Kirill Shcherbatov , tarantool-patches@freelists.org, Vladimir Davydov List-ID: I think we can resurrect hash of JSON path strings if will store JSON strings in a canonical form. Hash allows to improve performance for fields which have offsets in the both old and new formats. For the old format we will lookup by hash, for new we will use cache slot. It allows to avoid going down the tuple_field trees and to decode JSON on each comparison. It is easy to convert a path into a canonical form. For example, you can convert each identifier in the path to ["ident"] form. .ident -> ["ident"] ["ident"] -> unchanged [number] -> unchanged With these rules we can store all paths in the same way. On 30/07/2018 23:19, Vladislav Shpilevoy wrote: > Thanks for the fixes! Now I have no any major remark for > the RFC and the implementation method as well. > > On 30/07/2018 22:23, Kirill Shcherbatov wrote: >> Actual link >> https://github.com/tarantool/tarantool/blob/db468f172e642d3d830018a20e545eff77c655e3/doc/rfc/1012-json-indexes.md >> >>> I still do not clearly understand what are you talking about. You can have >>> different offsets in the same path: 1) [1][2][3].field and 2) [1][2][3]["field"]. >>> Here 'field' has offset 10 in the first case and 11 in the second >>> one. >>> >>> If you want to use prefix length in off.cache on comparison to walk the >>> tuple_field trees along with the path in key_part on mismatch of cache >>> versions, then you should explain more clear how do you want to use >>> off.cache. >>> >>> Lets suppose you have a key_part with a JSON path and a trees array. >>> To determine into which tree you first go to find offset_slot, you >>> should parse first path part. Same for each next part - you should >>> parse it to go down the tree. So you just do not know into which >>> tuple_field you go until the next part of the path is parsed. And how >>> does tuple_field.off_cache help here? >>> >>> And what will you do when you met a format which does not have >>> an offset for the needed field? For example, I have created an >>> index, inserted multiple tuples, then created another index. The >>> format is changed, but the old tuples have the old format that >>> does not have an offset to the parts of the new index. >> You are right. This feature won't work. >> >>> As I said, it is not special index tree. It can describe a space >>> format with tens of fields among which only one is indexed. >>> Index_tree is not correct name. It is tuple_field array as it is >>> now, where each field is just a struct tuple_field. And struct >>> tuple_field can contain more tuple_fields inside either as an array >>> (if the field type is array) or inside a tree/hash if it is map. >> Ok, fixed. >> >