From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Sat, 22 Sep 2018 22:15:31 +0300 From: Vladimir Davydov Subject: Re: [PATCH v3.1 3/5] box: introduce JSON indexes Message-ID: <20180922191531.srfjdd27fur3m3l5@esperanza> References: <5888720ea2e84d6b148ff65c7a7877ce934f33ce.1537429335.git.kshcherbatov@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5888720ea2e84d6b148ff65c7a7877ce934f33ce.1537429335.git.kshcherbatov@tarantool.org> To: Kirill Shcherbatov Cc: tarantool-patches@freelists.org List-ID: Hello, First, the RFC got lost somewhere between reviews. Please resurrect it. (Personally, I'm against committing RFCs to the repository, but rules are rules). Second, there are a few places in the patch where you simultaneously move and modify huge chunks of code. Please don't do that, because it makes the patch very difficult to review. Try to avoid moving chunks of code, but if you absolutely have to, then do it in a separate patch. Also, don't ever leave stray hunks left from previous versions (there are a few, too). I won't bother pointing you to exact places in the patch - you'll have to review your patch and find them by yourself. Now, about the patch itself. I've a few major comments, which will probably force you to rework a big part of your patch. I won't give you a line-by-line review until those are resolved or you convince me to leave them be: 1. For some reason, you require that the first component of the json path points to the tuple field, i.e. if a tuple field has index 3, then the json path specified by the user and stored in the _index system space must start with '[3]' too. Such a redundancy looks unnecessary and dubious. I assume that you do that so that you can maintain a single hash mapping normalized path to tuple_field in tuple_format. If so, that doesn't justify this decision, as you can as well keep a hash map in each root tuple_field. Please make the path point to data within a field, excluding the field itself. 2. AFAIU (I may be wrong, but it looks like it to me) the hash mapping a normalized path to a tuple field (the one that is in tuple_format) is a pure optimization. Hence it makes sense to submit it in a separate patch, applied on top of the main one. Since the sole purpose of path normalization is this optimization, it should be factored out and submitted separately, too. 3. Allocation of json paths after key_def and tuple_format structures looks completely pointless to me. Apart from complicating the code, this micro optimization achieves nothing. It looks especially ridiculous in case of tuple_format, which has to maintain a hash table, which implies making multiple allocations anyway. Please remove it and allocate the paths in a normal way, with malloc(). BTW, there's another place where you micro manage memory - it's json path normalization procedure, where you try to reuse a buffer left from the previous loop iteration. Again, don't do that, it isn't worth the complexity it brings to the code. 4. All json related methods should be defined in src/lib/json as independent entities. This concerns json path validation and normalization, as well as json tree construction and traversal. And each of them should be submitted in a separate patch accompanied by a unit test. Designing the json tree class and the corresponding traversal method will probably require some mental effort, but it'll be worth it, because it'll let you remove some duplicate code, as well as split tuple format validation and field map construction, which are currently done by the same function depending on the arguments (this looks bad). Not mentioning that it will also make the code much easier to follow and review. Those are the major points, but there are also a few minor ones concerning the coding technique that I couldn't help noticing: - Very poor comments - it looks like you write them just to follow the code style guideline. There are a lot of subtleties in this patch (to name a few, what the canonical form of a json path is and why it is needed, what the tuple_field tree looks like and why it exists, how json paths are allocated and where they are stored), but you never elaborate in the comments - you only briefly mention what a function does or what a struct field stores, but never explain why or what's the catch. I understand that writing comments in English is difficult (it is for me too! Sometimes I spend more time on writing a comment than on the code), but it is a lame excuse. Please take your time, use a dictionary, consult with your colleagues to write good comments that are easy to read and that shed some light on implementation details so that one can understand what's going on without diving deep in the implementation. - Weird function names - what's tuple_field_bypass_and_init supposed to mean, for example? If you looked up "bypass" you'd see that it means "go past/around" or "avoid/evade". This is far from what this function is about, which is traversing the tuple field tree. Probably this is related to my previous point... Please be more scrupulous when it comes to function and variable naming, otherwise your code will sound hilarious. - Rather strange way to divide the code into functions. For instance, take tuple_field_dig_with_parser. Why would you beget such a monster and let it out in the open? Outside tuple_format.c you only need two ways of looking up data by path - absolute (i.e. including the field number) for Lua and relative (i.e. without the field number) for extracting a key. So introduce two functions with appropriate names, say tuple_field_by_relative_path and tuple_field_by_absolute_path (think about the names, please). If we ever need the full power of tuple_field_dig_with_parser (the name sucks BTW) somewhere else, we will refactor the code again. Another example is tuple_field_by_part. I'd expect it to simply call tuple_field or tuple_field_by_path, depending on whether the part has a json path attached to it or not. However, this function also does an optimistic lookup in tuple_format->path_hash. Why can't we do this lookup in tuple_field_by_path? Don't we want to optimize Lua calls to tuple_field_by_path too? Oh wait, it does a lookup in the hash... What's going on there? Those are the minor points. Again, I won't be pointing to exact places in the patch, because that would be too tedious. I expect you to go over your patch and fix them on your own. I'll correct you in the next review round if you miss anything. And please don't forget to update the comment to the patch and probably the RFC when you fix the major points (provided we agree on them, of course). Thanks, Vladimir