[tarantool-patches] Re: [PATCH v1 4/5] box: introduce path_hash and tuple_field tree

Kirill Shcherbatov kshcherbatov at tarantool.org
Wed Aug 15 15:14:45 MSK 2018


> 1. A commit of a patchset should not fix errors introduced in
> the same patchset. Please, merge this hunk into the previous commit.
No such problem already.

> 2. For flags and flag calculators please use 'is/are'.
Ok, fixed.

> 3. The 'for' consists of two lines, so should have {}.
Ok, fixed.

> 4. I found a problem with your keys extractor and tried to
> reproduce it with a simple test but got another crash even
> earlier than I expected:
Thanks, fixed as a part of global reworking.

> But this is not what I had wanted to test. The initial
> problem I found is about how you extract keys: on extract key
> slowpath functions (both raw and not) you iterate over
> the very first level of a tuple and treat its entire
> fields as key parts, but it is wrong now. In tuple key
> extractor you should dig into the field when it is a
> JSON path part. And this is where you again need the
> template parameter 'is_flat' to omit such check for flat
> indexes.
> 
> The original test I wanted to run was this:
> 
> 	box.cfg{}
> 	s = box.schema.create_space('test')
> 	parts = {}
> 	parts[1] = {'[1][1]', 'unsigned'}
> 	pk = s:create_index('pk', {parts = parts})
> 	s:insert{{1, 1}}
> 	s:upsert({{1}}, {{'+', 2, 1}})
> 
> Here, I think, upsert will try to extract key 1, but
> will actually extract entire {1} and crash in some place.
No such problem already. I've included such tests in my patchset.

> 5. Exactly the same hash already exists in assoc.h as mh_strnptr, as
> I said you verbally.
Ok, I use it now.

> 6. You do not need to announce these functions. Just move the
> implementation here.
I need them defined in header for now. 

> 7. Use mh_strnptr_find_inp.
Ok, everywhere when possible.

> 8. Realloc works ok with NULL in the first argument, so you
> can merge this if-else branching. If part->num > size then realloc
> + memset of the gap. No special calloc.
Ok, done.

> 9. Please, preallocate the needed hash elements count once, using
> mh reserve() function.
Ok, I do this for format hash now.

> 10. Please, allocate json paths memory in the same block as the format.
> See tuple_format_alloc. This allows to do not do strdups here, do not
> use a 'free_path' flag in json_path_hash_delete, and reduces memory
> fragmentation.
Ok, done.

> 11. When a field is array, you can iterate over its fields and check
> if corresponding field->array[i] is not null. This allows to iterate
> over the array only once, while now you rescan it O(N) times where
> N is size of the field->array.
Good idea, I've done it.
> 
> 12. For the map above you can do another thing. Each time you do
> go_to_key() you can remember the key index (number of iterations
> inside go_to_key()) and the result pointer. During the cycle you
> calculate the maximal key index and its pointer. And then you
> finish the field decoding starting from this point. This is a
> strong optimization. For example, now for such field
> 
> {a = {b = {c = d, e = f}, g = h}}
> 
> and key in a.b.c you decode it as
> 
> {a = {b = {c = d, e = f}, g = h}}
> {b = {c = d, e = f}
> {c = d, e = f}
> 
> This is because after you found the key you still need to
> skip the entire map field to find the next one (see the next
> comment).
> 
> With my method you decode it as
> 
> {a = {b = {c = d, e = f}, g = h}}
If I've correctly understood this feature is already implemented in my patch.

> 13. Now you rescan the field pointed by pos twice - first
> time inside tuple_init_json_field_map, second in the mp_next below.
> You should not.
External mp_next moves across fields; at the same time  tuple_init_json_field_map (called 
json_field_tree_exec_routine now) iterates iside of it.
I don't know, how could I use internal offset to go to next field.
[{}, {}]
Believe that this would not the only thing that I'll fix next time.

> 14. As I said you verbally, instead of recalculating hash from
> the path you can store it in the key_part together with slot cache
> and epoch. And use light version of json_path_hash_get which takes
> a path with already calculated hash.
Ok, done.

>> +struct tuple_field_map;
> 15. What is it?
crap. dropped.
> 16. What is it? I do not see where is it
> assigned or used.
dropped.

>> -	 * Offset slot in field map in tuple. Normally tuple
>> +	 * Offset slot ix`n field map in tuple. Normally tuple
> 
> 17. !?
Fixed.
> 18. Sorry, but it is too little number of tests. You even did not
> test primary JSON index. As well as key extractions (upsert, vinyl replace/update etc),
> tuple hashes (via dumps in vinyl for example) etc.
I've wrote many pretty tests.




More information about the Tarantool-patches mailing list