[tarantool-patches] Re: [PATCH v2 4/8] tuple: account the whole array in field.data and size

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Sat Sep 28 01:29:35 MSK 2019


Thanks for the review!

I've rewrote the commit message. Perhaps now it makes more
sense for you:


    tuple: account the whole array in field.data and size
    
    Before the patch a struct update_field object didn't account
    array header in its .size and .data members. Indeed, it was not
    needed, because anyway updates could be only 'flat'.
    For example, consider the tuple:
    
        [mp_array, mp_uint, mp_uint, mp_uint]
                  ^                         ^
                 pos1                      pos2
    
    Struct update_field.size and .data accounted memory from pos1 to
    pos2, without the array header. Number of fields was stored inside
    a rope object. This is why it made no sense to keep array header
    pointer.
    
    But now updates are going to be not flat, and not only for array.
    There will be an update tree. Each node of that tree will describe
    update of some part of a tuple.
    
    Some of the nodes will need to know exact border of their
    children, including headers. It is going to be used for fast
    copying of neighbours of such children. Consider an example.
    
    Tuple with one field consisting of nested maps:
    
        tuple = {}
        tuple[1] = {
            a = {
                b = {
                    c = {
                        d = {1, 2, 3}
                    }
                }
            }
        }
    
    Update:
    
        {{'+', '[1].a.b.c.d[1]', 1}, {'+', '[1].a.b.c.d[2]', 1}}
    
    To update such a tuple a simple tree will be built:
    
                root: [ [1] ]
                         |
     isolated path: [ 'a.b.c' ]
                         |
          leaves: [ [1] [2] [3] ]
                    +1  +1   -
    
    Root node keeps the whole tuple borders. It is a rope with single
    field.
    This single field is a deeply updated map. Such deep multiple
    updates with long common prefixes are stored as an isolated path +
    map/array in the end. Here the isolated path is 'a.b.c'. It ends
    with the terminal array update.
    
    Assume, that operations are applied and it is time to save the
    result. Save starts from the root.
    Root rope will encode root array header, and will try to save the
    single field. The single field is an isolated update. It needs to
    save everything before old {1,2,3}, the new array {2,2,3}, and
    everything after the old array. The simplest way to do it - know
    exact borders of the old array {1,2,3} and memcpy all memory
    before and after.
    
    This is exactly what this patch allows to do. Everything before
    update_field.data, and after update_field.data + .size can be
    safely copied, and is not related to the field. To copy adjacent
    memory it is not even needed to know field type.
    Update_field.data and .size have the same meaning for all field
    types.
    
    Part of #1261

> 
>> of accounting would not be able to see exact borders of each
>> child.
> 
> Generally, it took a serious effort - trying to understand this comment - please rephrase.
> 

I agree, my comments are shit in this patchset. But it appeared
to be really hard to explain all these JSON update things.
I will rephrase lots of them in subsequent patches.

>>
>> Part of #1261
>> ---
>>  src/box/tuple_update.c        | 28 ++++++++++++++++------------
>>  src/box/update/update_array.c |  9 +++++----
>>  src/box/update/update_field.h |  6 ++++--
>>  3 files changed, 25 insertions(+), 18 deletions(-)
>>
>> diff --git a/src/box/tuple_update.c b/src/box/tuple_update.c
>> index 87946919d..81e1f7e97 100644
>> --- a/src/box/tuple_update.c
>> +++ b/src/box/tuple_update.c
>> @@ -193,11 +193,12 @@ update_read_ops(struct tuple_update *update, const char *expr,
>>   * @retval -1 Error.
>>   */
>>  static int
>> -update_do_ops(struct tuple_update *update, const char *old_data,
>> -	      const char *old_data_end, uint32_t part_count)
>> +update_do_ops(struct tuple_update *update, const char *header,
>> +	      const char *old_data, const char *old_data_end,
>> +	      uint32_t part_count)
> 
> Looks like you're lacking an new struct/object for update tree
> node, which would store the newly introduced state. 
> 
> Instead of storing a new header, it could store the new number of
> elements, this would make the code easier to understand. The idea
> of update_write_tuple is that it generates the right header, now
> you break that separation of concerns.
> 

I need a header not to keep track of elements number. I need to
know borders of a field. See my example above.




More information about the Tarantool-patches mailing list