[Tarantool-patches] [PATCH 2/2] uuid: fix unaligned memory	access
    Vladislav Shpilevoy 
    v.shpilevoy at tarantool.org
       
    Tue May 19 00:17:28 MSK 2020
    
    
  
Thanks for the review!
> On 5/16/20 2:03 AM, Vladislav Shpilevoy wrote:
>>   tt_uuid_is_nil(const struct tt_uuid *uu)
>>   {
>> -    const uint64_t *p = (const uint64_t *) uu;
>> -    return !p[0] && !p[1];
>> +    const uint32_t *p = (const uint32_t *) uu;
>> +    return p[0] == 0 && p[1] == 0 && p[2] == 0 && p[3] == 0;
>>   }
>>     /**
>> @@ -172,9 +172,10 @@ tt_uuid_is_nil(const struct tt_uuid *uu)
>>   inline bool
>>   tt_uuid_is_equal(const struct tt_uuid *lhs, const struct tt_uuid *rhs)
>>   {
>> -    const uint64_t *lp = (const uint64_t *) lhs;
>> -    const uint64_t *rp = (const uint64_t *) rhs;
>> -    return lp[0] == rp[0] && lp[1] == rp[1];
>> +    const uint32_t *lp = (const uint32_t *) lhs;
>> +    const uint32_t *rp = (const uint32_t *) rhs;
>> +    return lp[0] == rp[0] && lp[1] == rp[1] && lp[2] == rp[2] &&
>> +           lp[3] == rp[3];
> 
> It seems that we degrade performance just for clang to be happy..
Yeah, well. This is the same like saying that we degrade performance
when we do OOM checks. Unaligned memory access is UB. This is a bug.
> I would suggest to use memcmp in this case.
> It's portable and allows a compiler to generate the best possible code.
> I've measured it (gcc) and memcmp version is twice faster than your solution.
> Even for _is_nil method it's better to use memcmp with statically allocated zero buffer.
Could you please show the benchmark? I did my own, and I can't see any
significant difference. The present difference is so small, that it
looks like jitter. Both in is_nil and is_eq.
I did a very simple bench in Lua, without any GCed objects.
    uuid = require('uuid')
    uuid1 = uuid.new()
    uuid2 = uuid.new()
    clock = require('clock')
    function bench1()
        local res = 0
        for i = 1, 10000000 do
            res = res + ((uuid1 == uuid2) and 1 or 0)
        end
        return res
    end
    function bench2()
        local res = 0
        for i = 1, 100000000 do
            res = res + ((uuid1:isnil() and uuid2:isnil()) and 1 or 0)
        end
        return res
    end
    clock.bench(bench2)
Bench1 is to check is_eq(). Bench2 is to check is_nil(). The build
was Release. Here is what I've got:
    uuid is eq:
        4 x uint32 (my solution):
            2.691714
            2.727981
            2.852932
            2.764013
        memcmp (your solution):
            2.701216
            2.731426
            2.641924
            2.628619
        2 x uint64 (old solution):
            2.671254
            2.644062
            2.639914
            2.66742
    uuid is nil:
        4 x uint32:
            0.2599
            0.256139
            0.258356
            0.259495
        memcmp:
            0.26042
            0.263186
            0.262118
            0.261504
        2 x uint64:
            0.256665
            0.259839
            0.259746
            0.258651
    
    
More information about the Tarantool-patches
mailing list