[Tarantool-patches] [PATCH 2/2] uuid: fix unaligned memory access

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Tue May 19 00:17:28 MSK 2020


Thanks for the review!

> On 5/16/20 2:03 AM, Vladislav Shpilevoy wrote:
>>   tt_uuid_is_nil(const struct tt_uuid *uu)
>>   {
>> -    const uint64_t *p = (const uint64_t *) uu;
>> -    return !p[0] && !p[1];
>> +    const uint32_t *p = (const uint32_t *) uu;
>> +    return p[0] == 0 && p[1] == 0 && p[2] == 0 && p[3] == 0;
>>   }
>>     /**
>> @@ -172,9 +172,10 @@ tt_uuid_is_nil(const struct tt_uuid *uu)
>>   inline bool
>>   tt_uuid_is_equal(const struct tt_uuid *lhs, const struct tt_uuid *rhs)
>>   {
>> -    const uint64_t *lp = (const uint64_t *) lhs;
>> -    const uint64_t *rp = (const uint64_t *) rhs;
>> -    return lp[0] == rp[0] && lp[1] == rp[1];
>> +    const uint32_t *lp = (const uint32_t *) lhs;
>> +    const uint32_t *rp = (const uint32_t *) rhs;
>> +    return lp[0] == rp[0] && lp[1] == rp[1] && lp[2] == rp[2] &&
>> +           lp[3] == rp[3];
> 
> It seems that we degrade performance just for clang to be happy..

Yeah, well. This is the same like saying that we degrade performance
when we do OOM checks. Unaligned memory access is UB. This is a bug.

> I would suggest to use memcmp in this case.
> It's portable and allows a compiler to generate the best possible code.
> I've measured it (gcc) and memcmp version is twice faster than your solution.
> Even for _is_nil method it's better to use memcmp with statically allocated zero buffer.

Could you please show the benchmark? I did my own, and I can't see any
significant difference. The present difference is so small, that it
looks like jitter. Both in is_nil and is_eq.

I did a very simple bench in Lua, without any GCed objects.

    uuid = require('uuid')
    uuid1 = uuid.new()
    uuid2 = uuid.new()
    clock = require('clock')
    function bench1()
        local res = 0
        for i = 1, 10000000 do
            res = res + ((uuid1 == uuid2) and 1 or 0)
        end
        return res
    end
    function bench2()
        local res = 0
        for i = 1, 100000000 do
            res = res + ((uuid1:isnil() and uuid2:isnil()) and 1 or 0)
        end
        return res
    end
    clock.bench(bench2)


Bench1 is to check is_eq(). Bench2 is to check is_nil(). The build
was Release. Here is what I've got:

    uuid is eq:

        4 x uint32 (my solution):

            2.691714
            2.727981
            2.852932
            2.764013

        memcmp (your solution):

            2.701216
            2.731426
            2.641924
            2.628619

        2 x uint64 (old solution):

            2.671254
            2.644062
            2.639914
            2.66742


    uuid is nil:

        4 x uint32:

            0.2599
            0.256139
            0.258356
            0.259495

        memcmp:

            0.26042
            0.263186
            0.262118
            0.261504

        2 x uint64:

            0.256665
            0.259839
            0.259746
            0.258651


More information about the Tarantool-patches mailing list