[Tarantool-patches] [PATCH vshard 1/1] router: bucket_id_strcrc32 and bucket_id_mpcrc32
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Thu Feb 27 03:22:07 MSK 2020
Thanks for the review!
> On 26/02/2020 02:52, Vladislav Shpilevoy wrote:
>> Closes #207
>>
>> @TarantoolBot document
>> Title: vshard.router.bucket_id_strcrc32() and .bucket_id_mpcrc32()
>> vshard.router.bucket_id() is deprecated, each its usage logs a
>> warning. It still works, but will be deleted in future.
>
> I think it's quite agressive to print message in logs for each time. Moreover it could lead to unexpected behaviour for some people.
> IMO the most proper way print such message only once. Consider using something like that - https://github.com/tarantool/errors/blob/master/errors/deprecate.lua#L21
Yeah, ok, will print once.
>> Behaviour of the old bucket_id() function is now available as
>> vshard.router.bucket_id_strcrc32(). It works exactly like the old
>> function, but does not log a warning.
>>
>> The reason why there is a new function bucket_id_mpcrc32() is that
>> the old bucket_id() and the new bucket_id_strcrc32() are not
>> consistent for cdata numbers. In particular, they return 3
>> different values for normal Lua numbers like 123, for unsigned
>> long long cdata (like 123ULL, or ffi.cast('unsigned long long',
>> 123)), and for signed long long cdata (like 123LL, or
>> ffi.cast('long long', 123)). Note, this is important!
>>
>> vshard.router.bucket_id(123)
>> vshard.router.bucket_id(123LL)
>> vshard.router.bucket_id(123ULL)
>>
>> Return 3 different values!!!
>>
>> For float and double cdata (ffi.cast('float', number),
>> ffi.cast('double', number)) these functions return different
>> values even for the same numbers of the same floating point type.
>> This is because tostring() on a floating point cdata number
>> returns not the number, but a pointer at it. Different on each
>> call.
>>
>> vshard.router.bucket_id_strcrc32() behaves exactly the same.
>>
>> vshard.router.bucket_id_mpcrc32() is safer.
>
> Looks like it's not enough safe for new "double" type:
>
> | tarantool> box.space.test:create_index('double', {type='tree', parts = {1, 'double'}})
> | tarantool> box.space.test:insert({ffi.cast('double', 1)}) -- cdata
> | ---
> | - [1]
> | ...
> | tarantool> type(box.space.test:select()[1][1])
> | ---
> | - number
> | ...
>
> And results of msgpack.encode are different.
Yep, I knew that. Precision loss is why I discarded my first draft
patch, using tonumber() for all cdata numbers. The point is that
if you insert ffi.cast(double), then you should pass ffi.cast(double)
to bucket_id function as well. You calculate hash from what is
inserted. Not from what is returned. From how I imagine this.
>
> I think you can return to your idea (https://github.com/tarantool/vshard/issues/207#issuecomment-589976204) with "tonumber":
> |tarantool> tonumber(ffi.cast('double', 100))
> | ---
> | - 100
> | ...
>
> Such approach will lead to precision lost for huge unsigned/signed integers. But it will be predictable and determined for all cases.
This also makes MessagePack encoding useless, because if we
have only Lua numbers and strings, then crc32(tostring()) will
work fine too.
I can add a third function bucket_id_nstrcrc32. Stands for
Number String Crc32 - n str crc32. This will call tonumber()
for all cdata numbers before calling crc32(tostring(...)) on
them.
Sounds good?
>
> ---
> Oleg Babin
More information about the Tarantool-patches
mailing list