Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Oleg Babin <olegrok@tarantool.org>, tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH vshard 1/1] router: bucket_id_strcrc32 and bucket_id_mpcrc32
Date: Thu, 27 Feb 2020 01:22:07 +0100	[thread overview]
Message-ID: <14136e7a-c642-8c59-4371-b30e1604446f@tarantool.org> (raw)
In-Reply-To: <08e87903-c00a-9f8a-ff25-5199eb6be0f8@tarantool.org>

Thanks for the review!

> On 26/02/2020 02:52, Vladislav Shpilevoy wrote:
>> Closes #207
>>
>> @TarantoolBot document
>> Title: vshard.router.bucket_id_strcrc32() and .bucket_id_mpcrc32()
>> vshard.router.bucket_id() is deprecated, each its usage logs a
>> warning. It still works, but will be deleted in future.
> 
> I think it's quite agressive to print message in logs for each time. Moreover it could lead to unexpected behaviour for some people.
> IMO the most proper way print such message only once. Consider using something like that - https://github.com/tarantool/errors/blob/master/errors/deprecate.lua#L21

Yeah, ok, will print once.

>> Behaviour of the old bucket_id() function is now available as
>> vshard.router.bucket_id_strcrc32(). It works exactly like the old
>> function, but does not log a warning.
>>
>> The reason why there is a new function bucket_id_mpcrc32() is that
>> the old bucket_id() and the new bucket_id_strcrc32() are not
>> consistent for cdata numbers. In particular, they return 3
>> different values for normal Lua numbers like 123, for unsigned
>> long long cdata (like 123ULL, or ffi.cast('unsigned long long',
>> 123)), and for signed long long cdata (like 123LL, or
>> ffi.cast('long long', 123)). Note, this is important!
>>
>>      vshard.router.bucket_id(123)
>>      vshard.router.bucket_id(123LL)
>>      vshard.router.bucket_id(123ULL)
>>
>>      Return 3 different values!!!
>>
>> For float and double cdata (ffi.cast('float', number),
>> ffi.cast('double', number)) these functions return different
>> values even for the same numbers of the same floating point type.
>> This is because tostring() on a floating point cdata number
>> returns not the number, but a pointer at it. Different on each
>> call.
>>
>> vshard.router.bucket_id_strcrc32() behaves exactly the same.
>>
>> vshard.router.bucket_id_mpcrc32() is safer. 
> 
> Looks like it's not enough safe for new "double" type:
> 
> | tarantool> box.space.test:create_index('double', {type='tree', parts = {1, 'double'}})
> | tarantool> box.space.test:insert({ffi.cast('double', 1)}) -- cdata
> | ---
> | - [1]
> | ...
> | tarantool> type(box.space.test:select()[1][1])
> | ---
> | - number
> | ...
> 
> And results of msgpack.encode are different.

Yep, I knew that. Precision loss is why I discarded my first draft
patch, using tonumber() for all cdata numbers. The point is that
if you insert ffi.cast(double), then you should pass ffi.cast(double)
to bucket_id function as well. You calculate hash from what is
inserted. Not from what is returned. From how I imagine this.

> 
> I think you can return to your idea (https://github.com/tarantool/vshard/issues/207#issuecomment-589976204) with "tonumber":
> |tarantool> tonumber(ffi.cast('double', 100))
> | ---
> | - 100
> | ...
> 
> Such approach will lead to precision lost for huge unsigned/signed integers. But it will be predictable and determined for all cases.

This also makes MessagePack encoding useless, because if we
have only Lua numbers and strings, then crc32(tostring()) will
work fine too.

I can add a third function bucket_id_nstrcrc32. Stands for
Number String Crc32 - n str crc32. This will call tonumber()
for all cdata numbers before calling crc32(tostring(...)) on
them.

Sounds good?

> 
> ---
> Oleg Babin

  reply	other threads:[~2020-02-27  0:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-25 23:52 Vladislav Shpilevoy
2020-02-26  7:15 ` Oleg Babin
2020-02-27  0:22   ` Vladislav Shpilevoy [this message]
2020-02-27 10:12     ` Oleg Babin
2020-02-29 17:09   ` Vladislav Shpilevoy
2020-03-02  8:10     ` Oleg Babin
2020-03-02 21:06       ` Vladislav Shpilevoy
2020-02-26  8:20 ` Konstantin Osipov
2020-02-27  0:25   ` Vladislav Shpilevoy
2020-02-27  6:53     ` Konstantin Osipov
2020-02-29 17:10       ` Vladislav Shpilevoy
2020-03-01 16:52         ` Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14136e7a-c642-8c59-4371-b30e1604446f@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=olegrok@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH vshard 1/1] router: bucket_id_strcrc32 and bucket_id_mpcrc32' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox