From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: Oleg Babin <olegrok@tarantool.org>, tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH vshard 1/1] router: bucket_id_strcrc32 and bucket_id_mpcrc32 Date: Thu, 27 Feb 2020 01:22:07 +0100 [thread overview] Message-ID: <14136e7a-c642-8c59-4371-b30e1604446f@tarantool.org> (raw) In-Reply-To: <08e87903-c00a-9f8a-ff25-5199eb6be0f8@tarantool.org> Thanks for the review! > On 26/02/2020 02:52, Vladislav Shpilevoy wrote: >> Closes #207 >> >> @TarantoolBot document >> Title: vshard.router.bucket_id_strcrc32() and .bucket_id_mpcrc32() >> vshard.router.bucket_id() is deprecated, each its usage logs a >> warning. It still works, but will be deleted in future. > > I think it's quite agressive to print message in logs for each time. Moreover it could lead to unexpected behaviour for some people. > IMO the most proper way print such message only once. Consider using something like that - https://github.com/tarantool/errors/blob/master/errors/deprecate.lua#L21 Yeah, ok, will print once. >> Behaviour of the old bucket_id() function is now available as >> vshard.router.bucket_id_strcrc32(). It works exactly like the old >> function, but does not log a warning. >> >> The reason why there is a new function bucket_id_mpcrc32() is that >> the old bucket_id() and the new bucket_id_strcrc32() are not >> consistent for cdata numbers. In particular, they return 3 >> different values for normal Lua numbers like 123, for unsigned >> long long cdata (like 123ULL, or ffi.cast('unsigned long long', >> 123)), and for signed long long cdata (like 123LL, or >> ffi.cast('long long', 123)). Note, this is important! >> >> vshard.router.bucket_id(123) >> vshard.router.bucket_id(123LL) >> vshard.router.bucket_id(123ULL) >> >> Return 3 different values!!! >> >> For float and double cdata (ffi.cast('float', number), >> ffi.cast('double', number)) these functions return different >> values even for the same numbers of the same floating point type. >> This is because tostring() on a floating point cdata number >> returns not the number, but a pointer at it. Different on each >> call. >> >> vshard.router.bucket_id_strcrc32() behaves exactly the same. >> >> vshard.router.bucket_id_mpcrc32() is safer. > > Looks like it's not enough safe for new "double" type: > > | tarantool> box.space.test:create_index('double', {type='tree', parts = {1, 'double'}}) > | tarantool> box.space.test:insert({ffi.cast('double', 1)}) -- cdata > | --- > | - [1] > | ... > | tarantool> type(box.space.test:select()[1][1]) > | --- > | - number > | ... > > And results of msgpack.encode are different. Yep, I knew that. Precision loss is why I discarded my first draft patch, using tonumber() for all cdata numbers. The point is that if you insert ffi.cast(double), then you should pass ffi.cast(double) to bucket_id function as well. You calculate hash from what is inserted. Not from what is returned. From how I imagine this. > > I think you can return to your idea (https://github.com/tarantool/vshard/issues/207#issuecomment-589976204) with "tonumber": > |tarantool> tonumber(ffi.cast('double', 100)) > | --- > | - 100 > | ... > > Such approach will lead to precision lost for huge unsigned/signed integers. But it will be predictable and determined for all cases. This also makes MessagePack encoding useless, because if we have only Lua numbers and strings, then crc32(tostring()) will work fine too. I can add a third function bucket_id_nstrcrc32. Stands for Number String Crc32 - n str crc32. This will call tonumber() for all cdata numbers before calling crc32(tostring(...)) on them. Sounds good? > > --- > Oleg Babin
next prev parent reply other threads:[~2020-02-27 0:22 UTC|newest] Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-02-25 23:52 Vladislav Shpilevoy 2020-02-26 7:15 ` Oleg Babin 2020-02-27 0:22 ` Vladislav Shpilevoy [this message] 2020-02-27 10:12 ` Oleg Babin 2020-02-29 17:09 ` Vladislav Shpilevoy 2020-03-02 8:10 ` Oleg Babin 2020-03-02 21:06 ` Vladislav Shpilevoy 2020-02-26 8:20 ` Konstantin Osipov 2020-02-27 0:25 ` Vladislav Shpilevoy 2020-02-27 6:53 ` Konstantin Osipov 2020-02-29 17:10 ` Vladislav Shpilevoy 2020-03-01 16:52 ` Konstantin Osipov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=14136e7a-c642-8c59-4371-b30e1604446f@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=olegrok@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH vshard 1/1] router: bucket_id_strcrc32 and bucket_id_mpcrc32' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox