From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp34.i.mail.ru (smtp34.i.mail.ru [94.100.177.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E3F2B469719 for ; Thu, 27 Feb 2020 03:22:08 +0300 (MSK) References: <08e87903-c00a-9f8a-ff25-5199eb6be0f8@tarantool.org> From: Vladislav Shpilevoy Message-ID: <14136e7a-c642-8c59-4371-b30e1604446f@tarantool.org> Date: Thu, 27 Feb 2020 01:22:07 +0100 MIME-Version: 1.0 In-Reply-To: <08e87903-c00a-9f8a-ff25-5199eb6be0f8@tarantool.org> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [Tarantool-patches] [PATCH vshard 1/1] router: bucket_id_strcrc32 and bucket_id_mpcrc32 List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Oleg Babin , tarantool-patches@dev.tarantool.org Thanks for the review! > On 26/02/2020 02:52, Vladislav Shpilevoy wrote: >> Closes #207 >> >> @TarantoolBot document >> Title: vshard.router.bucket_id_strcrc32() and .bucket_id_mpcrc32() >> vshard.router.bucket_id() is deprecated, each its usage logs a >> warning. It still works, but will be deleted in future. > > I think it's quite agressive to print message in logs for each time. Moreover it could lead to unexpected behaviour for some people. > IMO the most proper way print such message only once. Consider using something like that - https://github.com/tarantool/errors/blob/master/errors/deprecate.lua#L21 Yeah, ok, will print once. >> Behaviour of the old bucket_id() function is now available as >> vshard.router.bucket_id_strcrc32(). It works exactly like the old >> function, but does not log a warning. >> >> The reason why there is a new function bucket_id_mpcrc32() is that >> the old bucket_id() and the new bucket_id_strcrc32() are not >> consistent for cdata numbers. In particular, they return 3 >> different values for normal Lua numbers like 123, for unsigned >> long long cdata (like 123ULL, or ffi.cast('unsigned long long', >> 123)), and for signed long long cdata (like 123LL, or >> ffi.cast('long long', 123)). Note, this is important! >> >>      vshard.router.bucket_id(123) >>      vshard.router.bucket_id(123LL) >>      vshard.router.bucket_id(123ULL) >> >>      Return 3 different values!!! >> >> For float and double cdata (ffi.cast('float', number), >> ffi.cast('double', number)) these functions return different >> values even for the same numbers of the same floating point type. >> This is because tostring() on a floating point cdata number >> returns not the number, but a pointer at it. Different on each >> call. >> >> vshard.router.bucket_id_strcrc32() behaves exactly the same. >> >> vshard.router.bucket_id_mpcrc32() is safer. > > Looks like it's not enough safe for new "double" type: > > | tarantool> box.space.test:create_index('double', {type='tree', parts = {1, 'double'}}) > | tarantool> box.space.test:insert({ffi.cast('double', 1)}) -- cdata > | --- > | - [1] > | ... > | tarantool> type(box.space.test:select()[1][1]) > | --- > | - number > | ... > > And results of msgpack.encode are different. Yep, I knew that. Precision loss is why I discarded my first draft patch, using tonumber() for all cdata numbers. The point is that if you insert ffi.cast(double), then you should pass ffi.cast(double) to bucket_id function as well. You calculate hash from what is inserted. Not from what is returned. From how I imagine this. > > I think you can return to your idea (https://github.com/tarantool/vshard/issues/207#issuecomment-589976204) with "tonumber": > |tarantool> tonumber(ffi.cast('double', 100)) > | --- > | - 100 > | ... > > Such approach will lead to precision lost for huge unsigned/signed integers. But it will be predictable and determined for all cases. This also makes MessagePack encoding useless, because if we have only Lua numbers and strings, then crc32(tostring()) will work fine too. I can add a third function bucket_id_nstrcrc32. Stands for Number String Crc32 - n str crc32. This will call tonumber() for all cdata numbers before calling crc32(tostring(...)) on them. Sounds good? > > --- > Oleg Babin