Re: [Tarantool-patches] [PATCH 4/4] box: introduce indices by UUID

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Serge Petrenko <sergepetrenko@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tml <tarantool-patches@dev.tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH 4/4] box: introduce indices by UUID
Date: Fri, 10 Apr 2020 02:46:51 +0300	[thread overview]
Message-ID: <7FF9C776-045A-4ADF-B44B-0CFB7FD746F9@tarantool.org> (raw)
In-Reply-To: <f57d9ff7-7e8e-73f0-ee47-4372d9033fac@tarantool.org>





> 6 апр. 2020 г., в 00:29, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а):
> 
> Thanks for the patch!


Hi! Thanks for the review!

> 
> See 6 comments below.
> 
> On 04/04/2020 01:02, Serge Petrenko wrote:
>> It is now possible to create an index over UUID values, either returned
>> by something like `uuid.new()` or 36- and 32- byte strings in format
>> "88ebbb9a-8c18-480f-bd04-dd5345f1573c" or "88ebbb9a8c18480fbd04dd5345f1573c".
>> 16-byte binary strings are also supported.
>> 
>> Closes #4268
>> Closes #2916
>> 
>> @TarantoolBot document
>> Title: Document uuid field type.
>> 
>> There's a new field type -- UUID, it accepts values returned by
>> `uuid.new()`, as well as strings representing uuids, like
>> `88ebbb9a-8c18-480f-bd04-dd5345f1573c` and
>> `88ebbb9a8c18480fbd04dd5345f1573c` and 16 byte binary values, encoded in
>> msgpack.
>> 
>> The index may be either unique or non-unique, nullable or non-nullable,
>> and may be a primary key.
>> 
>> To create an index over a uuid field for space `test`, say:
>> ```
>> box.space.test:create_index("pk", {parts={1, 'uuid'}})
>> ```
>> Now you may insert uuids into the space:
>> ```
>> tarantool> box.space.test:insert{uuid.new()}
>> ---
>> - [e631fdcc-0e8a-4d2f-83fd-b0ce6762b13f]
>> ...
>> 
>> tarantool> box.space.test:insert{"64d22e4d-ac92-4a23-899a-e59f34af5479"}
>> ---
>> - ['64d22e4d-ac92-4a23-899a-e59f34af5479']
>> ...
>> 
>> tarantool> box.space.test:insert{"856dc1177dcc49969f1407f8c6c8a371"}
>> ---
>> - ['856dc1177dcc49969f1407f8c6c8a371']
>> ...
>> 
>> tarantool> box.space.test:select{}
>> ---
>> - - ['64d22e4d-ac92-4a23-899a-e59f34af5479']
>>  - ['856dc1177dcc49969f1407f8c6c8a371']
>>  - [e631fdcc-0e8a-4d2f-83fd-b0ce6762b13f]
> 
> 1. Wait. Why are they printed differently? They should be all the
> same 36 strings when you print them, no?
> 
> Seems like you understood Mons to the letter. I believe he meant
> we should provide a way to insert strings to simplify moving data
> to a new space, but it does not mean we should store them as strings.
> It makes comparators slower. sscanf() to extract uuid from a string
> on *each* comparison is a huge performance hole. From what I measured
> recently (not in Tarantool), sscanf() is a *really* slow thing. Its
> usage in comparators is dangerous.
> 
> On the other hand ability to insert strings means we need to convert
> them to MP_EXT after insertion, which means full scan of each tuple
> by field types and possibly its recreation. We had a similar problem
> with 'double' type. After all it was decided to keep as is and let
> users enforce the needed type when they want using ffi.cast('double').
> 
> IMO, we should just disallow both ability to store strings, and to
> insert strings. I mean, you can't insert a string into a decimal field,
> for example. So a user anyway will never be able to insert only Lua
> simple types. Cdata is enforced for decimal and double right now.
> 
> Better keep it simple. If someone wants to convert their existing spaces
> to mp uuid, they easily can write a trivial function which would replace
> string uuid with cdata uuid before insertion into the new space. We
> should not sacrifice perf for that.
> 
> This is only my opinion, you may want to ask Mons and Kirill.

So, I consulted. Mons, he’s ok with allowing only cdata.

> 
> 
> 2. It is worth mentioning comparison order. Are they compared as
> strings, or individual uuid fields are compared? In the latter case
> in what order the fields are compared?

Ok, mentioned in commit message. The order’s lexicographical.
> 
> Also probably string comparison vs by-field comparison vs memcmp
> comparison will be the same in case fields are stored in msgpack
> in the same order as in the string, and in big-endian. I didn't check
> but it is worth investigation. It would be good to be able to use
> memcmp.
> 
>> diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
>> index 3f8a0ce24..aad69340d 100644
>> --- a/src/box/tuple_compare.cc
>> +++ b/src/box/tuple_compare.cc
>> @@ -378,6 +382,61 @@ mp_compare_bin(const char *field_a, const char *field_b)
>> 	return COMPARE_RESULT(size_a, size_b);
>> }
>> 
>> +static int
>> +mp_compare_uuid_with_type(const char *field_a, enum mp_type type_a,
>> +			  const char *field_b, enum mp_type type_b)
>> +{
>> +	struct tt_uuid uuid_a, uuid_b;
>> +	struct tt_uuid *ret;
>> +	const char *str;
>> +	uint32_t len;
>> +	int rc;
>> +	switch (type_a) {
>> +	case MP_STR:
>> +		str = mp_decode_str(&field_a, &len);
>> +		rc = tt_uuid_from_lstring(str, len, &uuid_a);
> 
> 3. This is perf killer. Basically it would be faster to store
> uuids as a string, or just as 16 bytes binary. Because memcmp
> could be used directly then.

Switched to memcmp.

> 
>> +		assert(rc == 0);
>> +		break;
>> +	case MP_BIN:
>> +		str = mp_decode_bin(&field_a, &len);
>> +		ret = uuid_unpack(&str, len, &uuid_a);
>> +		assert(ret != NULL);
>> +		break;
>> +	case MP_EXT:
>> +		ret = mp_decode_uuid(&field_a, &uuid_a);
>> +		assert(ret != NULL);
>> +		break;
>> +	default:
>> +		unreachable();
>> +	}
>> +	switch (type_b) {
>> +	case MP_STR:
>> +		str = mp_decode_str(&field_b, &len);
>> +		rc = tt_uuid_from_lstring(str, len, &uuid_b);
>> +		assert(rc == 0);
>> +		break;
>> +	case MP_BIN:
>> +		str = mp_decode_bin(&field_b, &len);
>> +		ret = uuid_unpack(&str, len, &uuid_b);
>> +		assert(ret != NULL);
>> +		break;
>> +	case MP_EXT:
>> +		ret = mp_decode_uuid(&field_b, &uuid_b);
>> +		assert(ret != NULL);
>> +		break;
>> +	default:
>> +		unreachable();
>> +	}
>> +	return tt_uuid_compare(&uuid_a, &uuid_b);
> 
> 4. If we will decide to store uuids as MP_EXT, it may be
> useful to introduce a function mp_compare_uuid, by analogue
> with other mp_compare_* functions. To compare them directly
> using memcmp, without unpacking. But see above my comment
> about order. memcmp validness depends on how uuid is stored
> in MessagePack.
> 

Done. uuid fields are stored in big-endian manner and in the same
order they’re compared, so memcmp is valid.

>> +}
>> +
>> +static inline int
>> +mp_compare_uuid(const char *field_a, const char *field_b)
>> +{
>> +	return mp_compare_uuid_with_type(field_a, mp_typeof(*field_a),
>> +					 field_b, mp_typeof(*field_b));
>> +}
>> @@ -1578,6 +1642,21 @@ hint_decimal(decimal_t *dec)
>> 	return hint_create(MP_CLASS_NUMBER, val);
>> }
>> 
>> +static inline hint_t
>> +hint_uuid(struct tt_uuid *uuid)
>> +{
>> +	/* Simply take the first part of the UUID as hint. */
> 
> 5. Why only first?

I meant, take the ‘high’ part of the UUID. The one which’s compared first.

> 
>> +	uint64_t val = 0;
>> +	val |= uuid->time_low;
>> +	val <<= sizeof(uuid->time_mid) * CHAR_BIT;
>> +	val |= uuid->time_mid;
>> +	val <<= sizeof(uuid->time_hi_and_version) * CHAR_BIT;
>> +	val |= uuid->time_hi_and_version;
>> +	/* Make space for class representation. */
>> +	val >>= HINT_CLASS_BITS;
>> +	return hint_create(MP_CLASS_UUID, val);
>> +}
>> diff --git a/test/engine/gh-4268-uuid.result b/test/engine/gh-4268-uuid.result
>> new file mode 100644
>> index 000000000..928204507
>> --- /dev/null
>> +++ b/test/engine/gh-4268-uuid.result
> 
> 6. Since this is a feature, we probably should not call the
> test file using name pattern used for bugs. Just uuid.test.lua.

Done.


--
Serge Petrenko
sergepetrenko@tarantool.org

next prev parent reply	other threads:[~2020-04-09 23:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 23:02 [Tarantool-patches] [PATCH 0/4] introduce indices over UUID Serge Petrenko
2020-04-03 23:02 ` [Tarantool-patches] [PATCH 1/4] decimal: fix comment typo Serge Petrenko
2020-04-05 21:22   ` Vladislav Shpilevoy
2020-04-03 23:02 ` [Tarantool-patches] [PATCH 2/4] uuid: expose additional from_string constructors Serge Petrenko
2020-04-05 21:22   ` Vladislav Shpilevoy
2020-04-09 23:46     ` Serge Petrenko
2020-04-10 16:56       ` Vladislav Shpilevoy
2020-04-11 13:35         ` Serge Petrenko
2020-04-03 23:02 ` [Tarantool-patches] [PATCH 3/4] box: add MsgPack encoding/decoding for UUID Serge Petrenko
2020-04-05 21:26   ` Vladislav Shpilevoy
2020-04-09 23:46     ` Serge Petrenko
2020-04-03 23:02 ` [Tarantool-patches] [PATCH 4/4] box: introduce indices by UUID Serge Petrenko
2020-04-05 21:29   ` Vladislav Shpilevoy
2020-04-09 23:46     ` Serge Petrenko [this message]
2020-04-10 16:56       ` Vladislav Shpilevoy
2020-04-05 21:21 ` [Tarantool-patches] [PATCH 0/4] introduce indices over UUID Vladislav Shpilevoy
2020-04-09 23:46   ` Serge Petrenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7FF9C776-045A-4ADF-B44B-0CFB7FD746F9@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 4/4] box: introduce indices by UUID' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox