From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp52.i.mail.ru (smtp52.i.mail.ru [94.100.177.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id A2B024696C4 for ; Fri, 10 Apr 2020 02:46:51 +0300 (MSK) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) From: Serge Petrenko In-Reply-To: Date: Fri, 10 Apr 2020 02:46:51 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <7FF9C776-045A-4ADF-B44B-0CFB7FD746F9@tarantool.org> References: <0bbcecda4f9f2a016eef86f780138ce9ed6afee8.1585954494.git.sergepetrenko@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH 4/4] box: introduce indices by UUID List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tml > 6 =D0=B0=D0=BF=D1=80. 2020 =D0=B3., =D0=B2 00:29, Vladislav Shpilevoy = =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0= =B0): >=20 > Thanks for the patch! Hi! Thanks for the review! >=20 > See 6 comments below. >=20 > On 04/04/2020 01:02, Serge Petrenko wrote: >> It is now possible to create an index over UUID values, either = returned >> by something like `uuid.new()` or 36- and 32- byte strings in format >> "88ebbb9a-8c18-480f-bd04-dd5345f1573c" or = "88ebbb9a8c18480fbd04dd5345f1573c". >> 16-byte binary strings are also supported. >>=20 >> Closes #4268 >> Closes #2916 >>=20 >> @TarantoolBot document >> Title: Document uuid field type. >>=20 >> There's a new field type -- UUID, it accepts values returned by >> `uuid.new()`, as well as strings representing uuids, like >> `88ebbb9a-8c18-480f-bd04-dd5345f1573c` and >> `88ebbb9a8c18480fbd04dd5345f1573c` and 16 byte binary values, encoded = in >> msgpack. >>=20 >> The index may be either unique or non-unique, nullable or = non-nullable, >> and may be a primary key. >>=20 >> To create an index over a uuid field for space `test`, say: >> ``` >> box.space.test:create_index("pk", {parts=3D{1, 'uuid'}}) >> ``` >> Now you may insert uuids into the space: >> ``` >> tarantool> box.space.test:insert{uuid.new()} >> --- >> - [e631fdcc-0e8a-4d2f-83fd-b0ce6762b13f] >> ... >>=20 >> tarantool> = box.space.test:insert{"64d22e4d-ac92-4a23-899a-e59f34af5479"} >> --- >> - ['64d22e4d-ac92-4a23-899a-e59f34af5479'] >> ... >>=20 >> tarantool> box.space.test:insert{"856dc1177dcc49969f1407f8c6c8a371"} >> --- >> - ['856dc1177dcc49969f1407f8c6c8a371'] >> ... >>=20 >> tarantool> box.space.test:select{} >> --- >> - - ['64d22e4d-ac92-4a23-899a-e59f34af5479'] >> - ['856dc1177dcc49969f1407f8c6c8a371'] >> - [e631fdcc-0e8a-4d2f-83fd-b0ce6762b13f] >=20 > 1. Wait. Why are they printed differently? They should be all the > same 36 strings when you print them, no? >=20 > Seems like you understood Mons to the letter. I believe he meant > we should provide a way to insert strings to simplify moving data > to a new space, but it does not mean we should store them as strings. > It makes comparators slower. sscanf() to extract uuid from a string > on *each* comparison is a huge performance hole. =46rom what I = measured > recently (not in Tarantool), sscanf() is a *really* slow thing. Its > usage in comparators is dangerous. >=20 > On the other hand ability to insert strings means we need to convert > them to MP_EXT after insertion, which means full scan of each tuple > by field types and possibly its recreation. We had a similar problem > with 'double' type. After all it was decided to keep as is and let > users enforce the needed type when they want using ffi.cast('double'). >=20 > IMO, we should just disallow both ability to store strings, and to > insert strings. I mean, you can't insert a string into a decimal = field, > for example. So a user anyway will never be able to insert only Lua > simple types. Cdata is enforced for decimal and double right now. >=20 > Better keep it simple. If someone wants to convert their existing = spaces > to mp uuid, they easily can write a trivial function which would = replace > string uuid with cdata uuid before insertion into the new space. We > should not sacrifice perf for that. >=20 > This is only my opinion, you may want to ask Mons and Kirill. So, I consulted. Mons, he=E2=80=99s ok with allowing only cdata. >=20 >=20 > 2. It is worth mentioning comparison order. Are they compared as > strings, or individual uuid fields are compared? In the latter case > in what order the fields are compared? Ok, mentioned in commit message. The order=E2=80=99s lexicographical. >=20 > Also probably string comparison vs by-field comparison vs memcmp > comparison will be the same in case fields are stored in msgpack > in the same order as in the string, and in big-endian. I didn't check > but it is worth investigation. It would be good to be able to use > memcmp. >=20 >> diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc >> index 3f8a0ce24..aad69340d 100644 >> --- a/src/box/tuple_compare.cc >> +++ b/src/box/tuple_compare.cc >> @@ -378,6 +382,61 @@ mp_compare_bin(const char *field_a, const char = *field_b) >> return COMPARE_RESULT(size_a, size_b); >> } >>=20 >> +static int >> +mp_compare_uuid_with_type(const char *field_a, enum mp_type type_a, >> + const char *field_b, enum mp_type type_b) >> +{ >> + struct tt_uuid uuid_a, uuid_b; >> + struct tt_uuid *ret; >> + const char *str; >> + uint32_t len; >> + int rc; >> + switch (type_a) { >> + case MP_STR: >> + str =3D mp_decode_str(&field_a, &len); >> + rc =3D tt_uuid_from_lstring(str, len, &uuid_a); >=20 > 3. This is perf killer. Basically it would be faster to store > uuids as a string, or just as 16 bytes binary. Because memcmp > could be used directly then. Switched to memcmp. >=20 >> + assert(rc =3D=3D 0); >> + break; >> + case MP_BIN: >> + str =3D mp_decode_bin(&field_a, &len); >> + ret =3D uuid_unpack(&str, len, &uuid_a); >> + assert(ret !=3D NULL); >> + break; >> + case MP_EXT: >> + ret =3D mp_decode_uuid(&field_a, &uuid_a); >> + assert(ret !=3D NULL); >> + break; >> + default: >> + unreachable(); >> + } >> + switch (type_b) { >> + case MP_STR: >> + str =3D mp_decode_str(&field_b, &len); >> + rc =3D tt_uuid_from_lstring(str, len, &uuid_b); >> + assert(rc =3D=3D 0); >> + break; >> + case MP_BIN: >> + str =3D mp_decode_bin(&field_b, &len); >> + ret =3D uuid_unpack(&str, len, &uuid_b); >> + assert(ret !=3D NULL); >> + break; >> + case MP_EXT: >> + ret =3D mp_decode_uuid(&field_b, &uuid_b); >> + assert(ret !=3D NULL); >> + break; >> + default: >> + unreachable(); >> + } >> + return tt_uuid_compare(&uuid_a, &uuid_b); >=20 > 4. If we will decide to store uuids as MP_EXT, it may be > useful to introduce a function mp_compare_uuid, by analogue > with other mp_compare_* functions. To compare them directly > using memcmp, without unpacking. But see above my comment > about order. memcmp validness depends on how uuid is stored > in MessagePack. >=20 Done. uuid fields are stored in big-endian manner and in the same order they=E2=80=99re compared, so memcmp is valid. >> +} >> + >> +static inline int >> +mp_compare_uuid(const char *field_a, const char *field_b) >> +{ >> + return mp_compare_uuid_with_type(field_a, mp_typeof(*field_a), >> + field_b, mp_typeof(*field_b)); >> +} >> @@ -1578,6 +1642,21 @@ hint_decimal(decimal_t *dec) >> return hint_create(MP_CLASS_NUMBER, val); >> } >>=20 >> +static inline hint_t >> +hint_uuid(struct tt_uuid *uuid) >> +{ >> + /* Simply take the first part of the UUID as hint. */ >=20 > 5. Why only first? I meant, take the =E2=80=98high=E2=80=99 part of the UUID. The one = which=E2=80=99s compared first. >=20 >> + uint64_t val =3D 0; >> + val |=3D uuid->time_low; >> + val <<=3D sizeof(uuid->time_mid) * CHAR_BIT; >> + val |=3D uuid->time_mid; >> + val <<=3D sizeof(uuid->time_hi_and_version) * CHAR_BIT; >> + val |=3D uuid->time_hi_and_version; >> + /* Make space for class representation. */ >> + val >>=3D HINT_CLASS_BITS; >> + return hint_create(MP_CLASS_UUID, val); >> +} >> diff --git a/test/engine/gh-4268-uuid.result = b/test/engine/gh-4268-uuid.result >> new file mode 100644 >> index 000000000..928204507 >> --- /dev/null >> +++ b/test/engine/gh-4268-uuid.result >=20 > 6. Since this is a feature, we probably should not call the > test file using name pattern used for bugs. Just uuid.test.lua. Done. -- Serge Petrenko sergepetrenko@tarantool.org