From: Serge Petrenko <sergepetrenko@tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: tml <tarantool-patches@dev.tarantool.org> Subject: Re: [Tarantool-patches] [PATCH 4/4] box: introduce indices by UUID Date: Fri, 10 Apr 2020 02:46:51 +0300 [thread overview] Message-ID: <7FF9C776-045A-4ADF-B44B-0CFB7FD746F9@tarantool.org> (raw) In-Reply-To: <f57d9ff7-7e8e-73f0-ee47-4372d9033fac@tarantool.org> > 6 апр. 2020 г., в 00:29, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а): > > Thanks for the patch! Hi! Thanks for the review! > > See 6 comments below. > > On 04/04/2020 01:02, Serge Petrenko wrote: >> It is now possible to create an index over UUID values, either returned >> by something like `uuid.new()` or 36- and 32- byte strings in format >> "88ebbb9a-8c18-480f-bd04-dd5345f1573c" or "88ebbb9a8c18480fbd04dd5345f1573c". >> 16-byte binary strings are also supported. >> >> Closes #4268 >> Closes #2916 >> >> @TarantoolBot document >> Title: Document uuid field type. >> >> There's a new field type -- UUID, it accepts values returned by >> `uuid.new()`, as well as strings representing uuids, like >> `88ebbb9a-8c18-480f-bd04-dd5345f1573c` and >> `88ebbb9a8c18480fbd04dd5345f1573c` and 16 byte binary values, encoded in >> msgpack. >> >> The index may be either unique or non-unique, nullable or non-nullable, >> and may be a primary key. >> >> To create an index over a uuid field for space `test`, say: >> ``` >> box.space.test:create_index("pk", {parts={1, 'uuid'}}) >> ``` >> Now you may insert uuids into the space: >> ``` >> tarantool> box.space.test:insert{uuid.new()} >> --- >> - [e631fdcc-0e8a-4d2f-83fd-b0ce6762b13f] >> ... >> >> tarantool> box.space.test:insert{"64d22e4d-ac92-4a23-899a-e59f34af5479"} >> --- >> - ['64d22e4d-ac92-4a23-899a-e59f34af5479'] >> ... >> >> tarantool> box.space.test:insert{"856dc1177dcc49969f1407f8c6c8a371"} >> --- >> - ['856dc1177dcc49969f1407f8c6c8a371'] >> ... >> >> tarantool> box.space.test:select{} >> --- >> - - ['64d22e4d-ac92-4a23-899a-e59f34af5479'] >> - ['856dc1177dcc49969f1407f8c6c8a371'] >> - [e631fdcc-0e8a-4d2f-83fd-b0ce6762b13f] > > 1. Wait. Why are they printed differently? They should be all the > same 36 strings when you print them, no? > > Seems like you understood Mons to the letter. I believe he meant > we should provide a way to insert strings to simplify moving data > to a new space, but it does not mean we should store them as strings. > It makes comparators slower. sscanf() to extract uuid from a string > on *each* comparison is a huge performance hole. From what I measured > recently (not in Tarantool), sscanf() is a *really* slow thing. Its > usage in comparators is dangerous. > > On the other hand ability to insert strings means we need to convert > them to MP_EXT after insertion, which means full scan of each tuple > by field types and possibly its recreation. We had a similar problem > with 'double' type. After all it was decided to keep as is and let > users enforce the needed type when they want using ffi.cast('double'). > > IMO, we should just disallow both ability to store strings, and to > insert strings. I mean, you can't insert a string into a decimal field, > for example. So a user anyway will never be able to insert only Lua > simple types. Cdata is enforced for decimal and double right now. > > Better keep it simple. If someone wants to convert their existing spaces > to mp uuid, they easily can write a trivial function which would replace > string uuid with cdata uuid before insertion into the new space. We > should not sacrifice perf for that. > > This is only my opinion, you may want to ask Mons and Kirill. So, I consulted. Mons, he’s ok with allowing only cdata. > > > 2. It is worth mentioning comparison order. Are they compared as > strings, or individual uuid fields are compared? In the latter case > in what order the fields are compared? Ok, mentioned in commit message. The order’s lexicographical. > > Also probably string comparison vs by-field comparison vs memcmp > comparison will be the same in case fields are stored in msgpack > in the same order as in the string, and in big-endian. I didn't check > but it is worth investigation. It would be good to be able to use > memcmp. > >> diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc >> index 3f8a0ce24..aad69340d 100644 >> --- a/src/box/tuple_compare.cc >> +++ b/src/box/tuple_compare.cc >> @@ -378,6 +382,61 @@ mp_compare_bin(const char *field_a, const char *field_b) >> return COMPARE_RESULT(size_a, size_b); >> } >> >> +static int >> +mp_compare_uuid_with_type(const char *field_a, enum mp_type type_a, >> + const char *field_b, enum mp_type type_b) >> +{ >> + struct tt_uuid uuid_a, uuid_b; >> + struct tt_uuid *ret; >> + const char *str; >> + uint32_t len; >> + int rc; >> + switch (type_a) { >> + case MP_STR: >> + str = mp_decode_str(&field_a, &len); >> + rc = tt_uuid_from_lstring(str, len, &uuid_a); > > 3. This is perf killer. Basically it would be faster to store > uuids as a string, or just as 16 bytes binary. Because memcmp > could be used directly then. Switched to memcmp. > >> + assert(rc == 0); >> + break; >> + case MP_BIN: >> + str = mp_decode_bin(&field_a, &len); >> + ret = uuid_unpack(&str, len, &uuid_a); >> + assert(ret != NULL); >> + break; >> + case MP_EXT: >> + ret = mp_decode_uuid(&field_a, &uuid_a); >> + assert(ret != NULL); >> + break; >> + default: >> + unreachable(); >> + } >> + switch (type_b) { >> + case MP_STR: >> + str = mp_decode_str(&field_b, &len); >> + rc = tt_uuid_from_lstring(str, len, &uuid_b); >> + assert(rc == 0); >> + break; >> + case MP_BIN: >> + str = mp_decode_bin(&field_b, &len); >> + ret = uuid_unpack(&str, len, &uuid_b); >> + assert(ret != NULL); >> + break; >> + case MP_EXT: >> + ret = mp_decode_uuid(&field_b, &uuid_b); >> + assert(ret != NULL); >> + break; >> + default: >> + unreachable(); >> + } >> + return tt_uuid_compare(&uuid_a, &uuid_b); > > 4. If we will decide to store uuids as MP_EXT, it may be > useful to introduce a function mp_compare_uuid, by analogue > with other mp_compare_* functions. To compare them directly > using memcmp, without unpacking. But see above my comment > about order. memcmp validness depends on how uuid is stored > in MessagePack. > Done. uuid fields are stored in big-endian manner and in the same order they’re compared, so memcmp is valid. >> +} >> + >> +static inline int >> +mp_compare_uuid(const char *field_a, const char *field_b) >> +{ >> + return mp_compare_uuid_with_type(field_a, mp_typeof(*field_a), >> + field_b, mp_typeof(*field_b)); >> +} >> @@ -1578,6 +1642,21 @@ hint_decimal(decimal_t *dec) >> return hint_create(MP_CLASS_NUMBER, val); >> } >> >> +static inline hint_t >> +hint_uuid(struct tt_uuid *uuid) >> +{ >> + /* Simply take the first part of the UUID as hint. */ > > 5. Why only first? I meant, take the ‘high’ part of the UUID. The one which’s compared first. > >> + uint64_t val = 0; >> + val |= uuid->time_low; >> + val <<= sizeof(uuid->time_mid) * CHAR_BIT; >> + val |= uuid->time_mid; >> + val <<= sizeof(uuid->time_hi_and_version) * CHAR_BIT; >> + val |= uuid->time_hi_and_version; >> + /* Make space for class representation. */ >> + val >>= HINT_CLASS_BITS; >> + return hint_create(MP_CLASS_UUID, val); >> +} >> diff --git a/test/engine/gh-4268-uuid.result b/test/engine/gh-4268-uuid.result >> new file mode 100644 >> index 000000000..928204507 >> --- /dev/null >> +++ b/test/engine/gh-4268-uuid.result > > 6. Since this is a feature, we probably should not call the > test file using name pattern used for bugs. Just uuid.test.lua. Done. -- Serge Petrenko sergepetrenko@tarantool.org
next prev parent reply other threads:[~2020-04-09 23:46 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-03 23:02 [Tarantool-patches] [PATCH 0/4] introduce indices over UUID Serge Petrenko 2020-04-03 23:02 ` [Tarantool-patches] [PATCH 1/4] decimal: fix comment typo Serge Petrenko 2020-04-05 21:22 ` Vladislav Shpilevoy 2020-04-03 23:02 ` [Tarantool-patches] [PATCH 2/4] uuid: expose additional from_string constructors Serge Petrenko 2020-04-05 21:22 ` Vladislav Shpilevoy 2020-04-09 23:46 ` Serge Petrenko 2020-04-10 16:56 ` Vladislav Shpilevoy 2020-04-11 13:35 ` Serge Petrenko 2020-04-03 23:02 ` [Tarantool-patches] [PATCH 3/4] box: add MsgPack encoding/decoding for UUID Serge Petrenko 2020-04-05 21:26 ` Vladislav Shpilevoy 2020-04-09 23:46 ` Serge Petrenko 2020-04-03 23:02 ` [Tarantool-patches] [PATCH 4/4] box: introduce indices by UUID Serge Petrenko 2020-04-05 21:29 ` Vladislav Shpilevoy 2020-04-09 23:46 ` Serge Petrenko [this message] 2020-04-10 16:56 ` Vladislav Shpilevoy 2020-04-05 21:21 ` [Tarantool-patches] [PATCH 0/4] introduce indices over UUID Vladislav Shpilevoy 2020-04-09 23:46 ` Serge Petrenko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=7FF9C776-045A-4ADF-B44B-0CFB7FD746F9@tarantool.org \ --to=sergepetrenko@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 4/4] box: introduce indices by UUID' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox