From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 906AA2D414 for ; Tue, 13 Nov 2018 17:37:49 -0500 (EST) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Eu4Qylt3UhPB for ; Tue, 13 Nov 2018 17:37:49 -0500 (EST) Received: from smtp37.i.mail.ru (smtp37.i.mail.ru [94.100.177.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 760DA2D1C7 for ; Tue, 13 Nov 2018 17:37:46 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\)) Subject: [tarantool-patches] Re: [PATCH v2 2/3] Introduce "none" and "binary" collations From: "n.pettik" In-Reply-To: Date: Wed, 14 Nov 2018 01:37:37 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <03A13ECD-6BA4-448F-8952-EE8E5C753130@tarantool.org> References: <8e9bca5d0f620a5cbd3bdffe521cecdfd1ad0ffd.1542066357.git.korablev@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org Cc: Vladislav Shpilevoy > On 13 Nov 2018, at 15:02, Vladislav Shpilevoy = wrote: >=20 > Hi! Thanks for the fixes! >=20 > Please, remember to put branch/issue link > into a covering letter next time. I=E2=80=99m sorry, it was late at night... > See 5 comments below, my fixes at the end > of the email and on the branch. >=20 >> diff --git a/src/box/alter.cc b/src/box/alter.cc >> index 6d2c59bbc..2eb9f53e8 100644 >> --- a/src/box/alter.cc >> +++ b/src/box/alter.cc >> @@ -403,7 +403,7 @@ field_def_decode(struct field_def *field, const = char **data, >> "nullable action properties", = fieldno + >> TUPLE_INDEX_BASE)); >> } >> - if (field->coll_id !=3D COLL_NONE && >> + if (field->coll_id !=3D 0 && >=20 > 1. Why not to leave enum COLL_NONE, but change its value to 0? > I guess, it is a purpose of define/enum features - define a name > for a constant to be able to change the value under the hood. It > could have reduced the diff. For me it looks more convenient. But I don=E2=80=99t mind applying your = fixes. But there is one more point to avoid using COLL_NONE. See my comment at the end of review. >> field->type !=3D FIELD_TYPE_STRING && >> field->type !=3D FIELD_TYPE_SCALAR && >> field->type !=3D FIELD_TYPE_ANY) { >> diff --git a/src/box/sql/callback.c b/src/box/sql/callback.c >> index 3cf3a835d..352745e0e 100644 >> --- a/src/box/sql/callback.c >> +++ b/src/box/sql/callback.c >> @@ -42,13 +42,22 @@ >> struct coll * >> sql_get_coll_seq(Parse *parser, const char *name, uint32_t *coll_id) >> { >> - if (name =3D=3D NULL || strcasecmp(name, "binary") =3D=3D 0) { >> - *coll_id =3D COLL_NONE; >> - return NULL; >> + if (name =3D=3D NULL) { >> + *coll_id =3D 0; >> + return coll_by_id(0)->coll; >> } >> - struct coll_id *p =3D coll_by_name(name, strlen(name)); >> + struct coll_id *p; >> + /* >> + * In SQL all identifiers should be uppercased, so >> + * to avoid mess lets simple search binary (since it is >> + * sort of "special" collation) ignoring case at all. >> + */ >=20 > 2. I am not sure if it should be so special. I think, we should > treat it just like any other collation. If tests have BINARY > uppercased, then they should be fixed, I guess. Thing is in SQLite binary can be used without quotes, like: =E2=80=A6 COLLATE BiNaRy =E2=80=A6 And since ids are uppercased, this thing comes as =E2=80=9CBINARY". If I forced only lower case =E2=80=9Cbinary=E2=80=9D, a lot (>80 = separate sub-tests, 12 separate file tests) of tests would fail. Moreover, I see that = similar things take place at Lua-land (schema.cc): local function update_index_parts(format, parts) ... elseif k =3D=3D 'collation' then -- find ID by name local coll =3D box.space._collation.index.name:get{v} if not coll then coll =3D box.space._collation.index.name:get{v:lower()} end If you still think that we should allow only =E2=80=9Cbinary=E2=80=9D = format, I will do it alongside with tests in a separate commit. >> + if (strcasecmp(name, "binary") =3D=3D 0) >> + p =3D coll_by_name("binary", strlen("binary")); >> + else >> + p =3D coll_by_name(name, strlen(name)); >> if (p =3D=3D NULL) { >> - *coll_id =3D COLL_NONE; >> + *coll_id =3D 0; >> sqlite3ErrorMsg(parser, "no such collation sequence: = %s", >> name); >> return NULL; >> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >> index 8c34cbb3d..580cf1e60 100644 >> --- a/src/box/sql/func.c >> +++ b/src/box/sql/func.c >> @@ -506,7 +506,7 @@ case_type##ICUFunc(sqlite3_context *context, int = argc, sqlite3_value **argv) \ >> UErrorCode status =3D U_ZERO_ERROR; = \ >> struct coll *coll =3D sqlite3GetFuncCollSeq(context); = \ >> const char *locale =3D NULL; = \ >> - if (coll !=3D NULL) { = \ >> + if (coll !=3D NULL && coll->collator !=3D NULL) { = \ >=20 > 3. Why do you use coll->collator !=3D 0 instead of coll->type =3D=3D = COLL_TYPE_ICU? > I do not think it is safe since in future we can move collator into a = union, > which will not be NULL for a non-ICU collation. Yep, you are absolutely right, I should use type of collation, instead = of checking existence of collator. Fixed: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 580cf1e60..a2ac9ba2d 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -506,7 +506,7 @@ case_type##ICUFunc(sqlite3_context *context, int = argc, sqlite3_value **argv) \ UErrorCode status =3D U_ZERO_ERROR; = \ struct coll *coll =3D sqlite3GetFuncCollSeq(context); = \ const char *locale =3D NULL; = \ - if (coll !=3D NULL && coll->collator !=3D NULL) { = \ + if (coll !=3D NULL && coll->type =3D=3D COLL_TYPE_ICU) { = \ locale =3D ucol_getLocaleByType(coll->collator, = \ ULOC_VALID_LOCALE, = &status); \ } = \ >=20 >> locale =3D ucol_getLocaleByType(coll->collator, = \ >> ULOC_VALID_LOCALE, = &status); \ >> } > diff --git a/src/box/tuple_format.c = b/src/box/tuple_format.c >> index 5bdf102e7..8e5aea51b 100644 >> --- a/src/box/tuple_format.c >> +++ b/src/box/tuple_format.c >> @@ -40,7 +40,7 @@ static uint32_t formats_size =3D 0, = formats_capacity =3D 0; >> static const struct tuple_field tuple_field_default =3D { >> FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, >> - ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE, >> + ON_CONFLICT_ACTION_DEFAULT, NULL, 0, >> }; >=20 > 4. Are you sure that changing on_conflict_action belongs > to this patch? No, it doesn=E2=80=99t. I guess it is an artefact of rebasing. Glad that you caught it. Fixed. diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 8e5aea51b..badffc701 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -40,7 +40,7 @@ static uint32_t formats_size =3D 0, formats_capacity =3D= 0; =20 static const struct tuple_field tuple_field_default =3D { FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, - ON_CONFLICT_ACTION_DEFAULT, NULL, 0, + ON_CONFLICT_ACTION_NONE, NULL, 0, >=20 > 5. How about tests like this one? >=20 > box.schema.create_space('test', {format =3D {{1, 'unsigned'}, {2, = 'string', collation =3D 'binary'}}}) >=20 > I mean specifying collations via Lua. And what happens if I use = collation =3D 'none' > in a space format or index parts? I think, it should throw an error, = but it is up > to Kostja or Kirill. Hmm, personally I see no reason why an error should be raised. If you specify =E2=80=9Cnone=E2=80=9D collation, then it will act as an = absence of collation. That is simple. Anyway, added tests on binary collation: diff --git a/test/sql/collation.result b/test/sql/collation.result index 3df4cfa70..b42d3e034 100644 --- a/test/sql/collation.result +++ b/test/sql/collation.result @@ -123,6 +123,49 @@ box.space.T:format()[3]['collation'] box.sql.execute("DROP TABLE t;") --- ... +-- BINARY collation works in the same way as there is no collation +-- at all. +-- +t =3D box.schema.create_space('test', {format =3D {{'id', 'unsigned'}, = {'a', 'string', collation =3D 'binary'}}}) +--- +... +t:format()[2]['collation'] +--- +- 3 +... +pk =3D t:create_index('primary', {parts =3D {1}}) +--- +... +i =3D t:create_index('secondary', {parts =3D {2, 'str', = collation=3D'binary'}}) +--- +... +t:insert({1, 'AsD'}) +--- +- [1, 'AsD'] +... +t:insert({2, 'ASD'}) +--- +- [2, 'ASD'] +... +t:insert({3, 'asd'}) +--- +- [3, 'asd'] +... +i:select('asd') +--- +- - [3, 'asd'] +... +i:select('ASD') +--- +- - [2, 'ASD'] +... +i:select('AsD') +--- +- - [1, 'AsD'] +... +t:drop() +--- +... -- Collation with id =3D=3D 0 is "none". It used to unify interaction -- with collation interface. It also can't be dropped. -- diff --git a/test/sql/collation.test.lua b/test/sql/collation.test.lua index 61df33a95..05f666f5e 100644 --- a/test/sql/collation.test.lua +++ b/test/sql/collation.test.lua @@ -50,6 +50,21 @@ box.space.T:format()[2]['collation'] box.space.T:format()[3]['collation'] box.sql.execute("DROP TABLE t;") =20 +-- BINARY collation works in the same way as there is no collation +-- at all. +-- +t =3D box.schema.create_space('test', {format =3D {{'id', 'unsigned'}, = {'a', 'string', collation =3D 'binary'}}}) +t:format()[2]['collation'] +pk =3D t:create_index('primary', {parts =3D {1}}) +i =3D t:create_index('secondary', {parts =3D {2, 'str', = collation=3D'binary'}}) +t:insert({1, 'AsD'}) +t:insert({2, 'ASD'}) +t:insert({3, 'asd'}) +i:select('asd') +i:select('ASD') +i:select('AsD') +t:drop() + I would like to reveal my main concern about this patch: pointer to =E2=80=9Cnone=E2=80=9D collation is not set in = key_def/field_def, even if =E2=80=9Cnone=E2=80=9D collation is forced. The way I wanted to implement this patch is like: @@ -174,17 +174,15 @@ key_def_new(const struct key_part_def *parts, = uint32_t part_count) for (uint32_t i =3D 0; i < part_count; i++) { const struct key_part_def *part =3D &parts[i]; struct coll *coll =3D NULL; - if (part->type =3D=3D FIELD_TYPE_SCALAR || - part->type =3D=3D FIELD_TYPE_STRING) { + if (part->coll_id !=3D COLL_NONE) { struct coll_id *coll_id =3D = coll_by_id(part->coll_id); - if (part->coll_id !=3D 0 && coll_id =3D=3D NULL) = { + if (coll_id =3D=3D NULL) { diag_set(ClientError, = ER_WRONG_INDEX_OPTIONS, i + 1, "collation was not found = by ID"); key_def_delete(def); return NULL; } - if (coll_id !=3D NULL) - coll =3D coll_id->coll; + coll =3D coll_id->coll; } key_def_set_part(def, i, part->fieldno, part->type, part->nullable_action, coll, = part->coll_id, @@ -321,8 +319,7 @@ key_def_sizeof_parts(const struct key_part_def = *parts, uint32_t part_count) for (uint32_t i =3D 0; i < part_count; i++) { const struct key_part_def *part =3D &parts[i]; int count =3D 2; - if (part->type =3D=3D FIELD_TYPE_STRING || - part->type =3D=3D FIELD_TYPE_SCALAR) + if (part->coll_id !=3D COLL_NONE) count++; if (part->is_nullable) count++; @@ -332,8 +329,7 @@ key_def_sizeof_parts(const struct key_part_def = *parts, uint32_t part_count) assert(part->type < field_type_MAX); size +=3D mp_sizeof_str(strlen(PART_OPT_TYPE)); size +=3D = mp_sizeof_str(strlen(field_type_strs[part->type])); - if (part->type =3D=3D FIELD_TYPE_STRING || - part->type =3D=3D FIELD_TYPE_SCALAR) { + if (part->coll_id !=3D COLL_NONE) { size +=3D = mp_sizeof_str(strlen(PART_OPT_COLLATION)); size +=3D mp_sizeof_uint(part->coll_id); } @@ -352,8 +348,7 @@ key_def_encode_parts(char *data, const struct = key_part_def *parts, for (uint32_t i =3D 0; i < part_count; i++) { const struct key_part_def *part =3D &parts[i]; int count =3D 2; - if (part->type =3D=3D FIELD_TYPE_STRING || - part->type =3D=3D FIELD_TYPE_SCALAR) + if (part->coll_id !=3D COLL_NONE) count++; if (part->is_nullable) count++; @@ -366,8 +361,7 @@ key_def_encode_parts(char *data, const struct = key_part_def *parts, assert(part->type < field_type_MAX); const char *type_str =3D field_type_strs[part->type]; data =3D mp_encode_str(data, type_str, = strlen(type_str)); - if (part->type =3D=3D FIELD_TYPE_STRING || - part->type =3D=3D FIELD_TYPE_SCALAR) { + if (part->coll_id !=3D COLL_NONE) { data =3D mp_encode_str(data, PART_OPT_COLLATION, = strlen(PART_OPT_COLLATION)); data =3D mp_encode_uint(data, part->coll_id); diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc index efaa10da0..7cae436f1 100644 --- a/src/box/lua/space.cc +++ b/src/box/lua/space.cc @@ -299,15 +299,12 @@ lbox_fillspace(struct lua_State *L, struct space = *space, int i) lua_pushboolean(L, key_part_is_nullable(part)); lua_setfield(L, -2, "is_nullable"); =20 - if (part->type =3D=3D FIELD_TYPE_STRING || - part->type =3D=3D FIELD_TYPE_SCALAR) { - if (! space_is_system(space)) { - struct coll_id *coll_id =3D - = coll_by_id(part->coll_id); - assert(coll_id !=3D NULL); - lua_pushstring(L, = coll_id->name); - lua_setfield(L, -2, = "collation"); - } + if (part->coll_id !=3D COLL_NONE) { + struct coll_id *coll_id =3D + coll_by_id(part->coll_id); + assert(coll_id !=3D NULL); + lua_pushstring(L, coll_id->name); + lua_setfield(L, -2, "collation"); } In other words, each field with type of STRING/SCALAR and mb ANY would have real pointer to collation. However, in turn such change would break too many tests, without significant refactoring of SQL query planner (at least I failed to do it in several hours). On the other hand, in this case we can remove COLL_NONE and unify collation interface: each field or key part MUST feature collation. And I mean not only valid ID (it is how it works now), but also valid (i.e. !=3D NULL) pointer to collation struct. So, it = seems that we would be able to remove any checks like: if (coll !=3D NULL) *use collation* else *process without involving collations* Would become: If (type_is_string_like) *always use collation* To be honest I was sure that you would notice this pickle=E2=80=A6 Or current approach is not dat bad?