From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounce@freelists.org>
Received: from localhost (localhost [127.0.0.1])
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 906AA2D414
	for <tarantool-patches@freelists.org>; Tue, 13 Nov 2018 17:37:49 -0500 (EST)
Received: from turing.freelists.org ([127.0.0.1])
	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id Eu4Qylt3UhPB for <tarantool-patches@freelists.org>;
	Tue, 13 Nov 2018 17:37:49 -0500 (EST)
Received: from smtp37.i.mail.ru (smtp37.i.mail.ru [94.100.177.97])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 760DA2D1C7
	for <tarantool-patches@freelists.org>; Tue, 13 Nov 2018 17:37:46 -0500 (EST)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\))
Subject: [tarantool-patches] Re: [PATCH v2 2/3] Introduce "none" and "binary"
 collations
From: "n.pettik" <korablev@tarantool.org>
In-Reply-To: <e1e6f1ad-a2bc-58f3-f6ce-6cbacd14dce0@tarantool.org>
Date: Wed, 14 Nov 2018 01:37:37 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <03A13ECD-6BA4-448F-8952-EE8E5C753130@tarantool.org>
References: <cover.1542066357.git.korablev@tarantool.org>
 <8e9bca5d0f620a5cbd3bdffe521cecdfd1ad0ffd.1542066357.git.korablev@tarantool.org>
 <e1e6f1ad-a2bc-58f3-f6ce-6cbacd14dce0@tarantool.org>
Sender: tarantool-patches-bounce@freelists.org
Errors-to: tarantool-patches-bounce@freelists.org
Reply-To: tarantool-patches@freelists.org
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <tarantool-patches-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: tarantool-patches <tarantool-patches.freelists.org>
List-subscribe: <tarantool-patches-request@freelists.org?Subject=subscribe>
List-owner: <mailto:>
List-post: <mailto:tarantool-patches@freelists.org>
List-archive: <http://www.freelists.org/archives/tarantool-patches>
To: tarantool-patches@freelists.org
Cc: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>


> On 13 Nov 2018, at 15:02, Vladislav Shpilevoy =
<v.shpilevoy@tarantool.org> wrote:
>=20
> Hi! Thanks for the fixes!
>=20
> Please, remember to put branch/issue link
> into a covering letter next time.

I=E2=80=99m sorry, it was late at night...

> See 5 comments below, my fixes at the end
> of the email and on the branch.
>=20
>> diff --git a/src/box/alter.cc b/src/box/alter.cc
>> index 6d2c59bbc..2eb9f53e8 100644
>> --- a/src/box/alter.cc
>> +++ b/src/box/alter.cc
>> @@ -403,7 +403,7 @@ field_def_decode(struct field_def *field, const =
char **data,
>>  				     "nullable action properties", =
fieldno +
>>  				     TUPLE_INDEX_BASE));
>>  	}
>> -	if (field->coll_id !=3D COLL_NONE &&
>> +	if (field->coll_id !=3D 0 &&
>=20
> 1. Why not to leave enum COLL_NONE, but change its value to 0?
> I guess, it is a purpose of define/enum features - define a name
> for a constant to be able to change the value under the hood. It
> could have reduced the diff.

For me it looks more convenient. But I don=E2=80=99t mind applying your =
fixes.
But there is one more point to avoid using COLL_NONE.
See my comment at the end of review.

>>  	    field->type !=3D FIELD_TYPE_STRING &&
>>  	    field->type !=3D FIELD_TYPE_SCALAR &&
>>  	    field->type !=3D FIELD_TYPE_ANY) {
>> diff --git a/src/box/sql/callback.c b/src/box/sql/callback.c
>> index 3cf3a835d..352745e0e 100644
>> --- a/src/box/sql/callback.c
>> +++ b/src/box/sql/callback.c
>> @@ -42,13 +42,22 @@
>>  struct coll *
>>  sql_get_coll_seq(Parse *parser, const char *name, uint32_t *coll_id)
>>  {
>> -	if (name =3D=3D NULL || strcasecmp(name, "binary") =3D=3D 0) {
>> -		*coll_id =3D COLL_NONE;
>> -		return NULL;
>> +	if (name =3D=3D NULL) {
>> +		*coll_id =3D 0;
>> +		return coll_by_id(0)->coll;
>>  	}
>> -	struct coll_id *p =3D coll_by_name(name, strlen(name));
>> +	struct coll_id *p;
>> +	/*
>> +	 * In SQL all identifiers should be uppercased, so
>> +	 * to avoid mess lets simple search binary (since it is
>> +	 * sort of "special" collation) ignoring case at all.
>> +	 */
>=20
> 2. I am not sure if it should be so special. I think, we should
> treat it just like any other collation. If tests have BINARY
> uppercased, then they should be fixed, I guess.

Thing is in SQLite binary can be used without quotes, like:

=E2=80=A6 COLLATE BiNaRy =E2=80=A6

And since ids are uppercased, this thing comes as =E2=80=9CBINARY".
If I forced only lower case =E2=80=9Cbinary=E2=80=9D, a lot (>80 =
separate sub-tests,
12 separate file tests) of tests would fail. Moreover, I see that =
similar
things take place at Lua-land (schema.cc):

local function update_index_parts(format, parts)
...
elseif k =3D=3D 'collation' then
    -- find ID by name
    local coll =3D box.space._collation.index.name:get{v}
    if not coll then
        coll =3D box.space._collation.index.name:get{v:lower()}
    end

If you still think that we should allow only =E2=80=9Cbinary=E2=80=9D =
format,
I will do it alongside with tests in a separate commit.

>> +	if (strcasecmp(name, "binary") =3D=3D 0)
>> +		p =3D coll_by_name("binary", strlen("binary"));
>> +	else
>> +		p =3D coll_by_name(name, strlen(name));
>>  	if (p =3D=3D NULL) {
>> -		*coll_id =3D COLL_NONE;
>> +		*coll_id =3D 0;
>>  		sqlite3ErrorMsg(parser, "no such collation sequence: =
%s",
>>  				name);
>>  		return NULL;
>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
>> index 8c34cbb3d..580cf1e60 100644
>> --- a/src/box/sql/func.c
>> +++ b/src/box/sql/func.c
>> @@ -506,7 +506,7 @@ case_type##ICUFunc(sqlite3_context *context, int =
argc, sqlite3_value **argv)   \
>>  	UErrorCode status =3D U_ZERO_ERROR;                              =
        \
>>  	struct coll *coll =3D sqlite3GetFuncCollSeq(context);            =
        \
>>  	const char *locale =3D NULL;                                     =
        \
>> -	if (coll !=3D NULL) {                                            =
        \
>> +	if (coll !=3D NULL && coll->collator !=3D NULL) {                =
          \
>=20
> 3. Why do you use coll->collator !=3D 0 instead of coll->type =3D=3D =
COLL_TYPE_ICU?
> I do not think it is safe since in future we can move collator into a =
union,
> which will not be NULL for a non-ICU collation.

Yep, you are absolutely right, I should use type of collation, instead =
of
checking existence of collator. Fixed:

diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 580cf1e60..a2ac9ba2d 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -506,7 +506,7 @@ case_type##ICUFunc(sqlite3_context *context, int =
argc, sqlite3_value **argv)   \
        UErrorCode status =3D U_ZERO_ERROR;                              =
        \
        struct coll *coll =3D sqlite3GetFuncCollSeq(context);            =
        \
        const char *locale =3D NULL;                                     =
        \
-       if (coll !=3D NULL && coll->collator !=3D NULL) {                =
          \
+       if (coll !=3D NULL && coll->type =3D=3D COLL_TYPE_ICU) {         =
            \
                locale =3D ucol_getLocaleByType(coll->collator,          =
        \
                                              ULOC_VALID_LOCALE, =
&status);     \
        }                                                                =
      \

>=20
>>  		locale =3D ucol_getLocaleByType(coll->collator,          =
        \
>>  					      ULOC_VALID_LOCALE, =
&status);     \
>>  	}        > diff --git a/src/box/tuple_format.c =
b/src/box/tuple_format.c
>> index 5bdf102e7..8e5aea51b 100644
>> --- a/src/box/tuple_format.c
>> +++ b/src/box/tuple_format.c
>> @@ -40,7 +40,7 @@ static uint32_t formats_size =3D 0, =
formats_capacity =3D 0;
>>  static const struct tuple_field tuple_field_default =3D {
>>        FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
>> -       ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE,
>> +       ON_CONFLICT_ACTION_DEFAULT, NULL, 0,
>> };
>=20
> 4. Are you sure that changing on_conflict_action belongs
> to this patch?

No, it doesn=E2=80=99t. I guess it is an artefact of rebasing.
Glad that you caught it. Fixed.

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 8e5aea51b..badffc701 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -40,7 +40,7 @@ static uint32_t formats_size =3D 0, formats_capacity =3D=
 0;
=20
 static const struct tuple_field tuple_field_default =3D {
        FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
-       ON_CONFLICT_ACTION_DEFAULT, NULL, 0,
+       ON_CONFLICT_ACTION_NONE, NULL, 0,

>=20
> 5. How about tests like this one?
>=20
> box.schema.create_space('test', {format =3D {{1, 'unsigned'}, {2, =
'string', collation =3D 'binary'}}})
>=20
> I mean specifying collations via Lua. And what happens if I use =
collation =3D 'none'
> in a space format or index parts? I think, it should throw an error, =
but it is up
> to Kostja or Kirill.

Hmm, personally I see no reason why an error should be raised.
If you specify =E2=80=9Cnone=E2=80=9D collation, then it will act as an =
absence of collation.
That is simple. Anyway, added tests on binary collation:

diff --git a/test/sql/collation.result b/test/sql/collation.result
index 3df4cfa70..b42d3e034 100644
--- a/test/sql/collation.result
+++ b/test/sql/collation.result
@@ -123,6 +123,49 @@ box.space.T:format()[3]['collation']
 box.sql.execute("DROP TABLE t;")
 ---
 ...
+-- BINARY collation works in the same way as there is no collation
+-- at all.
+--
+t =3D box.schema.create_space('test', {format =3D {{'id', 'unsigned'}, =
{'a', 'string', collation =3D 'binary'}}})
+---
+...
+t:format()[2]['collation']
+---
+- 3
+...
+pk =3D t:create_index('primary', {parts =3D {1}})
+---
+...
+i =3D t:create_index('secondary', {parts =3D {2, 'str', =
collation=3D'binary'}})
+---
+...
+t:insert({1, 'AsD'})
+---
+- [1, 'AsD']
+...
+t:insert({2, 'ASD'})
+---
+- [2, 'ASD']
+...
+t:insert({3, 'asd'})
+---
+- [3, 'asd']
+...
+i:select('asd')
+---
+- - [3, 'asd']
+...
+i:select('ASD')
+---
+- - [2, 'ASD']
+...
+i:select('AsD')
+---
+- - [1, 'AsD']
+...
+t:drop()
+---
+...
 -- Collation with id =3D=3D 0 is "none". It used to unify interaction
 -- with collation interface. It also can't be dropped.
 --
diff --git a/test/sql/collation.test.lua b/test/sql/collation.test.lua
index 61df33a95..05f666f5e 100644
--- a/test/sql/collation.test.lua
+++ b/test/sql/collation.test.lua
@@ -50,6 +50,21 @@ box.space.T:format()[2]['collation']
 box.space.T:format()[3]['collation']
 box.sql.execute("DROP TABLE t;")
=20
+-- BINARY collation works in the same way as there is no collation
+-- at all.
+--
+t =3D box.schema.create_space('test', {format =3D {{'id', 'unsigned'}, =
{'a', 'string', collation =3D 'binary'}}})
+t:format()[2]['collation']
+pk =3D t:create_index('primary', {parts =3D {1}})
+i =3D t:create_index('secondary', {parts =3D {2, 'str', =
collation=3D'binary'}})
+t:insert({1, 'AsD'})
+t:insert({2, 'ASD'})
+t:insert({3, 'asd'})
+i:select('asd')
+i:select('ASD')
+i:select('AsD')
+t:drop()
+

I would like to reveal my main concern about this patch:
pointer to =E2=80=9Cnone=E2=80=9D collation is not set in =
key_def/field_def, even if =E2=80=9Cnone=E2=80=9D
collation is forced. The way I wanted to implement this patch is like:

@@ -174,17 +174,15 @@ key_def_new(const struct key_part_def *parts, =
uint32_t part_count)
        for (uint32_t i =3D 0; i < part_count; i++) {
                const struct key_part_def *part =3D &parts[i];
                struct coll *coll =3D NULL;
-               if (part->type =3D=3D FIELD_TYPE_SCALAR ||
-                   part->type =3D=3D FIELD_TYPE_STRING) {
+               if (part->coll_id !=3D COLL_NONE) {
                        struct coll_id *coll_id =3D =
coll_by_id(part->coll_id);
-                       if (part->coll_id !=3D 0 && coll_id =3D=3D NULL) =
{
+                       if (coll_id =3D=3D NULL) {
                                diag_set(ClientError, =
ER_WRONG_INDEX_OPTIONS,
                                         i + 1, "collation was not found =
by ID");
                                key_def_delete(def);
                                return NULL;
                        }
-                       if (coll_id !=3D NULL)
-                               coll =3D coll_id->coll;
+                       coll =3D coll_id->coll;
                }
                key_def_set_part(def, i, part->fieldno, part->type,
                                 part->nullable_action, coll, =
part->coll_id,

@@ -321,8 +319,7 @@ key_def_sizeof_parts(const struct key_part_def =
*parts, uint32_t part_count)
        for (uint32_t i =3D 0; i < part_count; i++) {
                const struct key_part_def *part =3D &parts[i];
                int count =3D 2;
-               if (part->type =3D=3D FIELD_TYPE_STRING ||
-                   part->type =3D=3D FIELD_TYPE_SCALAR)
+               if (part->coll_id !=3D COLL_NONE)
                        count++;
                if (part->is_nullable)
                        count++;
@@ -332,8 +329,7 @@ key_def_sizeof_parts(const struct key_part_def =
*parts, uint32_t part_count)
                assert(part->type < field_type_MAX);
                size +=3D mp_sizeof_str(strlen(PART_OPT_TYPE));
                size +=3D =
mp_sizeof_str(strlen(field_type_strs[part->type]));
-               if (part->type =3D=3D FIELD_TYPE_STRING ||
-                   part->type =3D=3D FIELD_TYPE_SCALAR) {
+               if (part->coll_id !=3D COLL_NONE) {
                        size +=3D =
mp_sizeof_str(strlen(PART_OPT_COLLATION));
                        size +=3D mp_sizeof_uint(part->coll_id);
                }
@@ -352,8 +348,7 @@ key_def_encode_parts(char *data, const struct =
key_part_def *parts,
        for (uint32_t i =3D 0; i < part_count; i++) {
                const struct key_part_def *part =3D &parts[i];
                int count =3D 2;
-               if (part->type =3D=3D FIELD_TYPE_STRING ||
-                   part->type =3D=3D FIELD_TYPE_SCALAR)
+               if (part->coll_id !=3D COLL_NONE)
                        count++;
                if (part->is_nullable)
                        count++;
@@ -366,8 +361,7 @@ key_def_encode_parts(char *data, const struct =
key_part_def *parts,
                assert(part->type < field_type_MAX);
                const char *type_str =3D field_type_strs[part->type];
                data =3D mp_encode_str(data, type_str, =
strlen(type_str));
-               if (part->type =3D=3D FIELD_TYPE_STRING ||
-                   part->type =3D=3D FIELD_TYPE_SCALAR) {
+               if (part->coll_id !=3D COLL_NONE) {
                        data =3D mp_encode_str(data, PART_OPT_COLLATION,
                                             =
strlen(PART_OPT_COLLATION));
                        data =3D mp_encode_uint(data, part->coll_id);

diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index efaa10da0..7cae436f1 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -299,15 +299,12 @@ lbox_fillspace(struct lua_State *L, struct space =
*space, int i)
                        lua_pushboolean(L, key_part_is_nullable(part));
                        lua_setfield(L, -2, "is_nullable");
=20
-                       if (part->type =3D=3D FIELD_TYPE_STRING ||
-                           part->type =3D=3D FIELD_TYPE_SCALAR) {
-                               if (! space_is_system(space)) {
-                                       struct coll_id *coll_id =3D
-                                               =
coll_by_id(part->coll_id);
-                                       assert(coll_id !=3D NULL);
-                                       lua_pushstring(L, =
coll_id->name);
-                                       lua_setfield(L, -2, =
"collation");
-                               }
+                       if (part->coll_id !=3D COLL_NONE) {
+                               struct coll_id *coll_id =3D
+                                       coll_by_id(part->coll_id);
+                               assert(coll_id !=3D NULL);
+                               lua_pushstring(L, coll_id->name);
+                               lua_setfield(L, -2, "collation");
                        }

In other words, each field with type of STRING/SCALAR and mb ANY
would have real pointer to collation. However, in turn such change
would break too many tests, without significant refactoring of SQL
query planner (at least I failed to do it in several hours).
On the other hand, in this case we can remove COLL_NONE
and unify collation interface: each field or key part MUST feature
collation. And I mean not only valid ID (it is how it works now),
but also valid (i.e. !=3D NULL) pointer to collation struct. So, it =
seems
that we would be able to remove any checks like:

if (coll !=3D NULL)
	*use collation*
else
	*process without involving collations*

Would become:

If (type_is_string_like)
	*always use collation*

To be honest I was sure that you would notice this pickle=E2=80=A6
Or current approach is not dat bad?