From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 2C8192F338 for ; Thu, 1 Nov 2018 13:45:40 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GLxh3pymQnAW for ; Thu, 1 Nov 2018 13:45:40 -0400 (EDT) Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 127F12EE25 for ; Thu, 1 Nov 2018 13:45:39 -0400 (EDT) From: "n.pettik" Message-Id: <0EDE3854-BAD1-4569-9F93-D3876288ACAC@tarantool.org> Content-Type: multipart/alternative; boundary="Apple-Mail=_F7F51B3C-D036-486A-A09A-D317A341F06F" Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\)) Subject: [tarantool-patches] Re: [PATCH 2/3] Add surrogate ID for BINARY collation Date: Thu, 1 Nov 2018 20:45:35 +0300 In-Reply-To: <95CB17D5-E3ED-4B05-A289-983E2FD0DE37@gmail.com> References: <80794eb0182261e1887adc60c170c550de91fabc.1540460716.git.korablev@tarantool.org> <2A51C9E8-2A24-4F04-ABF1-0983F4322E82@tarantool.org> <20181101113717.GB2340@chai> <84dc3919-fd62-143d-327b-6f7ae184be5e@tarantool.org> <20181101125810.GA28156@chai> <2B8A8EDD-2479-4C1F-9FF3-E17B16DFB0AE@tarantool.org> <20181101153915.GL30032@chai> <95CB17D5-E3ED-4B05-A289-983E2FD0DE37@gmail.com> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org Cc: Vladislav Shpilevoy , Konstantin Osipov --Apple-Mail=_F7F51B3C-D036-486A-A09A-D317A341F06F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 I have occasionally sent mail from wrong address, so you might miss it. My apologies, I resend it from right one. > On 1 Nov 2018, at 19:31, =D0=9D=D0=B8=D0=BA=D0=B8=D1=82=D0=B0 = =D0=9F=D0=B5=D1=82=D1=82=D0=B8=D0=BA wrote: >=20 >> On 1 Nov 2018, at 18:39, Konstantin Osipov = wrote: >>=20 >> * n.pettik [18/11/01 16:11]: >>>>> I guess, because >>>>>=20 >>>>> 1) It is not real collation and is not presented in >>>>> _collation. So for a user it would be strange to see >>>>> a gap between 2 and 4 in _collation, which can not be >>>>> set. >>>>=20 >>>> Let's insert it there. >>>=20 >>> So, you insist on id =3D=3D 3, right? Again, if user process select >>> rom _collation space, one won=E2=80=99t see entry with id =3D=3D 3. >>> On the other hand, if user attempts at inserting id =3D=3D 3, >>> one will get an error. >>=20 >> No, I don't insist yet. Why not insert a special row in there? >=20 > Because insertion to _collation would result in creation > of collation objects. Meanwhile, in fact we need only ID > to distinguish BINARY and no-collation. The rest is the > same for them. So, it makes sense to store only ID within > space format. That is my point. >=20 >>>>> is consistent to has its ID near COLL_NONE, in a "special >>>>> range" of collation identifiers. >>>>=20 >>>> Uhm, AFAIU we have two binary collations. One is "collation is not >>>> set" and another is "collation binary". Which one did you mean >>>> now? >>>=20 >>> FIrst one is not collation at all. It is rather =E2=80=9Cabsence=E2=80= =9D of any collation. >>> The second one is sort of =E2=80=9Csurrogate=E2=80=9D and in terms = of functionality >>> means the same. However, its id will be stored in space format in >>> order to indicate that BINARY collation should be forced during >>> comparisons. >>=20 >> I think we could use internal ids to reference both cases. For >> these both ids we could have surrogate rows in _coll system space, >> they won't harm. This will make things easier in the future.=20 >=20 > Ok, how do you suggest to call =E2=80=9Cabsence=E2=80=9D of = collation? Like this: >=20 > box.space._collation:select() >=20 > --- > - - [1, 'unicode', 1, 'ICU', '', {}] > - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary=E2=80=99}] > - [3, =E2=80=98none', 1, 'ICU', '', {}] > ... >=20 > It is nonsense, IMHO. No collation is like =E2=80=9Cno collation at = all=E2=80=9D - > nothing represents it, especially visible for user. With BINARY > collation it would look even more suspicious: >=20 > - - [1, 'unicode', 1, 'ICU', '', {}] > - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary=E2=80=99}] > - [3, =E2=80=98none', 1, 'ICU', '', {}] > - [4, =E2=80=98binary', 1, 'ICU', '', {}] >=20 > It would confuse users who don=E2=80=99t use SQL: in Tarantool NoSQL > there is no difference between =E2=80=9Cbinary=E2=80=9D and = =E2=80=9Cno-collation=E2=80=9D. > Moreover, to keep things consistent, we would have to make > default collation be =E2=80=99none=E2=80=99 instead of absence of = collation. > It means that field def without explicitly set collation would > have =E2=80=99none=E2=80=99 collation in format. For instance: >=20 > *before* >=20 > - [{'affinity': 66, 'type': =E2=80=99string', 'nullable_action': = 'abort', 'name': 'ID', 'is_nullable': false}] >=20 > *after* >=20 > - [{'collation': 3, 'affinity': 66, 'type': 'string', = 'nullable_action': 'abort', > 'name': 'ID', 'is_nullable': false}] >=20 >> This is going to be the same mess as with NO ACTION and DEFAULT, >> which are mostly the same, but not quite, so we'd better prepare. >=20 > It is considered to be mess due to SQLite legacy. On the other hand, = all > these manipulations with collations follow SQL ANSI. >=20 > All points considered, I would prefer to introduce only another one ID > (alongside with COLL_NONE ID) and prohibit to create collations with > these ids. OR, add surrogate =E2=80=9Cbinary collation=E2=80=9D to = _collation with id =3D=3D 3, > but not both =E2=80=9Cbinary=E2=80=9D and =E2=80=9Cnone=E2=80=9D. --Apple-Mail=_F7F51B3C-D036-486A-A09A-D317A341F06F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 I = have occasionally sent mail from wrong address, so
you = might miss it. My apologies, I resend it from right one.

On 1 Nov 2018, at 19:31, =D0=9D=D0=B8=D0=BA=D0=B8= =D1=82=D0=B0 =D0=9F=D0=B5=D1=82=D1=82=D0=B8=D0=BA <kitnerh@gmail.com> = wrote:

On 1 Nov 2018, at 18:39, Konstantin = Osipov <kostja@tarantool.org> wrote:

* n.pettik <korablev@tarantool.org> [18/11/01 16:11]:
I guess, because

1) It is not real collation and is not = presented in
_collation. So for a user it would be strange = to see
a gap between 2 and 4 in _collation, which can not = be
set.

Let's = insert it there.

So, you = insist on id =3D=3D 3, right? Again, if user process select
rom _collation space, one won=E2=80=99t see entry with id =3D=3D= 3.
On the other hand, if user attempts at inserting id =3D=3D= 3,
one will get an error.

No, I don't insist yet. Why not insert a special row in = there?

Because insertion to _collation would result in = creation
of collation = objects. Meanwhile, in fact we need only ID
to distinguish BINARY and = no-collation. The rest is the
same for them. So, it makes sense to store only ID = within
space format. = That is my point.

is consistent to has its ID near COLL_NONE, in a "special
range" of collation identifiers.
Uhm, AFAIU we have two binary collations. One is "collation = is not
set" and another is "collation binary". Which one = did you mean
now?

FIrst one is not collation at all. It is rather =E2=80=9Cabsenc= e=E2=80=9D of any collation.
The second one is sort of = =E2=80=9Csurrogate=E2=80=9D and in terms of functionality
means the same. However, its id will be stored in space = format in
order to indicate that BINARY collation should = be forced during
comparisons.
I think we could use internal ids to reference both cases. = For
these both ids we could have surrogate rows in _coll = system space,
they won't harm. This will make things = easier in the future. 

Ok,  how do you suggest to call =E2=80=9Cabsence=E2=80=9D = of collation? Like this:

box.space._collation:select()

---
- - [1, 'unicode', 1, 'ICU', '', {}]
 - [2, 'unicode_ci', 1, = 'ICU', '', {'strength': 'primary=E2=80=99}]
 - [3, =E2=80=98none', 1, = 'ICU', '', {}]
...

It is nonsense, IMHO. No collation is like =E2=80=9Cno = collation at all=E2=80=9D -
nothing represents it, especially visible for user. With = BINARY
collation it = would look even more suspicious:

- - [1, 'unicode', 1, 'ICU', '', {}]
 - [2, 'unicode_ci', 1, = 'ICU', '', {'strength': 'primary=E2=80=99}]
 - [3, =E2=80=98none', 1, = 'ICU', '', {}]
 - [4, =E2=80=98binary', 1, 'ICU', '', {}]

It would confuse users who = don=E2=80=99t use SQL: in Tarantool NoSQL
there is no difference between =E2=80=9Cbinary=E2=80=9D and = =E2=80=9Cno-collation=E2=80=9D.
Moreover, to keep things consistent, we would have to  make
default collation be =E2=80=99none= =E2=80=99 instead of absence of collation.
It means that field def without explicitly set collation = would
have = =E2=80=99none=E2=80=99 collation in format. For instance:

*before*

- [{'affinity': 66, 'type': = =E2=80=99string', 'nullable_action': 'abort', 'name': 'ID', = 'is_nullable': false}]

*after*

- [{'collation': 3, 'affinity': 66, 'type': 'string', = 'nullable_action': 'abort',
   'name': 'ID', 'is_nullable': = false}]

This = is going to be the same mess as with NO ACTION and DEFAULT,
which are mostly the same, but not quite, so we'd better = prepare.

It is considered to be mess due to SQLite legacy. On the = other hand, all
these manipulations with collations follow SQL = ANSI.

All points = considered, I would prefer to introduce only another one ID
(alongside with COLL_NONE ID) = and prohibit to create collations with
these ids. OR, add surrogate =E2=80=9Cbinary collation=E2=80=9D= to _collation with id =3D=3D 3,
but not both =E2=80=9Cbinary=E2=80=9D and = =E2=80=9Cnone=E2=80=9D.

= --Apple-Mail=_F7F51B3C-D036-486A-A09A-D317A341F06F--