From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 3E8E72F383 for ; Thu, 1 Nov 2018 16:21:04 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LFdoqlJY0YuF for ; Thu, 1 Nov 2018 16:21:04 -0400 (EDT) Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id F202428CCB for ; Thu, 1 Nov 2018 16:21:03 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\)) Subject: [tarantool-patches] Re: [PATCH 2/3] Add surrogate ID for BINARY collation From: "n.pettik" In-Reply-To: <20181101200027.GA5887@chai> Date: Thu, 1 Nov 2018 23:20:59 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <80794eb0182261e1887adc60c170c550de91fabc.1540460716.git.korablev@tarantool.org> <2A51C9E8-2A24-4F04-ABF1-0983F4322E82@tarantool.org> <20181101113717.GB2340@chai> <84dc3919-fd62-143d-327b-6f7ae184be5e@tarantool.org> <20181101125810.GA28156@chai> <2B8A8EDD-2479-4C1F-9FF3-E17B16DFB0AE@tarantool.org> <20181101153915.GL30032@chai> <95CB17D5-E3ED-4B05-A289-983E2FD0DE37@gmail.com> <20181101200027.GA5887@chai> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org Cc: Konstantin Osipov , Vladislav Shpilevoy > On 1 Nov 2018, at 23:00, Konstantin Osipov = wrote: >=20 > * =D0=9D=D0=B8=D0=BA=D0=B8=D1=82=D0=B0 =D0=9F=D0=B5=D1=82=D1=82=D0=B8=D0= =BA [18/11/01 19:34]: >>>>>> 1) It is not real collation and is not presented in >>>>>> _collation. So for a user it would be strange to see >>>>>> a gap between 2 and 4 in _collation, which can not be >>>>>> set. >>>>> Let's insert it there. >>>> So, you insist on id =3D=3D 3, right? Again, if user process select >>>> rom _collation space, one won=E2=80=99t see entry with id =3D=3D 3. >>>> On the other hand, if user attempts at inserting id =3D=3D 3, >>>> one will get an error. >>> No, I don't insist yet. Why not insert a special row in there? >>=20 >> Because insertion to _collation would result in creation >> of collation objects.=20 >=20 > Not necessarily. We can add a special treatment of these ids to=20 > insert triggers. Or we can set a different collation type for > these - which is equivalent for special treatment. _coll system > space already supports non-icu collations, so this is one such > collation. Not really. I look at coll.c: struct coll * coll_new(const struct coll_def *def) { assert(def->type =3D=3D COLL_TYPE_ICU); ... >=20 >> Meanwhile, in fact we need only ID to distinguish BINARY and >> no-collation. The rest is the same for them. So, it makes sense >> to store only ID within space format. That is my point. >>=20 >>>>>> is consistent to has its ID near COLL_NONE, in a "special >>>>>> range" of collation identifiers. >>>>>=20 >>>>> Uhm, AFAIU we have two binary collations. One is "collation is not >>>>> set" and another is "collation binary". Which one did you mean >>>>> now? >>>>=20 >>>> FIrst one is not collation at all. It is rather =E2=80=9Cabsence=E2=80= =9D of any collation. >>>> The second one is sort of =E2=80=9Csurrogate=E2=80=9D and in terms = of functionality >>>> means the same. However, its id will be stored in space format in >>>> order to indicate that BINARY collation should be forced during >>>> comparisons. >>>=20 >>> I think we could use internal ids to reference both cases. For >>> these both ids we could have surrogate rows in _coll system space, >>> they won't harm. This will make things easier in the future.=20 >>=20 >> Ok, how do you suggest to call =E2=80=9Cabsence=E2=80=9D of = collation? Like this: >>=20 >> box.space._collation:select() >>=20 >> --- >> - - [1, 'unicode', 1, 'ICU', '', {}] >> - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary=E2=80=99}] >> - [3, =E2=80=98none', 1, 'ICU', '', {}] >> ... >>=20 >> It is nonsense, IMHO. No collation is like =E2=80=9Cno collation at = all=E2=80=9D - >> nothing represents it, especially visible for user. With BINARY >> collation it would look even more suspicious: >>=20 >> - - [1, 'unicode', 1, 'ICU', '', {}] >> - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary=E2=80=99}] >> - [3, =E2=80=98none', 1, 'ICU', '', {}] >> - [4, =E2=80=98binary', 1, 'ICU', '', {}] >=20 > Yes, I believe this is the thing. Looks pretty good to me.=20 >>=20 >> It would confuse users who don=E2=80=99t use SQL: in Tarantool NoSQL >> there is no difference between =E2=80=9Cbinary=E2=80=9D and = =E2=80=9Cno-collation=E2=80=9D. >=20 > This is temporary. The deeper SQL penetrates box layer the more > nosql will have the same semantics as SQL. >=20 >> Moreover, to keep things consistent, we would have to make >> default collation be =E2=80=99none=E2=80=99 instead of absence of = collation. >> It means that field def without explicitly set collation would >> have =E2=80=99none=E2=80=99 collation in format. For instance: >=20 > Then id of 'none' should be 0, not 3. >>=20 >> *before* >>=20 >> - [{'affinity': 66, 'type': =E2=80=99string', 'nullable_action': = 'abort', 'name': 'ID', 'is_nullable': false}] >>=20 >> *after* >>=20 >> - [{'collation': 3, 'affinity': 66, 'type': 'string', = 'nullable_action': 'abort', >> 'name': 'ID', 'is_nullable': false}] >>=20 >>> This is going to be the same mess as with NO ACTION and DEFAULT, >>> which are mostly the same, but not quite, so we'd better prepare. >>=20 >> It is considered to be mess due to SQLite legacy. On the other hand, = all >> these manipulations with collations follow SQL ANSI. >=20 > No, it's a mess due to ANSI not SQLite. And the distinction > between absence of collation and binary collation is also coming > from ANSI.=20 >>=20 >> All points considered, I would prefer to introduce only another one = ID >> (alongside with COLL_NONE ID) and prohibit to create collations with >> these ids. OR, add surrogate =E2=80=9Cbinary collation=E2=80=9D to = _collation with id =3D=3D 3, >> but not both =E2=80=9Cbinary=E2=80=9D and =E2=80=9Cnone=E2=80=9D. >=20 > We could also ask what PeterG thinks. I=E2=80=99d rather not - it is not question concerning user-visible = behaviour, but only internal implementation. Anyway, I would better follow your way, instead of arguing. To sum up we are introducing two additional collations to represent =E2=80=9Cnone=E2=80=9D (id =3D=3D 0) and = =E2=80=9Cbinary=E2=80=9D (id =3D=3D 3) collations and making =E2=80=9Cnone=E2=80=9D collation be default, which in turn = means that it would be presented in format, even if user didn=E2=80=99t specify it = explicitly. That=E2=80=99s my plan by now. SQL part of this patch is obvious and = already done.