From: "n.pettik" <korablev@tarantool.org> To: tarantool-patches@freelists.org Cc: Konstantin Osipov <kostja@tarantool.org>, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Subject: [tarantool-patches] Re: [PATCH 2/3] Add surrogate ID for BINARY collation Date: Thu, 1 Nov 2018 23:20:59 +0300 [thread overview] Message-ID: <D5C04604-FB7D-47BE-B52B-F776BBAAC014@tarantool.org> (raw) In-Reply-To: <20181101200027.GA5887@chai> > On 1 Nov 2018, at 23:00, Konstantin Osipov <kostja@tarantool.org> wrote: > > * Никита Петтик <kitnerh@gmail.com> [18/11/01 19:34]: >>>>>> 1) It is not real collation and is not presented in >>>>>> _collation. So for a user it would be strange to see >>>>>> a gap between 2 and 4 in _collation, which can not be >>>>>> set. >>>>> Let's insert it there. >>>> So, you insist on id == 3, right? Again, if user process select >>>> rom _collation space, one won’t see entry with id == 3. >>>> On the other hand, if user attempts at inserting id == 3, >>>> one will get an error. >>> No, I don't insist yet. Why not insert a special row in there? >> >> Because insertion to _collation would result in creation >> of collation objects. > > Not necessarily. We can add a special treatment of these ids to > insert triggers. Or we can set a different collation type for > these - which is equivalent for special treatment. _coll system > space already supports non-icu collations, so this is one such > collation. Not really. I look at coll.c: struct coll * coll_new(const struct coll_def *def) { assert(def->type == COLL_TYPE_ICU); ... > >> Meanwhile, in fact we need only ID to distinguish BINARY and >> no-collation. The rest is the same for them. So, it makes sense >> to store only ID within space format. That is my point. >> >>>>>> is consistent to has its ID near COLL_NONE, in a "special >>>>>> range" of collation identifiers. >>>>> >>>>> Uhm, AFAIU we have two binary collations. One is "collation is not >>>>> set" and another is "collation binary". Which one did you mean >>>>> now? >>>> >>>> FIrst one is not collation at all. It is rather “absence” of any collation. >>>> The second one is sort of “surrogate” and in terms of functionality >>>> means the same. However, its id will be stored in space format in >>>> order to indicate that BINARY collation should be forced during >>>> comparisons. >>> >>> I think we could use internal ids to reference both cases. For >>> these both ids we could have surrogate rows in _coll system space, >>> they won't harm. This will make things easier in the future. >> >> Ok, how do you suggest to call “absence” of collation? Like this: >> >> box.space._collation:select() >> >> --- >> - - [1, 'unicode', 1, 'ICU', '', {}] >> - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary’}] >> - [3, ‘none', 1, 'ICU', '', {}] >> ... >> >> It is nonsense, IMHO. No collation is like “no collation at all” - >> nothing represents it, especially visible for user. With BINARY >> collation it would look even more suspicious: >> >> - - [1, 'unicode', 1, 'ICU', '', {}] >> - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary’}] >> - [3, ‘none', 1, 'ICU', '', {}] >> - [4, ‘binary', 1, 'ICU', '', {}] > > Yes, I believe this is the thing. Looks pretty good to me. >> >> It would confuse users who don’t use SQL: in Tarantool NoSQL >> there is no difference between “binary” and “no-collation”. > > This is temporary. The deeper SQL penetrates box layer the more > nosql will have the same semantics as SQL. > >> Moreover, to keep things consistent, we would have to make >> default collation be ’none’ instead of absence of collation. >> It means that field def without explicitly set collation would >> have ’none’ collation in format. For instance: > > Then id of 'none' should be 0, not 3. >> >> *before* >> >> - [{'affinity': 66, 'type': ’string', 'nullable_action': 'abort', 'name': 'ID', 'is_nullable': false}] >> >> *after* >> >> - [{'collation': 3, 'affinity': 66, 'type': 'string', 'nullable_action': 'abort', >> 'name': 'ID', 'is_nullable': false}] >> >>> This is going to be the same mess as with NO ACTION and DEFAULT, >>> which are mostly the same, but not quite, so we'd better prepare. >> >> It is considered to be mess due to SQLite legacy. On the other hand, all >> these manipulations with collations follow SQL ANSI. > > No, it's a mess due to ANSI not SQLite. And the distinction > between absence of collation and binary collation is also coming > from ANSI. >> >> All points considered, I would prefer to introduce only another one ID >> (alongside with COLL_NONE ID) and prohibit to create collations with >> these ids. OR, add surrogate “binary collation” to _collation with id == 3, >> but not both “binary” and “none”. > > We could also ask what PeterG thinks. I’d rather not - it is not question concerning user-visible behaviour, but only internal implementation. Anyway, I would better follow your way, instead of arguing. To sum up we are introducing two additional collations to represent “none” (id == 0) and “binary” (id == 3) collations and making “none” collation be default, which in turn means that it would be presented in format, even if user didn’t specify it explicitly. That’s my plan by now. SQL part of this patch is obvious and already done.
next prev parent reply other threads:[~2018-11-01 20:21 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-10-25 11:00 [tarantool-patches] [PATCH 0/3] Change collation compatibility rules according to ANSI SQL Nikita Pettik 2018-10-25 11:00 ` [tarantool-patches] [PATCH 1/3] sql: do not add explicit <COLLATE "BINARY"> clause Nikita Pettik 2018-10-25 11:00 ` [tarantool-patches] [PATCH 2/3] Add surrogate ID for BINARY collation Nikita Pettik 2018-10-31 12:34 ` [tarantool-patches] " Vladislav Shpilevoy 2018-10-31 15:47 ` n.pettik 2018-11-01 11:37 ` Konstantin Osipov 2018-11-01 12:22 ` Vladislav Shpilevoy 2018-11-01 12:58 ` Konstantin Osipov 2018-11-01 13:08 ` n.pettik 2018-11-01 15:39 ` Konstantin Osipov [not found] ` <95CB17D5-E3ED-4B05-A289-983E2FD0DE37@gmail.com> 2018-11-01 17:45 ` n.pettik 2018-11-01 20:00 ` Konstantin Osipov 2018-11-01 20:06 ` Konstantin Osipov 2018-11-01 20:20 ` n.pettik [this message] 2018-10-25 11:00 ` [tarantool-patches] [PATCH 3/3] sql: change collation compatibility rules Nikita Pettik 2018-10-31 12:34 ` [tarantool-patches] " Vladislav Shpilevoy 2018-11-12 23:46 ` n.pettik
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=D5C04604-FB7D-47BE-B52B-F776BBAAC014@tarantool.org \ --to=korablev@tarantool.org \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --cc=v.shpilevoy@tarantool.org \ --subject='[tarantool-patches] Re: [PATCH 2/3] Add surrogate ID for BINARY collation' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox