From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 016812F4CE for ; Thu, 1 Nov 2018 16:00:32 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OqkP6Eqa3J3N for ; Thu, 1 Nov 2018 16:00:31 -0400 (EDT) Received: from smtp52.i.mail.ru (smtp52.i.mail.ru [94.100.177.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 44D6E2F1FB for ; Thu, 1 Nov 2018 16:00:31 -0400 (EDT) Date: Thu, 1 Nov 2018 23:00:27 +0300 From: Konstantin Osipov Subject: [tarantool-patches] Re: [PATCH 2/3] Add surrogate ID for BINARY collation Message-ID: <20181101200027.GA5887@chai> References: <80794eb0182261e1887adc60c170c550de91fabc.1540460716.git.korablev@tarantool.org> <2A51C9E8-2A24-4F04-ABF1-0983F4322E82@tarantool.org> <20181101113717.GB2340@chai> <84dc3919-fd62-143d-327b-6f7ae184be5e@tarantool.org> <20181101125810.GA28156@chai> <2B8A8EDD-2479-4C1F-9FF3-E17B16DFB0AE@tarantool.org> <20181101153915.GL30032@chai> <95CB17D5-E3ED-4B05-A289-983E2FD0DE37@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <95CB17D5-E3ED-4B05-A289-983E2FD0DE37@gmail.com> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: =?utf-8?B?0J3QuNC60LjRgtCwINCf0LXRgtGC0LjQug==?= Cc: tarantool-patches@freelists.org, Vladislav Shpilevoy * Никита Петтик [18/11/01 19:34]: > >>>> 1) It is not real collation and is not presented in > >>>> _collation. So for a user it would be strange to see > >>>> a gap between 2 and 4 in _collation, which can not be > >>>> set. > >>> Let's insert it there. > >> So, you insist on id == 3, right? Again, if user process select > >> rom _collation space, one won’t see entry with id == 3. > >> On the other hand, if user attempts at inserting id == 3, > >> one will get an error. > > No, I don't insist yet. Why not insert a special row in there? > > Because insertion to _collation would result in creation > of collation objects. Not necessarily. We can add a special treatment of these ids to insert triggers. Or we can set a different collation type for these - which is equivalent for special treatment. _coll system space already supports non-icu collations, so this is one such collation. > Meanwhile, in fact we need only ID to distinguish BINARY and > no-collation. The rest is the same for them. So, it makes sense > to store only ID within space format. That is my point. > > >>>> is consistent to has its ID near COLL_NONE, in a "special > >>>> range" of collation identifiers. > >>> > >>> Uhm, AFAIU we have two binary collations. One is "collation is not > >>> set" and another is "collation binary". Which one did you mean > >>> now? > >> > >> FIrst one is not collation at all. It is rather “absence” of any collation. > >> The second one is sort of “surrogate” and in terms of functionality > >> means the same. However, its id will be stored in space format in > >> order to indicate that BINARY collation should be forced during > >> comparisons. > > > > I think we could use internal ids to reference both cases. For > > these both ids we could have surrogate rows in _coll system space, > > they won't harm. This will make things easier in the future. > > Ok, how do you suggest to call “absence” of collation? Like this: > > box.space._collation:select() > > --- > - - [1, 'unicode', 1, 'ICU', '', {}] > - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary’}] > - [3, ‘none', 1, 'ICU', '', {}] > ... > > It is nonsense, IMHO. No collation is like “no collation at all” - > nothing represents it, especially visible for user. With BINARY > collation it would look even more suspicious: > > - - [1, 'unicode', 1, 'ICU', '', {}] > - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary’}] > - [3, ‘none', 1, 'ICU', '', {}] > - [4, ‘binary', 1, 'ICU', '', {}] Yes, I believe this is the thing. Looks pretty good to me. > > It would confuse users who don’t use SQL: in Tarantool NoSQL > there is no difference between “binary” and “no-collation”. This is temporary. The deeper SQL penetrates box layer the more nosql will have the same semantics as SQL. > Moreover, to keep things consistent, we would have to make > default collation be ’none’ instead of absence of collation. > It means that field def without explicitly set collation would > have ’none’ collation in format. For instance: Then id of 'none' should be 0, not 3. > > *before* > > - [{'affinity': 66, 'type': ’string', 'nullable_action': 'abort', 'name': 'ID', 'is_nullable': false}] > > *after* > > - [{'collation': 3, 'affinity': 66, 'type': 'string', 'nullable_action': 'abort', > 'name': 'ID', 'is_nullable': false}] > > > This is going to be the same mess as with NO ACTION and DEFAULT, > > which are mostly the same, but not quite, so we'd better prepare. > > It is considered to be mess due to SQLite legacy. On the other hand, all > these manipulations with collations follow SQL ANSI. No, it's a mess due to ANSI not SQLite. And the distinction between absence of collation and binary collation is also coming from ANSI. > > All points considered, I would prefer to introduce only another one ID > (alongside with COLL_NONE ID) and prohibit to create collations with > these ids. OR, add surrogate “binary collation” to _collation with id == 3, > but not both “binary” and “none”. We could also ask what PeterG thinks. -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov