[tarantool-patches] Re: [PATCH 2/3] Add surrogate ID for BINARY collation

Konstantin Osipov kostja at tarantool.org
Thu Nov 1 23:00:27 MSK 2018


* Никита Петтик <kitnerh at gmail.com> [18/11/01 19:34]:
> >>>> 1) It is not real collation and is not presented in
> >>>> _collation. So for a user it would be strange to see
> >>>> a gap between 2 and 4 in _collation, which can not be
> >>>> set.
> >>> Let's insert it there.
> >> So, you insist on id == 3, right? Again, if user process select
> >> rom _collation space, one won’t see entry with id == 3.
> >> On the other hand, if user attempts at inserting id == 3,
> >> one will get an error.
> > No, I don't insist yet. Why not insert a special row in there?
> 
> Because insertion to _collation would result in creation
> of collation objects. 

Not necessarily. We can add a special treatment of these ids to 
insert triggers. Or we can set a different collation type for
these - which is equivalent for special treatment. _coll system
space already supports non-icu collations, so this is one such
collation.

> Meanwhile, in fact we need only ID to distinguish BINARY and
> no-collation. The rest is the same for them. So, it makes sense
> to store only ID within space format. That is my point.
> 
> >>>> is consistent to has its ID near COLL_NONE, in a "special
> >>>> range" of collation identifiers.
> >>> 
> >>> Uhm, AFAIU we have two binary collations. One is "collation is not
> >>> set" and another is "collation binary". Which one did you mean
> >>> now?
> >> 
> >> FIrst one is not collation at all. It is rather “absence” of any collation.
> >> The second one is sort of “surrogate” and in terms of functionality
> >> means the same. However, its id will be stored in space format in
> >> order to indicate that BINARY collation should be forced during
> >> comparisons.
> > 
> > I think we could use internal ids to reference both cases. For
> > these both ids we could have surrogate rows in _coll system space,
> > they won't harm. This will make things easier in the future. 
> 
> Ok,  how do you suggest to call “absence” of collation? Like this:
> 
> box.space._collation:select()
> 
> ---
> - - [1, 'unicode', 1, 'ICU', '', {}]
>   - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary’}]
>   - [3, ‘none', 1, 'ICU', '', {}]
> ...
> 
> It is nonsense, IMHO. No collation is like “no collation at all” -
> nothing represents it, especially visible for user. With BINARY
> collation it would look even more suspicious:
> 
> - - [1, 'unicode', 1, 'ICU', '', {}]
>   - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary’}]
>   - [3, ‘none', 1, 'ICU', '', {}]
>   - [4, ‘binary', 1, 'ICU', '', {}]

Yes, I believe this is the thing. Looks pretty good to me. 
> 
> It would confuse users who don’t use SQL: in Tarantool NoSQL
> there is no difference between “binary” and “no-collation”.

This is temporary. The deeper SQL penetrates box layer the more
nosql will have the same semantics as SQL.

> Moreover, to keep things consistent, we would have to 	make
> default collation be ’none’ instead of absence of collation.
> It means that field def without explicitly set collation would
> have ’none’ collation in format. For instance:

Then id of 'none' should be 0, not 3.
> 
> *before*
> 
> - [{'affinity': 66, 'type': ’string', 'nullable_action': 'abort', 'name': 'ID', 'is_nullable': false}]
> 
> *after*
> 
> - [{'collation': 3, 'affinity': 66, 'type': 'string', 'nullable_action': 'abort',
>     'name': 'ID', 'is_nullable': false}]
> 
> > This is going to be the same mess as with NO ACTION and DEFAULT,
> > which are mostly the same, but not quite, so we'd better prepare.
> 
> It is considered to be mess due to SQLite legacy. On the other hand, all
> these manipulations with collations follow SQL ANSI.

No, it's a mess due to ANSI not SQLite. And the distinction
between absence of collation and binary collation is also coming
from ANSI. 
> 
> All points considered, I would prefer to introduce only another one ID
> (alongside with COLL_NONE ID) and prohibit to create collations with
> these ids. OR, add surrogate “binary collation” to _collation with id == 3,
> but not both “binary” and “none”.

We could also ask what PeterG thinks.

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov




More information about the Tarantool-patches mailing list