[tarantool-patches] Re: [PATCH 2/3] Add surrogate ID for BINARY collation

n.pettik korablev at tarantool.org
Thu Nov 1 23:20:59 MSK 2018



> On 1 Nov 2018, at 23:00, Konstantin Osipov <kostja at tarantool.org> wrote:
> 
> * Никита Петтик <kitnerh at gmail.com> [18/11/01 19:34]:
>>>>>> 1) It is not real collation and is not presented in
>>>>>> _collation. So for a user it would be strange to see
>>>>>> a gap between 2 and 4 in _collation, which can not be
>>>>>> set.
>>>>> Let's insert it there.
>>>> So, you insist on id == 3, right? Again, if user process select
>>>> rom _collation space, one won’t see entry with id == 3.
>>>> On the other hand, if user attempts at inserting id == 3,
>>>> one will get an error.
>>> No, I don't insist yet. Why not insert a special row in there?
>> 
>> Because insertion to _collation would result in creation
>> of collation objects. 
> 
> Not necessarily. We can add a special treatment of these ids to 
> insert triggers. Or we can set a different collation type for
> these - which is equivalent for special treatment. _coll system
> space already supports non-icu collations, so this is one such
> collation.

Not really. I look at coll.c:

struct coll *
coll_new(const struct coll_def *def)
{
       assert(def->type == COLL_TYPE_ICU);
...

> 
>> Meanwhile, in fact we need only ID to distinguish BINARY and
>> no-collation. The rest is the same for them. So, it makes sense
>> to store only ID within space format. That is my point.
>> 
>>>>>> is consistent to has its ID near COLL_NONE, in a "special
>>>>>> range" of collation identifiers.
>>>>> 
>>>>> Uhm, AFAIU we have two binary collations. One is "collation is not
>>>>> set" and another is "collation binary". Which one did you mean
>>>>> now?
>>>> 
>>>> FIrst one is not collation at all. It is rather “absence” of any collation.
>>>> The second one is sort of “surrogate” and in terms of functionality
>>>> means the same. However, its id will be stored in space format in
>>>> order to indicate that BINARY collation should be forced during
>>>> comparisons.
>>> 
>>> I think we could use internal ids to reference both cases. For
>>> these both ids we could have surrogate rows in _coll system space,
>>> they won't harm. This will make things easier in the future. 
>> 
>> Ok,  how do you suggest to call “absence” of collation? Like this:
>> 
>> box.space._collation:select()
>> 
>> ---
>> - - [1, 'unicode', 1, 'ICU', '', {}]
>>  - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary’}]
>>  - [3, ‘none', 1, 'ICU', '', {}]
>> ...
>> 
>> It is nonsense, IMHO. No collation is like “no collation at all” -
>> nothing represents it, especially visible for user. With BINARY
>> collation it would look even more suspicious:
>> 
>> - - [1, 'unicode', 1, 'ICU', '', {}]
>>  - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary’}]
>>  - [3, ‘none', 1, 'ICU', '', {}]
>>  - [4, ‘binary', 1, 'ICU', '', {}]
> 
> Yes, I believe this is the thing. Looks pretty good to me. 
>> 
>> It would confuse users who don’t use SQL: in Tarantool NoSQL
>> there is no difference between “binary” and “no-collation”.
> 
> This is temporary. The deeper SQL penetrates box layer the more
> nosql will have the same semantics as SQL.
> 
>> Moreover, to keep things consistent, we would have to 	make
>> default collation be ’none’ instead of absence of collation.
>> It means that field def without explicitly set collation would
>> have ’none’ collation in format. For instance:
> 
> Then id of 'none' should be 0, not 3.
>> 
>> *before*
>> 
>> - [{'affinity': 66, 'type': ’string', 'nullable_action': 'abort', 'name': 'ID', 'is_nullable': false}]
>> 
>> *after*
>> 
>> - [{'collation': 3, 'affinity': 66, 'type': 'string', 'nullable_action': 'abort',
>>    'name': 'ID', 'is_nullable': false}]
>> 
>>> This is going to be the same mess as with NO ACTION and DEFAULT,
>>> which are mostly the same, but not quite, so we'd better prepare.
>> 
>> It is considered to be mess due to SQLite legacy. On the other hand, all
>> these manipulations with collations follow SQL ANSI.
> 
> No, it's a mess due to ANSI not SQLite. And the distinction
> between absence of collation and binary collation is also coming
> from ANSI. 
>> 
>> All points considered, I would prefer to introduce only another one ID
>> (alongside with COLL_NONE ID) and prohibit to create collations with
>> these ids. OR, add surrogate “binary collation” to _collation with id == 3,
>> but not both “binary” and “none”.
> 
> We could also ask what PeterG thinks.

I’d rather not - it is not question concerning user-visible behaviour,
but only internal implementation. Anyway, I would better follow your
way, instead of arguing. To sum up we are introducing two additional
collations to represent “none” (id == 0) and “binary” (id == 3) collations
and making “none” collation be default, which in turn means that it would
be presented in format, even if user didn’t specify it explicitly.
That’s my plan by now. SQL part of this patch is obvious and already done.





More information about the Tarantool-patches mailing list