Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
@ 2019-11-29 23:39 Chris Sosnin
  2019-11-30 20:34 ` Konstantin Osipov
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Sosnin @ 2019-11-29 23:39 UTC (permalink / raw)
  To: tarantool-patches

Unicode_ci collation breaks the general
rule for objects naming, so we remove it
in version 2.3.1

Closes #4561
---
branch: https://github.com/tarantool/tarantool/tree/ksosnin/gh-4561-drop-func-collation
issue: https://github.com/tarantool/tarantool/issues/4561

 src/box/bootstrap.snap       | Bin 5944 -> 5921 bytes
 src/box/lua/upgrade.lua      |  14 ++++++++++++++
 test/box-py/bootstrap.result |   4 ++--
 test/box/alter.result        |   2 +-
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua
index e71b7fb41..07f1e0317 100644
--- a/src/box/lua/upgrade.lua
+++ b/src/box/lua/upgrade.lua
@@ -942,6 +942,19 @@ local function upgrade_to_2_3_0()
     _ck_constraint:format(format)
 end

+--------------------------------------------------------------------------------
+-- Tarantool 2.3.1
+--------------------------------------------------------------------------------
+
+local function drop_func_collation()
+    local _func = box.space[box.schema.FUNC_ID]
+    _func.index.name:alter({parts = {{'name', 'string'}}})
+end
+
+local function upgrade_to_2_3_1()
+    drop_func_collation()
+end
+
 --------------------------------------------------------------------------------

 local function get_version()
@@ -977,6 +990,7 @@ local function upgrade(options)
         {version = mkversion(2, 1, 3), func = upgrade_to_2_1_3, auto = true},
         {version = mkversion(2, 2, 1), func = upgrade_to_2_2_1, auto = true},
         {version = mkversion(2, 3, 0), func = upgrade_to_2_3_0, auto = true},
+        {version = mkversion(2, 3, 1), func = upgrade_to_2_3_1, auto = true},
     }

     for _, handler in ipairs(handlers) do
diff --git a/test/box-py/bootstrap.result b/test/box-py/bootstrap.result
index 123aa2feb..938a7631e 100644
--- a/test/box-py/bootstrap.result
+++ b/test/box-py/bootstrap.result
@@ -4,7 +4,7 @@ box.internal.bootstrap()
 box.space._schema:select{}
 ---
 - - ['max_id', 511]
-  - ['version', 2, 3, 0]
+  - ['version', 2, 3, 1]
 ...
 box.space._cluster:select{}
 ---
@@ -123,7 +123,7 @@ box.space._index:select{}
   - [289, 2, 'name', 'tree', {'unique': true}, [[0, 'unsigned'], [2, 'string']]]
   - [296, 0, 'primary', 'tree', {'unique': true}, [[0, 'unsigned']]]
   - [296, 1, 'owner', 'tree', {'unique': false}, [[1, 'unsigned']]]
-  - [296, 2, 'name', 'tree', {'unique': true}, [{'field': 2, 'collation': 2, 'type': 'string'}]]
+  - [296, 2, 'name', 'tree', {'unique': true}, [[2, 'string']]]
   - [297, 0, 'primary', 'tree', {'unique': true}, [[0, 'unsigned']]]
   - [297, 1, 'owner', 'tree', {'unique': false}, [[1, 'unsigned']]]
   - [297, 2, 'name', 'tree', {'unique': true}, [[2, 'string']]]
diff --git a/test/box/alter.result b/test/box/alter.result
index 46ce8687b..9a2f9917e 100644
--- a/test/box/alter.result
+++ b/test/box/alter.result
@@ -190,7 +190,7 @@ _index:select{}
   - [289, 2, 'name', 'tree', {'unique': true}, [[0, 'unsigned'], [2, 'string']]]
   - [296, 0, 'primary', 'tree', {'unique': true}, [[0, 'unsigned']]]
   - [296, 1, 'owner', 'tree', {'unique': false}, [[1, 'unsigned']]]
-  - [296, 2, 'name', 'tree', {'unique': true}, [{'field': 2, 'collation': 2, 'type': 'string'}]]
+  - [296, 2, 'name', 'tree', {'unique': true}, [[2, 'string']]]
   - [297, 0, 'primary', 'tree', {'unique': true}, [[0, 'unsigned']]]
   - [297, 1, 'owner', 'tree', {'unique': false}, [[1, 'unsigned']]]
   - [297, 2, 'name', 'tree', {'unique': true}, [[2, 'string']]]
--
2.21.0 (Apple Git-122.2)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-11-29 23:39 [Tarantool-patches] [PATCH] box: remove unicode_ci for functions Chris Sosnin
@ 2019-11-30 20:34 ` Konstantin Osipov
  2019-12-01 14:12   ` k.sosnin
  2019-12-01 14:36   ` Vladislav Shpilevoy
  0 siblings, 2 replies; 17+ messages in thread
From: Konstantin Osipov @ 2019-11-30 20:34 UTC (permalink / raw)
  To: Chris Sosnin; +Cc: tarantool-patches

* Chris Sosnin <k.sosnin@tarantool.org> [19/11/30 07:03]:
> Unicode_ci collation breaks the general
> rule for objects naming, so we remove it
> in version 2.3.1

The code works according to RFC.

There is a justification for this behaviour in RFC.


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-11-30 20:34 ` Konstantin Osipov
@ 2019-12-01 14:12   ` k.sosnin
  2019-12-01 14:36   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 17+ messages in thread
From: k.sosnin @ 2019-12-01 14:12 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 483 bytes --]


> The code works according to RFC.
> 
> There is a justification for this behaviour in RFC.

Quote from RFC:

To avoid name clash, we will reserve 
these names by adding entries for them
in _func system space. _func.name
index collation will change
to use unicode_ci. All built-ins
will be added to the bootstrap snapshot.

It is not really mentioned why it is necessary
to change the collation. Also, there is no conflict
with SQL functions as long as every query gets upper-cased.

[-- Attachment #2: Type: text/html, Size: 2386 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-11-30 20:34 ` Konstantin Osipov
  2019-12-01 14:12   ` k.sosnin
@ 2019-12-01 14:36   ` Vladislav Shpilevoy
  2019-12-02  7:07     ` Konstantin Osipov
  1 sibling, 1 reply; 17+ messages in thread
From: Vladislav Shpilevoy @ 2019-12-01 14:36 UTC (permalink / raw)
  To: Konstantin Osipov, Chris Sosnin, tarantool-patches

Hi, thanks for the review!

On 30/11/2019 21:34, Konstantin Osipov wrote:
> * Chris Sosnin <k.sosnin@tarantool.org> [19/11/30 07:03]:
>> Unicode_ci collation breaks the general
>> rule for objects naming, so we remove it
>> in version 2.3.1
> 
> The code works according to RFC.
> 
> There is a justification for this behaviour in RFC.
> 
> 

Yeah, RFC is cool and all, but 'this is in an RFC' is not
a good justification, don't you think? Especially taking
into account, that this is our own RFC. It is not a
standard or something.

With Chris we did a little investigation, and here is what
we've found:

- The RFC does not explain why we need a case-insensitive
  index. This has nothing to do with conflicts. SQL always
  is and will be able to choose one name - the uppercased
  one;

- Despite the index is case insensitive, I still get an error
  when call a not uppercased function from SQL:

      box.cfg{}
      box.schema.func.create('func1', {language = 'Lua', is_deterministic = true,
                                       body = 'function(a) return a end',
                                       param_list = {'any'}, returns = 'string',
                                       exports = {'LUA', 'SQL'}})
      box.execute('SELECT func1(\'str\')')
      ---
      - null
      - Function 'FUNC1' does not exist
      ...

      box.execute('SELECT "func1"(\'str\')')
      ---
      - metadata:
        - name: '"func1"(''str'')'
          type: string
        rows:
        - ['str']
      ...

  As you can see, the index's case does not matter. It is
  insensitive, but I still need to write functions in a
  special register.

- Lua box.func.* is also case sensitive:

      tarantool> box.schema.func.create('func1')
      ---
      ...

      tarantool> box.schema.func.create('FUNC1')
      ---
      - error: Function 'FUNC1' already exists
      ...

      tarantool> box.func['FUNC1']
      ---
      - null
      ...

  The error message is really confusing. It says, that
  such a function already exists, but nonetheless I can't
  get it. This is ridiculous.

- As Peter said, _func behaves inconsistent with all the
  other system spaces.

- As Peter said, we did it just for SQL, but it affects
  non-SQL front ends. As you can see from the box.func.*
  example.

On the summary, this collation is clearly useless - its
case insensitivity does not help any of the listed problems.
Therefore, it is a bug.

I think the collation should be dropped.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-01 14:36   ` Vladislav Shpilevoy
@ 2019-12-02  7:07     ` Konstantin Osipov
  2019-12-02 14:36       ` Nikita Pettik
  2019-12-06 11:42       ` Kirill Yukhin
  0 siblings, 2 replies; 17+ messages in thread
From: Konstantin Osipov @ 2019-12-02  7:07 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/01 19:29]:
> >> Unicode_ci collation breaks the general
> >> rule for objects naming, so we remove it
> >> in version 2.3.1
> > 
> > The code works according to RFC.
> > 
> > There is a justification for this behaviour in RFC.

Please see my reply with an explanation. The RFC was  written
presuming https://github.com/tarantool/tarantool/issues/4467 
will be fixed. 

The current way of uppercasing is broken - it is hard to use,
and people hit the problem almost immediately, especially newbies.

Instead of fixing the "inconsistency", one should fix 4467


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-02  7:07     ` Konstantin Osipov
@ 2019-12-02 14:36       ` Nikita Pettik
  2019-12-02 14:49         ` Konstantin Osipov
  2019-12-06 11:42       ` Kirill Yukhin
  1 sibling, 1 reply; 17+ messages in thread
From: Nikita Pettik @ 2019-12-02 14:36 UTC (permalink / raw)
  To: Konstantin Osipov, Vladislav Shpilevoy, Chris Sosnin, tarantool-patches

On 02 Dec 10:07, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/01 19:29]:
> > >> Unicode_ci collation breaks the general
> > >> rule for objects naming, so we remove it
> > >> in version 2.3.1
> > > 
> > > The code works according to RFC.
> > > 
> > > There is a justification for this behaviour in RFC.
> 
> Please see my reply with an explanation. The RFC was  written
> presuming https://github.com/tarantool/tarantool/issues/4467 
> will be fixed. 

According to milestone (which is 'feature'), it is not going to be
implemented soon. What is more, there's even no clearly stated proposal
or RFC without contradictions.

> The current way of uppercasing is broken - it is hard to use,
> and people hit the problem almost immediately, especially newbies.
> 
> Instead of fixing the "inconsistency", one should fix 4467

Just for the record, I have to say that I agree with Vlad and Chris.
Function's name featuring unicode_ci collation at least seem to be
strange. RFC says:

'''
To avoid name clash, we will reserve these names by adding entries for them in _func system space.
'''

That's all.

I can't figure out what did author really mean by 'name clash'.
We are able to create two different objects (of any kind: space,
trigger etc) with the same in terms of case-insensetive collation
(e.g. "t1" and "T1"). Why this rule should be violated for functions?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-02 14:36       ` Nikita Pettik
@ 2019-12-02 14:49         ` Konstantin Osipov
  0 siblings, 0 replies; 17+ messages in thread
From: Konstantin Osipov @ 2019-12-02 14:49 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-patches, Vladislav Shpilevoy

* Nikita Pettik <korablev@tarantool.org> [19/12/02 17:39]:
> On 02 Dec 10:07, Konstantin Osipov wrote:
> > * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/01 19:29]:
> > > >> Unicode_ci collation breaks the general
> > > >> rule for objects naming, so we remove it
> > > >> in version 2.3.1
> > > > 
> > > > The code works according to RFC.
> > > > 
> > > > There is a justification for this behaviour in RFC.
> > 
> > Please see my reply with an explanation. The RFC was  written
> > presuming https://github.com/tarantool/tarantool/issues/4467 
> > will be fixed. 
> 
> According to milestone (which is 'feature'), it is not going to be
> implemented soon. What is more, there's even no clearly stated proposal
> or RFC without contradictions.

Uhm, you could of course shift the burden on me for writing RFC
and do nothing on this premise. But come on, ask users how much
pain the uppercasing is making. It is not about me having my way,
it is about fixing a broken implementation.

As to contradictions of RFC, well, there was some ambiguity in
what I initially suggested, but it was later resolved in the
comments to the ticket.

> 
> '''
> To avoid name clash, we will reserve these names by adding entries for them in _func system space.
> '''
> 
> That's all.
> 
> I can't figure out what did author really mean by 'name clash'.
> We are able to create two different objects (of any kind: space,
> trigger etc) with the same in terms of case-insensetive collation
> (e.g. "t1" and "T1"). Why this rule should be violated for functions?

The idea of _ci is that SQL function lookup never returns a
non-builtin function for a built-in function.

I guess as long as SQL uppercases the name before lookup, this
won't happen even if the collation is case-sensitive - all the
uppercase names are reserved already.

But imagine we stop uppercasing, as 4467 suggests. Then, unless
the collation is _ci, another function will be returned, if it
exists, say, for a lowercase use of the name.

This is why _ci is in there - to prevent two functions with the
same name (but different casing) to ever exist.

This will also help with SQL/PSM name lookup, not just built-ins, when
SQL/PSM is in. Since SQL/PSM functions are also case-insensitive,
when I invoke a UDF from SQL/PSM I want to avoid ambiguity as well - and
I can't prevent it by "reserving" all UDFs used for SQL/PSM.

E.g. imagine I have box.schema.func.create("foo", ...) called from
Lua. Later I do CREATE FUNCTION foo ... using SQL. The name will
be uppercased and the function will be created. I would like to
avoid this. 

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-02  7:07     ` Konstantin Osipov
  2019-12-02 14:36       ` Nikita Pettik
@ 2019-12-06 11:42       ` Kirill Yukhin
  2019-12-06 20:17         ` Konstantin Osipov
  1 sibling, 1 reply; 17+ messages in thread
From: Kirill Yukhin @ 2019-12-06 11:42 UTC (permalink / raw)
  To: Konstantin Osipov, Vladislav Shpilevoy, Chris Sosnin, tarantool-patches

Hello,

On 02 дек 10:07, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/01 19:29]:
> > >> Unicode_ci collation breaks the general
> > >> rule for objects naming, so we remove it
> > >> in version 2.3.1
> > > 
> > > The code works according to RFC.
> > > 
> > > There is a justification for this behaviour in RFC.
> 
> Please see my reply with an explanation. The RFC was  written
> presuming https://github.com/tarantool/tarantool/issues/4467 
> will be fixed.

It was clearly pointed that proposal in #4467 is broken by
design. Please see [1] for details. Having that said I think
we should consider the proposal rejected and won't try to invent
any new workarounds.

[1] - https://github.com/tarantool/tarantool/issues/4467#issuecomment-527210486 and later.


Thanks a lot for your inputs!

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-06 11:42       ` Kirill Yukhin
@ 2019-12-06 20:17         ` Konstantin Osipov
  2019-12-09 11:06           ` Kirill Yukhin
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Osipov @ 2019-12-06 20:17 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: tarantool-patches, Vladislav Shpilevoy

* Kirill Yukhin <kyukhin@tarantool.org> [19/12/06 23:09]:
> > > >> Unicode_ci collation breaks the general
> > > >> rule for objects naming, so we remove it
> > > >> in version 2.3.1
> > > > 
> > > > The code works according to RFC.
> > > > 
> > > > There is a justification for this behaviour in RFC.
> > 
> > Please see my reply with an explanation. The RFC was  written
> > presuming https://github.com/tarantool/tarantool/issues/4467 
> > will be fixed.
> 
> It was clearly pointed that proposal in #4467 is broken by
> design. Please see [1] for details. Having that said I think
> we should consider the proposal rejected and won't try to invent
> any new workarounds.
> 
> [1] - https://github.com/tarantool/tarantool/issues/4467#issuecomment-527210486 and later.

Even if you think the proposal is broken the problem is there
and needs resolution, not aggravation. 

Re initial proposal being broken I admitted it in the comment.
We'll have to do an incompatible change and violate ANSI - in
order to make the product usable. I suggested to add a
case-insensitive unique index to every system space already.

As to the suggestion being broken - it will allow one to get rid
of uppercase-before-store and be mostly ansi compatible.

It's less broken what we have now.

-- 
Konstantin Osipov, Moscow, Russia
https://scylladb.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-06 20:17         ` Konstantin Osipov
@ 2019-12-09 11:06           ` Kirill Yukhin
  2019-12-09 11:24             ` Konstantin Osipov
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill Yukhin @ 2019-12-09 11:06 UTC (permalink / raw)
  To: Konstantin Osipov, Vladislav Shpilevoy, Chris Sosnin, tarantool-patches

Hello,

On 06 дек 23:17, Konstantin Osipov wrote:
> * Kirill Yukhin <kyukhin@tarantool.org> [19/12/06 23:09]:
> > > > >> Unicode_ci collation breaks the general
> > > > >> rule for objects naming, so we remove it
> > > > >> in version 2.3.1
> > > > > 
> > > > > The code works according to RFC.
> > > > > 
> > > > > There is a justification for this behaviour in RFC.
> > > 
> > > Please see my reply with an explanation. The RFC was  written
> > > presuming https://github.com/tarantool/tarantool/issues/4467 
> > > will be fixed.
> > 
> > It was clearly pointed that proposal in #4467 is broken by
> > design. Please see [1] for details. Having that said I think
> > we should consider the proposal rejected and won't try to invent
> > any new workarounds.
> > 
> > [1] - https://github.com/tarantool/tarantool/issues/4467#issuecomment-527210486 and later.
> 
> Even if you think the proposal is broken the problem is there
> and needs resolution, not aggravation. 
> 
> Re initial proposal being broken I admitted it in the comment.
> We'll have to do an incompatible change and violate ANSI - in
> order to make the product usable. I suggested to add a
> case-insensitive unique index to every system space already.

So, the proposal is to break backward compatibility and ANSI to
make visual basic programmers happy? No, we won't do that in
observable future.

> As to the suggestion being broken - it will allow one to get rid
> of uppercase-before-store and be mostly ansi compatible.

I do not see any issue here.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-09 11:06           ` Kirill Yukhin
@ 2019-12-09 11:24             ` Konstantin Osipov
  2019-12-09 13:25               ` Kirill Yukhin
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Osipov @ 2019-12-09 11:24 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: tarantool-patches, Vladislav Shpilevoy

* Kirill Yukhin <kyukhin@tarantool.org> [19/12/09 14:11]:
> > > > > >> Unicode_ci collation breaks the general
> > > > > >> rule for objects naming, so we remove it
> > > > > >> in version 2.3.1
> > > > > > 
> > > > > > The code works according to RFC.
> > > > > > 
> > > > > > There is a justification for this behaviour in RFC.
> > > > 
> > > > Please see my reply with an explanation. The RFC was  written
> > > > presuming https://github.com/tarantool/tarantool/issues/4467 
> > > > will be fixed.
> > > 
> > > It was clearly pointed that proposal in #4467 is broken by
> > > design. Please see [1] for details. Having that said I think
> > > we should consider the proposal rejected and won't try to invent
> > > any new workarounds.
> > > 
> > > [1] - https://github.com/tarantool/tarantool/issues/4467#issuecomment-527210486 and later.
> > 
> > Even if you think the proposal is broken the problem is there
> > and needs resolution, not aggravation. 
> > 
> > Re initial proposal being broken I admitted it in the comment.
> > We'll have to do an incompatible change and violate ANSI - in
> > order to make the product usable. I suggested to add a
> > case-insensitive unique index to every system space already.
> 
> So, the proposal is to break backward compatibility and ANSI to
> make visual basic programmers happy? No, we won't do that in
> observable future.

Who is "we"? 

I don't see this has been discussed & rejected, so why are you
thinking you can make this decision?

Maybe "we" instead of making "decisions" goes to users and
customers and asks them what they expect?

> > As to the suggestion being broken - it will allow one to get rid
> > of uppercase-before-store and be mostly ansi compatible.
> 
> I do not see any issue here.

Yes, this is kind of obvious. 

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-09 11:24             ` Konstantin Osipov
@ 2019-12-09 13:25               ` Kirill Yukhin
  2019-12-09 13:39                 ` Konstantin Osipov
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill Yukhin @ 2019-12-09 13:25 UTC (permalink / raw)
  To: Konstantin Osipov, Vladislav Shpilevoy, Chris Sosnin, tarantool-patches

Hello,

On 09 дек 14:24, Konstantin Osipov wrote:
> * Kirill Yukhin <kyukhin@tarantool.org> [19/12/09 14:11]:
> > > > > > >> Unicode_ci collation breaks the general
> > > > > > >> rule for objects naming, so we remove it
> > > > > > >> in version 2.3.1
> > > > > > > 
> > > > > > > The code works according to RFC.
> > > > > > > 
> > > > > > > There is a justification for this behaviour in RFC.
> > > > > 
> > > > > Please see my reply with an explanation. The RFC was  written
> > > > > presuming https://github.com/tarantool/tarantool/issues/4467 
> > > > > will be fixed.
> > > > 
> > > > It was clearly pointed that proposal in #4467 is broken by
> > > > design. Please see [1] for details. Having that said I think
> > > > we should consider the proposal rejected and won't try to invent
> > > > any new workarounds.
> > > > 
> > > > [1] - https://github.com/tarantool/tarantool/issues/4467#issuecomment-527210486 and later.
> > > 
> > > Even if you think the proposal is broken the problem is there
> > > and needs resolution, not aggravation. 
> > > 
> > > Re initial proposal being broken I admitted it in the comment.
> > > We'll have to do an incompatible change and violate ANSI - in
> > > order to make the product usable. I suggested to add a
> > > case-insensitive unique index to every system space already.
> > 
> > So, the proposal is to break backward compatibility and ANSI to
> > make visual basic programmers happy? No, we won't do that in
> > observable future.
> 
> Who is "we"?

We are team of Tarantool developers working for MailRU Group.
We had pretty much discussions internally on the matter and
came up with this decision: we won't break backward compatibility
in order to add some sugar.

> I don't see this has been discussed & rejected, so why are you
> thinking you can make this decision?

All I can see on the matter is issue #4467 which we agreed to
be broken by design. I see no solid proposals neither in discussions
nor in github issues on how it supposed to work. If it will occur -
we'll happily consider it.

> Maybe "we" instead of making "decisions" goes to users and
> customers and asks them what they expect?

Why not to ask if public demands free beer? Right now business
dictates us to work on other things.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-09 13:25               ` Kirill Yukhin
@ 2019-12-09 13:39                 ` Konstantin Osipov
  2019-12-09 14:07                   ` Nikita Pettik
  2019-12-09 23:09                   ` Vladislav Shpilevoy
  0 siblings, 2 replies; 17+ messages in thread
From: Konstantin Osipov @ 2019-12-09 13:39 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: tarantool-patches, Vladislav Shpilevoy

* Kirill Yukhin <kyukhin@tarantool.org> [19/12/09 16:28]:
> > Who is "we"?
> 
> We are team of Tarantool developers working for MailRU Group.

Actually this decision was made without any discussions - it was
quickly hacked in back in 2018. Maybe you and Nikita had some
discussion prompted by Peter's firm stance that we should
uppercase, but that was it. I protested several times while I
was still on board but had not anticipated how bad the solution
would turn out to be when it is implemented to follow through on
time. Now it is still not to late - bit it's getting more late
every day.

What we learned since then is that every single newbie trips over
it.


> We had pretty much discussions internally on the matter and
> came up with this decision: we won't break backward compatibility
> in order to add some sugar.

What backward compatibility do you think will be broken?


> 
> > I don't see this has been discussed & rejected, so why are you
> > thinking you can make this decision?
> 
> All I can see on the matter is issue #4467 which we agreed to
> be broken by design. I see no solid proposals neither in discussions
> nor in github issues on how it supposed to work. If it will occur -
> we'll happily consider it.
> 
> > Maybe "we" instead of making "decisions" goes to users and
> > customers and asks them what they expect?
> 
> Why not to ask if public demands free beer? Right now business
> dictates us to work on other things.

Like Chris patch?

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-09 13:39                 ` Konstantin Osipov
@ 2019-12-09 14:07                   ` Nikita Pettik
  2019-12-09 23:09                   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 17+ messages in thread
From: Nikita Pettik @ 2019-12-09 14:07 UTC (permalink / raw)
  To: Konstantin Osipov, Kirill Yukhin, Vladislav Shpilevoy,
	Chris Sosnin, tarantool-patches

On 09 Dec 16:39, Konstantin Osipov wrote:
> * Kirill Yukhin <kyukhin@tarantool.org> [19/12/09 16:28]:
> > > Who is "we"?
> > 
> > We are team of Tarantool developers working for MailRU Group.
> 
> Actually this decision was made without any discussions - it was
> quickly hacked in back in 2018. Maybe you and Nikita had some
> discussion prompted by Peter's firm stance that we should
> uppercase, but that was it.

I didn't participate in any discussion at that time.
When I came in Tarantool decision had been already taken.
I do not know who is responsible for that.

IMHO rather than attempt to fix it now, I'd better improve tutorials
and docs by placing most common examples/mistakes at the front page.

> I protested several times while I
> was still on board but had not anticipated how bad the solution
> would turn out to be when it is implemented to follow through on
> time. Now it is still not to late - bit it's getting more late
> every day.
> 
> What we learned since then is that every single newbie trips over
> it.
> 
> 
> > We had pretty much discussions internally on the matter and
> > came up with this decision: we won't break backward compatibility
> > in order to add some sugar.
> 
> What backward compatibility do you think will be broken?
> 
> 
> > 
> > > I don't see this has been discussed & rejected, so why are you
> > > thinking you can make this decision?
> > 
> > All I can see on the matter is issue #4467 which we agreed to
> > be broken by design. I see no solid proposals neither in discussions
> > nor in github issues on how it supposed to work. If it will occur -
> > we'll happily consider it.
> > 
> > > Maybe "we" instead of making "decisions" goes to users and
> > > customers and asks them what they expect?
> > 
> > Why not to ask if public demands free beer? Right now business
> > dictates us to work on other things.
> 
> Like Chris patch?
> 
> -- 
> Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-09 13:39                 ` Konstantin Osipov
  2019-12-09 14:07                   ` Nikita Pettik
@ 2019-12-09 23:09                   ` Vladislav Shpilevoy
  2019-12-10  8:19                     ` Konstantin Osipov
  1 sibling, 1 reply; 17+ messages in thread
From: Vladislav Shpilevoy @ 2019-12-09 23:09 UTC (permalink / raw)
  To: Konstantin Osipov, Kirill Yukhin, Chris Sosnin, tarantool-patches

Hi! Thanks for the discussion!

On 09/12/2019 14:39, Konstantin Osipov wrote:
> * Kirill Yukhin <kyukhin@tarantool.org> [19/12/09 16:28]:
>>> Who is "we"?
>>
>> We are team of Tarantool developers working for MailRU Group.
> 
> Actually this decision was made without any discussions - it was
> quickly hacked in back in 2018. Maybe you and Nikita had some
> discussion prompted by Peter's firm stance that we should
> uppercase, but that was it. I protested several times while I
> was still on board but had not anticipated how bad the solution
> would turn out to be when it is implemented to follow through on
> time. Now it is still not to late - bit it's getting more late
> every day.
> 
> What we learned since then is that every single newbie trips over
> it.
> 
> 

Well, you was exactly the person who voted for uppercase of
everything. Peter was for it too, but you made the decision,
and you at that time had a right to forbid it.

Now talking of the 'Newbies stumbling into the uppercase'.
Newbies, who use SQL only, will never notice the uppercasing.
Because SQL creates in uppercase and searches by uppercase
by default. It is a matter of how to organize tutorial for
newbies. In a tutorial they should study SQL and Lua box
separately, and then learn about details of how to use them
both. Where they would be explained, that SQL standard
uppercases everything. And they need to use quotes to force
lowercase.

Talking of the case insensitivity. I see several problems
serious enough to forget about this forever:

- Compatibility. As you probably know (and actually you are
  the one, from who I learned it) - if an API is public, it
  *is* used by someone. You always should assume that. As
  well as here you should assume, that for someone the case
  matters.

- Standard. This just violates the SQL standard, when you can
  find lowercase names without quotes. We fought for the
  standard too much to just drop it now because of this.

- Consistency. If you want space names case insensitive, you
  should realize, that it involves making case insensitive
  index names, user names, trigger, fk, ck names. This also
  includes tuple field names, and .... case-insensitive JSON
  paths! That is a very scary thing if you think about it.
  Consider this example:

      t = box.tuple.new({
          {
              key = 100,
              KEY = 200
          }
      })
      t["[1].key"]

  What should be returned? Take into account, that this is
  not an impossible example. 'key' and 'KEY' might be a
  consequence of necessity to support a legacy system in a
  user application, which was changed some time ago, and
  they decided to store both cases for compatibility. Your
  proposal breaks our backward compatibility, and compatibility
  of that system.

- Performance. With your proposal we would need to replace
  *all* strcmp/memcmp() not related to indexes to strcasecmp().
  The latter is ~x100 slower. That will slowdown everything -
  from field name access to lookup in internal hash tables
  using names as a key. I don't think it is worth the syntax
  sugar. Especially taking into account how hard it becomes to
  fit everything into the single tx thread.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-09 23:09                   ` Vladislav Shpilevoy
@ 2019-12-10  8:19                     ` Konstantin Osipov
  2019-12-10 12:44                       ` Kirill Yukhin
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Osipov @ 2019-12-10  8:19 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/10 10:22]:
> >> We are team of Tarantool developers working for MailRU Group.
> > 
> > Actually this decision was made without any discussions - it was
> > quickly hacked in back in 2018. Maybe you and Nikita had some
> > discussion prompted by Peter's firm stance that we should
> > uppercase, but that was it. I protested several times while I
> > was still on board but had not anticipated how bad the solution
> > would turn out to be when it is implemented to follow through on
> > time. Now it is still not to late - bit it's getting more late
> > every day.
> > 
> > What we learned since then is that every single newbie trips over
> > it.
> > 
> > 
> 
> Well, you was exactly the person who voted for uppercase of
> everything. Peter was for it too, but you made the decision,
> and you at that time had a right to forbid it.

I agree I am partly to blame because I had a chance to object 
and I didn't do it strongly enough at the time. 
In any case, I changed my mind seeing how it worked out.

In fact, I never liked uppercasing and tried to change the
implementation multiple times. But I also wanted you guys
to exercise broader freedom in making decisions, and now am 
paying for this :/

> Now talking of the 'Newbies stumbling into the uppercase'.
> Newbies, who use SQL only, will never notice the uppercasing.
> Because SQL creates in uppercase and searches by uppercase
> by default. It is a matter of how to organize tutorial for
> newbies. In a tutorial they should study SQL and Lua box
> separately, and then learn about details of how to use them
> both. Where they would be explained, that SQL standard
> uppercases everything. And they need to use quotes to force
> lowercase.

This is the same as Nikita says: explain carefully why the pain is
necessary. It is not. You can take a look at the ticket, no user
favours the current behaviour, it's only a few core engineers who
don't want to accept that a mistake was made.

> Talking of the case insensitivity. I see several problems
> serious enough to forget about this forever:
> 
> - Compatibility. As you probably know (and actually you are
>   the one, from who I learned it) - if an API is public, it
>   *is* used by someone. You always should assume that. As
>   well as here you should assume, that for someone the case
>   matters.

Both you and Kirill talk about it but nobody can actually imagine
a practically important situation when this would matter.

> - Standard. This just violates the SQL standard, when you can
>   find lowercase names without quotes. We fought for the
>   standard too much to just drop it now because of this.

box.cfg{ansi=true} and go uppercasing again if you care about the
standard.

But nobody does.

> - Consistency. If you want space names case insensitive, you
>   should realize, that it involves making case insensitive
>   index names, user names, trigger, fk, ck names. This also
>   includes tuple field names, and .... case-insensitive JSON
>   paths! That is a very scary thing if you think about it.
>   Consider this example:

All true, except JSON paths. JSON paths are part of JSON document
and are not part of relational model, so don't have to be governed
by ancient SQL rules.

> ll
>       t = box.tuple.new({
>           {
>               key = 100,
>               KEY = 200
>           }
>       })
>       t["[1].key"]
> 
>   What should be returned? Take into account, that this is
>   not an impossible example. 'key' and 'KEY' might be a
>   consequence of necessity to support a legacy system in a
>   user application, which was changed some time ago, and
>   they decided to store both cases for compatibility. Your
>   proposal breaks our backward compatibility, and compatibility
>   of that system.
> 
> - Performance. With your proposal we would need to replace
>   *all* strcmp/memcmp() not related to indexes to strcasecmp().
>   The latter is ~x100 slower. That will slowdown everything -
>   from field name access to lookup in internal hash tables
>   using names as a key. I don't think it is worth the syntax
>   sugar. Especially taking into account how hard it becomes to
>   fit everything into the single tx thread.

1) It's irrelevant, really. Most column and table names are short
   ascii sequences and are stored in hash tables in memory, not in 
   a binary search tree, so there is only 1 comparison per lookup. 
2) If you're really crazy about performance, you can have hints.

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
  2019-12-10  8:19                     ` Konstantin Osipov
@ 2019-12-10 12:44                       ` Kirill Yukhin
  0 siblings, 0 replies; 17+ messages in thread
From: Kirill Yukhin @ 2019-12-10 12:44 UTC (permalink / raw)
  To: Konstantin Osipov, Vladislav Shpilevoy, Chris Sosnin, tarantool-patches

Hello,

On 10 дек 11:19, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/10 10:22]:
> > >> We are team of Tarantool developers working for MailRU Group.
> > > 
> > > Actually this decision was made without any discussions - it was
> > > quickly hacked in back in 2018. Maybe you and Nikita had some
> > > discussion prompted by Peter's firm stance that we should
> > > uppercase, but that was it. I protested several times while I
> > > was still on board but had not anticipated how bad the solution
> > > would turn out to be when it is implemented to follow through on
> > > time. Now it is still not to late - bit it's getting more late
> > > every day.
> > > 
> > > What we learned since then is that every single newbie trips over
> > > it.
> > > 
> > > 
> > 
> > Well, you was exactly the person who voted for uppercase of
> > everything. Peter was for it too, but you made the decision,
> > and you at that time had a right to forbid it.
> 
> I agree I am partly to blame because I had a chance to object 
> and I didn't do it strongly enough at the time. 
> In any case, I changed my mind seeing how it worked out.

This patch is not about matter which is being discussed in this
thread. To discuss such major features (like `lets do everything
CI`) we have `discussions` mailing list and github issues.
I recommend you to move there with such proposals.

The patch LGTM. I've checked it into master.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-12-10 12:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-29 23:39 [Tarantool-patches] [PATCH] box: remove unicode_ci for functions Chris Sosnin
2019-11-30 20:34 ` Konstantin Osipov
2019-12-01 14:12   ` k.sosnin
2019-12-01 14:36   ` Vladislav Shpilevoy
2019-12-02  7:07     ` Konstantin Osipov
2019-12-02 14:36       ` Nikita Pettik
2019-12-02 14:49         ` Konstantin Osipov
2019-12-06 11:42       ` Kirill Yukhin
2019-12-06 20:17         ` Konstantin Osipov
2019-12-09 11:06           ` Kirill Yukhin
2019-12-09 11:24             ` Konstantin Osipov
2019-12-09 13:25               ` Kirill Yukhin
2019-12-09 13:39                 ` Konstantin Osipov
2019-12-09 14:07                   ` Nikita Pettik
2019-12-09 23:09                   ` Vladislav Shpilevoy
2019-12-10  8:19                     ` Konstantin Osipov
2019-12-10 12:44                       ` Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox