[Tarantool-patches] [PATCH] box: remove unicode_ci for functions

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Tue Dec 10 02:09:41 MSK 2019


Hi! Thanks for the discussion!

On 09/12/2019 14:39, Konstantin Osipov wrote:
> * Kirill Yukhin <kyukhin at tarantool.org> [19/12/09 16:28]:
>>> Who is "we"?
>>
>> We are team of Tarantool developers working for MailRU Group.
> 
> Actually this decision was made without any discussions - it was
> quickly hacked in back in 2018. Maybe you and Nikita had some
> discussion prompted by Peter's firm stance that we should
> uppercase, but that was it. I protested several times while I
> was still on board but had not anticipated how bad the solution
> would turn out to be when it is implemented to follow through on
> time. Now it is still not to late - bit it's getting more late
> every day.
> 
> What we learned since then is that every single newbie trips over
> it.
> 
> 

Well, you was exactly the person who voted for uppercase of
everything. Peter was for it too, but you made the decision,
and you at that time had a right to forbid it.

Now talking of the 'Newbies stumbling into the uppercase'.
Newbies, who use SQL only, will never notice the uppercasing.
Because SQL creates in uppercase and searches by uppercase
by default. It is a matter of how to organize tutorial for
newbies. In a tutorial they should study SQL and Lua box
separately, and then learn about details of how to use them
both. Where they would be explained, that SQL standard
uppercases everything. And they need to use quotes to force
lowercase.

Talking of the case insensitivity. I see several problems
serious enough to forget about this forever:

- Compatibility. As you probably know (and actually you are
  the one, from who I learned it) - if an API is public, it
  *is* used by someone. You always should assume that. As
  well as here you should assume, that for someone the case
  matters.

- Standard. This just violates the SQL standard, when you can
  find lowercase names without quotes. We fought for the
  standard too much to just drop it now because of this.

- Consistency. If you want space names case insensitive, you
  should realize, that it involves making case insensitive
  index names, user names, trigger, fk, ck names. This also
  includes tuple field names, and .... case-insensitive JSON
  paths! That is a very scary thing if you think about it.
  Consider this example:

      t = box.tuple.new({
          {
              key = 100,
              KEY = 200
          }
      })
      t["[1].key"]

  What should be returned? Take into account, that this is
  not an impossible example. 'key' and 'KEY' might be a
  consequence of necessity to support a legacy system in a
  user application, which was changed some time ago, and
  they decided to store both cases for compatibility. Your
  proposal breaks our backward compatibility, and compatibility
  of that system.

- Performance. With your proposal we would need to replace
  *all* strcmp/memcmp() not related to indexes to strcasecmp().
  The latter is ~x100 slower. That will slowdown everything -
  from field name access to lookup in internal hash tables
  using names as a key. I don't think it is worth the syntax
  sugar. Especially taking into account how hard it becomes to
  fit everything into the single tx thread.


More information about the Tarantool-patches mailing list