Tarantool development patches archive
 help / color / mirror / Atom feed
From: Konstantin Osipov <kostja.osipov@gmail.com>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions
Date: Tue, 10 Dec 2019 11:19:58 +0300	[thread overview]
Message-ID: <20191210081958.GB21413@atlas> (raw)
In-Reply-To: <cc7f1523-eb71-5312-695d-6fc82b343f65@tarantool.org>

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/12/10 10:22]:
> >> We are team of Tarantool developers working for MailRU Group.
> > 
> > Actually this decision was made without any discussions - it was
> > quickly hacked in back in 2018. Maybe you and Nikita had some
> > discussion prompted by Peter's firm stance that we should
> > uppercase, but that was it. I protested several times while I
> > was still on board but had not anticipated how bad the solution
> > would turn out to be when it is implemented to follow through on
> > time. Now it is still not to late - bit it's getting more late
> > every day.
> > 
> > What we learned since then is that every single newbie trips over
> > it.
> > 
> > 
> 
> Well, you was exactly the person who voted for uppercase of
> everything. Peter was for it too, but you made the decision,
> and you at that time had a right to forbid it.

I agree I am partly to blame because I had a chance to object 
and I didn't do it strongly enough at the time. 
In any case, I changed my mind seeing how it worked out.

In fact, I never liked uppercasing and tried to change the
implementation multiple times. But I also wanted you guys
to exercise broader freedom in making decisions, and now am 
paying for this :/

> Now talking of the 'Newbies stumbling into the uppercase'.
> Newbies, who use SQL only, will never notice the uppercasing.
> Because SQL creates in uppercase and searches by uppercase
> by default. It is a matter of how to organize tutorial for
> newbies. In a tutorial they should study SQL and Lua box
> separately, and then learn about details of how to use them
> both. Where they would be explained, that SQL standard
> uppercases everything. And they need to use quotes to force
> lowercase.

This is the same as Nikita says: explain carefully why the pain is
necessary. It is not. You can take a look at the ticket, no user
favours the current behaviour, it's only a few core engineers who
don't want to accept that a mistake was made.

> Talking of the case insensitivity. I see several problems
> serious enough to forget about this forever:
> 
> - Compatibility. As you probably know (and actually you are
>   the one, from who I learned it) - if an API is public, it
>   *is* used by someone. You always should assume that. As
>   well as here you should assume, that for someone the case
>   matters.

Both you and Kirill talk about it but nobody can actually imagine
a practically important situation when this would matter.

> - Standard. This just violates the SQL standard, when you can
>   find lowercase names without quotes. We fought for the
>   standard too much to just drop it now because of this.

box.cfg{ansi=true} and go uppercasing again if you care about the
standard.

But nobody does.

> - Consistency. If you want space names case insensitive, you
>   should realize, that it involves making case insensitive
>   index names, user names, trigger, fk, ck names. This also
>   includes tuple field names, and .... case-insensitive JSON
>   paths! That is a very scary thing if you think about it.
>   Consider this example:

All true, except JSON paths. JSON paths are part of JSON document
and are not part of relational model, so don't have to be governed
by ancient SQL rules.

> ll
>       t = box.tuple.new({
>           {
>               key = 100,
>               KEY = 200
>           }
>       })
>       t["[1].key"]
> 
>   What should be returned? Take into account, that this is
>   not an impossible example. 'key' and 'KEY' might be a
>   consequence of necessity to support a legacy system in a
>   user application, which was changed some time ago, and
>   they decided to store both cases for compatibility. Your
>   proposal breaks our backward compatibility, and compatibility
>   of that system.
> 
> - Performance. With your proposal we would need to replace
>   *all* strcmp/memcmp() not related to indexes to strcasecmp().
>   The latter is ~x100 slower. That will slowdown everything -
>   from field name access to lookup in internal hash tables
>   using names as a key. I don't think it is worth the syntax
>   sugar. Especially taking into account how hard it becomes to
>   fit everything into the single tx thread.

1) It's irrelevant, really. Most column and table names are short
   ascii sequences and are stored in hash tables in memory, not in 
   a binary search tree, so there is only 1 comparison per lookup. 
2) If you're really crazy about performance, you can have hints.

-- 
Konstantin Osipov, Moscow, Russia

  reply	other threads:[~2019-12-10  8:20 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-29 23:39 Chris Sosnin
2019-11-30 20:34 ` Konstantin Osipov
2019-12-01 14:12   ` k.sosnin
2019-12-01 14:36   ` Vladislav Shpilevoy
2019-12-02  7:07     ` Konstantin Osipov
2019-12-02 14:36       ` Nikita Pettik
2019-12-02 14:49         ` Konstantin Osipov
2019-12-06 11:42       ` Kirill Yukhin
2019-12-06 20:17         ` Konstantin Osipov
2019-12-09 11:06           ` Kirill Yukhin
2019-12-09 11:24             ` Konstantin Osipov
2019-12-09 13:25               ` Kirill Yukhin
2019-12-09 13:39                 ` Konstantin Osipov
2019-12-09 14:07                   ` Nikita Pettik
2019-12-09 23:09                   ` Vladislav Shpilevoy
2019-12-10  8:19                     ` Konstantin Osipov [this message]
2019-12-10 12:44                       ` Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191210081958.GB21413@atlas \
    --to=kostja.osipov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox