[Tarantool-patches] [PATCH] box: remove unicode_ci for functions
Konstantin Osipov
kostja.osipov at gmail.com
Tue Dec 10 11:19:58 MSK 2019
* Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [19/12/10 10:22]:
> >> We are team of Tarantool developers working for MailRU Group.
> >
> > Actually this decision was made without any discussions - it was
> > quickly hacked in back in 2018. Maybe you and Nikita had some
> > discussion prompted by Peter's firm stance that we should
> > uppercase, but that was it. I protested several times while I
> > was still on board but had not anticipated how bad the solution
> > would turn out to be when it is implemented to follow through on
> > time. Now it is still not to late - bit it's getting more late
> > every day.
> >
> > What we learned since then is that every single newbie trips over
> > it.
> >
> >
>
> Well, you was exactly the person who voted for uppercase of
> everything. Peter was for it too, but you made the decision,
> and you at that time had a right to forbid it.
I agree I am partly to blame because I had a chance to object
and I didn't do it strongly enough at the time.
In any case, I changed my mind seeing how it worked out.
In fact, I never liked uppercasing and tried to change the
implementation multiple times. But I also wanted you guys
to exercise broader freedom in making decisions, and now am
paying for this :/
> Now talking of the 'Newbies stumbling into the uppercase'.
> Newbies, who use SQL only, will never notice the uppercasing.
> Because SQL creates in uppercase and searches by uppercase
> by default. It is a matter of how to organize tutorial for
> newbies. In a tutorial they should study SQL and Lua box
> separately, and then learn about details of how to use them
> both. Where they would be explained, that SQL standard
> uppercases everything. And they need to use quotes to force
> lowercase.
This is the same as Nikita says: explain carefully why the pain is
necessary. It is not. You can take a look at the ticket, no user
favours the current behaviour, it's only a few core engineers who
don't want to accept that a mistake was made.
> Talking of the case insensitivity. I see several problems
> serious enough to forget about this forever:
>
> - Compatibility. As you probably know (and actually you are
> the one, from who I learned it) - if an API is public, it
> *is* used by someone. You always should assume that. As
> well as here you should assume, that for someone the case
> matters.
Both you and Kirill talk about it but nobody can actually imagine
a practically important situation when this would matter.
> - Standard. This just violates the SQL standard, when you can
> find lowercase names without quotes. We fought for the
> standard too much to just drop it now because of this.
box.cfg{ansi=true} and go uppercasing again if you care about the
standard.
But nobody does.
> - Consistency. If you want space names case insensitive, you
> should realize, that it involves making case insensitive
> index names, user names, trigger, fk, ck names. This also
> includes tuple field names, and .... case-insensitive JSON
> paths! That is a very scary thing if you think about it.
> Consider this example:
All true, except JSON paths. JSON paths are part of JSON document
and are not part of relational model, so don't have to be governed
by ancient SQL rules.
> ll
> t = box.tuple.new({
> {
> key = 100,
> KEY = 200
> }
> })
> t["[1].key"]
>
> What should be returned? Take into account, that this is
> not an impossible example. 'key' and 'KEY' might be a
> consequence of necessity to support a legacy system in a
> user application, which was changed some time ago, and
> they decided to store both cases for compatibility. Your
> proposal breaks our backward compatibility, and compatibility
> of that system.
>
> - Performance. With your proposal we would need to replace
> *all* strcmp/memcmp() not related to indexes to strcasecmp().
> The latter is ~x100 slower. That will slowdown everything -
> from field name access to lookup in internal hash tables
> using names as a key. I don't think it is worth the syntax
> sugar. Especially taking into account how hard it becomes to
> fit everything into the single tx thread.
1) It's irrelevant, really. Most column and table names are short
ascii sequences and are stored in hash tables in memory, not in
a binary search tree, so there is only 1 comparison per lookup.
2) If you're really crazy about performance, you can have hints.
--
Konstantin Osipov, Moscow, Russia
More information about the Tarantool-patches
mailing list