From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 4CC4446971A for ; Tue, 10 Dec 2019 11:20:00 +0300 (MSK) Received: by mail-lj1-f195.google.com with SMTP id m6so18836827ljc.1 for ; Tue, 10 Dec 2019 00:20:00 -0800 (PST) Date: Tue, 10 Dec 2019 11:19:58 +0300 From: Konstantin Osipov Message-ID: <20191210081958.GB21413@atlas> References: <20191130203439.GA23478@atlas> <13437800-f8ec-1964-f7d7-a01581e242ad@tarantool.org> <20191202070715.GA27802@atlas> <20191206114244.umbeo556b2atuhjm@tarantool.org> <20191206201718.GA7299@atlas> <20191209110631.srknhyc3zpn5cjsy@tarantool.org> <20191209112428.GB25729@atlas> <20191209132555.qn25wsxoa5lr4252@tarantool.org> <20191209133955.GA8196@atlas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org * Vladislav Shpilevoy [19/12/10 10:22]: > >> We are team of Tarantool developers working for MailRU Group. > > > > Actually this decision was made without any discussions - it was > > quickly hacked in back in 2018. Maybe you and Nikita had some > > discussion prompted by Peter's firm stance that we should > > uppercase, but that was it. I protested several times while I > > was still on board but had not anticipated how bad the solution > > would turn out to be when it is implemented to follow through on > > time. Now it is still not to late - bit it's getting more late > > every day. > > > > What we learned since then is that every single newbie trips over > > it. > > > > > > Well, you was exactly the person who voted for uppercase of > everything. Peter was for it too, but you made the decision, > and you at that time had a right to forbid it. I agree I am partly to blame because I had a chance to object and I didn't do it strongly enough at the time. In any case, I changed my mind seeing how it worked out. In fact, I never liked uppercasing and tried to change the implementation multiple times. But I also wanted you guys to exercise broader freedom in making decisions, and now am paying for this :/ > Now talking of the 'Newbies stumbling into the uppercase'. > Newbies, who use SQL only, will never notice the uppercasing. > Because SQL creates in uppercase and searches by uppercase > by default. It is a matter of how to organize tutorial for > newbies. In a tutorial they should study SQL and Lua box > separately, and then learn about details of how to use them > both. Where they would be explained, that SQL standard > uppercases everything. And they need to use quotes to force > lowercase. This is the same as Nikita says: explain carefully why the pain is necessary. It is not. You can take a look at the ticket, no user favours the current behaviour, it's only a few core engineers who don't want to accept that a mistake was made. > Talking of the case insensitivity. I see several problems > serious enough to forget about this forever: > > - Compatibility. As you probably know (and actually you are > the one, from who I learned it) - if an API is public, it > *is* used by someone. You always should assume that. As > well as here you should assume, that for someone the case > matters. Both you and Kirill talk about it but nobody can actually imagine a practically important situation when this would matter. > - Standard. This just violates the SQL standard, when you can > find lowercase names without quotes. We fought for the > standard too much to just drop it now because of this. box.cfg{ansi=true} and go uppercasing again if you care about the standard. But nobody does. > - Consistency. If you want space names case insensitive, you > should realize, that it involves making case insensitive > index names, user names, trigger, fk, ck names. This also > includes tuple field names, and .... case-insensitive JSON > paths! That is a very scary thing if you think about it. > Consider this example: All true, except JSON paths. JSON paths are part of JSON document and are not part of relational model, so don't have to be governed by ancient SQL rules. > ll > t = box.tuple.new({ > { > key = 100, > KEY = 200 > } > }) > t["[1].key"] > > What should be returned? Take into account, that this is > not an impossible example. 'key' and 'KEY' might be a > consequence of necessity to support a legacy system in a > user application, which was changed some time ago, and > they decided to store both cases for compatibility. Your > proposal breaks our backward compatibility, and compatibility > of that system. > > - Performance. With your proposal we would need to replace > *all* strcmp/memcmp() not related to indexes to strcasecmp(). > The latter is ~x100 slower. That will slowdown everything - > from field name access to lookup in internal hash tables > using names as a key. I don't think it is worth the syntax > sugar. Especially taking into account how hard it becomes to > fit everything into the single tx thread. 1) It's irrelevant, really. Most column and table names are short ascii sequences and are stored in hash tables in memory, not in a binary search tree, so there is only 1 comparison per lookup. 2) If you're really crazy about performance, you can have hints. -- Konstantin Osipov, Moscow, Russia