From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp49.i.mail.ru (smtp49.i.mail.ru [94.100.177.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 2753346971A for ; Tue, 10 Dec 2019 02:09:45 +0300 (MSK) References: <20191129233922.36600-1-k.sosnin@tarantool.org> <20191130203439.GA23478@atlas> <13437800-f8ec-1964-f7d7-a01581e242ad@tarantool.org> <20191202070715.GA27802@atlas> <20191206114244.umbeo556b2atuhjm@tarantool.org> <20191206201718.GA7299@atlas> <20191209110631.srknhyc3zpn5cjsy@tarantool.org> <20191209112428.GB25729@atlas> <20191209132555.qn25wsxoa5lr4252@tarantool.org> <20191209133955.GA8196@atlas> From: Vladislav Shpilevoy Message-ID: Date: Tue, 10 Dec 2019 00:09:41 +0100 MIME-Version: 1.0 In-Reply-To: <20191209133955.GA8196@atlas> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH] box: remove unicode_ci for functions List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Konstantin Osipov , Kirill Yukhin , Chris Sosnin , tarantool-patches@dev.tarantool.org Hi! Thanks for the discussion! On 09/12/2019 14:39, Konstantin Osipov wrote: > * Kirill Yukhin [19/12/09 16:28]: >>> Who is "we"? >> >> We are team of Tarantool developers working for MailRU Group. > > Actually this decision was made without any discussions - it was > quickly hacked in back in 2018. Maybe you and Nikita had some > discussion prompted by Peter's firm stance that we should > uppercase, but that was it. I protested several times while I > was still on board but had not anticipated how bad the solution > would turn out to be when it is implemented to follow through on > time. Now it is still not to late - bit it's getting more late > every day. > > What we learned since then is that every single newbie trips over > it. > > Well, you was exactly the person who voted for uppercase of everything. Peter was for it too, but you made the decision, and you at that time had a right to forbid it. Now talking of the 'Newbies stumbling into the uppercase'. Newbies, who use SQL only, will never notice the uppercasing. Because SQL creates in uppercase and searches by uppercase by default. It is a matter of how to organize tutorial for newbies. In a tutorial they should study SQL and Lua box separately, and then learn about details of how to use them both. Where they would be explained, that SQL standard uppercases everything. And they need to use quotes to force lowercase. Talking of the case insensitivity. I see several problems serious enough to forget about this forever: - Compatibility. As you probably know (and actually you are the one, from who I learned it) - if an API is public, it *is* used by someone. You always should assume that. As well as here you should assume, that for someone the case matters. - Standard. This just violates the SQL standard, when you can find lowercase names without quotes. We fought for the standard too much to just drop it now because of this. - Consistency. If you want space names case insensitive, you should realize, that it involves making case insensitive index names, user names, trigger, fk, ck names. This also includes tuple field names, and .... case-insensitive JSON paths! That is a very scary thing if you think about it. Consider this example: t = box.tuple.new({ { key = 100, KEY = 200 } }) t["[1].key"] What should be returned? Take into account, that this is not an impossible example. 'key' and 'KEY' might be a consequence of necessity to support a legacy system in a user application, which was changed some time ago, and they decided to store both cases for compatibility. Your proposal breaks our backward compatibility, and compatibility of that system. - Performance. With your proposal we would need to replace *all* strcmp/memcmp() not related to indexes to strcasecmp(). The latter is ~x100 slower. That will slowdown everything - from field name access to lookup in internal hash tables using names as a key. I don't think it is worth the syntax sugar. Especially taking into account how hard it becomes to fit everything into the single tx thread.