From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 26 Feb 2019 16:52:17 +0300 From: Vladimir Davydov Subject: Re: [PATCH v2] Feature request for a new collation Message-ID: <20190226135217.a4ptsoxgkk35b7m2@esperanza> References: <20190226104042.28149-1-szudin@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20190226104042.28149-1-szudin@tarantool.org> To: Stanislav Zudin Cc: tarantool-patches@freelists.org, kostja@tarantool.org List-ID: IMO the subject line would be more descriptive if it read box: add new unicode_s2 collation On Tue, Feb 26, 2019 at 01:40:42PM +0300, Stanislav Zudin wrote: > Adds a new default collation 'unicode_s2' to support the difference > between Cyrillic letters 'Е' and 'Ё'. The standard case insensitive > collation ('unicode_ci') doesn't distinguish these letters. > > Closes #4007 > --- > Branch: https://github.com/tarantool/tarantool/tree/stanztt/gh-4007-new-default-collation-2.1 > Issue: https://github.com/tarantool/tarantool/issues/4007 > > src/box/bootstrap.snap | Bin 1831 -> 1864 bytes > src/box/lua/upgrade.lua | 1 + > test/sql-tap/collation.test.lua | 7 +- > test/sql/collation.result | 111 ++++++++++++++++++++++++++++++++ > test/sql/collation.test.lua | 41 ++++++++++++ > 5 files changed, 157 insertions(+), 3 deletions(-) Tests still don't pass: https://travis-ci.org/tarantool/tarantool/builds/498654717?utm_source=github_status&utm_medium=notification Please fix. > diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua > index 70cfb4f2e..84c559dac 100644 > --- a/src/box/lua/upgrade.lua > +++ b/src/box/lua/upgrade.lua > @@ -610,6 +610,7 @@ local function upgrade_to_2_1_0() As I said, we're about to release 2.1.2 so you should add upgrade_to_2_1_2 and add the collation there. > > box.space._collation:replace{0, "none", ADMIN, "BINARY", "", setmap{}} > box.space._collation:replace{3, "binary", ADMIN, "BINARY", "", setmap{}} > + box.space._collation:replace{4, "unicode_s2", ADMIN, "ICU", "ru_RU", {strength='secondary'}} As I mentioned earlier, name unicode_s2 looks way too generic for ru locale. IMO we should either use unicode_ru_s2 or unicode_ru. Also, currently we don't use s1/s2/s3 suffixes. Instead we used cs/ci. May be, we should use a suffix like that in this case too. I looked through Kostja's discussion with Mr. Gulutzan and I didn't see that they had come to an agreement to name this new collation unicode_s2. Please solicit their approval on the name.