[PATCH v2] Feature request for a new collation

Vladimir Davydov vdavydov.dev at gmail.com
Tue Feb 26 16:52:17 MSK 2019


IMO the subject line would be more descriptive if it read

  box: add new unicode_s2 collation

On Tue, Feb 26, 2019 at 01:40:42PM +0300, Stanislav Zudin wrote:
> Adds a new default collation 'unicode_s2' to support the difference
> between Cyrillic letters 'Е' and 'Ё'. The standard case insensitive
> collation ('unicode_ci') doesn't distinguish these letters.
> 
> Closes #4007
> ---
> Branch: https://github.com/tarantool/tarantool/tree/stanztt/gh-4007-new-default-collation-2.1
> Issue: https://github.com/tarantool/tarantool/issues/4007
> 
>  src/box/bootstrap.snap          | Bin 1831 -> 1864 bytes
>  src/box/lua/upgrade.lua         |   1 +
>  test/sql-tap/collation.test.lua |   7 +-
>  test/sql/collation.result       | 111 ++++++++++++++++++++++++++++++++
>  test/sql/collation.test.lua     |  41 ++++++++++++
>  5 files changed, 157 insertions(+), 3 deletions(-)

Tests still don't pass:

https://travis-ci.org/tarantool/tarantool/builds/498654717?utm_source=github_status&utm_medium=notification

Please fix.

> diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua
> index 70cfb4f2e..84c559dac 100644
> --- a/src/box/lua/upgrade.lua
> +++ b/src/box/lua/upgrade.lua
> @@ -610,6 +610,7 @@ local function upgrade_to_2_1_0()

As I said, we're about to release 2.1.2 so you should add
upgrade_to_2_1_2 and add the collation there.

>  
>      box.space._collation:replace{0, "none", ADMIN, "BINARY", "", setmap{}}
>      box.space._collation:replace{3, "binary", ADMIN, "BINARY", "", setmap{}}
> +    box.space._collation:replace{4, "unicode_s2", ADMIN, "ICU", "ru_RU", {strength='secondary'}}

As I mentioned earlier, name unicode_s2 looks way too generic for
ru locale. IMO we should either use unicode_ru_s2 or unicode_ru.
Also, currently we don't use s1/s2/s3 suffixes. Instead we used cs/ci.
May be, we should use a suffix like that in this case too.

I looked through Kostja's discussion with Mr. Gulutzan and I didn't
see that they had come to an agreement to name this new collation
unicode_s2. Please solicit their approval on the name.



More information about the Tarantool-patches mailing list