From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [tarantool-patches] Re: [PATCH v2] Feature request for a new collation References: <20190226104042.28149-1-szudin@tarantool.org> <20190226135217.a4ptsoxgkk35b7m2@esperanza> From: Stanislav Zudin Message-ID: <230d10ab-6b3c-f5c0-853c-d48624a8cf4f@tarantool.org> Date: Thu, 28 Feb 2019 15:14:51 +0300 MIME-Version: 1.0 In-Reply-To: <20190226135217.a4ptsoxgkk35b7m2@esperanza> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-US To: tarantool-patches@freelists.org Cc: Vladimir Davydov List-ID: The recent patch includes the updated tests and new name for the collation. On 26.02.2019 16:52, Vladimir Davydov wrote: > IMO the subject line would be more descriptive if it read > > box: add new unicode_s2 collation > > On Tue, Feb 26, 2019 at 01:40:42PM +0300, Stanislav Zudin wrote: >> Adds a new default collation 'unicode_s2' to support the difference >> between Cyrillic letters 'Е' and 'Ё'. The standard case insensitive >> collation ('unicode_ci') doesn't distinguish these letters. >> >> Closes #4007 >> --- >> Branch: https://github.com/tarantool/tarantool/tree/stanztt/gh-4007-new-default-collation-2.1 >> Issue: https://github.com/tarantool/tarantool/issues/4007 >> >> src/box/bootstrap.snap | Bin 1831 -> 1864 bytes >> src/box/lua/upgrade.lua | 1 + >> test/sql-tap/collation.test.lua | 7 +- >> test/sql/collation.result | 111 ++++++++++++++++++++++++++++++++ >> test/sql/collation.test.lua | 41 ++++++++++++ >> 5 files changed, 157 insertions(+), 3 deletions(-) > Tests still don't pass: > > https://travis-ci.org/tarantool/tarantool/builds/498654717?utm_source=github_status&utm_medium=notification > > Please fix. Fixed > >> diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua >> index 70cfb4f2e..84c559dac 100644 >> --- a/src/box/lua/upgrade.lua >> +++ b/src/box/lua/upgrade.lua >> @@ -610,6 +610,7 @@ local function upgrade_to_2_1_0() > As I said, we're about to release 2.1.2 so you should add > upgrade_to_2_1_2 and add the collation there. done >> >> box.space._collation:replace{0, "none", ADMIN, "BINARY", "", setmap{}} >> box.space._collation:replace{3, "binary", ADMIN, "BINARY", "", setmap{}} >> + box.space._collation:replace{4, "unicode_s2", ADMIN, "ICU", "ru_RU", {strength='secondary'}} > As I mentioned earlier, name unicode_s2 looks way too generic for > ru locale. IMO we should either use unicode_ru_s2 or unicode_ru. > Also, currently we don't use s1/s2/s3 suffixes. Instead we used cs/ci. > May be, we should use a suffix like that in this case too. > > I looked through Kostja's discussion with Mr. Gulutzan and I didn't > see that they had come to an agreement to name this new collation > unicode_s2. Please solicit their approval on the name. Renamed to unicode_ru_s2. The updated patch is below: Branch: https://github.com/tarantool/tarantool/tree/stanztt/gh-4007-new-default-collation-2.1 Issue: https://github.com/tarantool/tarantool/issues/4007  src/box/bootstrap.snap          | Bin 1831 -> 1867 bytes  src/box/lua/upgrade.lua         |   7 +-  test/box/ddl.result             |  14 ++--  test/box/net.box.result         |   2 +-  test/sql-tap/collation.test.lua |   7 +-  test/sql/collation.result       | 111 ++++++++++++++++++++++++++++++++  test/sql/collation.test.lua     |  41 ++++++++++++  7 files changed, 171 insertions(+), 11 deletions(-) diff --git a/src/box/bootstrap.snap b/src/box/bootstrap.snap index 0bb446fb6903ac3ef630c419b909f7db3df0372a..a0c436d0dac9593d88e87fefe5084e833d8bed2f 100644 GIT binary patch delta 1863 zcmV-N2e|mB4$BUZ8Gkb|I4x&pGBIUjVP#}63Q2BrbYX5|WjY`?GG;S1Fgau`GcYnT zEi^JQVJ%@eWic%_G&x}~IWsq8Ffup_RzqxWV{1AfdmtbnARr)p3JTS_3%bn)%K*;U z5SneJ0000004TLD{QyuEHUN4^h!07?SWN>cqM|H{q9~%Gb$^TUh>GZvX5Gkcfo*0U z^Ec&r*()=!Gn3-3q)ZYae>Wl1Q4$JjYudtRaDrKA3;S4dr=roZF%c_L2HybK0OkNL z@s({XeEpHwDqqd_XaqwdiFoTcqh{i^pWe&y?`ROXybOE?(;UjDk@ zwV)ieYw^RbT~{bC0bbhXn;JI&8^HQ%(N+tnrd62u}#Y`*t(H}~08 z;$pUq$`DDDA31z##8@%T&tH~FB`#*8GNhB{?{59BnSYYGQ5klbP2R0Uwl%^ym?>oz z7$X)@xFp$9qs0fy3zbFzm3MC>^)dZ_JX^h*xvmDQICmT*>E9g{vr!qcc(+OVcdz0+ zHBi#!u5#6J<}lc(3_Jg=q|04Z#fh^($yH~QgV7cgZ%m2ky?pC9Zxk?b)um~ZfLpId znKcMDDu2UlAAes8cduu4wiCIwe2YXj-&Rcj>Uq91@cEW8$=`*3*V?2xF!a~8Uwh`N zvpHjc+m`-RfygCrhHfV$0c=~G6gm&L(sNjn}eUsb{`(W7uTPMKGBI% z{w}u|e_dUgF$6X$!z+(d7UQ1lzD(b-4IywqY=2aSM>roi6pTCFc!1sL*mSVrV6%a> zEvrTYO%4qXO@T2|K#-UFhaM7X523lQ6SM ziMz4cs0^9#*DYBO?D?y)I&NFyzD)VG1#2F@td0rK*J{4=*VQAc8fAJl8g)m%;e%xU00;1HpAQ$n#*MGF&~ruB*YC#kSjHEfcz# z>e(y>D#BAyL~@33LTrM}86YvUNu)n!{(mm-a#zvlOyYKYbm`J1%j~gH8Oq4t`dv47 z8jp?2@P*2m#O(y>2m2Q1gdN?*;A^J2>3IBUS$A=&gX5%0+$l%CEzgQ`;@GGRQLI!f z@U^jzj2GPo96)~PRfC^E8AI=4xvHQnYj*%GtE(01-(AM{_mwzh5_h_>Q5pIK&3}l^ zPnEBZ{@rzSh`!X6Ig>aX8`GdQd+4GX z+P~Si?zK9tNgT_UG1ZLgG}uepGMtGx4wFTJ%$5N-epcm_#CY)0hOwrCq<^5&SWjIH zW8j5Q&QD2U|C)Lu#9O;U{5%C9Y=wGNZP>%2H{*>{=hy-eG-wH-5y@49c@SMhsC#kk zN#fVhY(&LwyCljPpf;o%W23QGj$ktfhHbr<00Fv8JsBIi4pZDf27~JX=Y%6Ed}QY~ z8oQy04O+-)^bmp#{g&gw5r0VtOF(HDx6B6PwqahS&Gr4jNwGuK$Z|HIXhm7K`d>c=*r6lrd4uL|KNArGHFO zW-ncsAs0EmQn2W!^N=k_z4Uaf&iy<{`$~je#*&rlBZ={g6C)KpA%8kKcs7ixv^|;j zbm{D)iBhFLa=_1H0t^ZD^w@Kr*9%>NOI=QV=*t&r$CKlEX{rGx$-XAiE`nMlXUH$8 z(6&@XiGrNt3@R?-CE@iSwh)}BkRfqM^jrV3k}_8iR*wEH^b$E_bQJp4cckiC-bUS; z-%J)#X%jHUbZ}4qzJH4|8_2+M8mEzMuyUrm&l$oqpr{%7PKb^Sxmzp8lPrca8Fd2S zviMLaSb9+}P8L+7Fw74O{)c7-g;>wza6da?E-vYP>(OOkm%OwO?SFb!b46StdWk1V zriv0t`%dqsboi1#0aYrS*sQq!02#!fHHk`PNJd%g)j#5k*Fdy>ECFs%**<;V5Q6N4 z^CV}4W7AcZQYit@%r+mIaa6LF5KcxBLe!H*J8#%z$(OXkIdcvPxvGVs-~rVTt?d$+ Ba(Ms% delta 1827 zcmV+;2i*9}4yO)~8GkZ0H7#c}H)3WoGdVK~Np5p=VQyn(Iv`^*Gd5&pFkvk=W@R@m zG%zzYEjT!4F)d?aGh{VnH8Wx{HDL-?Lu_wjYdRo%eF_TIx(m9^2Ce|k)s!0pr2qf` z001bpFZ}>eEj0jYM??=v;25U>Fw8KA5aR@KMG#m>L*QW(A%7yj2IH-OV2p`GQD8SB zQ<9aEzBH}eU*u^eqq>gJE=JE6+w)C1z1FnIIjXms%tRqmN&(pb)d1!IBbLvpo3p4b z$3!pY+ag{1#zxmGiG3RH&dZ>+ZNbR` zpw6Eex9|>YH{XtzHQT#Sk^+hT!LINAW-Yu!OH$Mbwx5e92V0%vs*4x4pNp2HrUpzp zEvj0!q%;h+I!CcTXVPg=Wl1{oBGj@o$}QM8Bymir=RJ*WNo5o`Y1ttflb~PEN}&`G zY;}(Met!nRRPJ2A>TXB&eLE+6uEwm`{_nfSv+z5YIngifJhwFpbAab@i@W%&WoNWx zpkLMoRpocHitJhhJL~1gDk6A9*~40q(R(o z&R}PU=!n2pXC@OgC1^+h(Ttc8F%x2|a|D_>aet;42STRiCG#Q1LrjMl4l90F6lh>( zU^c*LC}k*NC|#(PWP!ebu22$3lA&}|VXJc-#mpBFmKS%h7E7Q1-DUHI771}O6t+4? zq6~IJ*9%1r)>d7=ERtg){riSBFXL6`q-%V4=Njy+5|x!gJ(#V|k#%|LX<62>o!2p; z5r0M|bS~@ha4e^K?9^}PHt%pdM2kr9zTf@&Kg)3xqM#|XM`^2bBwycer^VfEjp~rL zI>(^z!@FA=&55wpIf}AZ=VI*GD1^?BN8)& zmV|6|j&($f$Qp!d5cf`tDq5v54h5u3mwzr==#Q<=ku;z0Z(%Lcd~9`&0P3VL4%K6S zC~}fk;L+h6{wAW^j?dthbvUQQu%s2nnRIB(@-9hfj;+p7#iv?@E}n5{#OONc81g)> z8{HJr;Q7|7W#x2N{6lzGUCn1d7iFEl_>pwNI1`So&Jo8)jTi-08rSUSVogmCn17-| zDU6%BK|LE=ougMHz$?fIjaLyJ8lBN2M@I(DiID??-n_VRVbj6}$L8i$RuqegTCu%? zP%2I-PAE<%Og1J|fo-VqTx6&OMi77i1war0(FF%F=!ymsV6Z@pqcDzv7zjfUiXs~a zKmZIv3N(mXFhZ*abuTo5t}r0bJb#VtMSW;5s(;`EhOL>ZG@V_Dw&p-8+e?Qb5ZH(U z8w&cNkRT-mlY5E*j$JyRR7_U;;{~ix4OzC?q3X4Ia+*1cr!ninwax4~%hK_LeTB&v zLjVBRs^4;QWQeYPv}dd-Dk13b>Y^0K&07c!{KzY;+^{zwba*L5k*TZ4R)2A5)rMV5 z1W#@}b)GGRnL+;xjWVvON?E8 zB&%*d-&__)6H<((ZgQw$7zx{+W69LE*&h?7B7a?Z;EnR68??zGSv7f?8y1z%Mw_woZ)_#d6LLT2tbG!rMPqA;^9qOVp6)w)Uk3 zWj-M@9sOHqC33{?D9#~wr0H7TN8Lc)Ob&^*Dez)Cm?wWjj;1n9Cw{1vsrpxC{ z>={t>jCkioCWhRtm4EX|*239%3IO;oK41!#Ues%p1qCUL?4u+J7ywZd0|9Dm~is&PHOHUGW6(y5)oZ3z2@GO5Ksw8hRS@Hb?aZr+0cGNIK z43t=2$s?w>#u23tKrzBb1(Hfa^nVYT~h!6 diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua index 70cfb4f2e..09af2e20d 100644 --- a/src/box/lua/upgrade.lua +++ b/src/box/lua/upgrade.lua @@ -614,6 +614,10 @@ local function upgrade_to_2_1_0()      upgrade_priv_to_2_1_0()  end +local function upgrade_to_2_1_2() +    box.space._collation:replace{4, "unicode_ru_s2", ADMIN, "ICU", "ru_RU", {strength='secondary'}} +end +  local function get_version()      local version = box.space._schema:get{'version'}      if version == nil then @@ -641,7 +645,8 @@ local function upgrade(options)          {version = mkversion(1, 7, 7), func = upgrade_to_1_7_7, auto = true},          {version = mkversion(1, 10, 0), func = upgrade_to_1_10_0, auto = true},          {version = mkversion(1, 10, 2), func = upgrade_to_1_10_2, auto = true}, -        {version = mkversion(2, 1, 0), func = upgrade_to_2_1_0, auto = true} +        {version = mkversion(2, 1, 0), func = upgrade_to_2_1_0, auto = true}, +        {version = mkversion(2, 1, 2), func = upgrade_to_2_1_2, auto = true}      }      for _, handler in ipairs(handlers) do diff --git a/test/box/ddl.result b/test/box/ddl.result index 3d6d07f43..bffa19a8c 100644 --- a/test/box/ddl.result +++ b/test/box/ddl.result @@ -350,7 +350,7 @@ box.space._collation:auto_increment{'test', 0, 'ICU', 42}  ...  box.space._collation:auto_increment{'test', 0, 'ICU', 'ru_RU', setmap{}} --ok  --- -- [4, 'test', 0, 'ICU', 'ru_RU', {}] +- [5, 'test', 0, 'ICU', 'ru_RU', {}]  ...  box.space._collation:auto_increment{'test', 0, 'ICU', 'ru_RU', setmap{}}  --- @@ -358,7 +358,7 @@ box.space._collation:auto_increment{'test', 0, 'ICU', 'ru_RU', setmap{}}  ...  box.space._collation.index.name:delete{'test'} -- ok  --- -- [4, 'test', 0, 'ICU', 'ru_RU', {}] +- [5, 'test', 0, 'ICU', 'ru_RU', {}]  ...  box.space._collation.index.name:delete{'nothing'} -- allowed  --- @@ -474,7 +474,7 @@ _ = box.space._collation.index.name:delete{'test'} -- ok  ...  box.space._collation:auto_increment{'test', 0, 'ICU', 'ru_RU', setmap{}}  --- -- [4, 'test', 0, 'ICU', 'ru_RU', {}] +- [5, 'test', 0, 'ICU', 'ru_RU', {}]  ...  box.space._collation:select{}  --- @@ -482,7 +482,8 @@ box.space._collation:select{}    - [1, 'unicode', 1, 'ICU', '', {}]    - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary'}]    - [3, 'binary', 1, 'BINARY', '', {}] -  - [4, 'test', 0, 'ICU', 'ru_RU', {}] +  - [4, 'unicode_ru_s2', 1, 'ICU', 'ru_RU', {'strength': 'secondary'}] +  - [5, 'test', 0, 'ICU', 'ru_RU', {}]  ...  test_run:cmd('restart server default')  box.space._collation:select{} @@ -491,11 +492,12 @@ box.space._collation:select{}    - [1, 'unicode', 1, 'ICU', '', {}]    - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary'}]    - [3, 'binary', 1, 'BINARY', '', {}] -  - [4, 'test', 0, 'ICU', 'ru_RU', {}] +  - [4, 'unicode_ru_s2', 1, 'ICU', 'ru_RU', {'strength': 'secondary'}] +  - [5, 'test', 0, 'ICU', 'ru_RU', {}]  ...  box.space._collation.index.name:delete{'test'}  --- -- [4, 'test', 0, 'ICU', 'ru_RU', {}] +- [5, 'test', 0, 'ICU', 'ru_RU', {}]  ...  --  -- gh-3290: expose ICU into Lua. It uses built-in collations, that diff --git a/test/box/net.box.result b/test/box/net.box.result index b800531b4..67bb29a39 100644 --- a/test/box/net.box.result +++ b/test/box/net.box.result @@ -2666,7 +2666,7 @@ c.space.test.index.sk.parts  ---  - - type: string      is_nullable: false -    collation_id: 4 +    collation_id: 5      fieldno: 1  ...  c:close() diff --git a/test/sql-tap/collation.test.lua b/test/sql-tap/collation.test.lua index 1e55b0092..cec980423 100755 --- a/test/sql-tap/collation.test.lua +++ b/test/sql-tap/collation.test.lua @@ -21,9 +21,10 @@ test:do_execsql_test(          1,"unicode",          2,"unicode_ci",          3,"binary", -        4,"unicode_numeric", -        5,"unicode_numeric_s2", -        6,"unicode_tur_s2" +        4,"unicode_ru_s2", +        5,"unicode_numeric", +        6,"unicode_numeric_s2", +        7,"unicode_tur_s2"      }  ) diff --git a/test/sql/collation.result b/test/sql/collation.result index daea35543..8337dde0e 100644 --- a/test/sql/collation.result +++ b/test/sql/collation.result @@ -427,3 +427,114 @@ box.space.T4A:drop()  box.space.T4B:drop()  ---  ... +-- +-- gh-4007 Feature request for a new collation +-- +-- Default unicode collation deals with russian letters +s = box.schema.space.create('t1') +--- +... +s:format({{name='s1', type='string', collation = 'unicode'}}) +--- +... +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode'}}}) +--- +... +s:insert{'Ё'} +--- +- ['Ё'] +... +s:insert{'Е'} +--- +- ['Е'] +... +s:insert{'ё'} +--- +- ['ё'] +... +s:insert{'е'} +--- +- ['е'] +... +-- all 4 letters are in the table +s:select{} +--- +- - ['е'] +  - ['Е'] +  - ['ё'] +  - ['Ё'] +... +s:drop() +--- +... +-- unicode_ci collation doesn't distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1') +--- +... +s:format({{name='s1', type='string', collation = 'unicode_ci'}}) +--- +... +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_ci'}}}) +--- +... +s:insert{'Ё'} +--- +- ['Ё'] +... +-- the following calls should fail +s:insert{'е'} +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +s:insert{'Е'} +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +s:insert{'ё'} +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +-- return single 'Ё' +s:select{} +--- +- - ['Ё'] +... +s:drop() +--- +... +-- unicode_s2 collation does distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1') +--- +... +s:format({{name='s1', type='string', collation = 'unicode_ru_s2'}}) +--- +... +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_ru_s2'}}}) +--- +... +s:insert{'Ё'} +--- +- ['Ё'] +... +s:insert{'е'} +--- +- ['е'] +... +-- the following calls should fail +s:insert{'Е'} +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +s:insert{'ё'} +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +-- return two: 'Ё' and 'е' +s:select{} +--- +- - ['е'] +  - ['Ё'] +... +s:drop() +--- +... diff --git a/test/sql/collation.test.lua b/test/sql/collation.test.lua index 713a9bd89..6f4ef08f3 100644 --- a/test/sql/collation.test.lua +++ b/test/sql/collation.test.lua @@ -172,3 +172,44 @@ box.sql.execute("SELECT a FROM t4b ORDER BY a || b")  box.space.T4A:drop()  box.space.T4B:drop() + +-- +-- gh-4007 Feature request for a new collation +-- +-- Default unicode collation deals with russian letters +s = box.schema.space.create('t1') +s:format({{name='s1', type='string', collation = 'unicode'}}) +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode'}}}) +s:insert{'Ё'} +s:insert{'Е'} +s:insert{'ё'} +s:insert{'е'} +-- all 4 letters are in the table +s:select{} +s:drop() + +-- unicode_ci collation doesn't distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1') +s:format({{name='s1', type='string', collation = 'unicode_ci'}}) +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_ci'}}}) +s:insert{'Ё'} +-- the following calls should fail +s:insert{'е'} +s:insert{'Е'} +s:insert{'ё'} +-- return single 'Ё' +s:select{} +s:drop() + +-- unicode_s2 collation does distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1') +s:format({{name='s1', type='string', collation = 'unicode_ru_s2'}}) +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_ru_s2'}}}) +s:insert{'Ё'} +s:insert{'е'} +-- the following calls should fail +s:insert{'Е'} +s:insert{'ё'} +-- return two: 'Ё' and 'е' +s:select{} +s:drop() -- 2.17.1