From: Stanislav Zudin <szudin@tarantool.org> To: tarantool-patches@freelists.org, kostja@tarantool.org, vdavydov.dev@gmail.com Cc: Stanislav Zudin <szudin@tarantool.org> Subject: [PATCH v2] Feature request for a new collation Date: Tue, 26 Feb 2019 13:40:42 +0300 [thread overview] Message-ID: <20190226104042.28149-1-szudin@tarantool.org> (raw) Adds a new default collation 'unicode_s2' to support the difference between Cyrillic letters 'Е' and 'Ё'. The standard case insensitive collation ('unicode_ci') doesn't distinguish these letters. Closes #4007 --- Branch: https://github.com/tarantool/tarantool/tree/stanztt/gh-4007-new-default-collation-2.1 Issue: https://github.com/tarantool/tarantool/issues/4007 src/box/bootstrap.snap | Bin 1831 -> 1864 bytes src/box/lua/upgrade.lua | 1 + test/sql-tap/collation.test.lua | 7 +- test/sql/collation.result | 111 ++++++++++++++++++++++++++++++++ test/sql/collation.test.lua | 41 ++++++++++++ 5 files changed, 157 insertions(+), 3 deletions(-) diff --git a/src/box/bootstrap.snap b/src/box/bootstrap.snap index 0bb446fb6903ac3ef630c419b909f7db3df0372a..1b590939f1a9ca95cc93745b81a0271643055b21 100644 GIT binary patch delta 1860 zcmV-K2fO&E4#*CW8Gkb|I4x&pGBIUjVP#}63Q2BrbYX5|WjY`@I5=cDVl-wgIbvZk zEi^Y`W-Vc6IW#S1IWaIcH#K88V=`t6RzqxWV{1AfdmtbnARr)p3JTS_3%bn)$N<h` z@9{dN0000004TLD{Qyw?H2|7Oj1NiR7^eU*%rIlf7&At2IDeps#-Ld**&@FN<E?;T zjEO{1U^gODl9iIaG?(lz^0bn%T^oR1jGiyH=bLhRt!YE*qWbbWGg_Ha3f}<Q0O<fP z=KCUH`p!n!ADOW<-kq01F(ebHyWT}!*S9=wYxHHn-TRWCL$Q;Z=h^p~g}D}OAHq8A z!}h%!a@)JErhf?$(GS~D1UJhTG(Ur_Bi_xb+xO4^+3)xM)qgg_G=5nrlu{%mok_r_ zLGSskCx;a``GH#_mjJMRHCB}z0Itpvwyz8Cu-57<{kwBp5caH5b?IRdV@Cf2J*>qV zE$T`fSLcX?$&Va5HDatN>1$BaW>Ht->KrGn8pM46tACb+a&?YL6zRhj=(@^S_d@5q z@*=cF3z954GFWu1xYQ{W*W=~QcJGa-z@vY#>$_RC7v7;IY080YKNnFYuFkR5#Vgy- zMN86CgCw37S1(&q8wOmRqu8G(@wB+IB&}Ic^s+VDz3dy3cvb}TpT@SNGzylytP!0_ z(C=rTP=6T&T%Du7pFvQSJ=nLr>ydrm&PkxFv1_)U^<Cpy_?^?1=ofvS+Z&BJ@bmb^ zVZ8RTH40;(U)Kg#CF|xD*|i9I_REh~M;f#Amr86&O$E3*$MCvlyoVs|dSEU?Sl_RI zK(a}Z264YRgPt{_B?4TXW0lXUo3p4b$43D+BY%c$MAU?+0m0Qd4%K|<P_gbr;~_?K zGt*&)!_0;jzbzRJF*Y$ZF&SVmz+8Z_P^+l|jSW$uCJLsG(h?0<=UAFBV?$M5^ubyz zgZ_7?%@>MP)X~gvb&f|F^o;HoiW<DNynbCI$4C124R2q@EpJKJ`0ma%=-DGH`-FZn zTz{RT>+<r`w5(@7uV+OehQxGE1N!hRtNIMpZ|64qaBD<`Ncg_r{rXv!V<$&MQ0R^b zT%98s`+hww?rv|C2ZpP2Bx14f?$$<kOkAC#NQ-qYX3yGa4ulupA<hG?&XM&Qzwi!k zHwP0us><PR<tc(=QABKpbV6o=P#7RwoquC3Q6#edpc=&8)8fijsjN%+NJ>h&LVIv^ zj;8s1{|jr;#)GSKbg7fdx>V2pP~;??u*1VS{Cz|@9iPEX^Ke#)VM(j3JLS;W^<9!u z99*3v3f8I>!g$t^@xt4H1IYitYV=bm<L6(k*A&!k@ebi_dAFebT%>jWB8#L`)_<LD zaCMG4Ju_w$Txoo>pNls)K%goLrLwNZ2K{Pqb&ekm05PH=Gh#wy*eK7A7(6?42gc3| zn&Tp;1r7_G6*xMvvZiuoN(C1qN~J)dK%GFDIF&fzoKiL<0)cU;;ap^>Mn({T00lr0 z0MP{pG3bg05@4`Ej-xP+ffxuw5PymS8wWrD3_=Pth*~g0>jrfT|AK-%Fwi^??L~cP zFRJeV2h3G7q|)?g1ltBdbA9g{k3fJTifkx)20uq8G4^{(SsZ(Gnp9X;6y_zYPz{;9 z+2QQ9dU_K$iZf%@jB7X8^NgkWg#HYZ?}9K{vvYnc<gkTve6-tGQ&d9GF@LX%DvmDT zLdd}HsIc-7z0uo)RUzt_IuKj^J*u8)9)UaY=BYEbU<eFaLTJQx)ok8^3yF0PtvyNf zI{K`r+kQtw*aLb)zOgsVyZQ*5Juuj*`!o>HW#Gwh({vi*9x@nT51bR*BsZazTL%mT zY9MG~C*UDoHFPbHZ++7cwts-$kZ<e_^Oi3@K%3?L`bzN9MVN-r4O=6a`vFR@c-Lp- zd`kqWq|crheFPM+#PxC@i}2K{q9y;A+iE^tb}1Gw)4<oCX)a@;jfpZ2K})|FrOc{y zAwn()`AS>SPv>D<5Ps?DTHW?JP<tK}yV@qJ);&K*7RUT3vOaNgD1Xy1(y~46#?%7X zpCqNC`nU`~UIN$%*XeQ3xmzzp4qOd#;<bC<(hf_Gr^KlSWP*Lkq+0~FjM)soz(U(P z6(ug@oG0kkiI;@Qf7(KDo<c@qBGGT%%L>X|L0CEZyU<GH$ly`vSKX1SYk3=WYko61 zq}rxnjOk#W`uhr;G=C)nr(&FTY{Ot@I&#i<$AF?|;5#8YG30KoT%SY+oMEU_0GGvw zV!_gndReldjKY9^IQRvceH3K<B!|1~#JJE%@2xbKf!*ZQR%HLjSwSqKkLV?yBnDNK zK-xRGo3g|6`H8BMw8_|t2p(*RIa=AM8-`>kNnU+NJY&t(k3C*s4KCZ?&pU)5m~c9` y8Ijm@$x<vOpc~cTd((}Qsc`RP7$KBCnYE(>A_cy5E1a9;kYKJ_ata>R5UuSHt$oD+ delta 1827 zcmV+;2i*9`4yO)~8GkZ0H7#c}H)3WoGdVK~Np5p=VQyn(Iv`^*Gd5&pFkvk=W@R@m zG%zzYEjT!4F)d?aGh{VnH8Wx{HDL-?Lu_wjYdRo%eF_TIx(m9^2Ce|k)s!0pr2qf` z001bpFZ}>eEj0jYM??=v;25U>Fw8KA5aR@KMG#m>L*QW(A%7yj2IH-OV2p`GQD8SB zQ<9aEzBH}eU*u^eqq>gJE=JE6+w)C1z1FnIIjXms%tRqmN&(pb)d1!IBbLvpo3p4b z$3!pY+ag{1#zxmGiG3RH&dZ<IlG)Q)&mu4DR~@%A`ZD0|eaX+E*h$TE%zOP|u0`61 zt}gSi{qBaG_J6FaWj{gG!!{Jb{n32pXK<yYyE#?+{`o!Y{ob#7&t{j#En9_BTGWIi z32^PJajM(`Y;}&VeOY*ivsM4}@6KsHxU)kQfDaclV)Q+*!&;nCA#Sv7b&gA(2-$&i zV@8XTz6P~y3UQ;Y&T-PJLCp8JW=!Z-=h%f7dD!+`R)6{8Ug(@!UVgZAS1>>+ZNbR` zpw6Eex9|>YH{XtzHQT#Sk^+hT!LINAW-Yu!OH$Mbwx5e92V0%vs*4x4pNp2HrUpzp zEvj0!q%;h+I!CcTXVPg=Wl1{oBGj@o$}QM8Bymir=RJ*WNo5o`Y1ttflb~PEN}&`G zY;}(Met!nRRPJ2A>TXB&eLE+6uEwm`{_nfSv+z5YIngifJhwFpbAab@i@W%&WoNWx zpkLMoRpocHitJhhJL~1gDk6<P{bdtdQc{7f&M~~K81Eq{yB(Oz(AD>A9*~40q(R(o z&R}PU=!n2pXC@OgC1^+h(Ttc8F%x2|a|D_>aet;42STRiCG#Q1LrjMl4l90F6lh>( zU^c*LC}k*NC|#(PWP!ebu22$3lA&}|VXJc-#mpBFmKS%h7E7Q1-DUHI771}O6t+4? zq6~IJ*9%1r)>d7=ERtg){riSBFXL6`q-%V4=Njy+5|x!gJ(#V|k#%|LX<62>o!2p; z5r0M|bS~@ha4e^K?9^}PHt%pdM2kr9zTf@&Kg)3xqM#|XM`^2bBwycer^VfEjp~rL zI>(^z!@FA=&55wpIf}AZ=VI<Gjov_L;T_sM*y<d=4;51M-fT#!j<q>*GD1^?BN8)& zmV|6|j&($f$Qp!d5cf`tDq5v54h5u3mwzr==#Q<=ku;z0Z(%Lcd~9`&0P3VL4%K6S zC~}fk;L+h6{wAW^j?dthbvUQQu%s2nnRIB(@-9hfj;+p7#iv?@E}n5{#OONc81g)> z8{HJr;Q7|7W#x2N{6lzGUCn1d7iFEl_>pwNI1`So&Jo8)jTi-08rSUSVogmCn17-| zDU6%BK|LE=ougMHz$?fIjaLyJ8lBN2M@I(DiID??-n_VRVbj6}$L8i$RuqegTCu%? zP%2I-PAE<%Og1J|fo-VqTx6&OMi77i1war0(FF%F=!ymsV6Z@pqcDzv7zjfUiXs~a zKmZIv3N(mXFhZ*abuTo5t}r0bJb#VtMSW;5s(;`EhOL>ZG@V_Dw&p-8+e?Qb5ZH(U z8w&cNkRT-mlY5E*j$JyRR7_U;;{~ix4OzC?q3X4Ia+*1cr!ninwax4~%hK_LeTB&v zLjVBRs^4;QWQeYPv}dd-Dk13b>Y^0K&07c!{KzY;+^{zwba*L5k*TZ4R)2A5)rMV5 z1W#@}b)GGRnL+;xjWVv<D$TnPrhAM5C5b{upNYCzXQ?ndZ_JlrcixyU53||%^AMcJ zXn-!gCygMQX9hfEOg&wHPV7l;!Y8+w%G6)<X<;+(Ard$AAApCZBTbsYJgW!(nR&|J zKW8&<KWHVG1Y*2nb*s||dVfD!iR^d$j8txkAl2j9AER4<f-G5IreqOrTQ9UL|8iT+ z`jed*i{oi{_|G(#F*%Eg@(e*se+*q_EnS!*7j%53&FC-ksx8Q~^uVod^Bkx>ON?E8 zB&%*d-&__)6H<((ZgQw$7zx{+W69LE*&h?7B7a<sAC~}Z0mJEG=YKq-7ghsTx}5mh zm2YW>?Z;EnR68??zGSv7f?8y1z%Mw_woZ)_#d6LLT2tbG!rMPqA;^9qOVp6)w)Uk3 zWj-M@9sOHqC33{?D9#~wr0H7TN8Lc)Ob&^*Dez)Cm?wW<i_>jj;1n9Cw{1vsrpxC{ z>={t>jCkioCWhRtm4EX|*239%3IO;oK41!#Ues%p1qCUL?4u+<m}YW<L(kyhjyf?G zTJAly-7>J7ywZd0|9Dm~is&PHOHUGW6(y5)oZ3z2@GO5Ksw8hRS@Hb?aZr+0cGNIK z43t=2$s?w>#u23tKrzBb1(Hfa^nV<p45fgF?oxj_`)Ep_UK{`xaiyL&vvbKES-e0i R9JX=Dom4Gm<qp*lt?kA^T~Gi3 diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua index 70cfb4f2e..84c559dac 100644 --- a/src/box/lua/upgrade.lua +++ b/src/box/lua/upgrade.lua @@ -610,6 +610,7 @@ local function upgrade_to_2_1_0() box.space._collation:replace{0, "none", ADMIN, "BINARY", "", setmap{}} box.space._collation:replace{3, "binary", ADMIN, "BINARY", "", setmap{}} + box.space._collation:replace{4, "unicode_s2", ADMIN, "ICU", "ru_RU", {strength='secondary'}} upgrade_priv_to_2_1_0() end diff --git a/test/sql-tap/collation.test.lua b/test/sql-tap/collation.test.lua index 1e55b0092..b8bc02317 100755 --- a/test/sql-tap/collation.test.lua +++ b/test/sql-tap/collation.test.lua @@ -21,9 +21,10 @@ test:do_execsql_test( 1,"unicode", 2,"unicode_ci", 3,"binary", - 4,"unicode_numeric", - 5,"unicode_numeric_s2", - 6,"unicode_tur_s2" + 4,"unicode_s2", + 5,"unicode_numeric", + 6,"unicode_numeric_s2", + 7,"unicode_tur_s2" } ) diff --git a/test/sql/collation.result b/test/sql/collation.result index daea35543..7697f03d4 100644 --- a/test/sql/collation.result +++ b/test/sql/collation.result @@ -427,3 +427,114 @@ box.space.T4A:drop() box.space.T4B:drop() --- ... +-- +-- gh-4007 Feature request for a new collation +-- +-- Default unicode collation deals with russian letters +s = box.schema.space.create('t1') +--- +... +s:format({{name='s1', type='string', collation = 'unicode'}}) +--- +... +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode'}}}) +--- +... +s:insert{'Ё'} +--- +- ['Ё'] +... +s:insert{'Е'} +--- +- ['Е'] +... +s:insert{'ё'} +--- +- ['ё'] +... +s:insert{'е'} +--- +- ['е'] +... +-- all 4 letters are in the table +s:select{} +--- +- - ['е'] + - ['Е'] + - ['ё'] + - ['Ё'] +... +s:drop() +--- +... +-- unicode_ci collation doesn't distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1') +--- +... +s:format({{name='s1', type='string', collation = 'unicode_ci'}}) +--- +... +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_ci'}}}) +--- +... +s:insert{'Ё'} +--- +- ['Ё'] +... +-- the following calls should fail +s:insert{'е'} +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +s:insert{'Е'} +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +s:insert{'ё'} +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +-- return single 'Ё' +s:select{} +--- +- - ['Ё'] +... +s:drop() +--- +... +-- unicode_s2 collation does distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1') +--- +... +s:format({{name='s1', type='string', collation = 'unicode_s2'}}) +--- +... +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_s2'}}}) +--- +... +s:insert{'Ё'} +--- +- ['Ё'] +... +s:insert{'е'} +--- +- ['е'] +... +-- the following calls should fail +s:insert{'Е'} +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +s:insert{'ё'} +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +-- return two: 'Ё' and 'е' +s:select{} +--- +- - ['е'] + - ['Ё'] +... +s:drop() +--- +... diff --git a/test/sql/collation.test.lua b/test/sql/collation.test.lua index 713a9bd89..e125274ef 100644 --- a/test/sql/collation.test.lua +++ b/test/sql/collation.test.lua @@ -172,3 +172,44 @@ box.sql.execute("SELECT a FROM t4b ORDER BY a || b") box.space.T4A:drop() box.space.T4B:drop() + +-- +-- gh-4007 Feature request for a new collation +-- +-- Default unicode collation deals with russian letters +s = box.schema.space.create('t1') +s:format({{name='s1', type='string', collation = 'unicode'}}) +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode'}}}) +s:insert{'Ё'} +s:insert{'Е'} +s:insert{'ё'} +s:insert{'е'} +-- all 4 letters are in the table +s:select{} +s:drop() + +-- unicode_ci collation doesn't distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1') +s:format({{name='s1', type='string', collation = 'unicode_ci'}}) +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_ci'}}}) +s:insert{'Ё'} +-- the following calls should fail +s:insert{'е'} +s:insert{'Е'} +s:insert{'ё'} +-- return single 'Ё' +s:select{} +s:drop() + +-- unicode_s2 collation does distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1') +s:format({{name='s1', type='string', collation = 'unicode_s2'}}) +idx = s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_s2'}}}) +s:insert{'Ё'} +s:insert{'е'} +-- the following calls should fail +s:insert{'Е'} +s:insert{'ё'} +-- return two: 'Ё' and 'е' +s:select{} +s:drop() -- 2.17.1
next reply other threads:[~2019-02-26 10:40 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-02-26 10:40 Stanislav Zudin [this message] 2019-02-26 13:52 ` Vladimir Davydov 2019-02-28 12:14 ` [tarantool-patches] " Stanislav Zudin 2019-02-28 15:46 ` Vladimir Davydov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190226104042.28149-1-szudin@tarantool.org \ --to=szudin@tarantool.org \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --cc=vdavydov.dev@gmail.com \ --subject='Re: [PATCH v2] Feature request for a new collation' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox