From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 2BDCB281CC for ; Fri, 22 Feb 2019 06:49:45 -0500 (EST) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UlEHh8YWrKaC for ; Fri, 22 Feb 2019 06:49:45 -0500 (EST) Received: from smtp29.i.mail.ru (smtp29.i.mail.ru [94.100.177.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 477262819C for ; Fri, 22 Feb 2019 06:49:44 -0500 (EST) From: Stanislav Zudin Subject: [tarantool-patches] [PATCH] Feature request for a new collation Date: Fri, 22 Feb 2019 14:49:39 +0300 Message-Id: <20190222114939.21764-1-szudin@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org, kostja@tarantool.org Cc: Stanislav Zudin Adds a new default collation 'unicode_s2' to support the difference between Cyrillic letters 'Е' and 'Ё'. The standard case insensitive collation ('unicode_ci') doesn't distinguish these letters. Closes #4007 --- Branch: https://github.com/tarantool/tarantool/tree/stanztt/gh-4007-new-default-collation Issue: https://github.com/tarantool/tarantool/issues/4007 src/box/bootstrap.snap | Bin 1527 -> 1561 bytes src/box/lua/upgrade.lua | 6 ++ test/box/collation.result | 160 ++++++++++++++++++++++++++++++++++++ test/box/collation.test.lua | 49 +++++++++++ 4 files changed, 215 insertions(+) create mode 100644 test/box/collation.result create mode 100644 test/box/collation.test.lua diff --git a/src/box/bootstrap.snap b/src/box/bootstrap.snap index ba2af079571bab8f073d56223a4028e34068c3f6..190eb63d57990a95e0d1d8b3b588c02ad98ff1bc 100644 GIT binary patch delta 1524 zcmVGW-wwkFgFTGZgX^DZewLSAT~5JGB;&5WGyl@Vlyo? zG&eUbIXE*jEj2MVWHn|nWjHf7HVRflY;R+0Iv{&}3JMAe3JMAe)w&D1%>|eM&V=|9 zZ>0bL0000ewJ-euP<<=__Po$Uw#zBO|*OD$y@m6g|-Ba$hly8yTV#Q?M&o*^@b zx4u>0*0a2AYt$vc)%unXik+8vgR<}Q3h&%)FS_{b!}h%^ZritRp6w8!AGTr-^~$C* z9}HPRybIpko_|01pZ#9zSN&%}SH`1#LMinz(uDwgIrM(tYb&^An*tD1-qyL2-&b>xs9f+%wtOaNe@zy`qicqdj@`w6- z*yh|8?bW%^Ijy+ttjI!T3Kj`hW~#K*xf8qFlA0D2@j7x>jh04BE?k{t=f8`1xvOkR zNeW`4t|PO^h{EMNS7mI<9;lg|Mle~#quLVnvD zeJOzRc-*dC8*Pn>2=Lp|p^CP;psM7#ho1ew$Eu!;SNUm1ERBW+aCMU6ZOJwl!`byh zkS4mg{)`irNn$y~{DNTUStS%C;Dg9k8Ed`+T%9EMB7#DlN}NcXMwmpHLYP35P9+mh z=>sW-UuPIC1i91LeqzAYNwRpqF#Q1(e9#^;oK2;vN%1)CDCy{OjBrrV4G{Gys|}h%%)Q)I zwn||gs>aXHj}=OTtCI}P-}*cpP;6cpTRaZW3wW{Ky)ioGxp75HHKS z>FD2m#`pJCNhho`x!~#~XKZT7XsF6KM*r?HF*)Il6G~wnsfCT!u&9}#k)bjxWK^V5 zq)?eG>C{?He-*@85$*wJPIJOmr8X!(3OzXx^#)n|42M=|i1tPw8`W7XazNt)?CSQT>wYXJT5`^*3Z6U^Zle;LyKM@8|KefuqG>KOc|*5nZa z+M$MYfU|o{yy)-a2n9n73^woJ{Ntew)m6WQueR}vpjHAN_zGff&1;kymV==X!6A;N zulC?UAt)pv3#v$STgU$SG8#cc9qDg06A^NMl;@fqt?FQ-s6W&dd6)c<&idIxlfY<2pl5Yw7Z&3BXx#ic0{V1d9sDFMA82Bkb|s zgkhjLqwQ9x@GwT5K8p)`G0m3@{E|0y2)Xf+&5`1Xh+f5$%%EC7N%RkP1XC6P6RzEc zTnf@gQfXdkud(eY+i}*a_-2FU`BSPv634iVF3Ld^QhePtqa!xh%|ewD_-##$hD0#p afA;>!*~QQbXX1#4Lj<_D_`^HZ5UuSwvD(=H delta 1517 zcmVk|{VlgSRsQg{`)!<1)oS+mEf$3Fi#Slv1|gtLgwIwy+j!EWXOG`L^j` z&mO6<4#aVFl0S;rtcW3_|AijTy+&glh^v#V1!xZO)<4#QP_9n$hx&Zj=G+$T)veGu zrL61_fx_ccBAOx-MWxQ2lVtIJVfuHsB{i)C@p4y;XVmbe1Y-#l6z>v`%p(6uaon!~^SiboEY!cd1fUA?t_rWl{0X^5Vyz7-* zU%tznn{i8~fAu`$n)rOrhUD))e%l*mA%OFE+^$_4ZHs}yy&#(Uc zE4wk09w16+%>%AZvV4BM+%>m1I^%$=lkBq>-yQUZGs^1w+5scS&7dco-C4 z_fXEogQ1>z7`*8sjaFD^YLRj4yCtP8xH`!bD-}z0ZBT~Bg{R@hj>mD);3km<=SSAq zqshWv__^T zW+nxVK@oEzWg=A~MdC%P0f8i_!E9rwL`D#R00ck~01*a7DaeW*5`f@1jDt7|U=Rpo z0E*%o1^^&9F%=X+)Pey71#G&fJr9@!MIH75s?cK(BpG&MF?&oPAS)09>+57YV}EpR z0Y;q8fY?|&Ja7zXj|u8lf}e@Xis4mV<`5Z44+zMu1O2@TDf z;>caHiAY41EK&qhO`ub0Bg=lfS@jbMWc2jfd!)h1CYa-6#H#C$7ZF!>?{suhjo?2` zlSc*6T18TPZujtg(fdc-3x-%1l)NJ%#`7ywSL8WgPth-eS~$=7+7#WeRDV$pTFz}D z+(I0qANIj)A&4g-6&;Y~mJYA8Wu$^EJJR21B|+Bf5)E8gn%>CDE?$4y%lV zFwK^Nv^i0lN!ndxKa}aX)~fwx#qyR)`Qn7hl7agmJSqM*n0X#G=x1R~353;u(Ck9c T3OaEP#UVJ^Tm1GN)ex=iFYwYr diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua index ab705e978..a28b93ada 100644 --- a/src/box/lua/upgrade.lua +++ b/src/box/lua/upgrade.lua @@ -998,9 +998,15 @@ local function create_vinyl_deferred_delete_space() 'blackhole', 0, {group_id = 1}, format} end +local function create_default_collation_s2() + log.info("create predefined collation") + box.space._collation:replace{3, "unicode_s2", ADMIN, "ICU", "ru_RU", {strength='secondary'}} +end + local function upgrade_to_1_10_2() upgrade_priv_to_1_10_2() create_vinyl_deferred_delete_space() + create_default_collation_s2() end local function get_version() diff --git a/test/box/collation.result b/test/box/collation.result new file mode 100644 index 000000000..2dbb43c31 --- /dev/null +++ b/test/box/collation.result @@ -0,0 +1,160 @@ +env = require('test_run') +--- +... +test_run = env.new() +--- +... +-- +-- gh-4007 Feature request for a new collation +-- +-- Ensure all default collations exist +box.space._collation.index.name:get{'unicode'}; +--- +- [1, 'unicode', 1, 'ICU', '', {}] +... +box.space._collation.index.name:get{'unicode_ci'}; +--- +- [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary'}] +... +box.space._collation.index.name:get{'unicode_s2'}; +--- +- [3, 'unicode_s2', 1, 'ICU', 'ru_RU', {'strength': 'secondary'}] +... +-- Default unicode collation deals with russian letters +s = box.schema.space.create('t1'); +--- +... +s:format({{name='s1', type='string', collation = 'unicode'}}); +--- +... +s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode'}}}); +--- +- unique: true + parts: + - type: string + is_nullable: false + collation: unicode + fieldno: 1 + id: 0 + space_id: 512 + name: pk + type: TREE +... +s:insert{'Ё'}; +--- +- ['Ё'] +... +s:insert{'Е'}; +--- +- ['Е'] +... +s:insert{'ё'}; +--- +- ['ё'] +... +s:insert{'е'}; +--- +- ['е'] +... +-- all 4 letters are in the table +s:select{}; +--- +- - ['е'] + - ['Е'] + - ['ё'] + - ['Ё'] +... +s:drop(); +--- +... +-- unicode_ci collation doesn't distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1'); +--- +... +s:format({{name='s1', type='string', collation = 'unicode_ci'}}); +--- +... +s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_ci'}}}); +--- +- unique: true + parts: + - type: string + is_nullable: false + collation: unicode_ci + fieldno: 1 + id: 0 + space_id: 513 + name: pk + type: TREE +... +s:insert{'Ё'}; +--- +- ['Ё'] +... +-- the following calls should fail +s:insert{'е'}; +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +s:insert{'Е'}; +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +s:insert{'ё'}; +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +-- return single 'Ё' +s:select{}; +--- +- - ['Ё'] +... +s:drop(); +--- +... +-- unicode_s2 collation does distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1'); +--- +... +s:format({{name='s1', type='string', collation = 'unicode_s2'}}); +--- +... +s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_s2'}}}); +--- +- unique: true + parts: + - type: string + is_nullable: false + collation: unicode_s2 + fieldno: 1 + id: 0 + space_id: 514 + name: pk + type: TREE +... +s:insert{'Ё'}; +--- +- ['Ё'] +... +s:insert{'е'}; +--- +- ['е'] +... +-- the following calls should fail +s:insert{'Е'}; +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +s:insert{'ё'}; +--- +- error: Duplicate key exists in unique index 'pk' in space 't1' +... +-- return two: 'Ё' and 'е' +s:select{}; +--- +- - ['е'] + - ['Ё'] +... +s:drop(); +--- +... diff --git a/test/box/collation.test.lua b/test/box/collation.test.lua new file mode 100644 index 000000000..4cd24e64c --- /dev/null +++ b/test/box/collation.test.lua @@ -0,0 +1,49 @@ +env = require('test_run') +test_run = env.new() + +-- +-- gh-4007 Feature request for a new collation +-- +-- Ensure all default collations exist +box.space._collation.index.name:get{'unicode'}; +box.space._collation.index.name:get{'unicode_ci'}; +box.space._collation.index.name:get{'unicode_s2'}; + +-- Default unicode collation deals with russian letters +s = box.schema.space.create('t1'); +s:format({{name='s1', type='string', collation = 'unicode'}}); +s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode'}}}); +s:insert{'Ё'}; +s:insert{'Е'}; +s:insert{'ё'}; +s:insert{'е'}; +-- all 4 letters are in the table +s:select{}; +s:drop(); + +-- unicode_ci collation doesn't distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1'); +s:format({{name='s1', type='string', collation = 'unicode_ci'}}); +s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_ci'}}}); +s:insert{'Ё'}; +-- the following calls should fail +s:insert{'е'}; +s:insert{'Е'}; +s:insert{'ё'}; +-- return single 'Ё' +s:select{}; +s:drop(); + +-- unicode_s2 collation does distinguish russian letters 'Е' and 'Ё' +s = box.schema.space.create('t1'); +s:format({{name='s1', type='string', collation = 'unicode_s2'}}); +s:create_index('pk', {unique = true, type='tree', parts={{'s1', collation = 'unicode_s2'}}}); +s:insert{'Ё'}; +s:insert{'е'}; +-- the following calls should fail +s:insert{'Е'}; +s:insert{'ё'}; +-- return two: 'Ё' and 'е' +s:select{}; +s:drop(); + -- 2.17.1