[tarantool-patches] [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL

Tarantool development patches archive
 help / color / mirror / Atom feed

* [tarantool-patches] [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL
@ 2018-11-13  0:07 Nikita Pettik
  2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 1/3] sql: do not add explicit <COLLATE "BINARY"> clause Nikita Pettik
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Nikita Pettik @ 2018-11-13  0:07 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, Nikita Pettik

This patch-set introduces rules for collation compatibility in order to
restrict usage of different collations within one comparison.
Originally, SQLite uses doubtful approach: if collations of LHS and RHS
can not be used together, then collation of LHS is chosen.  Such
behaviour quite misleading and results in the fact that swap of LHS and
RHS values may change result of operation.  We've decided to add
restrictions on collations of LHS and RHS to disallow messing different
collations as it is stated in ANSI SQL.

Changes in v2:

Instead of using surrogate ids for "binary" and "none" collations,
now real ones are added to _collation space. They also come with
new collation type - binary. It simply uses memcmp as a collation
rule. For details see second patch in series.

Minor fixes:

- Make collations_are_compatible() be collations_check_compatibility()
  and process diag_set() right there;
- Clean-up debug asserts.

Nikita Pettik (3):
  sql: do not add explicit <COLLATE "BINARY"> clause
  Introduce "none" and "binary" collations
  sql: change collation compatibility rules

 src/box/alter.cc                  |  13 ++++-
 src/box/bootstrap.snap            | Bin 1888 -> 1911 bytes
 src/box/coll_id.c                 |   1 -
 src/box/errcode.h                 |   2 +
 src/box/field_def.c               |   2 +-
 src/box/key_def.c                 |  18 +++---
 src/box/key_def.h                 |   6 --
 src/box/lua/space.cc              |   2 +-
 src/box/lua/upgrade.lua           |   2 +
 src/box/sql.c                     |  10 ++--
 src/box/sql/build.c               |   2 +-
 src/box/sql/callback.c            |  19 +++++--
 src/box/sql/expr.c                |  73 ++++++++++++++++++------
 src/box/sql/fkey.c                |   2 -
 src/box/sql/func.c                |   2 +-
 src/box/sql/select.c              | 117 ++++++++++++++++++++------------------
 src/box/sql/sqliteInt.h           |  28 ++++++++-
 src/box/sql/where.c               |  15 +++--
 src/box/sql/whereexpr.c           |  38 ++++++-------
 src/box/tuple_format.c            |   4 +-
 src/coll.c                        |  52 ++++++++++++++---
 src/coll_def.c                    |   2 +-
 src/coll_def.h                    |   1 +
 test/box/ddl.result               |  20 ++++---
 test/box/misc.result              |   2 +
 test/box/net.box.result           |   2 +-
 test/sql-tap/collation.test.lua   |  12 ++--
 test/sql-tap/distinct.test.lua    |   2 +-
 test/sql-tap/e_select1.test.lua   |  12 ++--
 test/sql-tap/in3.test.lua         |   2 +-
 test/sql-tap/in4.test.lua         |   2 +-
 test/sql-tap/index3.test.lua      |   2 +-
 test/sql-tap/join.test.lua        |   2 +-
 test/sql-tap/tkt3493.test.lua     |   4 +-
 test/sql-tap/transitive1.test.lua |   4 +-
 test/sql-tap/where2.test.lua      |   2 +-
 test/sql-tap/with1.test.lua       |   2 +-
 test/sql/collation.result         |  90 +++++++++++++++++++++++++++++
 test/sql/collation.test.lua       |  36 ++++++++++++
 39 files changed, 427 insertions(+), 180 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH v2 1/3] sql: do not add explicit <COLLATE "BINARY"> clause
  2018-11-13  0:07 [tarantool-patches] [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL Nikita Pettik
@ 2018-11-13  0:07 ` Nikita Pettik
  2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 2/3] Introduce "none" and "binary" collations Nikita Pettik
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Nikita Pettik @ 2018-11-13  0:07 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, Nikita Pettik

We don't need to add explicit COLLATE "BINARY" clause since binary
collation is anyway default. On the other hand, it may confuse
due to ambiugty when comparing two terms.

Needed for #3185
---
 src/box/sql/fkey.c      |  2 --
 src/box/sql/select.c    |  4 ----
 src/box/sql/whereexpr.c | 25 ++++++++++++-------------
 3 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/src/box/sql/fkey.c b/src/box/sql/fkey.c
index fc39c879e..4e3270f0c 100644
--- a/src/box/sql/fkey.c
+++ b/src/box/sql/fkey.c
@@ -313,8 +313,6 @@ exprTableRegister(Parse * pParse,	/* Parsing and code generating context */
 			pExpr->iTable = regBase + iCol + 1;
 			char affinity = pTab->def->fields[iCol].affinity;
 			pExpr->affinity = affinity;
-			pExpr = sqlite3ExprAddCollateString(pParse, pExpr,
-							    "binary");
 		} else {
 			pExpr->iTable = regBase;
 			pExpr->affinity = AFFINITY_INTEGER;
diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index 16b1044d6..dfa6ed8e0 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -2238,10 +2238,6 @@ sql_multiselect_orderby_to_key_info(struct Parse *parse, struct Select *s,
 				order_by->a[i].pExpr =
 					sqlite3ExprAddCollateString(parse, term,
 								    name);
-			} else {
-				order_by->a[i].pExpr =
-					sqlite3ExprAddCollateString(parse, term,
-								    "BINARY");
 			}
 		}
 		part->coll_id = id;
diff --git a/src/box/sql/whereexpr.c b/src/box/sql/whereexpr.c
index 612868695..9fa6ce15d 100644
--- a/src/box/sql/whereexpr.c
+++ b/src/box/sql/whereexpr.c
@@ -1133,7 +1133,6 @@ exprAnalyze(SrcList * pSrc,	/* the FROM clause */
 		Expr *pNewExpr2;
 		int idxNew1;
 		int idxNew2;
-		const char *zCollSeqName;	/* Name of collating sequence */
 		const u16 wtFlags = TERM_LIKEOPT | TERM_VIRTUAL | TERM_DYNAMIC;
 
 		pLeft = pExpr->x.pList->a[1].pExpr;
@@ -1171,24 +1170,24 @@ exprAnalyze(SrcList * pSrc,	/* the FROM clause */
 			}
 			*pC = c + 1;
 		}
-		/* Support only for unicode_ci indexes by now */
-		zCollSeqName = noCase ? "unicode_ci" : "BINARY";
 		pNewExpr1 = sqlite3ExprDup(db, pLeft, 0);
-		pNewExpr1 = sqlite3PExpr(pParse, TK_GE,
-					 sqlite3ExprAddCollateString(pParse,
-								     pNewExpr1,
-								     zCollSeqName),
-					 pStr1);
+		if (noCase) {
+			pNewExpr1 =
+				sqlite3ExprAddCollateString(pParse, pNewExpr1,
+							    "unicode_ci");
+		}
+		pNewExpr1 = sqlite3PExpr(pParse, TK_GE, pNewExpr1, pStr1);
 		transferJoinMarkings(pNewExpr1, pExpr);
 		idxNew1 = whereClauseInsert(pWC, pNewExpr1, wtFlags);
 		testcase(idxNew1 == 0);
 		exprAnalyze(pSrc, pWC, idxNew1);
 		pNewExpr2 = sqlite3ExprDup(db, pLeft, 0);
-		pNewExpr2 = sqlite3PExpr(pParse, TK_LT,
-					 sqlite3ExprAddCollateString(pParse,
-								     pNewExpr2,
-								     zCollSeqName),
-					 pStr2);
+		if (noCase) {
+			pNewExpr2 =
+				sqlite3ExprAddCollateString(pParse, pNewExpr2,
+							    "unicode_ci");
+		}
+		pNewExpr2 = sqlite3PExpr(pParse, TK_LT, pNewExpr2, pStr2);
 		transferJoinMarkings(pNewExpr2, pExpr);
 		idxNew2 = whereClauseInsert(pWC, pNewExpr2, wtFlags);
 		testcase(idxNew2 == 0);
-- 
2.15.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH v2 2/3] Introduce "none" and "binary" collations
  2018-11-13  0:07 [tarantool-patches] [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL Nikita Pettik
  2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 1/3] sql: do not add explicit <COLLATE "BINARY"> clause Nikita Pettik
@ 2018-11-13  0:07 ` Nikita Pettik
  2018-11-13 12:02   ` [tarantool-patches] " Vladislav Shpilevoy
  2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 3/3] sql: change collation compatibility rules Nikita Pettik
  2018-11-15 11:24 ` [tarantool-patches] Re: [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL Kirill Yukhin
  3 siblings, 1 reply; 12+ messages in thread
From: Nikita Pettik @ 2018-11-13  0:07 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, Nikita Pettik

This patch introduces two new collation sequences: "none" and "binary".
Despite the fact that they use the same comparing algorithm (simple
byte-by-byte comparison), they don't mean the same. "binary" collation
get to the format if user explicitly points it: either specifies this
collation in space format manually or adds <COLLATE BINARY> clause to
column definition within CREATE TABLE statement. "none" collation is
used when user doesn't specify any collation at all. "none" collation always
comes with id == 0 and it can't be changed (since its id vastly used
under the hood as an indicator of absence of collation).
Difference between these collations is vital for ANSI SQL: mixing
"binary" with other collations is prohibited, meanwhile "none" collation
can be used alongside with others.
In this respect current patch extends list of available collations: now
not only ICU collations are allowed, but also BINARY.

Note, that in SQL some queries have changed their query plan. That
occurred due to the fact that our parser allows using <COLLATE> clause
with numeric fields:

CREATE TABLE (id INT PRIMARY KEY);
SELECT id COLLATE "binary" ...

In the example collation of LHS (id column) is NULL, but collation of
RHS is "binary". Before this patch both collations were NULL. Hence,
usage of certain indexes may not be allowed by query planner.
On the other hand, this feature is obviously broken, so that doesn't
seem to be big deal.

Needed for #3185
---
 src/box/alter.cc                |  13 ++++++++--
 src/box/bootstrap.snap          | Bin 1888 -> 1911 bytes
 src/box/coll_id.c               |   1 -
 src/box/errcode.h               |   1 +
 src/box/field_def.c             |   2 +-
 src/box/key_def.c               |  18 +++++++-------
 src/box/key_def.h               |   6 -----
 src/box/lua/space.cc            |   2 +-
 src/box/lua/upgrade.lua         |   2 ++
 src/box/sql.c                   |  10 ++++----
 src/box/sql/build.c             |   2 +-
 src/box/sql/callback.c          |  19 +++++++++++----
 src/box/sql/expr.c              |   2 +-
 src/box/sql/func.c              |   2 +-
 src/box/sql/select.c            |   8 +++----
 src/box/sql/where.c             |   2 +-
 src/box/tuple_format.c          |   4 ++--
 src/coll.c                      |  52 +++++++++++++++++++++++++++++++++-------
 src/coll_def.c                  |   2 +-
 src/coll_def.h                  |   1 +
 test/box/ddl.result             |  20 +++++++++-------
 test/box/misc.result            |   1 +
 test/box/net.box.result         |   2 +-
 test/sql-tap/collation.test.lua |  12 ++++++----
 test/sql-tap/distinct.test.lua  |   2 +-
 test/sql-tap/in3.test.lua       |   2 +-
 test/sql-tap/index3.test.lua    |   2 +-
 test/sql-tap/where2.test.lua    |   2 +-
 test/sql/collation.result       |  27 +++++++++++++++++++++
 test/sql/collation.test.lua     |  14 +++++++++++
 30 files changed, 166 insertions(+), 67 deletions(-)

diff --git a/src/box/alter.cc b/src/box/alter.cc
index 6d2c59bbc..2eb9f53e8 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -403,7 +403,7 @@ field_def_decode(struct field_def *field, const char **data,
 				     "nullable action properties", fieldno +
 				     TUPLE_INDEX_BASE));
 	}
-	if (field->coll_id != COLL_NONE &&
+	if (field->coll_id != 0 &&
 	    field->type != FIELD_TYPE_STRING &&
 	    field->type != FIELD_TYPE_SCALAR &&
 	    field->type != FIELD_TYPE_ANY) {
@@ -2560,7 +2560,6 @@ coll_id_def_new_from_tuple(const struct tuple *tuple, struct coll_id_def *def)
 		tuple_field_with_type_xc(tuple, BOX_COLLATION_FIELD_OPTIONS,
 					 MP_MAP);
 
-	assert(base->type == COLL_TYPE_ICU);
 	if (opts_decode(&base->icu, coll_icu_opts_reg, &options,
 			ER_WRONG_COLLATION_OPTIONS,
 			BOX_COLLATION_FIELD_OPTIONS, NULL) != 0)
@@ -2671,6 +2670,16 @@ on_replace_dd_collation(struct trigger * /* trigger */, void *event)
 		 */
 		int32_t old_id = tuple_field_u32_xc(old_tuple,
 						    BOX_COLLATION_FIELD_ID);
+		/*
+		 * Don't allow user to drop "none" collation
+		 * since it is very special and vastly used
+		 * under the hood. Hence, we can rely on the
+		 * fact that "none" collation features id == 0.
+		 */
+		if (old_id == 0) {
+			tnt_raise(ClientError, ER_DROP_COLLATION, "none",
+				  "system collation");
+		}
 		struct coll_id *old_coll_id = coll_by_id(old_id);
 		assert(old_coll_id != NULL);
 		access_check_ddl(old_coll_id->name, old_coll_id->id,
diff --git a/src/box/bootstrap.snap b/src/box/bootstrap.snap
index 65739384a66d6ba4a538553ccf4677536ba15280..5e136d147036760cb67df68e81aab5744be783e4 100644
GIT binary patch
delta 1909
zcmV-*2a5RM4)+d_7k@P^GdL}0GiGFFH!?UkWC}@cb97;DV`VxZIb=CFWHw=8Eip7@
zHZ3$|Wn?X3VP<75WjSRrW;tVGG-6_A3RXjGZ)0mZAbWiZ3e~y`y3Gdt0M5D=y5Xe&
z00000D77#B08phj0J=<Y5J}KjO#vWYm=|2a2zw054KzTf;(v`no5cMPX0Ik5Fk}sC
zZHY`tO-kA|Nr?QCPAdtQm=p%^DbL*=mX%*pI<rd`svcw`Pi0Ce?*Qij`T#~|EuDAg
zwISJ&8Ps3@D(~xEm|GitA#nG;<%8nbOU=QV_qvt29&azkI?c=W@fG>)UsuzHil~>Z
zIEK4*6Pgc(E`KBVf-I!(558x;-}}G!EEv-nW}{F_my&cb17iE?%&Kt$xH`$$zOTGR
zo2$3<@6K;R8QY{9z-I-F8GSFrS&udffUA={vcw0D9U3rH)adI_)8<hJ!*O+zlU5yO
zz8_{o!qrK3p-W!2LHAXoT%9CJby|h+KGnq+gy~DASbwG!%1fD8DG=9`laqtOp9*8Z
zvcK~Y#5P&?Lg%#dGL&UV%N8FlS9YoZsB<Vt7rdC)kZi|agcU0Nhh5*@vcd8eZAn!R
zjP~;p1;f=zy83vH_Vdw}^yB~u*5e1WHG0#4tCJl2b0k=gpDn2kgCm%=(e7nmvCOlh
zp!amPHGf*OKnctyQ5psPde#Z0kbtX`)c3(KWPv#LE_`v>_wBq4x;npR`&-|2wu#?4
zO^N>T=eb3qGYEh_!+4C_VAe)u4)pul;a6*2@GH9>Lu@_p^QWY9OMfkit<e|*T%DwN
zUo+mtkalqh*2GxfuYtmn$&n6ozaSW5n?zv>aDR1@R32nq5FEATS?Ndc!pfMwGcxu8
zRi!gkDL7e?3I*x}$^@#!iNtAjQr(QW5kYhl!X{;t!_`R=*?{O!vhGy#K~{r9W1{gu
z)4|2Bivow4n;Dy#4Kf;JGRR=4HRb}95lx{n7GgR|VQRQKNtg;#MnvVupRC84(EskV
z`F}!}j=CBeu1-=33~|!+#8HPxF5K^{<XK7oz9NzbW*1h{b-ufE9b#+bvrec-fvb~z
zU0(Av4b-!r*R!LPkeSYDL0+DLtUg2a+qngQxJ{x#CV1cPe*G=Wa}%YaCsc<7u1>O)
zeZN?byIUmMBg5567PVS=cYC8eF<hPGD1WmzsuKa0x4449)k(G<%&@%0AB&Y@;nj!w
z)tmmi*W~@DOd>voGTke}Q?0xe1}s$8>39_|W$AFY_7GWF)Q}q?T@V@|ROSa)Cs|?W
zQdxaa9p;Yp_}wZs>Qp_tbm?M+^5E(uW9IYyu&l@14z5lTK%La6Q~B%{M_$qjI)88m
z#otPl%kf~iX*h$dQLNEw)SYnX{PJ!|=?$(<@<i)%i7^iD#BhN--1zZ8F&h0O#sGSl
z3nm41UECyI7e*4=&qrG4KekFbjk=Q!u1<2tN5+hXpUyb@`G|uP6tbXDYSh`(pdJmb
zPSTqZKvpC~hAc>o8|~2nqlbs`yno<vL3LW_uvo2FsaWmEY5{?9sPSB6s02n30003<
z0MP{pG3bg05@4`MjH4)yffxuw5R4)lM?e4yLJBm9S}-&d%%{4wpNvvCz%hP|9;3(T
zF?zHEA24jqw9<6O0<kp)YS~^M3_}1zWP+j450X47k(ks|7U0<bm|XF(>VJwCyFxW&
zvSx>h*Xrupa#T-amV|4!+4G#G@C5t>lh2DV*jpCAP2@0w?0v*%tf^HhsNvQ{7l)g>
z5CZc%DXe^PZ}cV*RfytKSHf2D@T#r3MhHgUaO#{b0JA~=3XQU^`W(%7AyoI+1(QS-
zqc0cr3D2Ph*<hTN1NY20D}Nub*|X;doDFCMT{2G^FEqytSV)}uyY`&Glh#2|ZZU<a
zzUb1zX4XR_Zpg*&3z9osX4aT3gYK*`P1`?bGjG4U5)1@z%(1$?)d+h(Wr^9R`$E!f
ziSYdK3?HKdKvkBAFH^DzhpkH3f`7TKX8p--!lH5-?E5pZWK6YXqJPOLZ|Q}nl-W`j
zILKukUul5)<NRY6Bwu<4t3$j7+MW_(S6|7ho030)#c`U7rV*MP3NVaBZBM^cYTN9O
ziBi!&?!b?8tb@U(@15Tay+#-~n&rgTr+ia&_}^81l(p#5`)@np*g}UzaEkyp_}YhV
z!BtU?NRF(bYDJt+V}JRtEd<3cBuX5T-PT>QLz$n5K}W9^T8SL-I*Q|CJJQu!#!+|M
zH=_fq+5ko|oy_gO$HeItV&D`Qr?+iTcBTvGOe_s3$|U5@f+vRDt(EhWRKkg}6aa8j
ze83bezNpuT3z|n5)<;Wzkj^-TyWZmAiaL=OI__QSp=Dq<d2Y3b*#G#fQWSAX^v0ef
zxhhH|_1n4|(%~2VL`z9-(zl|(2RqRZtyz>JLomv9uig;>T;qt+2cQ^PqXJ1aA^JZK
vL55PXLzk)lO8W>&phrR$u~5(Z*LkIYMVy@l-;OZk>+zl>-;fT~5UuSBltZC&

delta 1886
zcmV-k2ch`)4&V-u7k@M@I5#mZXEQS}VKFskVq;}xWC}@cb97;DV`VxZGc`3ZV>UB6
zEi*7;Vl6Z_W-~1~FgZ0XI5lH1H#s$9F*ac{3RXjGZ)0mZAbWiZ3e~y`y3GdT0L~dp
z)=i}V00000D77#B08rI706I%F5K6#UQ2{85q9}?YD$b&eu79{-3N}O7l2mpR;KZ{2
zn+{@4)QCi;q$nk^F5UY}lDbGJHYRv&HT-RauFbizhI29aQ_<+yD7;K51?d3d0QUge
z@{IH&cw?nZUm7X<AvBlHyK|e+jK~D)u!ofw_Akusjj{;1d*AXwaqOk$AnkkY%3P1Q
z7iGQnW&8Mw9Dnz)t7}3;^vhNp!|lQa%?HER5PU&a?)!uP+3)xM)_)e1=^V38D5X=3
zcrF3DI!W2Su)M|FtGo2?&T&B*+oKAgtCJ*x<i-gWH2PnNvmS5M$ZP4kI>||^4l~~$
zyA|o`B$McrFI%7sv(c_j^4i?AS~#iHcgndb*|2O@rGL-H=B9Kq-J4QQ3x#n|IW`!q
z?C*R8@ona@&^f8F2r)^*5u~qF*^vUEPN7|$r0c;P%UjG{@M3Pu*^XZbD@6JayT03X
zi{&lalBygO?dRjEL02dF>f<fi&qrHQ6N4gHkFA)sQ5u%6PIBzei(oyrY)NMp7{%<3
zdJ6Ux%YRHW0{Ty9Yojs=j>PN{jX}`wXO~b42f8{*eIE=%7Kmm4!WWl)-_A>*t22zY
z-}PPRoA{mAj_4nKp4%LSG4S&_#%25#vp4!upkLSyTQ=7PTV>Z{i0ub{ZXxOH(r+_j
zZPbN8S0^c67>#!^tX&*}Hc{61>z}YpVx+^|FMkMz*dEcA0bQMBlm}TC1jj3#Rf(Cc
zsz^-Lh;Bxt5^)M~0&!ZIRHhsg4o!;YW{3tgV>5H=>LiC|Ox}l?S0huJ$#{tA5W^vc
z=Ed)e!VJs?7!5EPUNF2|c(G8ci$`ZeP^gQHO9e_>V!ArX77BAV6y-;stVdhW|L(QP
zLVu?ic``6vog@+%;-mYCqYiT~+%K%;8A<=XV(tU;3m@q^-`%+mu^rN~OXxSDtCNIX
zZu@i%)HI*hG^3jz5<0H|eVGQbI_=bN=N9zg_K5m;`M%%%`dyZ1B}ha}s128{PO_YR
zzgUmE+Z^qQ(A7y2xm<a7i=#Xcx;n{mW`A*1=RquQan%y2i*v%MRgMX#RHf9)q$;6W
zI5#C0%Y-VeP&n9>N}(`Vp)*g1RpHVgI-RXOKvq9aK4^M)JYsgx7ad)lWNo2S%9?}f
zFn6rS7Ov7FPlY2XDTxZb(bY-1%;)=KS&ujyU7aL=I%$!os<B@jc}XkKz!?;OBY)9u
z#)ILm;S91y(MD^LXOf{a?7JnUF}ga*5w6xtlyPvUg$mqZ29N)d!RRKj#m~oHu_ved
z;wACEa4u*+A8Vce=ql+f@=P$gI?0_IE@(7t>0Gm)k2N(wA*%_cMc&K{`nl-pB)=8`
zW<)=FW<FTpD36U5Jt`<q3LX?x=YNEbiPcVlYpC&HWT*s05C8xHNC43V2Qlc11`=Sf
zK#Zd>j)4#eLlBBW8%IC@3_=Pth*~go48;SwHJFH2Fd%F?2kk;VXcww=-~-00nO2(4
z7#_C9Kn&JPXB`m05Sd^o^n)x9k}%ou6eKwIauTj!EYZT3wXZg0T(d*RYkzg?LUB|#
zW0r(#8`yKErQ-?y^2rxLfPl3%zvXk}o-2LCX{@RGC#bmUq8G=dT?n=Lxht&Ltv5XM
zKq^G>sVm1;VNq3wzfvSZ&zbt0ErQxX{}LL1Ts3TVav_oK;iE_r^^ZOib+@*&8Jb6T
z{24#RZ0XOiX`b+=wiC=jmw(uke4NxWLj+RR*0wq)@TAqaja#AVtR8b(h~aq%?uI^M
z`prRRPT(uuXYNcJ+VJy!Hk<p!R)T;J@vKg_H;wT38<xm^m(NJmmI#uUo&7P|2}oqg
z_tGMZ@S$}GTlD{LtGR))1F&c@4JrSb3^FEkG11VJx0FKTGHdEW3V*pE$5+~-{vxmL
zf*4EBeRas!LD@4G?DCbYS}*y)SR8><QJU1qp&G+T()NPsQhQ*3fG8CW<4FAYY>n6@
z`rP@=q}MzHN3)#x+LLco2l`9ZcTS5Eseh}}gCB-3f?K0?yRSvk7El%CEOMj@g~-HV
zYP|oo3c>daiHd?`w|_NfFDEk*ne8ai;u4WnDMz7QaYtIM<u>XL{bqEKk~Rq!Oegci
z?|E^eRScX$<Mg@>shsKmIpci>6g3mx`LKW?cWdQ*5}9z8djSBq#m8;I;y?8oaX|qI
z!}@5+->Dg+0MUDNxS*Xd7KijMg=l18w|Yfa#Q*iIdK8gK^jFHBq=_p2AoZQJo5JB~
z{)9_u-DI}n!v|vG5UoYj4MPyhwO7m|j<QA`KTd1ZBMB3t<;Mk&Gr6RupO)4s0af}3
YejQs<I4l@DLYl8<_9QWfbkz{8?dNiiMF0Q*

diff --git a/src/box/coll_id.c b/src/box/coll_id.c
index 2d5f8a09a..b56c74961 100644
--- a/src/box/coll_id.c
+++ b/src/box/coll_id.c
@@ -37,7 +37,6 @@
 struct coll_id *
 coll_id_new(const struct coll_id_def *def)
 {
-	assert(def->base.type == COLL_TYPE_ICU);
 	size_t total_len = sizeof(struct coll_id) + def->name_len + 1;
 	struct coll_id *coll_id = (struct coll_id *) malloc(total_len);
 	if (coll_id == NULL) {
diff --git a/src/box/errcode.h b/src/box/errcode.h
index 4eb7fced5..18ffdf3d5 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -224,6 +224,7 @@ struct errcode_record {
 	/*169 */_(ER_NO_SUCH_CONSTRAINT,	"Constraint %s does not exist") \
 	/*170 */_(ER_CONSTRAINT_EXISTS,		"Constraint %s already exists") \
 	/*171 */_(ER_SQL_TYPE_MISMATCH,		"Type mismatch: can not convert %s to %s") \
+	/*172 */_(ER_DROP_COLLATION,		"Can't drop collation %s : %s") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/field_def.c b/src/box/field_def.c
index 3a9ff3703..3e63e12a3 100644
--- a/src/box/field_def.c
+++ b/src/box/field_def.c
@@ -123,7 +123,7 @@ const struct field_def field_def_default = {
 	.name = NULL,
 	.is_nullable = false,
 	.nullable_action = ON_CONFLICT_ACTION_DEFAULT,
-	.coll_id = COLL_NONE,
+	.coll_id = 0,
 	.default_value = NULL,
 	.default_value_expr = NULL
 };
diff --git a/src/box/key_def.c b/src/box/key_def.c
index 3a560bb06..6802489f1 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -41,7 +41,7 @@ const char *sort_order_strs[] = { "asc", "desc", "undef" };
 const struct key_part_def key_part_def_default = {
 	0,
 	field_type_MAX,
-	COLL_NONE,
+	0,
 	false,
 	ON_CONFLICT_ACTION_DEFAULT,
 	SORT_ORDER_ASC
@@ -174,7 +174,7 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
 	for (uint32_t i = 0; i < part_count; i++) {
 		const struct key_part_def *part = &parts[i];
 		struct coll *coll = NULL;
-		if (part->coll_id != COLL_NONE) {
+		if (part->coll_id != 0) {
 			struct coll_id *coll_id = coll_by_id(part->coll_id);
 			if (coll_id == NULL) {
 				diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
@@ -223,7 +223,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
 		key_def_set_part(key_def, item, fields[item],
 				 (enum field_type)types[item],
 				 ON_CONFLICT_ACTION_DEFAULT,
-				 NULL, COLL_NONE, SORT_ORDER_ASC);
+				 NULL, 0, SORT_ORDER_ASC);
 	}
 	key_def_set_cmp(key_def);
 	return key_def;
@@ -319,7 +319,7 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
 	for (uint32_t i = 0; i < part_count; i++) {
 		const struct key_part_def *part = &parts[i];
 		int count = 2;
-		if (part->coll_id != COLL_NONE)
+		if (part->coll_id != 0)
 			count++;
 		if (part->is_nullable)
 			count++;
@@ -329,7 +329,7 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
 		assert(part->type < field_type_MAX);
 		size += mp_sizeof_str(strlen(PART_OPT_TYPE));
 		size += mp_sizeof_str(strlen(field_type_strs[part->type]));
-		if (part->coll_id != COLL_NONE) {
+		if (part->coll_id != 0) {
 			size += mp_sizeof_str(strlen(PART_OPT_COLLATION));
 			size += mp_sizeof_uint(part->coll_id);
 		}
@@ -348,7 +348,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
 	for (uint32_t i = 0; i < part_count; i++) {
 		const struct key_part_def *part = &parts[i];
 		int count = 2;
-		if (part->coll_id != COLL_NONE)
+		if (part->coll_id != 0)
 			count++;
 		if (part->is_nullable)
 			count++;
@@ -361,7 +361,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
 		assert(part->type < field_type_MAX);
 		const char *type_str = field_type_strs[part->type];
 		data = mp_encode_str(data, type_str, strlen(type_str));
-		if (part->coll_id != COLL_NONE) {
+		if (part->coll_id != 0) {
 			data = mp_encode_str(data, PART_OPT_COLLATION,
 					     strlen(PART_OPT_COLLATION));
 			data = mp_encode_uint(data, part->coll_id);
@@ -431,7 +431,7 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count,
 		part->is_nullable = (part->fieldno < field_count ?
 				     fields[part->fieldno].is_nullable :
 				     key_part_def_default.is_nullable);
-		part->coll_id = COLL_NONE;
+		part->coll_id = 0;
 	}
 	return 0;
 }
@@ -488,7 +488,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
 				 "index part: unknown field type");
 			return -1;
 		}
-		if (part->coll_id != COLL_NONE &&
+		if (part->coll_id != 0 &&
 		    part->type != FIELD_TYPE_STRING &&
 		    part->type != FIELD_TYPE_SCALAR) {
 			diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
diff --git a/src/box/key_def.h b/src/box/key_def.h
index 20e79f9fe..684e9cf75 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -72,12 +72,6 @@ struct key_part_def {
 
 extern const struct key_part_def key_part_def_default;
 
-/**
- * Set key_part_def.coll_id to COLL_NONE if
- * the field does not have a collation.
- */
-#define COLL_NONE UINT32_MAX
-
 /** Descriptor of a single part in a multipart key. */
 struct key_part {
 	/** Tuple field index for this part */
diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index 7cae436f1..e5e09b042 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -299,7 +299,7 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
 			lua_pushboolean(L, key_part_is_nullable(part));
 			lua_setfield(L, -2, "is_nullable");
 
-			if (part->coll_id != COLL_NONE) {
+			if (part->coll_id != 0) {
 				struct coll_id *coll_id =
 					coll_by_id(part->coll_id);
 				assert(coll_id != NULL);
diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua
index 64f74f9d3..a9525058b 100644
--- a/src/box/lua/upgrade.lua
+++ b/src/box/lua/upgrade.lua
@@ -396,8 +396,10 @@ local function create_collation_space()
     box.space._index:insert{_collation.id, 1, 'name', 'tree', {unique = true}, {{1, 'string'}}}
 
     log.info("create predefined collations")
+    box.space._collation:replace{0, "none", ADMIN, "BINARY", "", setmap{}}
     box.space._collation:replace{1, "unicode", ADMIN, "ICU", "", setmap{}}
     box.space._collation:replace{2, "unicode_ci", ADMIN, "ICU", "", {strength='primary'}}
+    box.space._collation:replace{3, "binary", ADMIN, "BINARY", "", setmap{}}
 
     local _priv = box.space[box.schema.PRIV_ID]
     _priv:insert{ADMIN, PUBLIC, 'space', _collation.id, box.priv.W}
diff --git a/src/box/sql.c b/src/box/sql.c
index caa66144f..686d32335 100644
--- a/src/box/sql.c
+++ b/src/box/sql.c
@@ -379,7 +379,7 @@ sql_ephemeral_space_create(uint32_t field_count, struct sql_key_info *key_info)
 		if (def != NULL && i < def->part_count)
 			part->coll_id = def->parts[i].coll_id;
 		else
-			part->coll_id = COLL_NONE;
+			part->coll_id = 0;
 	}
 	struct key_def *ephemer_key_def = key_def_new(ephemer_key_parts,
 						      field_count);
@@ -1139,7 +1139,7 @@ sql_encode_table(struct region *region, struct Table *table, uint32_t *size)
 		struct field_def *field = &def->fields[i];
 		const char *default_str = field->default_value;
 		int base_len = 5;
-		if (cid != COLL_NONE)
+		if (cid != 0)
 			base_len += 1;
 		if (default_str != NULL)
 			base_len += 1;
@@ -1160,7 +1160,7 @@ sql_encode_table(struct region *region, struct Table *table, uint32_t *size)
 		const char *action =
 			on_conflict_action_strs[def->fields[i].nullable_action];
 		mpstream_encode_str(&stream, action);
-		if (cid != COLL_NONE) {
+		if (cid != 0) {
 			mpstream_encode_str(&stream, "collation");
 			mpstream_encode_uint(&stream, cid);
 		}
@@ -1285,13 +1285,13 @@ sql_encode_index_parts(struct region *region, const struct field_def *fields,
 		       action_is_nullable(fields[col].nullable_action));
 		/* Do not decode default collation. */
 		uint32_t cid = part->coll_id;
-		mpstream_encode_map(&stream, 5 + (cid != COLL_NONE));
+		mpstream_encode_map(&stream, 5 + (cid != 0));
 		mpstream_encode_str(&stream, "type");
 		mpstream_encode_str(&stream, field_type_strs[fields[col].type]);
 		mpstream_encode_str(&stream, "field");
 		mpstream_encode_uint(&stream, col);
 
-		if (cid != COLL_NONE) {
+		if (cid != 0) {
 			mpstream_encode_str(&stream, "collation");
 			mpstream_encode_uint(&stream, cid);
 		}
diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index 5b3348bd2..929d20dbf 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -2399,7 +2399,7 @@ index_fill_def(struct Parse *parse, struct index *index,
 		uint32_t coll_id;
 		if (expr->op == TK_COLLATE) {
 			sql_get_coll_seq(parse, expr->u.zToken, &coll_id);
-			if (coll_id == COLL_NONE &&
+			if (coll_id == 0 &&
 			    strcasecmp(expr->u.zToken, "binary") != 0) {
 				diag_set(ClientError, ER_NO_SUCH_COLLATION,
 					 expr->u.zToken);
diff --git a/src/box/sql/callback.c b/src/box/sql/callback.c
index 3cf3a835d..352745e0e 100644
--- a/src/box/sql/callback.c
+++ b/src/box/sql/callback.c
@@ -42,13 +42,22 @@
 struct coll *
 sql_get_coll_seq(Parse *parser, const char *name, uint32_t *coll_id)
 {
-	if (name == NULL || strcasecmp(name, "binary") == 0) {
-		*coll_id = COLL_NONE;
-		return NULL;
+	if (name == NULL) {
+		*coll_id = 0;
+		return coll_by_id(0)->coll;
 	}
-	struct coll_id *p = coll_by_name(name, strlen(name));
+	struct coll_id *p;
+	/*
+	 * In SQL all identifiers should be uppercased, so
+	 * to avoid mess lets simple search binary (since it is
+	 * sort of "special" collation) ignoring case at all.
+	 */
+	if (strcasecmp(name, "binary") == 0)
+		p = coll_by_name("binary", strlen("binary"));
+	else
+		p = coll_by_name(name, strlen(name));
 	if (p == NULL) {
-		*coll_id = COLL_NONE;
+		*coll_id = 0;
 		sqlite3ErrorMsg(parser, "no such collation sequence: %s",
 				name);
 		return NULL;
diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
index 4d1c1a634..e52cd6407 100644
--- a/src/box/sql/expr.c
+++ b/src/box/sql/expr.c
@@ -194,7 +194,7 @@ sql_expr_coll(Parse *parse, Expr *p, bool *is_found, uint32_t *coll_id)
 {
 	struct coll *coll = NULL;
 	*is_found = false;
-	*coll_id = COLL_NONE;
+	*coll_id = 0;
 	while (p != NULL) {
 		int op = p->op;
 		if (p->flags & EP_Generic)
diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 8c34cbb3d..580cf1e60 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -506,7 +506,7 @@ case_type##ICUFunc(sqlite3_context *context, int argc, sqlite3_value **argv)   \
 	UErrorCode status = U_ZERO_ERROR;                                      \
 	struct coll *coll = sqlite3GetFuncCollSeq(context);                    \
 	const char *locale = NULL;                                             \
-	if (coll != NULL) {                                                    \
+	if (coll != NULL && coll->collator != NULL) {                          \
 		locale = ucol_getLocaleByType(coll->collator,                  \
 					      ULOC_VALID_LOCALE, &status);     \
 	}                                                                      \
diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index dfa6ed8e0..cea453f08 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -1345,7 +1345,7 @@ sql_key_info_new(sqlite3 *db, uint32_t part_count)
 		struct key_part_def *part = &key_info->parts[i];
 		part->fieldno = i;
 		part->type = FIELD_TYPE_SCALAR;
-		part->coll_id = COLL_NONE;
+		part->coll_id = 0;
 		part->is_nullable = false;
 		part->nullable_action = ON_CONFLICT_ACTION_ABORT;
 		part->sort_order = SORT_ORDER_ASC;
@@ -1961,7 +1961,7 @@ sqlite3SelectAddColumnTypeAndCollation(Parse * pParse,		/* Parsing contexts */
 		pTab->def->fields[i].type = sql_affinity_to_field_type(affinity);
 		bool is_found;
 		uint32_t coll_id;
-		if (pTab->def->fields[i].coll_id == COLL_NONE &&
+		if (pTab->def->fields[i].coll_id == 0 &&
 		    sql_expr_coll(pParse, p, &is_found, &coll_id) && is_found)
 			pTab->def->fields[i].coll_id = coll_id;
 	}
@@ -2160,7 +2160,7 @@ multi_select_coll_seq_r(Parse *parser, Select *p, int n, bool *is_found,
 					       coll_id);
 	} else {
 		coll = NULL;
-		*coll_id = COLL_NONE;
+		*coll_id = 0;
 	}
 	assert(n >= 0);
 	/* iCol must be less than p->pEList->nExpr.  Otherwise an error would
@@ -2233,7 +2233,7 @@ sql_multiselect_orderby_to_key_info(struct Parse *parse, struct Select *s,
 		} else {
 			multi_select_coll_seq(parse, s,
 					      item->u.x.iOrderByCol - 1, &id);
-			if (id != COLL_NONE) {
+			if (id != 0) {
 				const char *name = coll_by_id(id)->name;
 				order_by->a[i].pExpr =
 					sqlite3ExprAddCollateString(parse, term,
diff --git a/src/box/sql/where.c b/src/box/sql/where.c
index 8c78c0c9b..1db4db874 100644
--- a/src/box/sql/where.c
+++ b/src/box/sql/where.c
@@ -2806,7 +2806,7 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder,	/* WHERE clause information */
 		part.nullable_action = ON_CONFLICT_ACTION_ABORT;
 		part.is_nullable = false;
 		part.sort_order = SORT_ORDER_ASC;
-		part.coll_id = COLL_NONE;
+		part.coll_id = 0;
 
 		struct key_def *key_def = key_def_new(&part, 1);
 		if (key_def == NULL) {
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 5bdf102e7..8e5aea51b 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -40,7 +40,7 @@ static uint32_t formats_size = 0, formats_capacity = 0;
 
 static const struct tuple_field tuple_field_default = {
 	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
-	ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE,
+	ON_CONFLICT_ACTION_DEFAULT, NULL, 0,
 };
 
 /**
@@ -67,7 +67,7 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 		format->fields[i].nullable_action = fields[i].nullable_action;
 		struct coll *coll = NULL;
 		uint32_t cid = fields[i].coll_id;
-		if (cid != COLL_NONE) {
+		if (cid != 0) {
 			struct coll_id *coll_id = coll_by_id(cid);
 			if (coll_id == NULL) {
 				diag_set(ClientError,ER_WRONG_COLLATION_OPTIONS,
diff --git a/src/coll.c b/src/coll.c
index 6a76f1f0b..6d9c44dbf 100644
--- a/src/coll.c
+++ b/src/coll.c
@@ -92,6 +92,17 @@ coll_icu_cmp(const char *s, size_t slen, const char *t, size_t tlen,
 	return (int)result;
 }
 
+static int
+coll_bin_cmp(const char *s, size_t slen, const char *t, size_t tlen,
+	     const struct coll *coll)
+{
+	(void) coll;
+	int res = memcmp(s, t, slen < tlen ? slen : tlen);
+	if (res == 0)
+		res = slen - tlen;
+	return res;
+}
+
 /** Get a hash of a string using ICU collation. */
 static uint32_t
 coll_icu_hash(const char *s, size_t s_len, uint32_t *ph, uint32_t *pcarry,
@@ -113,6 +124,15 @@ coll_icu_hash(const char *s, size_t s_len, uint32_t *ph, uint32_t *pcarry,
 	return total_size;
 }
 
+static uint32_t
+coll_bin_hash(const char *s, size_t s_len, uint32_t *ph, uint32_t *pcarry,
+	      struct coll *coll)
+{
+	(void) coll;
+	PMurHash32_Process(ph, pcarry, s, s_len);
+	return s_len;
+}
+
 /**
  * Set up ICU collator and init cmp and hash members of collation.
  * @param coll Collation to set up.
@@ -262,17 +282,22 @@ static int
 coll_def_snfingerprint(char *buffer, int size, const struct coll_def *def)
 {
 	int total = 0;
-	SNPRINT(total, snprintf, buffer, size, "{locale: %s, type = %d, "\
-	        "icu: ", def->locale, (int) def->type);
-	SNPRINT(total, coll_icu_def_snfingerprint, buffer, size, &def->icu);
-	SNPRINT(total, snprintf, buffer, size, "}");
+	if (def->type == COLL_TYPE_ICU) {
+		SNPRINT(total, snprintf, buffer, size, "{locale: %s,"\
+			"type = %d, icu: ", def->locale, (int) def->type);
+		SNPRINT(total, coll_icu_def_snfingerprint, buffer,
+			size, &def->icu);
+		SNPRINT(total, snprintf, buffer, size, "}");
+	} else {
+		assert(def->type == COLL_TYPE_BINARY);
+		SNPRINT(total, snprintf, buffer, size, "{type = binary}");
+	}
 	return total;
 }
 
 struct coll *
 coll_new(const struct coll_def *def)
 {
-	assert(def->type == COLL_TYPE_ICU);
 	int fingerprint_len = coll_def_snfingerprint(NULL, 0, def);
 	assert(fingerprint_len <= TT_STATIC_BUF_LEN);
 	char *fingerprint = tt_static_buf();
@@ -296,9 +321,20 @@ coll_new(const struct coll_def *def)
 	memcpy((char *) coll->fingerprint, fingerprint, fingerprint_len + 1);
 	coll->refs = 1;
 	coll->type = def->type;
-	if (coll_icu_init_cmp(coll, def) != 0) {
-		free(coll);
-		return NULL;
+	switch (coll->type) {
+	case COLL_TYPE_ICU:
+		if (coll_icu_init_cmp(coll, def) != 0) {
+			free(coll);
+			return NULL;
+		}
+		break;
+	case COLL_TYPE_BINARY:
+		coll->collator = NULL;
+		coll->cmp = coll_bin_cmp;
+		coll->hash = coll_bin_hash;
+		break;
+	default:
+		unreachable();
 	}
 
 	struct mh_coll_node_t node = { fingerprint_len, hash, coll };
diff --git a/src/coll_def.c b/src/coll_def.c
index df58caca8..3a4fc2b87 100644
--- a/src/coll_def.c
+++ b/src/coll_def.c
@@ -31,7 +31,7 @@
 #include "coll_def.h"
 
 const char *coll_type_strs[] = {
-	"ICU"
+	"ICU", "BINARY"
 };
 
 const char *coll_icu_on_off_strs[] = {
diff --git a/src/coll_def.h b/src/coll_def.h
index 7c20abf66..d3af89802 100644
--- a/src/coll_def.h
+++ b/src/coll_def.h
@@ -36,6 +36,7 @@
 /** The supported collation types */
 enum coll_type {
 	COLL_TYPE_ICU = 0,
+	COLL_TYPE_BINARY = 1,
 	coll_type_MAX,
 };
 
diff --git a/test/box/ddl.result b/test/box/ddl.result
index c9a8e96ae..d3b0d1e0e 100644
--- a/test/box/ddl.result
+++ b/test/box/ddl.result
@@ -356,7 +356,7 @@ box.space._collation:auto_increment{'test', 0, 'ICU', 42}
 ...
 box.space._collation:auto_increment{'test', 0, 'ICU', 'ru_RU', setmap{}} --ok
 ---
-- [3, 'test', 0, 'ICU', 'ru_RU', {}]
+- [4, 'test', 0, 'ICU', 'ru_RU', {}]
 ...
 box.space._collation:auto_increment{'test', 0, 'ICU', 'ru_RU', setmap{}}
 ---
@@ -364,7 +364,7 @@ box.space._collation:auto_increment{'test', 0, 'ICU', 'ru_RU', setmap{}}
 ...
 box.space._collation.index.name:delete{'test'} -- ok
 ---
-- [3, 'test', 0, 'ICU', 'ru_RU', {}]
+- [4, 'test', 0, 'ICU', 'ru_RU', {}]
 ...
 box.space._collation.index.name:delete{'nothing'} -- allowed
 ---
@@ -480,24 +480,28 @@ _ = box.space._collation.index.name:delete{'test'} -- ok
 ...
 box.space._collation:auto_increment{'test', 0, 'ICU', 'ru_RU', setmap{}}
 ---
-- [3, 'test', 0, 'ICU', 'ru_RU', {}]
+- [4, 'test', 0, 'ICU', 'ru_RU', {}]
 ...
 box.space._collation:select{}
 ---
-- - [1, 'unicode', 1, 'ICU', '', {}]
+- - [0, 'none', 1, 'BINARY', '', {}]
+  - [1, 'unicode', 1, 'ICU', '', {}]
   - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary'}]
-  - [3, 'test', 0, 'ICU', 'ru_RU', {}]
+  - [3, 'binary', 1, 'BINARY', '', {}]
+  - [4, 'test', 0, 'ICU', 'ru_RU', {}]
 ...
 test_run:cmd('restart server default')
 box.space._collation:select{}
 ---
-- - [1, 'unicode', 1, 'ICU', '', {}]
+- - [0, 'none', 1, 'BINARY', '', {}]
+  - [1, 'unicode', 1, 'ICU', '', {}]
   - [2, 'unicode_ci', 1, 'ICU', '', {'strength': 'primary'}]
-  - [3, 'test', 0, 'ICU', 'ru_RU', {}]
+  - [3, 'binary', 1, 'BINARY', '', {}]
+  - [4, 'test', 0, 'ICU', 'ru_RU', {}]
 ...
 box.space._collation.index.name:delete{'test'}
 ---
-- [3, 'test', 0, 'ICU', 'ru_RU', {}]
+- [4, 'test', 0, 'ICU', 'ru_RU', {}]
 ...
 --
 -- gh-3290: expose ICU into Lua. It uses built-in collations, that
diff --git a/test/box/misc.result b/test/box/misc.result
index 3ada82fb7..ea3cd8805 100644
--- a/test/box/misc.result
+++ b/test/box/misc.result
@@ -498,6 +498,7 @@ t;
   169: box.error.NO_SUCH_CONSTRAINT
   170: box.error.CONSTRAINT_EXISTS
   171: box.error.SQL_TYPE_MISMATCH
+  172: box.error.DROP_COLLATION
 ...
 test_run:cmd("setopt delimiter ''");
 ---
diff --git a/test/box/net.box.result b/test/box/net.box.result
index 57eebfbfc..4f6979d5b 100644
--- a/test/box/net.box.result
+++ b/test/box/net.box.result
@@ -2654,7 +2654,7 @@ c.space.test.index.sk.parts
 ---
 - - type: string
     is_nullable: false
-    collation_id: 3
+    collation_id: 4
     fieldno: 1
 ...
 c:close()
diff --git a/test/sql-tap/collation.test.lua b/test/sql-tap/collation.test.lua
index eb4f43a90..a4684b956 100755
--- a/test/sql-tap/collation.test.lua
+++ b/test/sql-tap/collation.test.lua
@@ -17,11 +17,13 @@ test:do_execsql_test(
     prefix.."0.2",
     "pragma collation_list",
     {
-        0,"unicode",
-        1,"unicode_ci",
-        2,"unicode_numeric",
-        3,"unicode_numeric_s2",
-        4,"unicode_tur_s2"
+        0,"none",
+        1,"unicode",
+        2,"unicode_ci",
+        3,"binary",
+        4,"unicode_numeric",
+        5,"unicode_numeric_s2",
+        6,"unicode_tur_s2"
     }
 )
 
diff --git a/test/sql-tap/distinct.test.lua b/test/sql-tap/distinct.test.lua
index 26a4aace2..a83b37829 100755
--- a/test/sql-tap/distinct.test.lua
+++ b/test/sql-tap/distinct.test.lua
@@ -126,7 +126,7 @@ local data = {
     {"14.2", 1, "SELECT DISTINCT a, d COLLATE \"unicode_ci\" FROM t4"},
     {"15 ", 0, "SELECT DISTINCT a, d COLLATE binary FROM t1"},
     {"16.1", 0, "SELECT DISTINCT a, b, c COLLATE binary FROM t1"},
-    {"16.2", 1, "SELECT DISTINCT a, b, c COLLATE binary FROM t4"},
+    {"16.2", 0, "SELECT DISTINCT a, b, c COLLATE binary FROM t4"},
     {"17",  0,   --{ \/* Technically, it would be possible to detect that DISTINCT\n            ** is a no-op in cases like the following. But SQLite does not\n            ** do so. *\/\n
     "SELECT DISTINCT t1.id FROM t1, t2 WHERE t1.id=t2.x" },
     {"18 ", 1, "SELECT DISTINCT c1, c2 FROM t3"},
diff --git a/test/sql-tap/in3.test.lua b/test/sql-tap/in3.test.lua
index 83139a3e4..ad4f4fc2c 100755
--- a/test/sql-tap/in3.test.lua
+++ b/test/sql-tap/in3.test.lua
@@ -209,7 +209,7 @@ test:do_test(
         return exec_neph(" SELECT a FROM t1 WHERE a COLLATE binary IN (SELECT a FROM t1) ")
     end, {
         -- <in3-1.15>
-        0, 1, 3, 5
+        1, 1, 3, 5
         -- </in3-1.15>
     })
 
diff --git a/test/sql-tap/index3.test.lua b/test/sql-tap/index3.test.lua
index 4f950be09..6a9911ada 100755
--- a/test/sql-tap/index3.test.lua
+++ b/test/sql-tap/index3.test.lua
@@ -59,7 +59,7 @@ test:do_execsql_test(
         CREATE TABLE t1(a INT , b TEXT , c INT , d INT , e INT ,
                         PRIMARY KEY(a), UNIQUE(b COLLATE "unicode_ci" DESC));
         CREATE INDEX t1c ON t1(c);
-        CREATE INDEX t1d ON t1(d COLLATE binary ASC);
+        CREATE INDEX t1d ON t1(d);
         WITH RECURSIVE c(x) AS (VALUES(1) UNION SELECT x+1 FROM c WHERE x<30)
           INSERT INTO t1(a,b,c,d,e) 
             SELECT x, printf('ab%03xxy',x), x, x, x FROM c;
diff --git a/test/sql-tap/where2.test.lua b/test/sql-tap/where2.test.lua
index a2b60e347..9089c97f9 100755
--- a/test/sql-tap/where2.test.lua
+++ b/test/sql-tap/where2.test.lua
@@ -183,7 +183,7 @@ test:do_test(
   ]])
     end, {
         -- <where2-2.1>
-        85, 6, 7396, 7402, "nosort", "T1", "*"
+        85, 6, 7396, 7402, "sort", "T1", "*"
         -- </where2-2.1>
     })
 
diff --git a/test/sql/collation.result b/test/sql/collation.result
index 419e469f7..3df4cfa70 100644
--- a/test/sql/collation.result
+++ b/test/sql/collation.result
@@ -107,6 +107,33 @@ cn:execute('select 1 limit ? collate not_exist', {1})
 cn:close()
 ---
 ...
+-- Explicitly set BINARY collation is predifined and has ID.
+--
+box.sql.execute("CREATE TABLE t (id INT PRIMARY KEY, a TEXT, b TEXT COLLATE \"binary\");")
+---
+...
+box.space.T:format()[2]['collation']
+---
+- null
+...
+box.space.T:format()[3]['collation']
+---
+- 3
+...
+box.sql.execute("DROP TABLE t;")
+---
+...
+-- Collation with id == 0 is "none". It used to unify interaction
+-- with collation interface. It also can't be dropped.
+--
+box.space._collation:select{0}
+---
+- - [0, 'none', 1, 'BINARY', '', {}]
+...
+box.space._collation:delete{0}
+---
+- error: 'Can''t drop collation none : system collation'
+...
 box.schema.user.revoke('guest', 'read,write,execute', 'universe')
 ---
 ...
diff --git a/test/sql/collation.test.lua b/test/sql/collation.test.lua
index da577c910..61df33a95 100644
--- a/test/sql/collation.test.lua
+++ b/test/sql/collation.test.lua
@@ -42,4 +42,18 @@ cn = remote.connect(box.cfg.listen)
 cn:execute('select 1 limit ? collate not_exist', {1})
 
 cn:close()
+
+-- Explicitly set BINARY collation is predifined and has ID.
+--
+box.sql.execute("CREATE TABLE t (id INT PRIMARY KEY, a TEXT, b TEXT COLLATE \"binary\");")
+box.space.T:format()[2]['collation']
+box.space.T:format()[3]['collation']
+box.sql.execute("DROP TABLE t;")
+
+-- Collation with id == 0 is "none". It used to unify interaction
+-- with collation interface. It also can't be dropped.
+--
+box.space._collation:select{0}
+box.space._collation:delete{0}
+
 box.schema.user.revoke('guest', 'read,write,execute', 'universe')
-- 
2.15.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH v2 3/3] sql: change collation compatibility rules
  2018-11-13  0:07 [tarantool-patches] [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL Nikita Pettik
  2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 1/3] sql: do not add explicit <COLLATE "BINARY"> clause Nikita Pettik
  2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 2/3] Introduce "none" and "binary" collations Nikita Pettik
@ 2018-11-13  0:07 ` Nikita Pettik
  2018-11-13 12:02   ` [tarantool-patches] " Vladislav Shpilevoy
  2018-11-15 11:24 ` [tarantool-patches] Re: [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL Kirill Yukhin
  3 siblings, 1 reply; 12+ messages in thread
From: Nikita Pettik @ 2018-11-13  0:07 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, Nikita Pettik

Before this patch our SQL implementation relied on strange rules
when comparing strings with different collations:

- if either operand has an explicit collating function assignment using
  the postfix COLLATE operator, then the explicit collating function is
  used for comparison, with precedence to the collating function of the
  left operand;

- if either operand is a column, then the collating function of that
  column is used with precedence to the left operand.

The main concern in this implementation is that collation of the left
operand is forced over right one (even if both operands come with
explicit COLLATE clause). This contradicts ANSI SQL and seems to be
quite misleading, since if user simple swap LHS and RHS - result of
query may change.

Lets introduce restrictions concerning collations compatibilities.
Collations of LHS and RHS are compatible (i.e. no "Illegal mix of
collations" is thrown) if:

- one of collations is mentioned alongside with explicit COLLATE clause,
  which forces this collation over another one. It is allowed to have
  the same forced collations;

- both collations are derived from table columns and they are the same;

- one collation is derived from table column and another one is not
  specified (i.e. COLL_NONE).

The compound SELECT operators UNION, INTERSECT and EXCEPT perform implicit
comparisons between values. Hence, all new rules stated above are
applied to parts of compound SELECT. Otherwise, an error is raised.
In other words, before current patch queries like below were allowed:

SELECT 'abc' COLLATE binary UNION SELECT 'ABC' COLLATE "unicode_ci";
---
- - ['ABC']
  - ['abc']
...

If we swap collations, we will get another result:

SELECT 'ABC' COLLATE "unicode_ci" UNION SELECT 'abc' COLLATE BINARY
---
- - ['abc']
...

Now such queries are illegal.

Closes #3185
---
 src/box/errcode.h                 |   1 +
 src/box/sql/expr.c                |  71 +++++++++++++++++++------
 src/box/sql/select.c              | 107 +++++++++++++++++++++-----------------
 src/box/sql/sqliteInt.h           |  28 +++++++++-
 src/box/sql/where.c               |  13 +++--
 src/box/sql/whereexpr.c           |  13 ++---
 test/box/misc.result              |   1 +
 test/sql-tap/e_select1.test.lua   |  12 ++---
 test/sql-tap/in4.test.lua         |   2 +-
 test/sql-tap/join.test.lua        |   2 +-
 test/sql-tap/tkt3493.test.lua     |   4 +-
 test/sql-tap/transitive1.test.lua |   4 +-
 test/sql-tap/with1.test.lua       |   2 +-
 test/sql/collation.result         |  63 ++++++++++++++++++++++
 test/sql/collation.test.lua       |  22 ++++++++
 15 files changed, 250 insertions(+), 95 deletions(-)

diff --git a/src/box/errcode.h b/src/box/errcode.h
index 18ffdf3d5..3b4f35c58 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -225,6 +225,7 @@ struct errcode_record {
 	/*170 */_(ER_CONSTRAINT_EXISTS,		"Constraint %s already exists") \
 	/*171 */_(ER_SQL_TYPE_MISMATCH,		"Type mismatch: can not convert %s to %s") \
 	/*172 */_(ER_DROP_COLLATION,		"Can't drop collation %s : %s") \
+	/*173 */_(ER_ILLEGAL_COLLATION_MIX,	"Illegal mix of collations") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
index e52cd6407..50de9ba6d 100644
--- a/src/box/sql/expr.c
+++ b/src/box/sql/expr.c
@@ -190,10 +190,10 @@ sqlite3ExprSkipCollate(Expr * pExpr)
 }
 
 struct coll *
-sql_expr_coll(Parse *parse, Expr *p, bool *is_found, uint32_t *coll_id)
+sql_expr_coll(Parse *parse, Expr *p, bool *is_explicit_coll, uint32_t *coll_id)
 {
 	struct coll *coll = NULL;
-	*is_found = false;
+	*is_explicit_coll = false;
 	*coll_id = 0;
 	while (p != NULL) {
 		int op = p->op;
@@ -206,7 +206,7 @@ sql_expr_coll(Parse *parse, Expr *p, bool *is_found, uint32_t *coll_id)
 		if (op == TK_COLLATE ||
 		    (op == TK_REGISTER && p->op2 == TK_COLLATE)) {
 			coll = sql_get_coll_seq(parse, p->u.zToken, coll_id);
-			*is_found = true;
+			*is_explicit_coll = true;
 			break;
 		}
 		if ((op == TK_AGG_COLUMN || op == TK_COLUMN ||
@@ -219,7 +219,6 @@ sql_expr_coll(Parse *parse, Expr *p, bool *is_found, uint32_t *coll_id)
 			if (j >= 0) {
 				coll = sql_column_collation(p->space_def, j,
 							    coll_id);
-				*is_found = true;
 			}
 			break;
 		}
@@ -342,25 +341,63 @@ binaryCompareP5(Expr * pExpr1, Expr * pExpr2, int jumpIfNull)
 	return aff;
 }
 
+int
+collations_check_compatibility(uint32_t lhs_id, bool is_lhs_forced,
+			       uint32_t rhs_id, bool is_rhs_forced,
+			       uint32_t *res_id)
+{
+	assert(res_id != NULL);
+	if (is_lhs_forced && is_rhs_forced) {
+		if (lhs_id != rhs_id)
+			goto illegal_collation_mix;
+	}
+	if (is_lhs_forced) {
+		*res_id = lhs_id;
+		return 0;
+	}
+	if (is_rhs_forced) {
+		*res_id = rhs_id;
+		return 0;
+	}
+	if (lhs_id != rhs_id) {
+		if (lhs_id == 0) {
+			*res_id = rhs_id;
+			return 0;
+		}
+		if (rhs_id == 0) {
+			*res_id = lhs_id;
+			return 0;
+		}
+		goto illegal_collation_mix;
+	}
+	*res_id = lhs_id;
+	return 0;
+illegal_collation_mix:
+	diag_set(ClientError, ER_ILLEGAL_COLLATION_MIX);
+	return -1;
+}
+
 struct coll *
 sql_binary_compare_coll_seq(Parse *parser, Expr *left, Expr *right,
 			    uint32_t *coll_id)
 {
-	struct coll *coll;
-	bool is_found;
 	assert(left != NULL);
-	if ((left->flags & EP_Collate) != 0) {
-		coll = sql_expr_coll(parser, left, &is_found, coll_id);
-	} else if (right != NULL && (right->flags & EP_Collate) != 0) {
-		coll = sql_expr_coll(parser, right, &is_found, coll_id);
-	} else {
-		coll = sql_expr_coll(parser, left, &is_found, coll_id);
-		if (! is_found) {
-			coll = sql_expr_coll(parser, right, &is_found,
-					     coll_id);
-		}
+	bool is_lhs_forced;
+	bool is_rhs_forced;
+	uint32_t lhs_coll_id;
+	uint32_t rhs_coll_id;
+	struct coll *lhs_coll = sql_expr_coll(parser, left, &is_lhs_forced,
+					      &lhs_coll_id);
+	struct coll *rhs_coll = sql_expr_coll(parser, right, &is_rhs_forced,
+					      &rhs_coll_id);
+	if (collations_check_compatibility(lhs_coll_id, is_lhs_forced,
+					   rhs_coll_id, is_rhs_forced,
+					   coll_id) != 0) {
+		parser->rc = SQL_TARANTOOL_ERROR;
+		parser->nErr++;
+		return NULL;
 	}
-	return coll;
+	return *coll_id == rhs_coll_id ? rhs_coll : lhs_coll;;
 }
 
 /*
diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index cea453f08..ab0376a1d 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -1048,7 +1048,7 @@ selectInnerLoop(Parse * pParse,		/* The parser context */
 								  regPrev + i);
 						VdbeCoverage(v);
 					}
-					if (is_found) {
+					if (coll != NULL) {
 						sqlite3VdbeChangeP4(v, -1,
 								    (const char *)coll,
 								    P4_COLLSEQ);
@@ -1961,8 +1961,9 @@ sqlite3SelectAddColumnTypeAndCollation(Parse * pParse,		/* Parsing contexts */
 		pTab->def->fields[i].type = sql_affinity_to_field_type(affinity);
 		bool is_found;
 		uint32_t coll_id;
+
 		if (pTab->def->fields[i].coll_id == 0 &&
-		    sql_expr_coll(pParse, p, &is_found, &coll_id) && is_found)
+		    sql_expr_coll(pParse, p, &is_found, &coll_id) && coll_id != 0)
 			pTab->def->fields[i].coll_id = coll_id;
 	}
 }
@@ -2150,47 +2151,51 @@ computeLimitRegisters(Parse * pParse, Select * p, int iBreak)
 }
 
 #ifndef SQLITE_OMIT_COMPOUND_SELECT
-static struct coll *
-multi_select_coll_seq_r(Parse *parser, Select *p, int n, bool *is_found,
-			uint32_t *coll_id)
+/**
+ * This function determines resulting collation sequence for
+ * @n-th column of the result set for the compound SELECT
+ * statement. Since compound SELECT performs implicit comparisons
+ * between values, all parts of compound queries must use
+ * the same collation. Otherwise, an error is raised.
+ *
+ * @param parser Parse context.
+ * @param p Select meta-information.
+ * @param n Column number of the result set.
+ * @param is_forced_coll Used if we fall into recursion.
+ *        For most-outer call it is unused. Used to indicate that
+ *        explicit COLLATE clause is used.
+ * @retval Id of collation to be used during string comparison.
+ */
+static uint32_t
+multi_select_coll_seq(struct Parse *parser, struct Select *p, int n,
+		      bool *is_forced_coll)
 {
-	struct coll *coll;
+	bool is_prior_forced = false;
+	bool is_current_forced;
+	uint32_t prior_coll_id = 0;
+	uint32_t current_coll_id;
 	if (p->pPrior != NULL) {
-		coll = multi_select_coll_seq_r(parser, p->pPrior, n, is_found,
-					       coll_id);
-	} else {
-		coll = NULL;
-		*coll_id = 0;
+		prior_coll_id = multi_select_coll_seq(parser, p->pPrior, n,
+						      &is_prior_forced);
 	}
-	assert(n >= 0);
-	/* iCol must be less than p->pEList->nExpr.  Otherwise an error would
-	 * have been thrown during name resolution and we would not have gotten
-	 * this far
+	/*
+	 * Column number must be less than p->pEList->nExpr.
+	 * Otherwise an error would have been thrown during name
+	 * resolution and we would not have got this far.
 	 */
-	if (!*is_found && ALWAYS(n < p->pEList->nExpr)) {
-		coll = sql_expr_coll(parser, p->pEList->a[n].pExpr, is_found,
-				     coll_id);
+	assert(n >= 0 && n < p->pEList->nExpr);
+	sql_expr_coll(parser, p->pEList->a[n].pExpr, &is_current_forced,
+		      &current_coll_id);
+	uint32_t res_coll_id;
+	if (collations_check_compatibility(prior_coll_id, is_prior_forced,
+					   current_coll_id, is_current_forced,
+					   &res_coll_id) != 0) {
+		parser->rc = SQL_TARANTOOL_ERROR;
+		parser->nErr++;
+		return 0;
 	}
-	return coll;
-}
-
-/**
- * The collating sequence for the compound select is taken from the
- * left-most term of the select that has a collating sequence.
- * @param parser Parser.
- * @param p Select.
- * @param n Column number.
- * @param[out] coll_id Collation identifer.
- * @retval The appropriate collating sequence for the n-th column
- *         of the result set for the compound-select statement
- *         "p".
- * @retval NULL The column has no default collating sequence.
- */
-static inline struct coll *
-multi_select_coll_seq(Parse *parser, Select *p, int n, uint32_t *coll_id)
-{
-	bool is_found = false;
-	return multi_select_coll_seq_r(parser, p, n, &is_found, coll_id);
+	*is_forced_coll = (is_prior_forced || is_current_forced);
+	return res_coll_id;
 }
 
 /**
@@ -2227,12 +2232,13 @@ sql_multiselect_orderby_to_key_info(struct Parse *parse, struct Select *s,
 		struct ExprList_item *item = &order_by->a[i];
 		struct Expr *term = item->pExpr;
 		uint32_t id;
+		bool unused;
 		if ((term->flags & EP_Collate) != 0) {
-			bool is_found = false;
-			sql_expr_coll(parse, term, &is_found, &id);
+			sql_expr_coll(parse, term, &unused, &id);
 		} else {
-			multi_select_coll_seq(parse, s,
-					      item->u.x.iOrderByCol - 1, &id);
+			id = multi_select_coll_seq(parse, s,
+						   item->u.x.iOrderByCol - 1,
+						   &unused);
 			if (id != 0) {
 				const char *name = coll_by_id(id)->name;
 				order_by->a[i].pExpr =
@@ -2894,9 +2900,10 @@ multiSelect(Parse * pParse,	/* Parsing context */
 		struct sql_key_info *key_info = sql_key_info_new(db, nCol);
 		if (key_info == NULL)
 			goto multi_select_end;
+		bool unused;
 		for (int i = 0; i < nCol; i++) {
-			multi_select_coll_seq(pParse, p, i,
-					      &key_info->parts[i].coll_id);
+			key_info->parts[i].coll_id =
+				multi_select_coll_seq(pParse, p, i, &unused);
 		}
 
 		for (struct Select *pLoop = p; pLoop; pLoop = pLoop->pPrior) {
@@ -3321,10 +3328,12 @@ multiSelectOrderBy(Parse * pParse,	/* Parsing context */
 		pParse->nMem += expr_count + 1;
 		sqlite3VdbeAddOp2(v, OP_Integer, 0, regPrev);
 		key_info_dup = sql_key_info_new(db, expr_count);
+		bool unused;
 		if (key_info_dup != NULL) {
 			for (int i = 0; i < expr_count; i++) {
-				multi_select_coll_seq(pParse, p, i,
-					&key_info_dup->parts[i].coll_id);
+				key_info_dup->parts[i].coll_id =
+					multi_select_coll_seq(pParse, p, i,
+							      &unused);
 			}
 		}
 	}
@@ -5300,12 +5309,12 @@ updateAccumulator(Parse * pParse, AggInfo * pAggInfo)
 			struct ExprList_item *pItem;
 			int j;
 			assert(pList != 0);	/* pList!=0 if pF->pFunc has NEEDCOLL */
-			bool is_found = false;
+			bool unused;
 			uint32_t id;
-			for (j = 0, pItem = pList->a; !is_found && j < nArg;
+			for (j = 0, pItem = pList->a; coll == NULL && j < nArg;
 			     j++, pItem++) {
 				coll = sql_expr_coll(pParse, pItem->pExpr,
-						     &is_found, &id);
+						     &unused, &id);
 			}
 			if (regHit == 0 && pAggInfo->nAccumulator)
 				regHit = ++pParse->nMem;
diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h
index 1999ff568..cfe87c62a 100644
--- a/src/box/sql/sqliteInt.h
+++ b/src/box/sql/sqliteInt.h
@@ -4330,13 +4330,14 @@ const char *sqlite3ErrStr(int);
  *
  * @param parse Parsing context.
  * @param expr Expression to scan.
- * @param[out] is_found Flag set if collation was found.
+ * @param[out] is_found Flag set if explicit COLLATE clause is used.
  * @param[out] coll_id Collation identifier.
  *
  * @retval Pointer to collation.
  */
 struct coll *
-sql_expr_coll(Parse * pParse, Expr * pExpr, bool *is_found, uint32_t *coll_id);
+sql_expr_coll(Parse * pParse, Expr * pExpr, bool *is_explicit_coll,
+	      uint32_t *coll_id);
 
 Expr *sqlite3ExprAddCollateToken(Parse * pParse, Expr *, const Token *, int);
 Expr *sqlite3ExprAddCollateString(Parse *, Expr *, const char *);
@@ -4631,6 +4632,29 @@ int sqlite3VdbeParameterIndex(Vdbe *, const char *, int);
 int sqlite3TransferBindings(sqlite3_stmt *, sqlite3_stmt *);
 int sqlite3Reprepare(Vdbe *);
 void sqlite3ExprListCheckLength(Parse *, ExprList *, const char *);
+
+/**
+ * This function verifies that two collations (to be more precise
+ * their ids) are compatible. In terms of SQL ANSI they are
+ * compatible if:
+ *  - one of collations is mentioned alongside with explicit
+ *    COLLATE clause, which forces this collation over another
+ *    one. It is allowed to have the same forced collations;
+ * - both collations are derived from table columns and they
+ *   are the same;
+ * - one collation is derived from table column and another
+ *   one is not specified (i.e. COLL_NONE);
+ * In all other cases they are not accounted to be compatible
+ * and error should be raised.
+ * Collation to be used in comparison operator is returned
+ * via @res_id: in case one of collations is absent, then
+ * the second one is utilized.
+ */
+int
+collations_check_compatibility(uint32_t lhs_id, bool is_lhs_forced,
+			       uint32_t rhs_id, bool is_rhs_forced,
+			       uint32_t *res_id);
+
 /**
  * Return a pointer to the collation sequence that should be used
  * by a binary comparison operator comparing left and right.
diff --git a/src/box/sql/where.c b/src/box/sql/where.c
index 1db4db874..44c52d67f 100644
--- a/src/box/sql/where.c
+++ b/src/box/sql/where.c
@@ -304,11 +304,11 @@ whereScanNext(WhereScan * pScan)
 								Parse *pParse =
 									pWC->pWInfo->pParse;
 								assert(pX->pLeft);
-								uint32_t id;
+								uint32_t unused;
 								struct coll *coll =
 									sql_binary_compare_coll_seq(
 										pParse, pX->pLeft,
-										pX->pRight, &id);
+										pX->pRight, &unused);
 								if (coll != pScan->coll)
 									continue;
 							}
@@ -559,7 +559,7 @@ findIndexCol(Parse * pParse,	/* Parse context */
 			struct coll *coll = sql_expr_coll(pParse,
 							  pList->a[i].pExpr,
 							  &is_found, &id);
-			if (is_found && coll == part_to_match->coll)
+			if (coll == part_to_match->coll)
 				return i;
 		}
 	}
@@ -2288,8 +2288,8 @@ whereRangeVectorLen(Parse * pParse,	/* Parsing context */
 		    sqlite3TableColumnAffinity(space->def, pLhs->iColumn);
 		if (aff != idxaff)
 			break;
-		uint32_t id;
-		pColl = sql_binary_compare_coll_seq(pParse, pLhs, pRhs, &id);
+		uint32_t unused;
+		pColl = sql_binary_compare_coll_seq(pParse, pLhs, pRhs, &unused);
 		if (pColl == 0)
 			break;
 		if (idx_def->key_def->parts[i + nEq].coll != pColl)
@@ -3408,8 +3408,7 @@ wherePathSatisfiesOrderBy(WhereInfo * pWInfo,	/* The WHERE clause */
 								      &is_found, &id);
 						struct coll *idx_coll =
 							idx_def->key_def->parts[j].coll;
-						if (is_found &&
-						    coll != idx_coll)
+						if (coll != idx_coll)
 							continue;
 					}
 					isMatch = 1;
diff --git a/src/box/sql/whereexpr.c b/src/box/sql/whereexpr.c
index 9fa6ce15d..e05967554 100644
--- a/src/box/sql/whereexpr.c
+++ b/src/box/sql/whereexpr.c
@@ -168,7 +168,7 @@ exprCommute(Parse * pParse, Expr * pExpr)
 			bool is_found;
 			uint32_t id;
 			sql_expr_coll(pParse, pExpr->pLeft, &is_found, &id);
-			if (is_found) {
+			if (id != 0) {
 				/*
 				 * Neither X nor Y have COLLATE
 				 * operators, but X has a
@@ -837,15 +837,16 @@ termIsEquivalence(Parse * pParse, Expr * pExpr)
 	    ) {
 		return 0;
 	}
-	uint32_t id;
+	uint32_t unused;
 	struct coll *coll1 =
 	    sql_binary_compare_coll_seq(pParse, pExpr->pLeft, pExpr->pRight,
-					&id);
+					&unused);
 	if (coll1 == NULL)
 		return 1;
-	bool unused;
-	coll1 = sql_expr_coll(pParse, pExpr->pLeft, &unused, &id);
-	struct coll *coll2 = sql_expr_coll(pParse, pExpr->pRight, &unused, &id);
+	bool unused1;
+	coll1 = sql_expr_coll(pParse, pExpr->pLeft, &unused1, &unused);
+	struct coll *coll2 = sql_expr_coll(pParse, pExpr->pRight, &unused1,
+					   &unused);
 	return coll1 != NULL && coll1 == coll2;
 }
 
diff --git a/test/box/misc.result b/test/box/misc.result
index ea3cd8805..4401fc1b6 100644
--- a/test/box/misc.result
+++ b/test/box/misc.result
@@ -499,6 +499,7 @@ t;
   170: box.error.CONSTRAINT_EXISTS
   171: box.error.SQL_TYPE_MISMATCH
   172: box.error.DROP_COLLATION
+  173: box.error.ILLEGAL_COLLATION_MIX
 ...
 test_run:cmd("setopt delimiter ''");
 ---
diff --git a/test/sql-tap/e_select1.test.lua b/test/sql-tap/e_select1.test.lua
index 47fb7a809..e9f29465c 100755
--- a/test/sql-tap/e_select1.test.lua
+++ b/test/sql-tap/e_select1.test.lua
@@ -1,6 +1,6 @@
 #!/usr/bin/env tarantool
 test = require("sqltester")
-test:plan(525)
+test:plan(523)
 
 --!./tcltestrunner.lua
 -- 2010 July 16
@@ -1588,12 +1588,10 @@ test:do_select_tests(
         {"1", "SELECT 'abc'                UNION SELECT 'ABC'", {"ABC",  "abc"}},
         {"2", "SELECT 'abc' COLLATE \"unicode_ci\" UNION SELECT 'ABC'", {"ABC" }},
         {"3", "SELECT 'abc'                UNION SELECT 'ABC' COLLATE \"unicode_ci\"", {"ABC" }},
-        {"4", "SELECT 'abc' COLLATE binary UNION SELECT 'ABC' COLLATE \"unicode_ci\"", {"ABC",  "abc"}},
-        {"5", "SELECT 'abc' COLLATE \"unicode_ci\" UNION SELECT 'ABC' COLLATE binary", {"ABC" }},
-        {"6", "SELECT a FROM y1 UNION SELECT b FROM y1", {"abc" }},
-        {"7", "SELECT b FROM y1 UNION SELECT a FROM y1", {"Abc",  "abc"}},
-        {"8", "SELECT a FROM y1 UNION SELECT c FROM y1", {"aBC" }},
-        {"9", "SELECT a FROM y1 UNION SELECT c COLLATE binary FROM y1", {"aBC" }},
+        {"4", "SELECT 'abc' COLLATE binary UNION SELECT 'ABC'", {"ABC",  "abc"}},
+        {"5", "SELECT 'abc' COLLATE \"unicode_ci\" UNION SELECT 'ABC'", {"ABC" }},
+        {"6", "SELECT a FROM y1 UNION SELECT c FROM y1", {"aBC" }},
+        {"7", "SELECT a FROM y1 UNION SELECT c COLLATE binary FROM y1", {"Abc", "aBC" }},
     })
 
 -- EVIDENCE-OF: R-32706-07403 No affinity transformations are applied to
diff --git a/test/sql-tap/in4.test.lua b/test/sql-tap/in4.test.lua
index ef426b092..b029535af 100755
--- a/test/sql-tap/in4.test.lua
+++ b/test/sql-tap/in4.test.lua
@@ -531,7 +531,7 @@ test:do_execsql_test(
         SELECT c FROM t4a WHERE a=b ORDER BY c;
     ]], {
         -- <in4-4.1>
-        3
+        1, 3
         -- </in4-4.1>
     })
 
diff --git a/test/sql-tap/join.test.lua b/test/sql-tap/join.test.lua
index 4e4ec6422..d8a4f2e7f 100755
--- a/test/sql-tap/join.test.lua
+++ b/test/sql-tap/join.test.lua
@@ -1003,7 +1003,7 @@ test:do_execsql_test(
         SELECT a,b FROM t2 NATURAL JOIN t1 
     ]], {
         -- <join-11.7>
-        "two", 2
+        "one", 1, "two", 2
         -- </join-11.7>
     })
 
diff --git a/test/sql-tap/tkt3493.test.lua b/test/sql-tap/tkt3493.test.lua
index 26ca2271b..7eb4fd726 100755
--- a/test/sql-tap/tkt3493.test.lua
+++ b/test/sql-tap/tkt3493.test.lua
@@ -246,7 +246,7 @@ test:do_execsql_test(
 test:do_execsql_test(
     "tkt3493-3.1",
     [[
-        CREATE TABLE t2(a  TEXT COLLATE "unicode_ci" PRIMARY KEY, b  TEXT COLLATE BINARY);
+        CREATE TABLE t2(a  TEXT COLLATE "unicode_ci" PRIMARY KEY, b  TEXT);
         INSERT INTO t2 VALUES('aBc', 'DeF');
     ]], {
         -- <tkt3493-3.1>
@@ -304,7 +304,7 @@ test:do_execsql_test(
         SELECT b>a FROM t2 GROUP BY a, b
     ]], {
         -- <tkt3493-3.3.3>
-        0
+        1
         -- </tkt3493-3.3.3>
     })
 
diff --git a/test/sql-tap/transitive1.test.lua b/test/sql-tap/transitive1.test.lua
index 178fd9da6..8baad49c2 100755
--- a/test/sql-tap/transitive1.test.lua
+++ b/test/sql-tap/transitive1.test.lua
@@ -503,7 +503,7 @@ test:do_execsql_test(
         SELECT * FROM c1 WHERE x=y AND z=y AND z='abc';
     ]], {
         -- <transitive1-570>
-
+        1, "ABC", "ABC", "abc"
         -- </transitive1-570>
     })
 
@@ -516,7 +516,7 @@ test:do_execsql_test(
         SELECT * FROM c1 WHERE x=y AND z=y AND z='abc';
     ]], {
         -- <transitive1-570eqp>
-        "/SEARCH TABLE C1 USING COVERING INDEX C1X/"
+        "/SCAN TABLE C1/"
         -- </transitive1-570eqp>
     })
 
diff --git a/test/sql-tap/with1.test.lua b/test/sql-tap/with1.test.lua
index faa99811c..43c913bbd 100755
--- a/test/sql-tap/with1.test.lua
+++ b/test/sql-tap/with1.test.lua
@@ -880,7 +880,7 @@ test:do_execsql_test("10.8.4.2", [[
   SELECT a FROM tst UNION ALL SELECT b COLLATE "unicode_ci" FROM tst ORDER BY 1;
 ]], {
   -- <10.8.4.2>
-  "A", "B", "C", "a", "b", "c"
+  "a", "A", "b", "B", "c", "C"
   -- </10.8.4.2>
 })
 
diff --git a/test/sql/collation.result b/test/sql/collation.result
index 3df4cfa70..c6bea382b 100644
--- a/test/sql/collation.result
+++ b/test/sql/collation.result
@@ -134,6 +134,69 @@ box.space._collation:delete{0}
 ---
 - error: 'Can''t drop collation none : system collation'
 ...
+-- gh-3185: collations of LHS and RHS must be compatible.
+--
+box.sql.execute("CREATE TABLE t (id INT PRIMARY KEY, a TEXT, b TEXT COLLATE BINARY, c TEXT COLLATE \"unicode_ci\");")
+---
+...
+box.sql.execute("SELECT * FROM t WHERE a = b;")
+---
+- []
+...
+box.sql.execute("SELECT * FROM t WHERE a COLLATE BINARY = b;")
+---
+- []
+...
+box.sql.execute("SELECT * FROM t WHERE b = c;")
+---
+- error: Illegal mix of collations
+...
+box.sql.execute("SELECT * FROM t WHERE b COLLATE BINARY = c;")
+---
+- []
+...
+box.sql.execute("SELECT * FROM t WHERE a = c;")
+---
+- []
+...
+box.sql.execute("SELECT * FROM t WHERE a COLLATE BINARY = c COLLATE \"unicode\";")
+---
+- error: Illegal mix of collations
+...
+-- Compound queries perform implicit comparisons between values.
+-- Hence, rules for collations compatibilities are the same.
+--
+box.sql.execute("SELECT 'abc' COLLATE binary UNION SELECT 'ABC' COLLATE \"unicode_ci\"")
+---
+- error: Illegal mix of collations
+...
+box.sql.execute("SELECT 'abc' COLLATE \"unicode_ci\" UNION SELECT 'ABC' COLLATE binary")
+---
+- error: Illegal mix of collations
+...
+box.sql.execute("SELECT c FROM t UNION SELECT b FROM t;")
+---
+- error: Illegal mix of collations
+...
+box.sql.execute("SELECT b FROM t UNION SELECT a FROM t;")
+---
+- []
+...
+box.sql.execute("SELECT a FROM t UNION SELECT c FROM t;")
+---
+- []
+...
+box.sql.execute("SELECT c COLLATE BINARY FROM t UNION SELECT a FROM t;")
+---
+- []
+...
+box.sql.execute("SELECT b COLLATE \"unicode\" FROM t UNION SELECT a FROM t;")
+---
+- []
+...
+box.sql.execute("DROP TABLE t;")
+---
+...
 box.schema.user.revoke('guest', 'read,write,execute', 'universe')
 ---
 ...
diff --git a/test/sql/collation.test.lua b/test/sql/collation.test.lua
index 61df33a95..00e5113b1 100644
--- a/test/sql/collation.test.lua
+++ b/test/sql/collation.test.lua
@@ -56,4 +56,26 @@ box.sql.execute("DROP TABLE t;")
 box.space._collation:select{0}
 box.space._collation:delete{0}
 
+-- gh-3185: collations of LHS and RHS must be compatible.
+--
+box.sql.execute("CREATE TABLE t (id INT PRIMARY KEY, a TEXT, b TEXT COLLATE BINARY, c TEXT COLLATE \"unicode_ci\");")
+box.sql.execute("SELECT * FROM t WHERE a = b;")
+box.sql.execute("SELECT * FROM t WHERE a COLLATE BINARY = b;")
+box.sql.execute("SELECT * FROM t WHERE b = c;")
+box.sql.execute("SELECT * FROM t WHERE b COLLATE BINARY = c;")
+box.sql.execute("SELECT * FROM t WHERE a = c;")
+box.sql.execute("SELECT * FROM t WHERE a COLLATE BINARY = c COLLATE \"unicode\";")
+
+-- Compound queries perform implicit comparisons between values.
+-- Hence, rules for collations compatibilities are the same.
+--
+box.sql.execute("SELECT 'abc' COLLATE binary UNION SELECT 'ABC' COLLATE \"unicode_ci\"")
+box.sql.execute("SELECT 'abc' COLLATE \"unicode_ci\" UNION SELECT 'ABC' COLLATE binary")
+box.sql.execute("SELECT c FROM t UNION SELECT b FROM t;")
+box.sql.execute("SELECT b FROM t UNION SELECT a FROM t;")
+box.sql.execute("SELECT a FROM t UNION SELECT c FROM t;")
+box.sql.execute("SELECT c COLLATE BINARY FROM t UNION SELECT a FROM t;")
+box.sql.execute("SELECT b COLLATE \"unicode\" FROM t UNION SELECT a FROM t;")
+
+box.sql.execute("DROP TABLE t;")
 box.schema.user.revoke('guest', 'read,write,execute', 'universe')
-- 
2.15.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/3] Introduce "none" and "binary" collations
  2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 2/3] Introduce "none" and "binary" collations Nikita Pettik
@ 2018-11-13 12:02   ` Vladislav Shpilevoy
  2018-11-13 22:37     ` n.pettik
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2018-11-13 12:02 UTC (permalink / raw)
  To: tarantool-patches, Nikita Pettik

Hi! Thanks for the fixes!

Please, remember to put branch/issue link
into a covering letter next time.

See 5 comments below, my fixes at the end
of the email and on the branch.

> diff --git a/src/box/alter.cc b/src/box/alter.cc
> index 6d2c59bbc..2eb9f53e8 100644
> --- a/src/box/alter.cc
> +++ b/src/box/alter.cc
> @@ -403,7 +403,7 @@ field_def_decode(struct field_def *field, const char **data,
>   				     "nullable action properties", fieldno +
>   				     TUPLE_INDEX_BASE));
>   	}
> -	if (field->coll_id != COLL_NONE &&
> +	if (field->coll_id != 0 &&

1. Why not to leave enum COLL_NONE, but change its value to 0?
I guess, it is a purpose of define/enum features - define a name
for a constant to be able to change the value under the hood. It
could have reduced the diff.

>   	    field->type != FIELD_TYPE_STRING &&
>   	    field->type != FIELD_TYPE_SCALAR &&
>   	    field->type != FIELD_TYPE_ANY) {
> diff --git a/src/box/sql/callback.c b/src/box/sql/callback.c
> index 3cf3a835d..352745e0e 100644
> --- a/src/box/sql/callback.c
> +++ b/src/box/sql/callback.c
> @@ -42,13 +42,22 @@
>   struct coll *
>   sql_get_coll_seq(Parse *parser, const char *name, uint32_t *coll_id)
>   {
> -	if (name == NULL || strcasecmp(name, "binary") == 0) {
> -		*coll_id = COLL_NONE;
> -		return NULL;
> +	if (name == NULL) {
> +		*coll_id = 0;
> +		return coll_by_id(0)->coll;
>   	}
> -	struct coll_id *p = coll_by_name(name, strlen(name));
> +	struct coll_id *p;
> +	/*
> +	 * In SQL all identifiers should be uppercased, so
> +	 * to avoid mess lets simple search binary (since it is
> +	 * sort of "special" collation) ignoring case at all.
> +	 */

2. I am not sure if it should be so special. I think, we should
treat it just like any other collation. If tests have BINARY
uppercased, then they should be fixed, I guess.

> +	if (strcasecmp(name, "binary") == 0)
> +		p = coll_by_name("binary", strlen("binary"));
> +	else
> +		p = coll_by_name(name, strlen(name));
>   	if (p == NULL) {
> -		*coll_id = COLL_NONE;
> +		*coll_id = 0;
>   		sqlite3ErrorMsg(parser, "no such collation sequence: %s",
>   				name);
>   		return NULL;
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index 8c34cbb3d..580cf1e60 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -506,7 +506,7 @@ case_type##ICUFunc(sqlite3_context *context, int argc, sqlite3_value **argv)   \
>   	UErrorCode status = U_ZERO_ERROR;                                      \
>   	struct coll *coll = sqlite3GetFuncCollSeq(context);                    \
>   	const char *locale = NULL;                                             \
> -	if (coll != NULL) {                                                    \
> +	if (coll != NULL && coll->collator != NULL) {                          \

3. Why do you use coll->collator != 0 instead of coll->type == COLL_TYPE_ICU?
I do not think it is safe since in future we can move collator into a union,
which will not be NULL for a non-ICU collation.

>   		locale = ucol_getLocaleByType(coll->collator,                  \
>   					      ULOC_VALID_LOCALE, &status);     \
>   	}        > diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
> index 5bdf102e7..8e5aea51b 100644
> --- a/src/box/tuple_format.c
> +++ b/src/box/tuple_format.c
> @@ -40,7 +40,7 @@ static uint32_t formats_size = 0, formats_capacity = 0;
>  
>  static const struct tuple_field tuple_field_default = {
>         FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
> -       ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE,
> +       ON_CONFLICT_ACTION_DEFAULT, NULL, 0,
>  };

4. Are you sure that changing on_conflict_action belongs
to this patch?

5. How about tests like this one?

box.schema.create_space('test', {format = {{1, 'unsigned'}, {2, 'string', collation = 'binary'}}})

I mean specifying collations via Lua. And what happens if I use collation = 'none'
in a space format or index parts? I think, it should throw an error, but it is up
to Kostja or Kirill.

==================================================

commit 58691e9f2f593771865264c6aaed93b36cc13c08
Author: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Date:   Tue Nov 13 14:56:17 2018 +0300

     Speculative review fixes

diff --git a/src/box/alter.cc b/src/box/alter.cc
index 2eb9f53e8..474ef791e 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -403,7 +403,7 @@ field_def_decode(struct field_def *field, const char **data,
  				     "nullable action properties", fieldno +
  				     TUPLE_INDEX_BASE));
  	}
-	if (field->coll_id != 0 &&
+	if (field->coll_id != COLL_NONE &&
  	    field->type != FIELD_TYPE_STRING &&
  	    field->type != FIELD_TYPE_SCALAR &&
  	    field->type != FIELD_TYPE_ANY) {
@@ -2676,7 +2676,7 @@ on_replace_dd_collation(struct trigger * /* trigger */, void *event)
  		 * under the hood. Hence, we can rely on the
  		 * fact that "none" collation features id == 0.
  		 */
-		if (old_id == 0) {
+		if (old_id == COLL_NONE) {
  			tnt_raise(ClientError, ER_DROP_COLLATION, "none",
  				  "system collation");
  		}
diff --git a/src/box/coll_id.h b/src/box/coll_id.h
index 1b67a3f86..82e50d11b 100644
--- a/src/box/coll_id.h
+++ b/src/box/coll_id.h
@@ -37,6 +37,10 @@
  extern "C" {
  #endif /* defined(__cplusplus) */
  
+enum {
+	COLL_NONE = 0,
+};
+
  struct coll_id_def;
  struct coll;
  
diff --git a/src/box/field_def.c b/src/box/field_def.c
index 3e63e12a3..3a9ff3703 100644
--- a/src/box/field_def.c
+++ b/src/box/field_def.c
@@ -123,7 +123,7 @@ const struct field_def field_def_default = {
  	.name = NULL,
  	.is_nullable = false,
  	.nullable_action = ON_CONFLICT_ACTION_DEFAULT,
-	.coll_id = 0,
+	.coll_id = COLL_NONE,
  	.default_value = NULL,
  	.default_value_expr = NULL
  };
diff --git a/src/box/key_def.c b/src/box/key_def.c
index 6802489f1..3a560bb06 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -41,7 +41,7 @@ const char *sort_order_strs[] = { "asc", "desc", "undef" };
  const struct key_part_def key_part_def_default = {
  	0,
  	field_type_MAX,
-	0,
+	COLL_NONE,
  	false,
  	ON_CONFLICT_ACTION_DEFAULT,
  	SORT_ORDER_ASC
@@ -174,7 +174,7 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
  	for (uint32_t i = 0; i < part_count; i++) {
  		const struct key_part_def *part = &parts[i];
  		struct coll *coll = NULL;
-		if (part->coll_id != 0) {
+		if (part->coll_id != COLL_NONE) {
  			struct coll_id *coll_id = coll_by_id(part->coll_id);
  			if (coll_id == NULL) {
  				diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
@@ -223,7 +223,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
  		key_def_set_part(key_def, item, fields[item],
  				 (enum field_type)types[item],
  				 ON_CONFLICT_ACTION_DEFAULT,
-				 NULL, 0, SORT_ORDER_ASC);
+				 NULL, COLL_NONE, SORT_ORDER_ASC);
  	}
  	key_def_set_cmp(key_def);
  	return key_def;
@@ -319,7 +319,7 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
  	for (uint32_t i = 0; i < part_count; i++) {
  		const struct key_part_def *part = &parts[i];
  		int count = 2;
-		if (part->coll_id != 0)
+		if (part->coll_id != COLL_NONE)
  			count++;
  		if (part->is_nullable)
  			count++;
@@ -329,7 +329,7 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
  		assert(part->type < field_type_MAX);
  		size += mp_sizeof_str(strlen(PART_OPT_TYPE));
  		size += mp_sizeof_str(strlen(field_type_strs[part->type]));
-		if (part->coll_id != 0) {
+		if (part->coll_id != COLL_NONE) {
  			size += mp_sizeof_str(strlen(PART_OPT_COLLATION));
  			size += mp_sizeof_uint(part->coll_id);
  		}
@@ -348,7 +348,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
  	for (uint32_t i = 0; i < part_count; i++) {
  		const struct key_part_def *part = &parts[i];
  		int count = 2;
-		if (part->coll_id != 0)
+		if (part->coll_id != COLL_NONE)
  			count++;
  		if (part->is_nullable)
  			count++;
@@ -361,7 +361,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
  		assert(part->type < field_type_MAX);
  		const char *type_str = field_type_strs[part->type];
  		data = mp_encode_str(data, type_str, strlen(type_str));
-		if (part->coll_id != 0) {
+		if (part->coll_id != COLL_NONE) {
  			data = mp_encode_str(data, PART_OPT_COLLATION,
  					     strlen(PART_OPT_COLLATION));
  			data = mp_encode_uint(data, part->coll_id);
@@ -431,7 +431,7 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count,
  		part->is_nullable = (part->fieldno < field_count ?
  				     fields[part->fieldno].is_nullable :
  				     key_part_def_default.is_nullable);
-		part->coll_id = 0;
+		part->coll_id = COLL_NONE;
  	}
  	return 0;
  }
@@ -488,7 +488,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
  				 "index part: unknown field type");
  			return -1;
  		}
-		if (part->coll_id != 0 &&
+		if (part->coll_id != COLL_NONE &&
  		    part->type != FIELD_TYPE_STRING &&
  		    part->type != FIELD_TYPE_SCALAR) {
  			diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index e5e09b042..7cae436f1 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -299,7 +299,7 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
  			lua_pushboolean(L, key_part_is_nullable(part));
  			lua_setfield(L, -2, "is_nullable");
  
-			if (part->coll_id != 0) {
+			if (part->coll_id != COLL_NONE) {
  				struct coll_id *coll_id =
  					coll_by_id(part->coll_id);
  				assert(coll_id != NULL);
diff --git a/src/box/sql.c b/src/box/sql.c
index 686d32335..caa66144f 100644
--- a/src/box/sql.c
+++ b/src/box/sql.c
@@ -379,7 +379,7 @@ sql_ephemeral_space_create(uint32_t field_count, struct sql_key_info *key_info)
  		if (def != NULL && i < def->part_count)
  			part->coll_id = def->parts[i].coll_id;
  		else
-			part->coll_id = 0;
+			part->coll_id = COLL_NONE;
  	}
  	struct key_def *ephemer_key_def = key_def_new(ephemer_key_parts,
  						      field_count);
@@ -1139,7 +1139,7 @@ sql_encode_table(struct region *region, struct Table *table, uint32_t *size)
  		struct field_def *field = &def->fields[i];
  		const char *default_str = field->default_value;
  		int base_len = 5;
-		if (cid != 0)
+		if (cid != COLL_NONE)
  			base_len += 1;
  		if (default_str != NULL)
  			base_len += 1;
@@ -1160,7 +1160,7 @@ sql_encode_table(struct region *region, struct Table *table, uint32_t *size)
  		const char *action =
  			on_conflict_action_strs[def->fields[i].nullable_action];
  		mpstream_encode_str(&stream, action);
-		if (cid != 0) {
+		if (cid != COLL_NONE) {
  			mpstream_encode_str(&stream, "collation");
  			mpstream_encode_uint(&stream, cid);
  		}
@@ -1285,13 +1285,13 @@ sql_encode_index_parts(struct region *region, const struct field_def *fields,
  		       action_is_nullable(fields[col].nullable_action));
  		/* Do not decode default collation. */
  		uint32_t cid = part->coll_id;
-		mpstream_encode_map(&stream, 5 + (cid != 0));
+		mpstream_encode_map(&stream, 5 + (cid != COLL_NONE));
  		mpstream_encode_str(&stream, "type");
  		mpstream_encode_str(&stream, field_type_strs[fields[col].type]);
  		mpstream_encode_str(&stream, "field");
  		mpstream_encode_uint(&stream, col);
  
-		if (cid != 0) {
+		if (cid != COLL_NONE) {
  			mpstream_encode_str(&stream, "collation");
  			mpstream_encode_uint(&stream, cid);
  		}
diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index 929d20dbf..5b3348bd2 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -2399,7 +2399,7 @@ index_fill_def(struct Parse *parse, struct index *index,
  		uint32_t coll_id;
  		if (expr->op == TK_COLLATE) {
  			sql_get_coll_seq(parse, expr->u.zToken, &coll_id);
-			if (coll_id == 0 &&
+			if (coll_id == COLL_NONE &&
  			    strcasecmp(expr->u.zToken, "binary") != 0) {
  				diag_set(ClientError, ER_NO_SUCH_COLLATION,
  					 expr->u.zToken);
diff --git a/src/box/sql/callback.c b/src/box/sql/callback.c
index 352745e0e..927eb2643 100644
--- a/src/box/sql/callback.c
+++ b/src/box/sql/callback.c
@@ -43,8 +43,8 @@ struct coll *
  sql_get_coll_seq(Parse *parser, const char *name, uint32_t *coll_id)
  {
  	if (name == NULL) {
-		*coll_id = 0;
-		return coll_by_id(0)->coll;
+		*coll_id = COLL_NONE;
+		return coll_by_id(COLL_NONE)->coll;
  	}
  	struct coll_id *p;
  	/*
@@ -57,7 +57,7 @@ sql_get_coll_seq(Parse *parser, const char *name, uint32_t *coll_id)
  	else
  		p = coll_by_name(name, strlen(name));
  	if (p == NULL) {
-		*coll_id = 0;
+		*coll_id = COLL_NONE;
  		sqlite3ErrorMsg(parser, "no such collation sequence: %s",
  				name);
  		return NULL;
diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
index e52cd6407..4d1c1a634 100644
--- a/src/box/sql/expr.c
+++ b/src/box/sql/expr.c
@@ -194,7 +194,7 @@ sql_expr_coll(Parse *parse, Expr *p, bool *is_found, uint32_t *coll_id)
  {
  	struct coll *coll = NULL;
  	*is_found = false;
-	*coll_id = 0;
+	*coll_id = COLL_NONE;
  	while (p != NULL) {
  		int op = p->op;
  		if (p->flags & EP_Generic)
diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index cea453f08..dfa6ed8e0 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -1345,7 +1345,7 @@ sql_key_info_new(sqlite3 *db, uint32_t part_count)
  		struct key_part_def *part = &key_info->parts[i];
  		part->fieldno = i;
  		part->type = FIELD_TYPE_SCALAR;
-		part->coll_id = 0;
+		part->coll_id = COLL_NONE;
  		part->is_nullable = false;
  		part->nullable_action = ON_CONFLICT_ACTION_ABORT;
  		part->sort_order = SORT_ORDER_ASC;
@@ -1961,7 +1961,7 @@ sqlite3SelectAddColumnTypeAndCollation(Parse * pParse,		/* Parsing contexts */
  		pTab->def->fields[i].type = sql_affinity_to_field_type(affinity);
  		bool is_found;
  		uint32_t coll_id;
-		if (pTab->def->fields[i].coll_id == 0 &&
+		if (pTab->def->fields[i].coll_id == COLL_NONE &&
  		    sql_expr_coll(pParse, p, &is_found, &coll_id) && is_found)
  			pTab->def->fields[i].coll_id = coll_id;
  	}
@@ -2160,7 +2160,7 @@ multi_select_coll_seq_r(Parse *parser, Select *p, int n, bool *is_found,
  					       coll_id);
  	} else {
  		coll = NULL;
-		*coll_id = 0;
+		*coll_id = COLL_NONE;
  	}
  	assert(n >= 0);
  	/* iCol must be less than p->pEList->nExpr.  Otherwise an error would
@@ -2233,7 +2233,7 @@ sql_multiselect_orderby_to_key_info(struct Parse *parse, struct Select *s,
  		} else {
  			multi_select_coll_seq(parse, s,
  					      item->u.x.iOrderByCol - 1, &id);
-			if (id != 0) {
+			if (id != COLL_NONE) {
  				const char *name = coll_by_id(id)->name;
  				order_by->a[i].pExpr =
  					sqlite3ExprAddCollateString(parse, term,
diff --git a/src/box/sql/where.c b/src/box/sql/where.c
index 1db4db874..8c78c0c9b 100644
--- a/src/box/sql/where.c
+++ b/src/box/sql/where.c
@@ -2806,7 +2806,7 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder,	/* WHERE clause information */
  		part.nullable_action = ON_CONFLICT_ACTION_ABORT;
  		part.is_nullable = false;
  		part.sort_order = SORT_ORDER_ASC;
-		part.coll_id = 0;
+		part.coll_id = COLL_NONE;
  
  		struct key_def *key_def = key_def_new(&part, 1);
  		if (key_def == NULL) {
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 8e5aea51b..4a8905262 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -40,7 +40,7 @@ static uint32_t formats_size = 0, formats_capacity = 0;
  
  static const struct tuple_field tuple_field_default = {
  	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
-	ON_CONFLICT_ACTION_DEFAULT, NULL, 0,
+	ON_CONFLICT_ACTION_DEFAULT, NULL, COLL_NONE,
  };
  
  /**
@@ -67,7 +67,7 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
  		format->fields[i].nullable_action = fields[i].nullable_action;
  		struct coll *coll = NULL;
  		uint32_t cid = fields[i].coll_id;
-		if (cid != 0) {
+		if (cid != COLL_NONE) {
  			struct coll_id *coll_id = coll_by_id(cid);
  			if (coll_id == NULL) {
  				diag_set(ClientError,ER_WRONG_COLLATION_OPTIONS,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH v2 3/3] sql: change collation compatibility rules
  2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 3/3] sql: change collation compatibility rules Nikita Pettik
@ 2018-11-13 12:02   ` Vladislav Shpilevoy
  2018-11-13 22:37     ` n.pettik
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2018-11-13 12:02 UTC (permalink / raw)
  To: Nikita Pettik, tarantool-patches

Thanks for the fixes!

See 2 comments below, my fixes at the end
of the email and on the branch.

> diff --git a/src/box/sql/select.c b/src/box/sql/select.c
> index cea453f08..ab0376a1d 100644
> --- a/src/box/sql/select.c
> +++ b/src/box/sql/select.c
> @@ -2150,47 +2151,51 @@ computeLimitRegisters(Parse * pParse, Select * p, int iBreak)
>   }
>   
>   #ifndef SQLITE_OMIT_COMPOUND_SELECT
> -static struct coll *
> -multi_select_coll_seq_r(Parse *parser, Select *p, int n, bool *is_found,
> -			uint32_t *coll_id)
> +/**
> + * This function determines resulting collation sequence for
> + * @n-th column of the result set for the compound SELECT
> + * statement. Since compound SELECT performs implicit comparisons
> + * between values, all parts of compound queries must use
> + * the same collation. Otherwise, an error is raised.
> + *
> + * @param parser Parse context.
> + * @param p Select meta-information.
> + * @param n Column number of the result set.
> + * @param is_forced_coll Used if we fall into recursion.
> + *        For most-outer call it is unused. Used to indicate that
> + *        explicit COLLATE clause is used.
> + * @retval Id of collation to be used during string comparison.
> + */
> +static uint32_t
> +multi_select_coll_seq(struct Parse *parser, struct Select *p, int n,
> +		      bool *is_forced_coll)

1. I renamed this function into multi_select_coll_seq_r and
created a wrapper without is_forced_coll to remove some of
'bool unused' things. Apply or drop the fix, up to you.

> diff --git a/src/box/sql/whereexpr.c b/src/box/sql/whereexpr.c
> index 9fa6ce15d..e05967554 100644
> --- a/src/box/sql/whereexpr.c
> +++ b/src/box/sql/whereexpr.c
> @@ -168,7 +168,7 @@ exprCommute(Parse * pParse, Expr * pExpr)
>   			bool is_found;
>   			uint32_t id;
>   			sql_expr_coll(pParse, pExpr->pLeft, &is_found, &id);
> -			if (is_found) {
> +			if (id != 0) {

2. If you fix a comment about COLL_NONE constant return,
do not forget to pick this hunk up as well, please. Maybe
this patch contains more, I am not sure.

>   				/*
>   				 * Neither X nor Y have COLLATE
>   				 * operators, but X has a

====================================================================

commit b138616a1851dbe905a2e86b981f36793a7eb117
Author: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Date:   Tue Nov 13 14:39:45 2018 +0300

     Speculative review fixes

diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index ffbe83aa5..2db8e1c7f 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -2167,16 +2167,16 @@ computeLimitRegisters(Parse * pParse, Select * p, int iBreak)
   * @retval Id of collation to be used during string comparison.
   */
  static uint32_t
-multi_select_coll_seq(struct Parse *parser, struct Select *p, int n,
-		      bool *is_forced_coll)
+multi_select_coll_seq_r(struct Parse *parser, struct Select *p, int n,
+			bool *is_forced_coll)
  {
  	bool is_prior_forced = false;
  	bool is_current_forced;
  	uint32_t prior_coll_id = 0;
  	uint32_t current_coll_id;
  	if (p->pPrior != NULL) {
-		prior_coll_id = multi_select_coll_seq(parser, p->pPrior, n,
-						      &is_prior_forced);
+		prior_coll_id = multi_select_coll_seq_r(parser, p->pPrior, n,
+							&is_prior_forced);
  	}
  	/*
  	 * Column number must be less than p->pEList->nExpr.
@@ -2198,6 +2198,13 @@ multi_select_coll_seq(struct Parse *parser, struct Select *p, int n,
  	return res_coll_id;
  }
  
+static inline uint32_t
+multi_select_coll_seq(struct Parse *parser, struct Select *p, int n)
+{
+	bool unused;
+	return multi_select_coll_seq_r(parser, p, n, &unused);
+}
+
  /**
   * The select statement passed as the second parameter is a
   * compound SELECT with an ORDER BY clause. This function
@@ -2237,8 +2244,7 @@ sql_multiselect_orderby_to_key_info(struct Parse *parse, struct Select *s,
  			sql_expr_coll(parse, term, &unused, &id);
  		} else {
  			id = multi_select_coll_seq(parse, s,
-						   item->u.x.iOrderByCol - 1,
-						   &unused);
+						   item->u.x.iOrderByCol - 1);
  			if (id != COLL_NONE) {
  				const char *name = coll_by_id(id)->name;
  				order_by->a[i].pExpr =
@@ -2900,10 +2906,9 @@ multiSelect(Parse * pParse,	/* Parsing context */
  		struct sql_key_info *key_info = sql_key_info_new(db, nCol);
  		if (key_info == NULL)
  			goto multi_select_end;
-		bool unused;
  		for (int i = 0; i < nCol; i++) {
  			key_info->parts[i].coll_id =
-				multi_select_coll_seq(pParse, p, i, &unused);
+				multi_select_coll_seq(pParse, p, i);
  		}
  
  		for (struct Select *pLoop = p; pLoop; pLoop = pLoop->pPrior) {
@@ -3328,12 +3333,10 @@ multiSelectOrderBy(Parse * pParse,	/* Parsing context */
  		pParse->nMem += expr_count + 1;
  		sqlite3VdbeAddOp2(v, OP_Integer, 0, regPrev);
  		key_info_dup = sql_key_info_new(db, expr_count);
-		bool unused;
  		if (key_info_dup != NULL) {
  			for (int i = 0; i < expr_count; i++) {
  				key_info_dup->parts[i].coll_id =
-					multi_select_coll_seq(pParse, p, i,
-							      &unused);
+					multi_select_coll_seq(pParse, p, i);
  			}
  		}
  	}
diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h
index cfe87c62a..470320e8d 100644
--- a/src/box/sql/sqliteInt.h
+++ b/src/box/sql/sqliteInt.h
@@ -4330,7 +4330,8 @@ const char *sqlite3ErrStr(int);
   *
   * @param parse Parsing context.
   * @param expr Expression to scan.
- * @param[out] is_found Flag set if explicit COLLATE clause is used.
+ * @param[out] is_explicit_coll Flag set if explicit COLLATE
+ *             clause is used.
   * @param[out] coll_id Collation identifier.
   *
   * @retval Pointer to collation.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/3] Introduce "none" and "binary" collations
  2018-11-13 12:02   ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-11-13 22:37     ` n.pettik
  2018-11-14 11:52       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 12+ messages in thread
From: n.pettik @ 2018-11-13 22:37 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Vladislav Shpilevoy



> On 13 Nov 2018, at 15:02, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> Hi! Thanks for the fixes!
> 
> Please, remember to put branch/issue link
> into a covering letter next time.

I’m sorry, it was late at night...

> See 5 comments below, my fixes at the end
> of the email and on the branch.
> 
>> diff --git a/src/box/alter.cc b/src/box/alter.cc
>> index 6d2c59bbc..2eb9f53e8 100644
>> --- a/src/box/alter.cc
>> +++ b/src/box/alter.cc
>> @@ -403,7 +403,7 @@ field_def_decode(struct field_def *field, const char **data,
>>  				     "nullable action properties", fieldno +
>>  				     TUPLE_INDEX_BASE));
>>  	}
>> -	if (field->coll_id != COLL_NONE &&
>> +	if (field->coll_id != 0 &&
> 
> 1. Why not to leave enum COLL_NONE, but change its value to 0?
> I guess, it is a purpose of define/enum features - define a name
> for a constant to be able to change the value under the hood. It
> could have reduced the diff.

For me it looks more convenient. But I don’t mind applying your fixes.
But there is one more point to avoid using COLL_NONE.
See my comment at the end of review.

>>  	    field->type != FIELD_TYPE_STRING &&
>>  	    field->type != FIELD_TYPE_SCALAR &&
>>  	    field->type != FIELD_TYPE_ANY) {
>> diff --git a/src/box/sql/callback.c b/src/box/sql/callback.c
>> index 3cf3a835d..352745e0e 100644
>> --- a/src/box/sql/callback.c
>> +++ b/src/box/sql/callback.c
>> @@ -42,13 +42,22 @@
>>  struct coll *
>>  sql_get_coll_seq(Parse *parser, const char *name, uint32_t *coll_id)
>>  {
>> -	if (name == NULL || strcasecmp(name, "binary") == 0) {
>> -		*coll_id = COLL_NONE;
>> -		return NULL;
>> +	if (name == NULL) {
>> +		*coll_id = 0;
>> +		return coll_by_id(0)->coll;
>>  	}
>> -	struct coll_id *p = coll_by_name(name, strlen(name));
>> +	struct coll_id *p;
>> +	/*
>> +	 * In SQL all identifiers should be uppercased, so
>> +	 * to avoid mess lets simple search binary (since it is
>> +	 * sort of "special" collation) ignoring case at all.
>> +	 */
> 
> 2. I am not sure if it should be so special. I think, we should
> treat it just like any other collation. If tests have BINARY
> uppercased, then they should be fixed, I guess.

Thing is in SQLite binary can be used without quotes, like:

… COLLATE BiNaRy …

And since ids are uppercased, this thing comes as “BINARY".
If I forced only lower case “binary”, a lot (>80 separate sub-tests,
12 separate file tests) of tests would fail. Moreover, I see that similar
things take place at Lua-land (schema.cc):

local function update_index_parts(format, parts)
...
elseif k == 'collation' then
    -- find ID by name
    local coll = box.space._collation.index.name:get{v}
    if not coll then
        coll = box.space._collation.index.name:get{v:lower()}
    end

If you still think that we should allow only “binary” format,
I will do it alongside with tests in a separate commit.

>> +	if (strcasecmp(name, "binary") == 0)
>> +		p = coll_by_name("binary", strlen("binary"));
>> +	else
>> +		p = coll_by_name(name, strlen(name));
>>  	if (p == NULL) {
>> -		*coll_id = COLL_NONE;
>> +		*coll_id = 0;
>>  		sqlite3ErrorMsg(parser, "no such collation sequence: %s",
>>  				name);
>>  		return NULL;
>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
>> index 8c34cbb3d..580cf1e60 100644
>> --- a/src/box/sql/func.c
>> +++ b/src/box/sql/func.c
>> @@ -506,7 +506,7 @@ case_type##ICUFunc(sqlite3_context *context, int argc, sqlite3_value **argv)   \
>>  	UErrorCode status = U_ZERO_ERROR;                                      \
>>  	struct coll *coll = sqlite3GetFuncCollSeq(context);                    \
>>  	const char *locale = NULL;                                             \
>> -	if (coll != NULL) {                                                    \
>> +	if (coll != NULL && coll->collator != NULL) {                          \
> 
> 3. Why do you use coll->collator != 0 instead of coll->type == COLL_TYPE_ICU?
> I do not think it is safe since in future we can move collator into a union,
> which will not be NULL for a non-ICU collation.

Yep, you are absolutely right, I should use type of collation, instead of
checking existence of collator. Fixed:

diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 580cf1e60..a2ac9ba2d 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -506,7 +506,7 @@ case_type##ICUFunc(sqlite3_context *context, int argc, sqlite3_value **argv)   \
        UErrorCode status = U_ZERO_ERROR;                                      \
        struct coll *coll = sqlite3GetFuncCollSeq(context);                    \
        const char *locale = NULL;                                             \
-       if (coll != NULL && coll->collator != NULL) {                          \
+       if (coll != NULL && coll->type == COLL_TYPE_ICU) {                     \
                locale = ucol_getLocaleByType(coll->collator,                  \
                                              ULOC_VALID_LOCALE, &status);     \
        }                                                                      \

> 
>>  		locale = ucol_getLocaleByType(coll->collator,                  \
>>  					      ULOC_VALID_LOCALE, &status);     \
>>  	}        > diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
>> index 5bdf102e7..8e5aea51b 100644
>> --- a/src/box/tuple_format.c
>> +++ b/src/box/tuple_format.c
>> @@ -40,7 +40,7 @@ static uint32_t formats_size = 0, formats_capacity = 0;
>>  static const struct tuple_field tuple_field_default = {
>>        FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
>> -       ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE,
>> +       ON_CONFLICT_ACTION_DEFAULT, NULL, 0,
>> };
> 
> 4. Are you sure that changing on_conflict_action belongs
> to this patch?

No, it doesn’t. I guess it is an artefact of rebasing.
Glad that you caught it. Fixed.

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 8e5aea51b..badffc701 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -40,7 +40,7 @@ static uint32_t formats_size = 0, formats_capacity = 0;
 
 static const struct tuple_field tuple_field_default = {
        FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
-       ON_CONFLICT_ACTION_DEFAULT, NULL, 0,
+       ON_CONFLICT_ACTION_NONE, NULL, 0,

> 
> 5. How about tests like this one?
> 
> box.schema.create_space('test', {format = {{1, 'unsigned'}, {2, 'string', collation = 'binary'}}})
> 
> I mean specifying collations via Lua. And what happens if I use collation = 'none'
> in a space format or index parts? I think, it should throw an error, but it is up
> to Kostja or Kirill.

Hmm, personally I see no reason why an error should be raised.
If you specify “none” collation, then it will act as an absence of collation.
That is simple. Anyway, added tests on binary collation:

diff --git a/test/sql/collation.result b/test/sql/collation.result
index 3df4cfa70..b42d3e034 100644
--- a/test/sql/collation.result
+++ b/test/sql/collation.result
@@ -123,6 +123,49 @@ box.space.T:format()[3]['collation']
 box.sql.execute("DROP TABLE t;")
 ---
 ...
+-- BINARY collation works in the same way as there is no collation
+-- at all.
+--
+t = box.schema.create_space('test', {format = {{'id', 'unsigned'}, {'a', 'string', collation = 'binary'}}})
+---
+...
+t:format()[2]['collation']
+---
+- 3
+...
+pk = t:create_index('primary', {parts = {1}})
+---
+...
+i = t:create_index('secondary', {parts = {2, 'str', collation='binary'}})
+---
+...
+t:insert({1, 'AsD'})
+---
+- [1, 'AsD']
+...
+t:insert({2, 'ASD'})
+---
+- [2, 'ASD']
+...
+t:insert({3, 'asd'})
+---
+- [3, 'asd']
+...
+i:select('asd')
+---
+- - [3, 'asd']
+...
+i:select('ASD')
+---
+- - [2, 'ASD']
+...
+i:select('AsD')
+---
+- - [1, 'AsD']
+...
+t:drop()
+---
+...
 -- Collation with id == 0 is "none". It used to unify interaction
 -- with collation interface. It also can't be dropped.
 --
diff --git a/test/sql/collation.test.lua b/test/sql/collation.test.lua
index 61df33a95..05f666f5e 100644
--- a/test/sql/collation.test.lua
+++ b/test/sql/collation.test.lua
@@ -50,6 +50,21 @@ box.space.T:format()[2]['collation']
 box.space.T:format()[3]['collation']
 box.sql.execute("DROP TABLE t;")
 
+-- BINARY collation works in the same way as there is no collation
+-- at all.
+--
+t = box.schema.create_space('test', {format = {{'id', 'unsigned'}, {'a', 'string', collation = 'binary'}}})
+t:format()[2]['collation']
+pk = t:create_index('primary', {parts = {1}})
+i = t:create_index('secondary', {parts = {2, 'str', collation='binary'}})
+t:insert({1, 'AsD'})
+t:insert({2, 'ASD'})
+t:insert({3, 'asd'})
+i:select('asd')
+i:select('ASD')
+i:select('AsD')
+t:drop()
+

I would like to reveal my main concern about this patch:
pointer to “none” collation is not set in key_def/field_def, even if “none”
collation is forced. The way I wanted to implement this patch is like:

@@ -174,17 +174,15 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
        for (uint32_t i = 0; i < part_count; i++) {
                const struct key_part_def *part = &parts[i];
                struct coll *coll = NULL;
-               if (part->type == FIELD_TYPE_SCALAR ||
-                   part->type == FIELD_TYPE_STRING) {
+               if (part->coll_id != COLL_NONE) {
                        struct coll_id *coll_id = coll_by_id(part->coll_id);
-                       if (part->coll_id != 0 && coll_id == NULL) {
+                       if (coll_id == NULL) {
                                diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
                                         i + 1, "collation was not found by ID");
                                key_def_delete(def);
                                return NULL;
                        }
-                       if (coll_id != NULL)
-                               coll = coll_id->coll;
+                       coll = coll_id->coll;
                }
                key_def_set_part(def, i, part->fieldno, part->type,
                                 part->nullable_action, coll, part->coll_id,

@@ -321,8 +319,7 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
        for (uint32_t i = 0; i < part_count; i++) {
                const struct key_part_def *part = &parts[i];
                int count = 2;
-               if (part->type == FIELD_TYPE_STRING ||
-                   part->type == FIELD_TYPE_SCALAR)
+               if (part->coll_id != COLL_NONE)
                        count++;
                if (part->is_nullable)
                        count++;
@@ -332,8 +329,7 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
                assert(part->type < field_type_MAX);
                size += mp_sizeof_str(strlen(PART_OPT_TYPE));
                size += mp_sizeof_str(strlen(field_type_strs[part->type]));
-               if (part->type == FIELD_TYPE_STRING ||
-                   part->type == FIELD_TYPE_SCALAR) {
+               if (part->coll_id != COLL_NONE) {
                        size += mp_sizeof_str(strlen(PART_OPT_COLLATION));
                        size += mp_sizeof_uint(part->coll_id);
                }
@@ -352,8 +348,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
        for (uint32_t i = 0; i < part_count; i++) {
                const struct key_part_def *part = &parts[i];
                int count = 2;
-               if (part->type == FIELD_TYPE_STRING ||
-                   part->type == FIELD_TYPE_SCALAR)
+               if (part->coll_id != COLL_NONE)
                        count++;
                if (part->is_nullable)
                        count++;
@@ -366,8 +361,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
                assert(part->type < field_type_MAX);
                const char *type_str = field_type_strs[part->type];
                data = mp_encode_str(data, type_str, strlen(type_str));
-               if (part->type == FIELD_TYPE_STRING ||
-                   part->type == FIELD_TYPE_SCALAR) {
+               if (part->coll_id != COLL_NONE) {
                        data = mp_encode_str(data, PART_OPT_COLLATION,
                                             strlen(PART_OPT_COLLATION));
                        data = mp_encode_uint(data, part->coll_id);

diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index efaa10da0..7cae436f1 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -299,15 +299,12 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
                        lua_pushboolean(L, key_part_is_nullable(part));
                        lua_setfield(L, -2, "is_nullable");
 
-                       if (part->type == FIELD_TYPE_STRING ||
-                           part->type == FIELD_TYPE_SCALAR) {
-                               if (! space_is_system(space)) {
-                                       struct coll_id *coll_id =
-                                               coll_by_id(part->coll_id);
-                                       assert(coll_id != NULL);
-                                       lua_pushstring(L, coll_id->name);
-                                       lua_setfield(L, -2, "collation");
-                               }
+                       if (part->coll_id != COLL_NONE) {
+                               struct coll_id *coll_id =
+                                       coll_by_id(part->coll_id);
+                               assert(coll_id != NULL);
+                               lua_pushstring(L, coll_id->name);
+                               lua_setfield(L, -2, "collation");
                        }

In other words, each field with type of STRING/SCALAR and mb ANY
would have real pointer to collation. However, in turn such change
would break too many tests, without significant refactoring of SQL
query planner (at least I failed to do it in several hours).
On the other hand, in this case we can remove COLL_NONE
and unify collation interface: each field or key part MUST feature
collation. And I mean not only valid ID (it is how it works now),
but also valid (i.e. != NULL) pointer to collation struct. So, it seems
that we would be able to remove any checks like:

if (coll != NULL)
	*use collation*
else
	*process without involving collations*

Would become:

If (type_is_string_like)
	*always use collation*

To be honest I was sure that you would notice this pickle…
Or current approach is not dat bad?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH v2 3/3] sql: change collation compatibility rules
  2018-11-13 12:02   ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-11-13 22:37     ` n.pettik
  0 siblings, 0 replies; 12+ messages in thread
From: n.pettik @ 2018-11-13 22:37 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Vladislav Shpilevoy


>> diff --git a/src/box/sql/select.c b/src/box/sql/select.c
>> index cea453f08..ab0376a1d 100644
>> --- a/src/box/sql/select.c
>> +++ b/src/box/sql/select.c
>> @@ -2150,47 +2151,51 @@ computeLimitRegisters(Parse * pParse, Select * p, int iBreak)
>>  }
>>    #ifndef SQLITE_OMIT_COMPOUND_SELECT
>> -static struct coll *
>> -multi_select_coll_seq_r(Parse *parser, Select *p, int n, bool *is_found,
>> -			uint32_t *coll_id)
>> +/**
>> + * This function determines resulting collation sequence for
>> + * @n-th column of the result set for the compound SELECT
>> + * statement. Since compound SELECT performs implicit comparisons
>> + * between values, all parts of compound queries must use
>> + * the same collation. Otherwise, an error is raised.
>> + *
>> + * @param parser Parse context.
>> + * @param p Select meta-information.
>> + * @param n Column number of the result set.
>> + * @param is_forced_coll Used if we fall into recursion.
>> + *        For most-outer call it is unused. Used to indicate that
>> + *        explicit COLLATE clause is used.
>> + * @retval Id of collation to be used during string comparison.
>> + */
>> +static uint32_t
>> +multi_select_coll_seq(struct Parse *parser, struct Select *p, int n,
>> +		      bool *is_forced_coll)
> 
> 1. I renamed this function into multi_select_coll_seq_r and
> created a wrapper without is_forced_coll to remove some of
> 'bool unused' things. Apply or drop the fix, up to you.

Ok, I don’t mind your fixes. Applied.

>> diff --git a/src/box/sql/whereexpr.c b/src/box/sql/whereexpr.c
>> index 9fa6ce15d..e05967554 100644
>> --- a/src/box/sql/whereexpr.c
>> +++ b/src/box/sql/whereexpr.c
>> @@ -168,7 +168,7 @@ exprCommute(Parse * pParse, Expr * pExpr)
>>  			bool is_found;
>>  			uint32_t id;
>>  			sql_expr_coll(pParse, pExpr->pLeft, &is_found, &id);
>> -			if (is_found) {
>> +			if (id != 0) {
> 
> 2. If you fix a comment about COLL_NONE constant return,
> do not forget to pick this hunk up as well, please. Maybe
> this patch contains more, I am not sure.

Ok, I’ve returned COLL_NONE usages:

diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
index 09c248154..b67b22c23 100644
--- a/src/box/sql/expr.c
+++ b/src/box/sql/expr.c
@@ -360,11 +360,11 @@ collations_check_compatibility(uint32_t lhs_id, bool is_lhs_forced,
                return 0;
        }
        if (lhs_id != rhs_id) {
-               if (lhs_id == 0) {
+               if (lhs_id == COLL_NONE) {
                        *res_id = rhs_id;
                        return 0;
                }
-               if (rhs_id == 0) {
+               if (rhs_id == COLL_NONE) {
                        *res_id = lhs_id;
                        return 0;
                }
diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index 2db8e1c7f..efdac9dba 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -1963,7 +1963,8 @@ sqlite3SelectAddColumnTypeAndCollation(Parse * pParse,            /* Parsing contexts */
                uint32_t coll_id;
 
                if (pTab->def->fields[i].coll_id == COLL_NONE &&
-                   sql_expr_coll(pParse, p, &is_found, &coll_id) && coll_id != 0)
+                   sql_expr_coll(pParse, p, &is_found, &coll_id) &&
+                   coll_id != COLL_NONE)
                        pTab->def->fields[i].coll_id = coll_id;
        }
 }
@@ -2172,7 +2173,7 @@ multi_select_coll_seq_r(struct Parse *parser, struct Select *p, int n,
 {
        bool is_prior_forced = false;
        bool is_current_forced;
-       uint32_t prior_coll_id = 0;
+       uint32_t prior_coll_id = COLL_NONE;
        uint32_t current_coll_id;
        if (p->pPrior != NULL) {
                prior_coll_id = multi_select_coll_seq_r(parser, p->pPrior, n,
diff --git a/src/box/sql/whereexpr.c b/src/box/sql/whereexpr.c
index e05967554..971f428b7 100644
--- a/src/box/sql/whereexpr.c
+++ b/src/box/sql/whereexpr.c
@@ -168,7 +168,7 @@ exprCommute(Parse * pParse, Expr * pExpr)
                        bool is_found;
                        uint32_t id;
                        sql_expr_coll(pParse, pExpr->pLeft, &is_found, &id);
-                       if (id != 0) {
+                       if (id != COLL_NONE) {
                                /*
                                 * Neither X nor Y have COLLATE
                                 * operators, but X has ag

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/3] Introduce "none" and "binary" collations
  2018-11-13 22:37     ` n.pettik
@ 2018-11-14 11:52       ` Vladislav Shpilevoy
  2018-11-14 20:59         ` n.pettik
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2018-11-14 11:52 UTC (permalink / raw)
  To: n.pettik, tarantool-patches


>>> +	/*
>>> +	 * In SQL all identifiers should be uppercased, so
>>> +	 * to avoid mess lets simple search binary (since it is
>>> +	 * sort of "special" collation) ignoring case at all.
>>> +	 */
>>
>> 2. I am not sure if it should be so special. I think, we should
>> treat it just like any other collation. If tests have BINARY
>> uppercased, then they should be fixed, I guess.
> 
> Thing is in SQLite binary can be used without quotes, like:
> 
> … COLLATE BiNaRy …
> 
> And since ids are uppercased, this thing comes as “BINARY".
> If I forced only lower case “binary”, a lot (>80 separate sub-tests,
> 12 separate file tests) of tests would fail. Moreover, I see that similar
> things take place at Lua-land (schema.cc):
> 
> local function update_index_parts(format, parts)
> ...
> elseif k == 'collation' then
>      -- find ID by name
>      local coll = box.space._collation.index.name:get{v}
>      if not coll then
>          coll = box.space._collation.index.name:get{v:lower()}
>      end
> 
> If you still think that we should allow only “binary” format,
> I will do it alongside with tests in a separate commit.

I still think that since 'binary' is even in _collation space
now, it should be just a regular collation. If you do not want
to change tests, then you can try to search any collation firstly
as is, and if not found, then lowerify and search again, regardless
of is it binary or not.

My point is that we should not treat binary differently from
others. Please, ask in the server team chat or consult Peter.
Especially, if you like the idea of lowering any collation name
if it is not found (just like we do in Lua as you decently
highlighted).

By the way, what if I write SQL like this? :

     SELECT ... WHERE  ... COLLATE NONE ...

Looks like this code:

> +       if (strcasecmp(name, "binary") == 0)
> +               p = coll_by_name("binary", strlen("binary"));
> +       else
> +               p = coll_by_name(name, strlen(name));

will find it ok, but maybe should not, is it?

> diff --git a/test/sql/collation.test.lua b/test/sql/collation.test.lua
> index 61df33a95..05f666f5e 100644
> --- a/test/sql/collation.test.lua
> +++ b/test/sql/collation.test.lua
> @@ -321,8 +319,7 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
>          for (uint32_t i = 0; i < part_count; i++) {
>                  const struct key_part_def *part = &parts[i];
>                  int count = 2;
> -               if (part->type == FIELD_TYPE_STRING ||
> -                   part->type == FIELD_TYPE_SCALAR)
> +               if (part->coll_id != COLL_NONE)
>                          count++;
>                  if (part->is_nullable)
>                          count++;
> @@ -332,8 +329,7 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
>                  assert(part->type < field_type_MAX);
>                  size += mp_sizeof_str(strlen(PART_OPT_TYPE));
>                  size += mp_sizeof_str(strlen(field_type_strs[part->type]));
> -               if (part->type == FIELD_TYPE_STRING ||
> -                   part->type == FIELD_TYPE_SCALAR) {
> +               if (part->coll_id != COLL_NONE) {
>                          size += mp_sizeof_str(strlen(PART_OPT_COLLATION));
>                          size += mp_sizeof_uint(part->coll_id);
>                  }
> @@ -352,8 +348,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
>          for (uint32_t i = 0; i < part_count; i++) {
>                  const struct key_part_def *part = &parts[i];
>                  int count = 2;
> -               if (part->type == FIELD_TYPE_STRING ||
> -                   part->type == FIELD_TYPE_SCALAR)
> +               if (part->coll_id != COLL_NONE)
>                          count++;
>                  if (part->is_nullable)
>                          count++;
> @@ -366,8 +361,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
>                  assert(part->type < field_type_MAX);
>                  const char *type_str = field_type_strs[part->type];
>                  data = mp_encode_str(data, type_str, strlen(type_str));
> -               if (part->type == FIELD_TYPE_STRING ||
> -                   part->type == FIELD_TYPE_SCALAR) {
> +               if (part->coll_id != COLL_NONE) {
>                          data = mp_encode_str(data, PART_OPT_COLLATION,
>                                               strlen(PART_OPT_COLLATION));
>                          data = mp_encode_uint(data, part->coll_id);
> 
> diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
> index efaa10da0..7cae436f1 100644
> --- a/src/box/lua/space.cc
> +++ b/src/box/lua/space.cc
> @@ -299,15 +299,12 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
>                          lua_pushboolean(L, key_part_is_nullable(part));
>                          lua_setfield(L, -2, "is_nullable");
>   
> -                       if (part->type == FIELD_TYPE_STRING ||
> -                           part->type == FIELD_TYPE_SCALAR) {
> -                               if (! space_is_system(space)) {
> -                                       struct coll_id *coll_id =
> -                                               coll_by_id(part->coll_id);
> -                                       assert(coll_id != NULL);
> -                                       lua_pushstring(L, coll_id->name);
> -                                       lua_setfield(L, -2, "collation");
> -                               }
> +                       if (part->coll_id != COLL_NONE) {
> +                               struct coll_id *coll_id =
> +                                       coll_by_id(part->coll_id);
> +                               assert(coll_id != NULL);
> +                               lua_pushstring(L, coll_id->name);
> +                               lua_setfield(L, -2, "collation");
>                          }
> 
> In other words, each field with type of STRING/SCALAR and mb ANY
> would have real pointer to collation. However, in turn such change
> would break too many tests, without significant refactoring of SQL
> query planner (at least I failed to do it in several hours).
> On the other hand, in this case we can remove COLL_NONE
> and unify collation interface: each field or key part MUST feature
> collation. And I mean not only valid ID (it is how it works now),
> but also valid (i.e. != NULL) pointer to collation struct. So, it seems
> that we would be able to remove any checks like:
> 
> if (coll != NULL)
> 	*use collation*
> else
> 	*process without involving collations*
> 
> Would become:
> 
> If (type_is_string_like)
> 	*always use collation*
> 
> To be honest I was sure that you would notice this pickle…
> Or current approach is not dat bad?
> 

Yeah, I understand your aspiration, but I have some remarks
about the draft patch above and about the concept as it is:

* you said that you want to get rid of COLL_NONE, but in the draft
   above you still use it. Even if you had removed COLL_NONE
   usage from key_def builders, it would have stayed in alter.cc
   as a check that you can not drop 'none' collation. I think that
   COLL_NONE as a constant equal to 0 should not be removed from
   source code until it is not used at all. But it is a minor
   comment rather about coding style: I prefer code like
   'id != UNDERSTANDABLE_NAME' to 'id != <strange_constant>'.

* the main flaw in storing a collation for each string is that it
   will dramatically slow down comparators. The awful problem is in
   virtual functions calling - it is much slower than regular
   functions. Now we have two levels of avoiding coll->cmp() calling.
   First in *_compare_create functions. If a key has no collations,
   we can use highly specified template recursive comparators
   (struct TupleCompare, struct TupleCompareWithKey) which call
   memcmp. Second, we are trying to do not call coll->cmp() in other
   comparators (see mp_compare_str).

   Yes, we still can follow you, store a collation object in each
   key_part, tuple_field, field_def, but in tuple_compare.cc we
   still will need a way how to check that a collation comparator
   can be inlined into memcmp. This way is check coll->id == COLL_NONE.
   Now it is like

       use_fast_cmp = coll == NULL

   Will be like

       use_fast_cmp = coll->id == COLL_NONE

   As a summary: we can store a collation in each *_field/part object
   to simplify SQL source code, but it does not affect necessity of
   a special COLL_NONE constant.


Also, I see now a new minor problem:

> diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua
> index 64f74f9d3..a9525058b 100644
> --- a/src/box/lua/upgrade.lua
> +++ b/src/box/lua/upgrade.lua
> @@ -396,8 +396,10 @@ local function create_collation_space()
>      box.space._index:insert{_collation.id, 1, 'name', 'tree', {unique = true}, {{1, 'string'}}}
>  
>      log.info("create predefined collations")
> +    box.space._collation:replace{0, "none", ADMIN, "BINARY", "", setmap{}}
>      box.space._collation:replace{1, "unicode", ADMIN, "ICU", "", setmap{}}
>      box.space._collation:replace{2, "unicode_ci", ADMIN, "ICU", "", {strength='primary'}}
> +    box.space._collation:replace{3, "binary", ADMIN, "BINARY", "", setmap{}}
>  
>      local _priv = box.space[box.schema.PRIV_ID]
>      _priv:insert{ADMIN, PUBLIC, 'space', _collation.id, box.priv.W}

Unfortunately, we can not change old update() functions. So the only way to add
these new collations is to add a new code to upgrade_to_2_1_0() with these two
new lines.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/3] Introduce "none" and "binary" collations
  2018-11-14 11:52       ` Vladislav Shpilevoy
@ 2018-11-14 20:59         ` n.pettik
  2018-11-15 11:04           ` Vladislav Shpilevoy
  0 siblings, 1 reply; 12+ messages in thread
From: n.pettik @ 2018-11-14 20:59 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Vladislav Shpilevoy



> On 14 Nov 2018, at 14:52, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
>>>> 
>>>> +	/*
>>>> +	 * In SQL all identifiers should be uppercased, so
>>>> +	 * to avoid mess lets simple search binary (since it is
>>>> +	 * sort of "special" collation) ignoring case at all.
>>>> +	 */
>>> 
>>> 2. I am not sure if it should be so special. I think, we should
>>> treat it just like any other collation. If tests have BINARY
>>> uppercased, then they should be fixed, I guess.
>> Thing is in SQLite binary can be used without quotes, like:
>> … COLLATE BiNaRy …
>> And since ids are uppercased, this thing comes as “BINARY".
>> If I forced only lower case “binary”, a lot (>80 separate sub-tests,
>> 12 separate file tests) of tests would fail. Moreover, I see that similar
>> things take place at Lua-land (schema.cc):
>> local function update_index_parts(format, parts)
>> ...
>> elseif k == 'collation' then
>>     -- find ID by name
>>     local coll = box.space._collation.index.name:get{v}
>>     if not coll then
>>         coll = box.space._collation.index.name:get{v:lower()}
>>     end
>> If you still think that we should allow only “binary” format,
>> I will do it alongside with tests in a separate commit.
> 
> I still think that since 'binary' is even in _collation space
> now, it should be just a regular collation. If you do not want
> to change tests, then you can try to search any collation firstly
> as is, and if not found, then lowerify and search again, regardless
> of is it binary or not.

We can’t lower each collation’s name - I’m sure that Peter will
be strictly against this change (to be honest I am too), because
it contradicts our rules concerning identifications’ names.
Moreover, what if user create two collations with names like
“custom_collation” and “CuStom_COLLATION” and execute query like

SELECT … COLLATE “custom_COLLATION"

Which collation will be used in this case? Without lowering no one
is found, but after lowering both would be available.
I would better fix tests to make them use “binary” instead of
BINARY. I guess this is absolutely right solution.
Commit at the end of letter.

> My point is that we should not treat binary differently from
> others.

Yep, now I fully agree with you. I am going to add one more
commit to patchset.

> By the way, what if I write SQL like this? :
> 
>    SELECT ... WHERE  ... COLLATE NONE …
> 
> Looks like this code:
> 
>> +       if (strcasecmp(name, "binary") == 0)
>> +               p = coll_by_name("binary", strlen("binary"));
>> +       else
>> +               p = coll_by_name(name, strlen(name));
> 
> will find it ok, but maybe should not, is it?

Why? It exists in _collation and uses memcmp, so what is wrong with it? 
I guess we simply should document its behaviour.
Also, in your example parser would fail to find it. Correct usage is like:

> SELECT ... WHERE  ... COLLATE “none” …

:)

On the contrary, I've added tests on this case:

diff --git a/test/sql-tap/identifier_case.test.lua b/test/sql-tap/identifier_case.test.lua
index 4878ced65..9e8107aef 100755
--- a/test/sql-tap/identifier_case.test.lua
+++ b/test/sql-tap/identifier_case.test.lua
@@ -1,6 +1,6 @@
 #!/usr/bin/env tarantool
 test = require("sqltester")
-test:plan(67)
+test:plan(71)
 
 local test_prefix = "identifier_case-"
 
@@ -207,7 +207,9 @@ data = {
     { 4,  [["bInaRy"]], {1, "Collation 'bInaRy' does not exist"}},
     { 5,  [["unicode"]], {0}},
     { 6,  [[ unicode ]], {1,"Collation 'UNICODE' does not exist"}},
-    { 7,  [["UNICODE"]], {1,"Collation 'UNICODE' does not exist"}}
+    { 7,  [["UNICODE"]], {1,"Collation 'UNICODE' does not exist"}},
+    { 8,  [[NONE]], {1,"Collation 'NONE' does not exist"}},
+    { 9,  [["none"]], {0}},
 }
 
 test:do_catchsql_test(
@@ -241,6 +243,8 @@ data = {
     { 5,  [[ 5 < 'b' collate "unicode" ]], {0, {1}}},
     { 6,  [[ 5 < 'b' collate unicode ]], {1,"Collation 'UNICODE' does not exist"}},
     { 7,  [[ 5 < 'b' collate "unicode_ci" ]], {0, {1}}},
+    { 8,  [[ 5 < 'b' collate NONE ]], {1, "Collation 'NONE' does not exist"}},
+    { 9,  [[ 5 < 'b' collate "none" ]], {0, {1}}},

>> In other words, each field with type of STRING/SCALAR and mb ANY
>> would have real pointer to collation. However, in turn such change
>> would break too many tests, without significant refactoring of SQL
>> query planner (at least I failed to do it in several hours).
>> On the other hand, in this case we can remove COLL_NONE
>> and unify collation interface: each field or key part MUST feature
>> collation. And I mean not only valid ID (it is how it works now),
>> but also valid (i.e. != NULL) pointer to collation struct. So, it seems
>> that we would be able to remove any checks like:
>> if (coll != NULL)
>> 	*use collation*
>> else
>> 	*process without involving collations*
>> Would become:
>> If (type_is_string_like)
>> 	*always use collation*
>> To be honest I was sure that you would notice this pickle…
>> Or current approach is not dat bad?
> 
> Yeah, I understand your aspiration, but I have some remarks
> about the draft patch above and about the concept as it is:
> 
> * you said that you want to get rid of COLL_NONE, but in the draft
>  above you still use it. Even if you had removed COLL_NONE
>  usage from key_def builders, it would have stayed in alter.cc
>  as a check that you can not drop 'none' collation. I think that
>  COLL_NONE as a constant equal to 0 should not be removed from
>  source code until it is not used at all. But it is a minor
>  comment rather about coding style: I prefer code like
>  'id != UNDERSTANDABLE_NAME' to 'id != <strange_constant>'.

And still we use iid != 0 to check whether index primary or not…
But I like your argument.

> * the main flaw in storing a collation for each string is that it
>  will dramatically slow down comparators. The awful problem is in
>  virtual functions calling - it is much slower than regular
>  functions.

Yep, I understand this point and I told the same to Kostja,
but he insisted on this “none” collation. To be honest, I thought
that he wanted to implement second variant, where pointers
to “none” collation != NULL.

> Now we have two levels of avoiding coll->cmp() calling.
>  First in *_compare_create functions. If a key has no collations,
>  we can use highly specified template recursive comparators
>  (struct TupleCompare, struct TupleCompareWithKey) which call
>  memcmp. Second, we are trying to do not call coll->cmp() in other
>  comparators (see mp_compare_str).
> 
>  Yes, we still can follow you, store a collation object in each
>  key_part, tuple_field, field_def, but in tuple_compare.cc we
>  still will need a way how to check that a collation comparator
>  can be inlined into memcmp. This way is check coll->id == COLL_NONE.
>  Now it is like
> 
>      use_fast_cmp = coll == NULL
> 
>  Will be like
> 
>      use_fast_cmp = coll->id == COLL_NONE
> 
>  As a summary: we can store a collation in each *_field/part object
>  to simplify SQL source code, but it does not affect necessity of
>  a special COLL_NONE constant.

Thanks for your comments.

> Also, I see now a new minor problem:
> 
>> diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua
>> index 64f74f9d3..a9525058b 100644
>> --- a/src/box/lua/upgrade.lua
>> +++ b/src/box/lua/upgrade.lua
>> @@ -396,8 +396,10 @@ local function create_collation_space()
>>     box.space._index:insert{_collation.id, 1, 'name', 'tree', {unique = true}, {{1, 'string'}}}
>>      log.info("create predefined collations")
>> +    box.space._collation:replace{0, "none", ADMIN, "BINARY", "", setmap{}}
>>     box.space._collation:replace{1, "unicode", ADMIN, "ICU", "", setmap{}}
>>     box.space._collation:replace{2, "unicode_ci", ADMIN, "ICU", "", {strength='primary'}}
>> +    box.space._collation:replace{3, "binary", ADMIN, "BINARY", "", setmap{}}
>>      local _priv = box.space[box.schema.PRIV_ID]
>>     _priv:insert{ADMIN, PUBLIC, 'space', _collation.id, box.priv.W}
> 
> Unfortunately, we can not change old update() functions. So the only way to add
> these new collations is to add a new code to upgrade_to_2_1_0() with these two
> new lines.

Ops, indeed. Moved new collations to upgrade function and
updated bootstrap file:

diff --git a/src/box/lua/upgrade.lua b/src/box/lua/upgrade.lua
index a9525058b..4b0fd0345 100644
--- a/src/box/lua/upgrade.lua
+++ b/src/box/lua/upgrade.lua
@@ -396,10 +396,8 @@ local function create_collation_space()
     box.space._index:insert{_collation.id, 1, 'name', 'tree', {unique = true}, {{1, 'string'}}}
 
     log.info("create predefined collations")
-    box.space._collation:replace{0, "none", ADMIN, "BINARY", "", setmap{}}
     box.space._collation:replace{1, "unicode", ADMIN, "ICU", "", setmap{}}
     box.space._collation:replace{2, "unicode_ci", ADMIN, "ICU", "", {strength='primary'}}
-    box.space._collation:replace{3, "binary", ADMIN, "BINARY", "", setmap{}}
 
     local _priv = box.space[box.schema.PRIV_ID]
     _priv:insert{ADMIN, PUBLIC, 'space', _collation.id, box.priv.W}
@@ -608,6 +606,9 @@ local function upgrade_to_2_1_0()
     format[2] = {type='any', name='value', is_nullable=true}
     box.space._schema:format(format)
 
+    box.space._collation:replace{0, "none", ADMIN, "BINARY", "", setmap{}}
+    box.space._collation:replace{3, "binary", ADMIN, "BINARY", "", setmap{}}
+
     upgrade_priv_to_2_1_0()
 end

New commit in patch-set:

commit 854792f46d4ec87c3c2218c7645fe398e007c888
Author: Nikita Pettik <korablev@tarantool.org>
Date:   Wed Nov 14 21:07:58 2018 +0300

    sql: don't uppercase name of binary collation
    
    Since now we don't have real binary collation, it was allowed to use its
    name in different cases. For instance: BiNaRY, "bInArY", "BINARY" etc.
    All these names were valid and accounted for binary collation.
    However, we are going to introduce real entry in _collation space to
    represent binary collation. Thus, for now we allow using only standard
    "binary" name.
    
    Needed for #3185

diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index 17c20480e..191eb0308 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -2400,11 +2400,8 @@ index_fill_def(struct Parse *parse, struct index *index,
                if (expr->op == TK_COLLATE) {
                        sql_get_coll_seq(parse, expr->u.zToken, &coll_id);
                        if (coll_id == COLL_NONE &&
-                           strcmp(expr->u.zToken, "binary") != 0) {
-                               diag_set(ClientError, ER_NO_SUCH_COLLATION,
-                                        expr->u.zToken);
+                           strcmp(expr->u.zToken, "binary") != 0)
                                goto tnt_error;
-                       }

diff --git a/src/box/sql/callback.c b/src/box/sql/callback.c
index 3cf3a835d..62904fd20 100644
--- a/src/box/sql/callback.c
+++ b/src/box/sql/callback.c
@@ -42,15 +42,16 @@
 struct coll *
 sql_get_coll_seq(Parse *parser, const char *name, uint32_t *coll_id)
 {
-       if (name == NULL || strcasecmp(name, "binary") == 0) {
+       if (name == NULL || strcmp(name, "binary") == 0) {
                *coll_id = COLL_NONE;
                return NULL;
        }
        struct coll_id *p = coll_by_name(name, strlen(name));
        if (p == NULL) {
                *coll_id = COLL_NONE;
-               sqlite3ErrorMsg(parser, "no such collation sequence: %s",
-                               name);
+               diag_set(ClientError, ER_NO_SUCH_COLLATION, name);
+               parser->rc = SQL_TARANTOOL_ERROR;
+               parser->nErr++;
                return NULL;
        } else {
                *coll_id = p->id;
diff --git a/test/sql-tap/collation.test.lua b/test/sql-tap/collation.test.lua
index eb4f43a90..9483632b2 100755
--- a/test/sql-tap/collation.test.lua
+++ b/test/sql-tap/collation.test.lua
@@ -155,7 +155,7 @@ local data_test_unicode_ci = {
 local data_collations = {
     -- default collation = binary
     {"/*COLLATE DEFAULT*/", data_test_binary_1},
-    {"COLLATE BINARY", data_test_binary_1},
+    {"COLLATE \"binary\"", data_test_binary_1},
     {"COLLATE \"unicode\"", data_test_unicode},
     {"COLLATE \"unicode_ci\"", data_test_unicode_ci},
 }
diff --git a/test/sql-tap/distinct.test.lua b/test/sql-tap/distinct.test.lua
index 26a4aace2..8f6ca8ed9 100755
--- a/test/sql-tap/distinct.test.lua
+++ b/test/sql-tap/distinct.test.lua
@@ -124,9 +124,9 @@ local data = {
     {"13.2", 0, "SELECT DISTINCT a, b, c COLLATE \"unicode_ci\" FROM t4"},
     {"14.1", 0, "SELECT DISTINCT a, d COLLATE \"unicode_ci\" FROM t1"},
     {"14.2", 1, "SELECT DISTINCT a, d COLLATE \"unicode_ci\" FROM t4"},
-    {"15 ", 0, "SELECT DISTINCT a, d COLLATE binary FROM t1"},
-    {"16.1", 0, "SELECT DISTINCT a, b, c COLLATE binary FROM t1"},
-    {"16.2", 1, "SELECT DISTINCT a, b, c COLLATE binary FROM t4"},
+    {"15 ", 0, "SELECT DISTINCT a, d COLLATE \"binary\" FROM t1"},
+    {"16.1", 0, "SELECT DISTINCT a, b, c COLLATE \"binary\" FROM t1"},
+    {"16.2", 1, "SELECT DISTINCT a, b, c COLLATE \"binary\" FROM t4"},
     {"17",  0,   --{ \/* Technically, it would be possible to detect that DISTINCT\n            ** is a no-op in cases like the following. But SQLite does not\n            ** do so. *\/\n
     "SELECT DISTINCT t1.id FROM t1, t2 WHERE t1.id=t2.x" },
     {"18 ", 1, "SELECT DISTINCT c1, c2 FROM t3"},
@@ -173,7 +173,7 @@ data = {
     {"a, b, c FROM t1", {"btree"}, {"A", "B", "C", "a", "b", "c"}},
     {"a, b, c FROM t1 ORDER BY a, b, c", {"btree"}, {"A", "B", "C", "a", "b", "c"}},
     {"b FROM t1 WHERE a = 'a'", {}, {"b"}},
-    {"b FROM t1 ORDER BY +b COLLATE binary", {"btree", "btree"}, {"B", "b"}},
+    {"b FROM t1 ORDER BY +b COLLATE \"binary\"", {"btree", "btree"}, {"B", "b"}},
     {"a FROM t1", {}, {"A", "a"}},
     {"b COLLATE \"unicode_ci\" FROM t1", {}, {"b"}},
     {"b COLLATE \"unicode_ci\" FROM t1 ORDER BY b COLLATE \"unicode_ci\"", {}, {"b"}},
diff --git a/test/sql-tap/e_select1.test.lua b/test/sql-tap/e_select1.test.lua
index 47fb7a809..1e8f096e1 100755
--- a/test/sql-tap/e_select1.test.lua
+++ b/test/sql-tap/e_select1.test.lua
@@ -508,11 +508,11 @@ if (0 > 0)
     test:do_execsql_test(
         "e_select-1.6.0",
         [[
-            CREATE TABLE t5(a  TEXT COLLATE "unicode_ci", b  TEXT COLLATE binary);
+            CREATE TABLE t5(a  TEXT COLLATE "unicode_ci", b  TEXT COLLATE "binary");
             INSERT INTO t5 VALUES('AA', 'cc');
             INSERT INTO t5 VALUES('BB', 'dd');
             INSERT INTO t5 VALUES(NULL, NULL);
-            CREATE TABLE t6(a  TEXT COLLATE binary, b  TEXT COLLATE "unicode_ci");
+            CREATE TABLE t6(a  TEXT COLLATE "c", b  TEXT COLLATE "unicode_ci");
             INSERT INTO t6 VALUES('aa', 'cc');
             INSERT INTO t6 VALUES('bb', 'DD');
             INSERT INTO t6 VALUES(NULL, NULL);
@@ -997,7 +997,7 @@ test:do_execsql_test(
         INSERT INTO b2 VALUES('abc', 3);
         INSERT INTO b2 VALUES('xyz', 4);
 
-        CREATE TABLE b3(id  INT PRIMARY KEY, a  TEXT COLLATE "unicode_ci", b  TEXT COLLATE binary);
+        CREATE TABLE b3(id  INT PRIMARY KEY, a  TEXT COLLATE "unicode_ci", b  TEXT COLLATE "binary");
         INSERT INTO b3 VALUES(1, 'abc', 'abc');
         INSERT INTO b3 VALUES(2, 'aBC', 'aBC');
         INSERT INTO b3 VALUES(3, 'Def', 'Def');
@@ -1289,7 +1289,7 @@ test:do_select_tests(
         {"1", "SELECT DISTINCT b FROM h1", {"one", "I", "i", "four", "IV", "iv"}},
         {"2", "SELECT DISTINCT b COLLATE \"unicode_ci\" FROM h1", {"one", "I", "four", "IV"}},
         {"3", "SELECT DISTINCT x FROM h2", {"One", "Two", "Three", "Four"}},
-        {"4", "SELECT DISTINCT x COLLATE binary FROM h2", {
+        {"4", "SELECT DISTINCT x COLLATE \"binary\" FROM h2", {
             "One", "Two", "Three", "Four", "one", "two", "three", "four"
         }},
     })
@@ -1574,7 +1574,7 @@ test:drop_all_tables()
 test:do_execsql_test(
     "e_select-7.10.0",
     [[
-        CREATE TABLE y1(a  TEXT COLLATE "unicode_ci" PRIMARY KEY, b  TEXT COLLATE binary, c TEXT );
+        CREATE TABLE y1(a  TEXT COLLATE "unicode_ci" PRIMARY KEY, b  TEXT COLLATE "binary", c TEXT );
         INSERT INTO y1 VALUES('Abc', 'abc', 'aBC');
     ]], {
         -- <e_select-7.10.0>
@@ -1588,12 +1588,12 @@ test:do_select_tests(
         {"1", "SELECT 'abc'                UNION SELECT 'ABC'", {"ABC",  "abc"}},
         {"2", "SELECT 'abc' COLLATE \"unicode_ci\" UNION SELECT 'ABC'", {"ABC" }},
         {"3", "SELECT 'abc'                UNION SELECT 'ABC' COLLATE \"unicode_ci\"", {"ABC" }},
-        {"4", "SELECT 'abc' COLLATE binary UNION SELECT 'ABC' COLLATE \"unicode_ci\"", {"ABC",  "abc"}},
-        {"5", "SELECT 'abc' COLLATE \"unicode_ci\" UNION SELECT 'ABC' COLLATE binary", {"ABC" }},
+        {"4", "SELECT 'abc' COLLATE \"binary\" UNION SELECT 'ABC' COLLATE \"unicode_ci\"", {"ABC",  "abc"}},
+        {"5", "SELECT 'abc' COLLATE \"unicode_ci\" UNION SELECT 'ABC' COLLATE \"binary\"", {"ABC" }},
         {"6", "SELECT a FROM y1 UNION SELECT b FROM y1", {"abc" }},
         {"7", "SELECT b FROM y1 UNION SELECT a FROM y1", {"Abc",  "abc"}},
         {"8", "SELECT a FROM y1 UNION SELECT c FROM y1", {"aBC" }},
-        {"9", "SELECT a FROM y1 UNION SELECT c COLLATE binary FROM y1", {"aBC" }},
+        {"9", "SELECT a FROM y1 UNION SELECT c COLLATE \"binary\" FROM y1", {"aBC" }},
     })
 
 -- EVIDENCE-OF: R-32706-07403 No affinity transformations are applied to
@@ -1932,7 +1932,7 @@ test:do_execsql_test(
 test:do_execsql_test(
     "e_select-8.9.1",
     [[
-        SELECT x FROM d4 ORDER BY 1 COLLATE binary
+        SELECT x FROM d4 ORDER BY 1 COLLATE "binary"
     ]], {
         -- <e_select-8.9.1>
         "DEF", "JKL", "abc", "ghi"
@@ -1942,7 +1942,7 @@ test:do_execsql_test(
 test:do_execsql_test(
     "e_select-8.9.2",
     [[
-        SELECT x COLLATE binary FROM d4 ORDER BY 1 COLLATE "unicode_ci"
+        SELECT x COLLATE "binary" FROM d4 ORDER BY 1 COLLATE "unicode_ci"
     ]], {
         -- <e_select-8.9.2>
         "abc", "DEF", "ghi", "JKL"
@@ -1963,7 +1963,7 @@ test:do_execsql_test(
 test:do_execsql_test(
     "e_select-8.10.1",
     [[
-        SELECT x COLLATE binary FROM d4 ORDER BY 1
+        SELECT x COLLATE "binary" FROM d4 ORDER BY 1
     ]], {
         -- <e_select-8.10.1>
         "DEF", "JKL", "abc", "ghi"
@@ -1973,7 +1973,7 @@ test:do_execsql_test(
 test:do_execsql_test(
     "e_select-8.10.2",
     [[
-        SELECT x COLLATE binary FROM d4 ORDER BY x
+        SELECT x COLLATE "binary" FROM d4 ORDER BY x
     ]], {
         -- <e_select-8.10.2>
         "abc", "DEF", "ghi", "JKL"
@@ -1983,7 +1983,7 @@ test:do_execsql_test(
 test:do_execsql_test(
     "e_select-8.10.3",
     [[
-        SELECT x COLLATE binary AS x FROM d4 ORDER BY x
+        SELECT x COLLATE "binary" AS x FROM d4 ORDER BY x
     ]], {
         -- <e_select-8.10.3>
         "DEF", "JKL", "abc", "ghi"
diff --git a/test/sql-tap/identifier_case.test.lua b/test/sql-tap/identifier_case.test.lua
index 096130a52..4878ced65 100755
--- a/test/sql-tap/identifier_case.test.lua
+++ b/test/sql-tap/identifier_case.test.lua
@@ -201,10 +201,10 @@ end
 
 -- Check that collaiton names work as identifiers
 data = {
-    { 1,  [[ binary ]], {0}},
-    { 2,  [[ BINARY ]], {0}},
+    { 1,  [[ binary ]], {1, "Collation 'BINARY' does not exist"}},
+    { 2,  [[ BINARY ]], {1, "Collation 'BINARY' does not exist"}},
     { 3,  [["binary"]], {0}},
-    { 4,  [["bInaRy"]], {0}},
+    { 4,  [["bInaRy"]], {1, "Collation 'bInaRy' does not exist"}},
     { 5,  [["unicode"]], {0}},
     { 6,  [[ unicode ]], {1,"Collation 'UNICODE' does not exist"}},
     { 7,  [["UNICODE"]], {1,"Collation 'UNICODE' does not exist"}}
@@ -234,12 +234,12 @@ test:do_catchsql_test(
     {0})
 
 data = {
-    { 1,  [[ 'a' < 'b' collate binary ]], {0, {1}}},
+    { 1,  [[ 'a' < 'b' collate binary ]], {1, "Collation 'BINARY' does not exist"}},
     { 2,  [[ 'a' < 'b' collate "binary" ]], {0, {1}}},
     { 3,  [[ 'a' < 'b' collate 'binary' ]], {1, [[near "'binary'": syntax error]]}},
     { 4,  [[ 'a' < 'b' collate "unicode" ]], {0, {1}}},
     { 5,  [[ 5 < 'b' collate "unicode" ]], {0, {1}}},
-    { 6,  [[ 5 < 'b' collate unicode ]], {1,"no such collation sequence: UNICODE"}},
+    { 6,  [[ 5 < 'b' collate unicode ]], {1,"Collation 'UNICODE' does not exist"}},
     { 7,  [[ 5 < 'b' collate "unicode_ci" ]], {0, {1}}},
 }
 
diff --git a/test/sql-tap/in3.test.lua b/test/sql-tap/in3.test.lua
index 83139a3e4..e6f0d415d 100755
--- a/test/sql-tap/in3.test.lua
+++ b/test/sql-tap/in3.test.lua
@@ -206,7 +206,7 @@ test:do_test(
 test:do_test(
     "in3-1.15",
     function()
-        return exec_neph(" SELECT a FROM t1 WHERE a COLLATE binary IN (SELECT a FROM t1) ")
+        return exec_neph(" SELECT a FROM t1 WHERE a COLLATE \"binary\" IN (SELECT a FROM t1) ")
     end, {
         -- <in3-1.15>
         0, 1, 3, 5
diff --git a/test/sql-tap/in5.test.lua b/test/sql-tap/in5.test.lua
index 4e2cdcd24..ba21170ee 100755
--- a/test/sql-tap/in5.test.lua
+++ b/test/sql-tap/in5.test.lua
@@ -277,7 +277,7 @@ test:do_execsql_test(
 test:do_execsql_test(
     "6.1.2",
     [[
-        SELECT count(*) FROM t1 WHERE a COLLATE BINARY IN (SELECT DISTINCT a FROM t1)
+        SELECT count(*) FROM t1 WHERE a COLLATE "binary" IN (SELECT DISTINCT a FROM t1)
     ]], {
         -- <6.1.2>
         1
diff --git a/test/sql-tap/index3.test.lua b/test/sql-tap/index3.test.lua
index 4f950be09..222b895aa 100755
--- a/test/sql-tap/index3.test.lua
+++ b/test/sql-tap/index3.test.lua
@@ -59,7 +59,7 @@ test:do_execsql_test(
         CREATE TABLE t1(a INT , b TEXT , c INT , d INT , e INT ,
                         PRIMARY KEY(a), UNIQUE(b COLLATE "unicode_ci" DESC));
         CREATE INDEX t1c ON t1(c);
-        CREATE INDEX t1d ON t1(d COLLATE binary ASC);
+        CREATE INDEX t1d ON t1(d COLLATE "binary" ASC);
         WITH RECURSIVE c(x) AS (VALUES(1) UNION SELECT x+1 FROM c WHERE x<30)
           INSERT INTO t1(a,b,c,d,e) 
             SELECT x, printf('ab%03xxy',x), x, x, x FROM c;
diff --git a/test/sql-tap/minmax3.test.lua b/test/sql-tap/minmax3.test.lua
index 1ddf39ff5..348275ffa 100755
--- a/test/sql-tap/minmax3.test.lua
+++ b/test/sql-tap/minmax3.test.lua
@@ -502,7 +502,7 @@ test:do_execsql_test(
 test:do_execsql_test(
     "minmax3-4.4",
     [[
-        SELECT max(x COLLATE binary), max(x COLLATE "unicode_ci") FROM t4;
+        SELECT max(x COLLATE "binary"), max(x COLLATE "unicode_ci") FROM t4;
     ]], {
         -- <minmax3-4.4>
         "abc", "BCD"
@@ -564,7 +564,7 @@ test:do_execsql_test(
 test:do_execsql_test(
     "minmax3-4.13",
     [[
-        SELECT min(x COLLATE binary), min(x COLLATE "unicode_ci") FROM t4;
+        SELECT min(x COLLATE "binary"), min(x COLLATE "unicode_ci") FROM t4;
     ]], {
         -- <minmax3-4.13>
         "BCD", "abc"
diff --git a/test/sql-tap/resolver01.test.lua b/test/sql-tap/resolver01.test.lua
index d08f95bf6..7be2184e5 100755
--- a/test/sql-tap/resolver01.test.lua
+++ b/test/sql-tap/resolver01.test.lua
@@ -236,7 +236,7 @@ test:do_test(
         ]])
         local r = {}
         table.insert(r, test:execsql("SELECT '1', substr(m,2) AS m FROM t4 ORDER BY m;"))
-        table.insert(r, test:execsql("SELECT '2', substr(m,2) AS m FROM t4 ORDER BY m COLLATE binary;"))
+        table.insert(r, test:execsql("SELECT '2', substr(m,2) AS m FROM t4 ORDER BY m COLLATE \"binary\";"))
         table.insert(r, test:execsql("SELECT '3', substr(m,2) AS m FROM t4 ORDER BY lower(m);"))
         return r
     end, {
diff --git a/test/sql-tap/selectE.test.lua b/test/sql-tap/selectE.test.lua
index 11b84711e..b379f5aa8 100755
--- a/test/sql-tap/selectE.test.lua
+++ b/test/sql-tap/selectE.test.lua
@@ -84,7 +84,7 @@ test:do_test(
     function()
         return test:execsql [[
             SELECT a FROM t2 EXCEPT SELECT a FROM t3
-             ORDER BY a COLLATE binary;
+             ORDER BY a COLLATE "binary";
         ]]
     end, {
         -- <selectE-1.2>
@@ -127,7 +127,7 @@ test:do_test(
     function()
         return test:execsql [[
             SELECT a COLLATE "unicode_ci" FROM t2 EXCEPT SELECT a FROM t3
-             ORDER BY 1 COLLATE binary
+             ORDER BY 1 COLLATE "binary"
         ]]
     end, {
         -- <selectE-2.2>
diff --git a/test/sql-tap/tkt2391.test.lua b/test/sql-tap/tkt2391.test.lua
index 7fa5e1634..b48998c53 100755
--- a/test/sql-tap/tkt2391.test.lua
+++ b/test/sql-tap/tkt2391.test.lua
@@ -20,7 +20,7 @@ test:plan(4)
 test:do_execsql_test(
     "tkt2391.1",
     [[
-        CREATE TABLE folders(folderid INT , parentid INT , foldername TEXT COLLATE binary primary key);
+        CREATE TABLE folders(folderid INT , parentid INT , foldername TEXT COLLATE "binary" primary key);
         INSERT INTO folders VALUES(1, 3, 'FolderA');
         INSERT INTO folders VALUES(1, 3, 'folderB');
         INSERT INTO folders VALUES(4, 0, 'FolderC');
diff --git a/test/sql-tap/tkt3493.test.lua b/test/sql-tap/tkt3493.test.lua
index 26ca2271b..7e093b217 100755
--- a/test/sql-tap/tkt3493.test.lua
+++ b/test/sql-tap/tkt3493.test.lua
@@ -246,7 +246,7 @@ test:do_execsql_test(
 test:do_execsql_test(
     "tkt3493-3.1",
     [[
-        CREATE TABLE t2(a  TEXT COLLATE "unicode_ci" PRIMARY KEY, b  TEXT COLLATE BINARY);
+        CREATE TABLE t2(a  TEXT COLLATE "unicode_ci" PRIMARY KEY, b  TEXT COLLATE "binary");
         INSERT INTO t2 VALUES('aBc', 'DeF');
     ]], {
         -- <tkt3493-3.1>
@@ -291,7 +291,7 @@ test:do_execsql_test(
 test:do_execsql_test(
     "tkt3493-3.3.2",
     [[
-        SELECT a>b COLLATE BINARY FROM t2 GROUP BY a, b
+        SELECT a>b COLLATE "binary" FROM t2 GROUP BY a, b
     ]], {
         -- <tkt3493-3.3.2>
         1
diff --git a/test/sql-tap/with1.test.lua b/test/sql-tap/with1.test.lua
index faa99811c..8200d25f8 100755
--- a/test/sql-tap/with1.test.lua
+++ b/test/sql-tap/with1.test.lua
@@ -1056,7 +1056,7 @@ test:do_catchsql_test(13.3, [[
 -- 2015-04-12
 --
 test:do_execsql_test(14.1, [[
-  WITH x AS (SELECT * FROM t) SELECT 0 EXCEPT SELECT 0 ORDER BY 1 COLLATE binary;
+  WITH x AS (SELECT * FROM t) SELECT 0 EXCEPT SELECT 0 ORDER BY 1 COLLATE "binary";
 ]], {
   -- <14.1>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/3] Introduce "none" and "binary" collations
  2018-11-14 20:59         ` n.pettik
@ 2018-11-15 11:04           ` Vladislav Shpilevoy
  0 siblings, 0 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2018-11-15 11:04 UTC (permalink / raw)
  To: n.pettik, tarantool-patches

Hi! Thanks for the fixes!

>>> In other words, each field with type of STRING/SCALAR and mb ANY
>>> would have real pointer to collation. However, in turn such change
>>> would break too many tests, without significant refactoring of SQL
>>> query planner (at least I failed to do it in several hours).
>>> On the other hand, in this case we can remove COLL_NONE
>>> and unify collation interface: each field or key part MUST feature
>>> collation. And I mean not only valid ID (it is how it works now),
>>> but also valid (i.e. != NULL) pointer to collation struct. So, it seems
>>> that we would be able to remove any checks like:
>>> if (coll != NULL)
>>> 	*use collation*
>>> else
>>> 	*process without involving collations*
>>> Would become:
>>> If (type_is_string_like)
>>> 	*always use collation*
>>> To be honest I was sure that you would notice this pickle…
>>> Or current approach is not dat bad?
>>
>> Yeah, I understand your aspiration, but I have some remarks
>> about the draft patch above and about the concept as it is:
>>
>> * you said that you want to get rid of COLL_NONE, but in the draft
>>   above you still use it. Even if you had removed COLL_NONE
>>   usage from key_def builders, it would have stayed in alter.cc
>>   as a check that you can not drop 'none' collation. I think that
>>   COLL_NONE as a constant equal to 0 should not be removed from
>>   source code until it is not used at all. But it is a minor
>>   comment rather about coding style: I prefer code like
>>   'id != UNDERSTANDABLE_NAME' to 'id != <strange_constant>'.
> 
> And still we use iid != 0 to check whether index primary or not…

Decent, but I do not like it too. I tried to convince Kostja that it
is a bad pattern, but as usually he did not listen.

> diff --git a/test/sql-tap/e_select1.test.lua b/test/sql-tap/e_select1.test.lua
> index 47fb7a809..1e8f096e1 100755
> --- a/test/sql-tap/e_select1.test.lua
> +++ b/test/sql-tap/e_select1.test.lua
> @@ -508,11 +508,11 @@ if (0 > 0)
>       test:do_execsql_test(
>           "e_select-1.6.0",
>           [[
> -            CREATE TABLE t5(a  TEXT COLLATE "unicode_ci", b  TEXT COLLATE binary);
> +            CREATE TABLE t5(a  TEXT COLLATE "unicode_ci", b  TEXT COLLATE "binary");
>               INSERT INTO t5 VALUES('AA', 'cc');
>               INSERT INTO t5 VALUES('BB', 'dd');
>               INSERT INTO t5 VALUES(NULL, NULL);
> -            CREATE TABLE t6(a  TEXT COLLATE binary, b  TEXT COLLATE "unicode_ci");
> +            CREATE TABLE t6(a  TEXT COLLATE "c", b  TEXT COLLATE "unicode_ci");

Typo - binary turned into "c". I've force pushed a fix so as to allow
Kirill Y. merge the branch as soon as possible. Nevertheless, I've put
the fix at the end of the email, according to SOP.

>               INSERT INTO t6 VALUES('aa', 'cc');
>               INSERT INTO t6 VALUES('bb', 'DD');
>               INSERT INTO t6 VALUES(NULL, NULL);

Also in the patch 'Introduce "none" and "binary" collations' I see one
of COLL_NONE substituted back to 0. I have force fixed it too, and put
at the end of the email.

==================================================================

Fix for "sql: don't uppercase name of binary collation".

diff --git a/test/sql-tap/e_select1.test.lua b/test/sql-tap/e_select1.test.lua
index 1e8f096e1..e1d814f47 100755
--- a/test/sql-tap/e_select1.test.lua
+++ b/test/sql-tap/e_select1.test.lua
@@ -512,7 +512,7 @@ if (0 > 0)
              INSERT INTO t5 VALUES('AA', 'cc');
              INSERT INTO t5 VALUES('BB', 'dd');
              INSERT INTO t5 VALUES(NULL, NULL);
-            CREATE TABLE t6(a  TEXT COLLATE "c", b  TEXT COLLATE "unicode_ci");
+            CREATE TABLE t6(a  TEXT COLLATE "binary", b  TEXT COLLATE "unicode_ci");
              INSERT INTO t6 VALUES('aa', 'cc');
              INSERT INTO t6 VALUES('bb', 'DD');
              INSERT INTO t6 VALUES(NULL, NULL);

===================================================================

Fix for "Introduce "none" and "binary" collations".

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 8fc217f68..5bdf102e7 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -40,7 +40,7 @@ static uint32_t formats_size = 0, formats_capacity = 0;
  
  static const struct tuple_field tuple_field_default = {
  	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
-	ON_CONFLICT_ACTION_NONE, NULL, 0,
+	ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE,
  };
  
  /**

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL
  2018-11-13  0:07 [tarantool-patches] [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL Nikita Pettik
                   ` (2 preceding siblings ...)
  2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 3/3] sql: change collation compatibility rules Nikita Pettik
@ 2018-11-15 11:24 ` Kirill Yukhin
  3 siblings, 0 replies; 12+ messages in thread
From: Kirill Yukhin @ 2018-11-15 11:24 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, Nikita Pettik

Hello,
On 13 Nov 03:07, Nikita Pettik wrote:
> This patch-set introduces rules for collation compatibility in order to
> restrict usage of different collations within one comparison.
> Originally, SQLite uses doubtful approach: if collations of LHS and RHS
> can not be used together, then collation of LHS is chosen.  Such
> behaviour quite misleading and results in the fact that swap of LHS and
> RHS values may change result of operation.  We've decided to add
> restrictions on collations of LHS and RHS to disallow messing different
> collations as it is stated in ANSI SQL.

I've rebased and checked your patch set into 2.1 branch.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-11-15 11:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-13  0:07 [tarantool-patches] [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL Nikita Pettik
2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 1/3] sql: do not add explicit <COLLATE "BINARY"> clause Nikita Pettik
2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 2/3] Introduce "none" and "binary" collations Nikita Pettik
2018-11-13 12:02   ` [tarantool-patches] " Vladislav Shpilevoy
2018-11-13 22:37     ` n.pettik
2018-11-14 11:52       ` Vladislav Shpilevoy
2018-11-14 20:59         ` n.pettik
2018-11-15 11:04           ` Vladislav Shpilevoy
2018-11-13  0:07 ` [tarantool-patches] [PATCH v2 3/3] sql: change collation compatibility rules Nikita Pettik
2018-11-13 12:02   ` [tarantool-patches] " Vladislav Shpilevoy
2018-11-13 22:37     ` n.pettik
2018-11-15 11:24 ` [tarantool-patches] Re: [PATCH v2 0/3] Change collation compatibility rules according to ANSI SQL Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox