From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id B40B72A6BF for ; Wed, 27 Mar 2019 09:38:50 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uRfU79zGV5Oa for ; Wed, 27 Mar 2019 09:38:50 -0400 (EDT) Received: from smtp44.i.mail.ru (smtp44.i.mail.ru [94.100.177.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 36BBC2A25A for ; Wed, 27 Mar 2019 09:38:50 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\)) Subject: [tarantool-patches] Re: [PATCH 1/2] sql: add better collation determination in functions From: "i.koptelov" In-Reply-To: <6472870C-8952-43AC-9B86-8BB2E006502A@tarantool.org> Date: Wed, 27 Mar 2019 16:38:47 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <73268E75-2978-465E-8E82-1EEC0DCE92CA@tarantool.org> References: <6472870C-8952-43AC-9B86-8BB2E006502A@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: "n.pettik" Cc: tarantool-patches@freelists.org > On 25 Mar 2019, at 22:26, n.pettik wrote: >=20 >=20 >=20 >> On 20 Mar 2019, at 14:11, Ivan Koptelov = wrote: >>=20 >> Before the patch determination of collation in SQL functions >> (used only in instr) was too narrow: the arguments were scanned >> from left to right, till the argument with collation was >> encountered, then its collation was used. >> Now every arguments collation is considered. The right collation >> which would be used in function is determined using ANSI >> compatibility rules ("type combination" rules). >>=20 >> Note: currently only instr() a.k.a position() uses mechanism >> described above, other functions (except aggregate) simply >> ignores collations. >=20 > That=E2=80=99s not true: I see that min-max aggregates also feature = this flag: >=20 > FUNCTION(min, -1, 0, 1, minmaxFunc, FIELD_TYPE_SCALAR), >=20 > Fourth param indicates whether SQL_FUNC_NEEDCOLL is set or not. Oh, sorry for that. Fixed commit message. >> --- >> src/box/sql/expr.c | 69 = ++++++++++++++++++++++++++++++++++++++++---- >> src/box/sql/sqlInt.h | 8 ++++- >> 2 files changed, 70 insertions(+), 7 deletions(-) >>=20 >> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c >> index a2c70935e..2f48d90c6 100644 >> --- a/src/box/sql/expr.c >> +++ b/src/box/sql/expr.c >> @@ -4093,16 +4093,73 @@ sqlExprCodeTarget(Parse * pParse, Expr * = pExpr, int target) >> testcase(i =3D=3D 31); >> constMask |=3D MASKBIT32(i); >> } >> - if ((pDef->funcFlags & = SQL_FUNC_NEEDCOLL) !=3D >> - 0 && coll =3D=3D NULL) { >> - bool unused; >> - uint32_t id; >> + } >> + /* >> + * Function arguments may have different >> + * collations. The following code >> + * checks if they are compatible and >> + * finds the collation to be used. This >> + * is done using ANSI rules from >> + * collations_check_compatibility(). >> + */ >> + if ((pDef->funcFlags & SQL_FUNC_NEEDCOLL) !=3D 0 = && >> + coll =3D=3D NULL) { >> + struct coll *unused =3D NULL; >> + uint32_t curr_id =3D COLL_NONE; >> + bool is_curr_forced =3D false; >> + >> + uint32_t temp_id =3D COLL_NONE; >> + bool is_temp_forced =3D false; >> + >> + uint32_t lhs_id =3D COLL_NONE; >> + bool is_lhs_forced =3D false; >> + >> + uint32_t rhs_id =3D COLL_NONE; >> + bool is_rhs_forced =3D false; >> + >> + for (int i =3D 0; i < nFarg; i++) { >> if (sql_expr_coll(pParse, >> = pFarg->a[i].pExpr, >> - &unused, &id, >> - &coll) !=3D 0) >> + = &is_lhs_forced, >> + &lhs_id, >> + &unused) !=3D = 0) >> return 0; >> + >> + for (int j =3D i + 1; j < nFarg; = j++) { >> + if = (sql_expr_coll(pParse, >> + = pFarg->a[j].pExpr, >> + = &is_rhs_forced, >> + = &rhs_id, >> + = &unused) !=3D 0) >> + return 0; >=20 > Seems like you need only one pass saving resulting collation. > Resulting collation shouldn=E2=80=99t depend on way of passing through > arguments. And second call of collations_check_copatiility() is > redundant as well. Now you are using 2n^2 calls of this function, > but n is enough: you compare collation of first argument with > second one and save result in tmp. Then, compare tmp with > third argument etc. Thank you for noticing, fixed now: diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c index 8c1889d8a..34abb9665 100644 --- a/src/box/sql/expr.c +++ b/src/box/sql/expr.c @@ -4102,65 +4102,34 @@ sqlExprCodeTarget(Parse * pParse, Expr * pExpr, = int target) * is done using ANSI rules from * collations_check_compatibility(). */ - if (nFarg =3D=3D 1) { - bool unused; - uint32_t id; - if (sql_expr_coll(pParse, - pFarg->a[0].pExpr, = &unused, - &id, &coll) !=3D 0) - return 0; - } if ((pDef->funcFlags & SQL_FUNC_NEEDCOLL) !=3D 0 = && - coll =3D=3D NULL && nFarg > 1) { + coll =3D=3D NULL) { struct coll *unused =3D NULL; uint32_t curr_id =3D COLL_NONE; bool is_curr_forced =3D false; =20 - uint32_t temp_id =3D COLL_NONE; - bool is_temp_forced =3D false; - - uint32_t lhs_id =3D COLL_NONE; - bool is_lhs_forced =3D false; + uint32_t next_id =3D COLL_NONE; + bool is_next_forced =3D false; =20 - uint32_t rhs_id =3D COLL_NONE; - bool is_rhs_forced =3D false; + if (sql_expr_coll(pParse, = pFarg->a[0].pExpr, + &is_curr_forced, = &curr_id, + &unused) !=3D 0) + return 0; =20 - for (int i =3D 0; i < nFarg; i++) { + for (int j =3D 1; j < nFarg; j++) { if (sql_expr_coll(pParse, - = pFarg->a[i].pExpr, - = &is_lhs_forced, - &lhs_id, + = pFarg->a[j].pExpr, + = &is_next_forced, + &next_id, &unused) !=3D = 0) return 0; =20 - for (int j =3D i + 1; j < nFarg; = j++) { - if = (sql_expr_coll(pParse, - = pFarg->a[j].pExpr, - = &is_rhs_forced, - = &rhs_id, - = &unused) !=3D 0) - return 0; - - if = (collations_check_compatibility( - lhs_id, = is_lhs_forced, - rhs_id, = is_rhs_forced, - &temp_id) !=3D = 0) { - = pParse->is_aborted =3D true; - return 0; - } - - is_temp_forced =3D = (temp_id =3D=3D - = lhs_id) ? - = is_lhs_forced : - = is_rhs_forced; - - if = (collations_check_compatibility( - curr_id, = is_curr_forced, - temp_id, = is_temp_forced, - &curr_id) !=3D = 0) { - = pParse->is_aborted =3D true; - return 0; - } + if = (collations_check_compatibility( + curr_id, is_curr_forced, + next_id, is_next_forced, + &curr_id) !=3D 0) { + pParse->is_aborted =3D = true; + return 0; } } coll =3D coll_by_id(curr_id)->coll; >=20 >> + >> + if = (collations_check_compatibility( >> + lhs_id, = is_lhs_forced, >> + rhs_id, = is_rhs_forced, >> + &temp_id) !=3D = 0) { >> + pParse->rc =3D >> + = SQL_TARANTOOL_ERROR; >> + pParse->nErr++; >> + return 0; >> + } >> + >> + is_temp_forced =3D = (temp_id =3D=3D >> + = lhs_id) ? >> + = is_lhs_forced : >> + = is_rhs_forced; >> + >> + if = (collations_check_compatibility( >> + curr_id, = is_curr_forced, >> + temp_id, = is_temp_forced, >> + &curr_id) !=3D = 0) { >> + pParse->rc =3D >> + = SQL_TARANTOOL_ERROR; >> + pParse->nErr++; >> + return 0; >> + } >> + } >> } >> + coll =3D coll_by_id(curr_id)->coll; >> } >> if (pFarg) { >> if (constMask) { >> diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h >> index 8967ea3e0..47ee474bb 100644 >> --- a/src/box/sql/sqlInt.h >> +++ b/src/box/sql/sqlInt.h >> @@ -1660,7 +1660,13 @@ struct FuncDestructor { >> #define SQL_FUNC_LIKE 0x0004 /* Candidate for the LIKE = optimization */ >> #define SQL_FUNC_CASE 0x0008 /* Case-sensitive LIKE-type = function */ >> #define SQL_FUNC_EPHEM 0x0010 /* Ephemeral. Delete with VDBE = */ >> -#define SQL_FUNC_NEEDCOLL 0x0020 /* sqlGetFuncCollSeq() might be = called */ >> +#define SQL_FUNC_NEEDCOLL 0x0020 /* sqlGetFuncCollSeq() might be = called. >> + * The flag is set when the = collation >> + * of function arguments should = be >> + * determined, using rules in >> + * = collations_check_compatibility() >> + * function. >> + */ >> #define SQL_FUNC_LENGTH 0x0040 /* Built-in length() function */ >> #define SQL_FUNC_TYPEOF 0x0080 /* Built-in typeof() function */ >> #define SQL_FUNC_COUNT 0x0100 /* Built-in count(*) aggregate = */ >> --=20 >=20 > Please, provide basic test cases involving one or more built-in > functions and incompatible arguments (at least min-max funcs use it). > Moreover, this flag can=E2=80=99t be set for user-defined functions, = which is pretty sad. Add a few tests: diff --git a/test/sql-tap/func5.test.lua b/test/sql-tap/func5.test.lua index 6605a2ba1..4282fdac8 100755 --- a/test/sql-tap/func5.test.lua +++ b/test/sql-tap/func5.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool test =3D require("sqltester") -test:plan(5) +test:plan(9) =20 --!./tcltestrunner.lua -- 2010 August 27 @@ -98,5 +98,59 @@ test:do_execsql_test( -- }) =20 +-- The following tests ensures that max() and min() functions +-- raise error if argument's collations are incompatible. + +test:do_catchsql_test( + "func-5-3.1", + [[ + SELECT max('a' COLLATE "unicode", 'A' COLLATE "unicode_ci"); + ]], + { + -- + 1, "Illegal mix of collations" + -- + } +) + +test:do_catchsql_test( + "func-5-3.2", + [[ + CREATE TABLE test1 (s1 VARCHAR(5) PRIMARY KEY COLLATE = "unicode"); + CREATE TABLE test2 (s2 VARCHAR(5) PRIMARY KEY COLLATE = "unicode_ci"); + INSERT INTO test1 VALUES ('a'); + INSERT INTO test2 VALUES ('a'); + SELECT max(s1, s2) FROM test1 JOIN test2; + ]], + { + -- + 1, "Illegal mix of collations" + -- + } +) + +test:do_catchsql_test( + "func-5-3.3", + [[ + SELECT min('a' COLLATE "unicode", 'A' COLLATE "unicode_ci"); + ]], + { + -- + 1, "Illegal mix of collations" + -- + } +) + +test:do_catchsql_test( + "func-5-3.4", + [[ + SELECT min(s1, s2) FROM test1 JOIN test2; + ]], + { + -- + 1, "Illegal mix of collations" + -- + } +) =20 test:finish_test()=