From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id CE9CB29166 for ; Mon, 25 Mar 2019 15:26:44 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FDrn8kfZlW5v for ; Mon, 25 Mar 2019 15:26:44 -0400 (EDT) Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id EABB928B33 for ; Mon, 25 Mar 2019 15:26:43 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: [tarantool-patches] Re: [PATCH 1/2] sql: add better collation determination in functions From: "n.pettik" In-Reply-To: Date: Mon, 25 Mar 2019 22:26:41 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <6472870C-8952-43AC-9B86-8BB2E006502A@tarantool.org> References: Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: Ivan Koptelov > On 20 Mar 2019, at 14:11, Ivan Koptelov = wrote: >=20 > Before the patch determination of collation in SQL functions > (used only in instr) was too narrow: the arguments were scanned > from left to right, till the argument with collation was > encountered, then its collation was used. > Now every arguments collation is considered. The right collation > which would be used in function is determined using ANSI > compatibility rules ("type combination" rules). >=20 > Note: currently only instr() a.k.a position() uses mechanism > described above, other functions (except aggregate) simply > ignores collations. That=E2=80=99s not true: I see that min-max aggregates also feature this = flag: FUNCTION(min, -1, 0, 1, minmaxFunc, FIELD_TYPE_SCALAR), Fourth param indicates whether SQL_FUNC_NEEDCOLL is set or not. > --- > src/box/sql/expr.c | 69 ++++++++++++++++++++++++++++++++++++++++---- > src/box/sql/sqlInt.h | 8 ++++- > 2 files changed, 70 insertions(+), 7 deletions(-) >=20 > diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c > index a2c70935e..2f48d90c6 100644 > --- a/src/box/sql/expr.c > +++ b/src/box/sql/expr.c > @@ -4093,16 +4093,73 @@ sqlExprCodeTarget(Parse * pParse, Expr * = pExpr, int target) > testcase(i =3D=3D 31); > constMask |=3D MASKBIT32(i); > } > - if ((pDef->funcFlags & = SQL_FUNC_NEEDCOLL) !=3D > - 0 && coll =3D=3D NULL) { > - bool unused; > - uint32_t id; > + } > + /* > + * Function arguments may have different > + * collations. The following code > + * checks if they are compatible and > + * finds the collation to be used. This > + * is done using ANSI rules from > + * collations_check_compatibility(). > + */ > + if ((pDef->funcFlags & SQL_FUNC_NEEDCOLL) !=3D 0 = && > + coll =3D=3D NULL) { > + struct coll *unused =3D NULL; > + uint32_t curr_id =3D COLL_NONE; > + bool is_curr_forced =3D false; > + > + uint32_t temp_id =3D COLL_NONE; > + bool is_temp_forced =3D false; > + > + uint32_t lhs_id =3D COLL_NONE; > + bool is_lhs_forced =3D false; > + > + uint32_t rhs_id =3D COLL_NONE; > + bool is_rhs_forced =3D false; > + > + for (int i =3D 0; i < nFarg; i++) { > if (sql_expr_coll(pParse, > = pFarg->a[i].pExpr, > - &unused, &id, > - &coll) !=3D 0) > + = &is_lhs_forced, > + &lhs_id, > + &unused) !=3D = 0) > return 0; > + > + for (int j =3D i + 1; j < nFarg; = j++) { > + if = (sql_expr_coll(pParse, > + = pFarg->a[j].pExpr, > + = &is_rhs_forced, > + = &rhs_id, > + = &unused) !=3D 0) > + return 0; Seems like you need only one pass saving resulting collation. Resulting collation shouldn=E2=80=99t depend on way of passing through arguments. And second call of collations_check_copatiility() is redundant as well. Now you are using 2n^2 calls of this function, but n is enough: you compare collation of first argument with second one and save result in tmp. Then, compare tmp with third argument etc. > + > + if = (collations_check_compatibility( > + lhs_id, = is_lhs_forced, > + rhs_id, = is_rhs_forced, > + &temp_id) !=3D = 0) { > + pParse->rc =3D > + = SQL_TARANTOOL_ERROR; > + pParse->nErr++; > + return 0; > + } > + > + is_temp_forced =3D = (temp_id =3D=3D > + = lhs_id) ? > + = is_lhs_forced : > + = is_rhs_forced; > + > + if = (collations_check_compatibility( > + curr_id, = is_curr_forced, > + temp_id, = is_temp_forced, > + &curr_id) !=3D = 0) { > + pParse->rc =3D > + = SQL_TARANTOOL_ERROR; > + pParse->nErr++; > + return 0; > + } > + } > } > + coll =3D coll_by_id(curr_id)->coll; > } > if (pFarg) { > if (constMask) { > diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h > index 8967ea3e0..47ee474bb 100644 > --- a/src/box/sql/sqlInt.h > +++ b/src/box/sql/sqlInt.h > @@ -1660,7 +1660,13 @@ struct FuncDestructor { > #define SQL_FUNC_LIKE 0x0004 /* Candidate for the LIKE = optimization */ > #define SQL_FUNC_CASE 0x0008 /* Case-sensitive LIKE-type = function */ > #define SQL_FUNC_EPHEM 0x0010 /* Ephemeral. Delete with VDBE = */ > -#define SQL_FUNC_NEEDCOLL 0x0020 /* sqlGetFuncCollSeq() might be = called */ > +#define SQL_FUNC_NEEDCOLL 0x0020 /* sqlGetFuncCollSeq() might be = called. > + * The flag is set when the = collation > + * of function arguments should = be > + * determined, using rules in > + * = collations_check_compatibility() > + * function. > + */ > #define SQL_FUNC_LENGTH 0x0040 /* Built-in length() function */ > #define SQL_FUNC_TYPEOF 0x0080 /* Built-in typeof() function */ > #define SQL_FUNC_COUNT 0x0100 /* Built-in count(*) aggregate = */ > --=20 Please, provide basic test cases involving one or more built-in functions and incompatible arguments (at least min-max funcs use it). Moreover, this flag can=E2=80=99t be set for user-defined functions, = which is pretty sad.