From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id B287E259C0 for ; Thu, 14 Feb 2019 18:26:29 -0500 (EST) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id A12KJNCA6flP for ; Thu, 14 Feb 2019 18:26:29 -0500 (EST) Received: from smtp37.i.mail.ru (smtp37.i.mail.ru [94.100.177.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 450AB27E1E for ; Thu, 14 Feb 2019 18:26:29 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: [tarantool-patches] Re: [PATCH 2/2] sql: compute resulting collation for concatenation From: "n.pettik" In-Reply-To: <0c99e4e0-972a-2013-45b1-5c60747ee2ef@tarantool.org> Date: Fri, 15 Feb 2019 02:26:27 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <5AD6A399-D158-44CF-9180-F57185F2C376@tarantool.org> References: <652a9e6a4514a03ef93133961b09c2f5d45721d8.1547644180.git.korablev@tarantool.org> <0c99e4e0-972a-2013-45b1-5c60747ee2ef@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org Cc: Vladislav Shpilevoy > On 24 Jan 2019, at 21:29, Vladislav Shpilevoy = wrote: >=20 > Thanks for the patch! See 7 comments below. Sorry for delayed answer. > On 16/01/2019 16:13, Nikita Pettik wrote: >> According to ANSI, result of concatenation operation should derive >> collation sequence from its operands. Now it is not true: result is >> always comes with no ("none") collation. >> In a nutshell*, rules are quite simple: >=20 > 1. If you want to put a reference, then use [], like in > literature and like Vova does in his commits. Firstly, I thought > that '*' was a typo. Ok, fixed. >> a) If some data type has an explicit collation EC1, then every data >=20 > 2. Double space after 'type'. In some other places below too. Ok, fixed. As a rule I use vim auto-formatting utility, which aligns borders to 72 chars automatically putting extra spaces between sentences. I think it is kind of normal. >> type that has an explicit collation shall have declared type = collation >> that is EC1. The collation derivation is explicit and the collation = is >> EC1. >> b) If every data type has an implicit collation, then: >> - If every data type has the same declared type collation IC1, then >> the collation derivation is implicit and the collation is IC1. >> - Otherwise, the collation derivation is none. >> c) Otherwise, the collation derivation is none. >> *Read complete statement at 9.5 Result of data type combinations >=20 > 3. Please, say a bit more words: that 9.5 is a chapter in an SQL > standard, and the standard of which year it is. Ok, done. Fixed commit message: Author: Nikita Pettik Date: Tue Jan 15 15:18:37 2019 +0300 sql: compute resulting collation for concatenation =20 According to ANSI, result of concatenation operation should derive collation sequence from its operands. Now it is not true: result is always comes with no ("none") collation. =20 In a nutshell[1], rules are quite simple: =20 a) If some data type has an explicit collation EC1, then every data = type that has an explicit collation shall have declared type collation = that is EC1. The collation derivation is explicit and the collation is = EC1. =20 b) If every data type has an implicit collation, then: =20 - If every data type has the same declared type collation IC1, then the collation derivation is implicit and the collation is IC1. =20 - Otherwise, the collation derivation is none. =20 c) Otherwise, the collation derivation is none. =20 [1] Read complete statement at chapter 9.5 Result of data type combinations, ANSI 2013, Part 2: Foundations. =20 Closes #3937 >> Closes #3937 >> --- >> src/box/sql/expr.c | 47 +++++++++++++++++++- >> test/sql/collation.result | 102 = ++++++++++++++++++++++++++++++++++++++++++++ >> test/sql/collation.test.lua | 46 ++++++++++++++++++++ >> 3 files changed, 193 insertions(+), 2 deletions(-) >> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c >> index f8819f779..e6f536757 100644 >> --- a/src/box/sql/expr.c >> +++ b/src/box/sql/expr.c >> @@ -221,6 +221,45 @@ sql_expr_coll(Parse *parse, Expr *p, bool = *is_explicit_coll, uint32_t *coll_id) >> } >> break; >> } >> + if (op =3D=3D TK_CONCAT) { >> + /* >> + * Despite the fact that procedure below >> + * is very similar to = collation_check_compatability(), >> + * it is slightly different: when both >> + * operands have different implicit collations, >> + * derived collation should be "none", >> + * i.e. no collation is used at all >> + * (instead of raising error). >=20 > 4. Typo: collation_check_compatability -> = collation*S*_check_compat*I*bility. >=20 > Also, I think that it is not worth mentioning the difference here > with that function, especially in such a big comment, looks like an = excuse. > It is better to put a link to the standard. As you wish: diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c index e6f536757..031b1f16f 100644 --- a/src/box/sql/expr.c +++ b/src/box/sql/expr.c @@ -223,13 +223,10 @@ sql_expr_coll(Parse *parse, Expr *p, bool = *is_explicit_coll, uint32_t *coll_id) } if (op =3D=3D TK_CONCAT) { /* - * Despite the fact that procedure below - * is very similar to = collation_check_compatability(), - * it is slightly different: when both - * operands have different implicit collations, - * derived collation should be "none", - * i.e. no collation is used at all - * (instead of raising error). + * Procedure below provides compatibility + * checks declared in ANSI SQL 2013: + * chapter 9.5 Result of data type + * combinations. */ >> + bool is_rhs_forced; >> + uint32_t rhs_coll_id; >> + if (sql_expr_coll(parse, p->pRight, = &is_rhs_forced, >> + &rhs_coll_id) !=3D 0) >> + return -1; >> + if (is_lhs_forced && is_rhs_forced) { >> + if (lhs_coll_id !=3D rhs_coll_id) >> + return -1; >=20 > 5. Did you miss diag_set? No, I did it on purpose. Firstly, this function is recursive, so in case error occurred on bottom levels of recursion, diag would be reseted on each level above (and parser=E2=80=99s counter of errors would be incremented several times as well). No terrible consequences would take place in this case, but it looks like a bad pattern. Anyway, fixed it according to your suggestion. Secondly, many functions which call sql_expr_coll don=E2=80=99t expect that it can return real error and they are not adapted to handle them (e.g. wherePathSatisfiesOrderBy). In original SQLite this function don=E2=80=99t fail under any = circumstances. Sure, if error is set as Parse->err, sooner or later it will arise (I guess this is likely to be global problem of SQLite). Diff (cut from context, sorry): + if (is_lhs_forced && is_rhs_forced) { + if (lhs_coll_id !=3D rhs_coll_id) { + diag_set(ClientError, + = ER_ILLEGAL_COLLATION_MIX); + parse->nErr++; + parse->rc =3D = SQL_TARANTOOL_ERROR; + return -1; + } + } >> + } >> + if (is_lhs_forced) { >> + *coll_id =3D lhs_coll_id; >> + *is_explicit_coll =3D true; >> + return 0; >=20 > 6. In this function (sql_expr_coll) to break the cycle 'break' > keyword is used, so lets be consistent and use 'break' as well. Ok: @@ -248,17 +245,17 @@ sql_expr_coll(Parse *parse, Expr *p, bool = *is_explicit_coll, uint32_t *coll_id) if (is_lhs_forced) { *coll_id =3D lhs_coll_id; *is_explicit_coll =3D true; - return 0; + break; } if (is_rhs_forced) { *coll_id =3D rhs_coll_id; *is_explicit_coll =3D true; - return 0; + break; } if (rhs_coll_id !=3D lhs_coll_id) - return 0; + break; *coll_id =3D lhs_coll_id; - return 0; + break; } if (p->flags & EP_Collate) { if (p->pLeft && (p->pLeft->flags & EP_Collate) = !=3D 0) { >>=20 >> @@ -384,10 +423,14 @@ sql_binary_compare_coll_seq(Parse *parser, Expr = *left, Expr *right) >> bool is_rhs_forced; >> uint32_t lhs_coll_id; >> uint32_t rhs_coll_id; >> - if (sql_expr_coll(parser, left, &is_lhs_forced, &lhs_coll_id) !=3D= 0) >> + if (sql_expr_coll(parser, left, &is_lhs_forced, &lhs_coll_id) !=3D= 0) { >> + diag_set(ClientError, ER_ILLEGAL_COLLATION_MIX); >> goto err; >> - if (sql_expr_coll(parser, right, &is_rhs_forced, &rhs_coll_id) = !=3D 0) >> + } >> + if (sql_expr_coll(parser, right, &is_rhs_forced, &rhs_coll_id) = !=3D 0) { >> + diag_set(ClientError, ER_ILLEGAL_COLLATION_MIX); >=20 > 7. Why do you set it here and not on the point 5 above? sql_expr_coll > can return an error not only because of illegal collation mix, but = also > if a collation does not exist, for example. Hm, I missed this fact. Ok, I=E2=80=99ve done it, see above.