From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 7A52222DD5 for ; Tue, 2 Jul 2019 13:21:07 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n9-K2s8UmBGb for ; Tue, 2 Jul 2019 13:21:07 -0400 (EDT) Received: from smtp58.i.mail.ru (smtp58.i.mail.ru [217.69.128.38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 84FBF22D99 for ; Tue, 2 Jul 2019 13:21:06 -0400 (EDT) From: "n.pettik" Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_ED41C219-89E4-4DE4-9ED2-09ADEBA5FB16" Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: [tarantool-patches] Re: [PATCH 1/2 v2] test: check that collations isn't ignored in SELECTs Date: Tue, 2 Jul 2019 20:21:02 +0300 In-Reply-To: References: <49f3a1279811efc66c580c6039cf1b890ff37889.1561372731.git.roman.habibov@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: Roman Khabibov --Apple-Mail=_ED41C219-89E4-4DE4-9ED2-09ADEBA5FB16 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 28 Jun 2019, at 16:11, Roman Khabibov = wrote: >=20 > Hi! Thanks for the review. >=20 >> On Jun 25, 2019, at 8:30 PM, n.pettik > wrote: >>> On 24 Jun 2019, at 13:54, Roman Khabibov = > = wrote: >>>=20 >>> Add test to check that a new collation isn't ignored regardless >>> of a name of a previous one in the following patterns of quries: >>=20 >> Nit: quries -> queries > Fixed. >=20 >>>=20 >>> SELECT s COLLATE "unicode_ci" FROM a ORDER BY s COLLATE = =E2=80=9Cunicode_ci" >>=20 >> Why do you consider this kind of query? What can be wrong with it? >> Specifying collation for result set members doesn=E2=80=99t make much = sense btw. > I consider it, because it didn=E2=80=99t work after my addition of = checking from the > second patch: > +static int > +check_collate_arg(struct Parse *parse, struct Expr *expr) > +{ > + struct Expr *left =3D expr->pLeft; > + while (left->op =3D=3D TK_COLLATE) > + left =3D left->pLeft; > + enum field_type type =3D sql_expr_type(left); > + if (type !=3D FIELD_TYPE_STRING && type !=3D FIELD_TYPE_SCALAR) = { > + diag_set(ClientError, ER_SQL_PARSER_GENERIC, > + "COLLATE clause can't be used with non-string " > + "arguments"); > + parse->is_aborted =3D true; > + return -1; > + } > + return 0; > +} > The tree of this query included sequence of collations on left nodes. = This is > the result of optimizations in multiSelectOrderBy. Now I can not = exactly say how > It occurs. At first time, I didn=E2=80=99t use loop to descend = throughout left nodes. > I added this loop, because of that reason. Wait, in the example like this: SELECT s COLLATE "unicode_ci" FROM a ORDER BY s COLLATE =E2=80=9Cunicode_= ci" collate clauses refer to independent parts of query and can=E2=80=99t contradict to each other. As I already said: "First one is used during sorting routines, second one is simply ignored (since values of resulting set are not supposed to be compared with anything else)." Why then are they organised into list? Please, investigate reasons of this =E2=80=9Coptimisation=E2=80=9D. >>> Also note: It is disallowed to compare strings with different >>> collations: ISO/IEC JTC 1/SC 32, Part 2: Foundation, page 531 >>> --- >>> src/box/sql/resolve.c | 7 +++++++ >>> test/sql-tap/collation.test.lua | 19 ++++++++++++++++++- >>> 2 files changed, 25 insertions(+), 1 deletion(-) >>>=20 >>> diff --git a/src/box/sql/resolve.c b/src/box/sql/resolve.c >>> index fdf3703da..348b3ea9a 100644 >>> --- a/src/box/sql/resolve.c >>> +++ b/src/box/sql/resolve.c >>> @@ -109,6 +109,13 @@ resolveAlias(Parse * pParse, /* Parsing = context */ >>> return; >>> if (zType[0] !=3D 'G') >>> incrAggFunctionDepth(pDup, nSubquery); >>> + /* >>> + * If there was typed more than one explicit collations in >>> + * query, it will be a sequence of left nodes with the >>> + * collations in a tree. There is nothing special about >>> + * keeping the sequence. Only one collation could be >>> + * stored, but the present solution is simpler. >>> + */ >>=20 >> Do not understand how mentioned example is related to this code. >> I suppose you might mean example like: >>=20 >> SELECT s COLLATE =E2=80=9Cunicode=E2=80=9D COLLATE =E2=80=9Cbinary=E2=80= =9D COLLATE =E2=80=9Cunicode_ci=E2=80=9D =E2=80=A6 >>=20 >> Is this syntax allowed by ANSI? If so, which one (first or last) = collation must be used? >>=20 >>> if (pExpr->op =3D=3D TK_COLLATE) { >>> pDup =3D >>> sqlExprAddCollateString(pParse, pDup, = pExpr->u.zToken); >>> diff --git a/test/sql-tap/collation.test.lua = b/test/sql-tap/collation.test.lua >>> index 79547361c..f36540fc2 100755 >>> --- a/test/sql-tap/collation.test.lua >>> +++ b/test/sql-tap/collation.test.lua >>> @@ -1,6 +1,6 @@ >>> #!/usr/bin/env tarantool >>> test =3D require("sqltester") >>> -test:plan(174) >>> +test:plan(177) >>>=20 >>> local prefix =3D "collation-" >>>=20 >>> @@ -529,4 +529,21 @@ test:do_catchsql_test( >>> 'CREATE TABLE test3 (a int, b int, c int, PRIMARY KEY (a, a = COLLATE foo, b, c))', >>> {1, "Collation 'FOO' does not exist"}) >>>=20 >>> +-- gh-3805 Check that collation is not ignored. >>=20 >> Why these test cases are related to the mentioned ticket? >> Which collation might be ignored? I see two collations in >> two different parts of query. First one is used during sorting >> routines, second one is simply ignored (since values of >> resulting set are not supposed to be compared with anything else). > See above. See above what?:) Please drop this commit or provide decent explanation. Now I can=E2=80=99t understand how provided test is related to the ticket and what=E2=80=99s wrong with this case at all. --Apple-Mail=_ED41C219-89E4-4DE4-9ED2-09ADEBA5FB16 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On 28 Jun 2019, at 16:11, Roman Khabibov <roman.habibov@tarantool.org> wrote:

Hi! Thanks for the = review.

On = Jun 25, 2019, at 8:30 PM, n.pettik <korablev@tarantool.org> wrote:
On 24 Jun 2019, at 13:54, Roman Khabibov = <roman.habibov@tarantool.org> wrote:

Add test to check that a new collation isn't ignored = regardless
of a name of a previous one in the following = patterns of quries:

Nit: = quries -> queries
Fixed.


SELECT s = COLLATE "unicode_ci" FROM a ORDER BY s COLLATE =E2=80=9Cunicode_ci"

Why do you consider this kind of = query? What can be wrong with it?
Specifying collation for = result set members doesn=E2=80=99t make much sense btw.
I consider it, because it didn=E2=80=99t work after my = addition of checking from the
second patch:
+static int
+check_collate_arg(struct Parse *parse, struct Expr = *expr)
+{
+ struct Expr = *left =3D expr->pLeft;
+ while (left->op =3D=3D TK_COLLATE)
+ left =3D left->pLeft;
+ enum field_type type =3D sql_expr_type(left);
+ if (type !=3D = FIELD_TYPE_STRING && type !=3D FIELD_TYPE_SCALAR) {
+ diag_set(ClientError, ER_SQL_PARSER_GENERIC,
+ =  "COLLATE clause can't be = used with non-string "
+ =  "arguments");
+ parse->is_aborted =3D true;
+ = return = -1;
+ }
+ return = 0;
+}
The tree of this query included = sequence of collations on left nodes. This is
the result of optimizations in = multiSelectOrderBy. Now I can not exactly say how
It occurs. At first time, I = didn=E2=80=99t use loop to descend throughout left nodes.
I added this loop, because of = that reason.

Wait, in = the example like this:

 SELECT s COLLATE "unicode_ci" FROM a ORDER BY s = COLLATE =E2=80=9Cunicode_ci"

collate = clauses refer to independent parts of query and
can=E2=80=99t contradict = to each = other. As I already said:

"First one is used during = sorting
routines, second one is simply ignored (since values = of
resulting set are not supposed to be compared with = anything else)."

Why then are = they organised into list? Please, investigate reasons
of this = =E2=80=9Coptimisation=E2=80=9D.

Also note: It is = disallowed to compare strings with different
collations: = ISO/IEC JTC 1/SC 32, Part 2: Foundation, page 531
---
src/box/sql/resolve.c =           |  7 = +++++++
test/sql-tap/collation.test.lua | 19 = ++++++++++++++++++-
2 files changed, 25 insertions(+), 1 = deletion(-)

diff --git = a/src/box/sql/resolve.c b/src/box/sql/resolve.c
index = fdf3703da..348b3ea9a 100644
--- a/src/box/sql/resolve.c
+++ b/src/box/sql/resolve.c
@@ -109,6 +109,13 = @@ resolveAlias(Parse * pParse, /* Parsing context */
= = return;
if (zType[0] !=3D 'G')
= = incrAggFunctionDepth(pDup, nSubquery);
+ /*
+ =  * If there was = typed more than one explicit collations in
+  * query, it will be a = sequence of left nodes with the
+  * collations in a tree. = There is nothing special about
+  * keeping the sequence. = Only one collation could be
+  * stored, but the present = solution is simpler.
+  */

Do not understand how mentioned = example is related to this code.
I suppose you might mean = example like:

SELECT s COLLATE = =E2=80=9Cunicode=E2=80=9D COLLATE =E2=80=9Cbinary=E2=80=9D COLLATE = =E2=80=9Cunicode_ci=E2=80=9D =E2=80=A6

Is = this syntax allowed by ANSI? If so, which one (first or last) collation = must be used?

= if (pExpr->op =3D=3D TK_COLLATE) {
pDup =3D = =     sqlExprAddC= ollateString(pParse, pDup, pExpr->u.zToken);
diff --git = a/test/sql-tap/collation.test.lua b/test/sql-tap/collation.test.lua
index 79547361c..f36540fc2 100755
--- = a/test/sql-tap/collation.test.lua
+++ = b/test/sql-tap/collation.test.lua
@@ -1,6 +1,6 @@
#!/usr/bin/env tarantool
test =3D = require("sqltester")
-test:plan(174)
+test:plan(177)

local prefix =3D = "collation-"

@@ -529,4 +529,21 @@ = test:do_catchsql_test(
      'CREATE TABLE test3 (a = int, b int, c int, PRIMARY KEY (a, a COLLATE foo, b, c))',
      {1, "Collation 'FOO' does = not exist"})

+-- gh-3805 Check that = collation is not ignored.

Why = these test cases are related to the mentioned ticket?
Which = collation might be ignored? I see two collations in
two = different parts of query. First one is used during sorting
routines, second one is simply ignored (since values of
resulting set are not supposed to be compared with anything = else).
See above.

See above what?:) Please drop this commit or = provide
decent explanation. Now I can=E2=80=99t understand how = provided
test is related to the ticket and what=E2=80=99s = wrong with this case
at all.

= --Apple-Mail=_ED41C219-89E4-4DE4-9ED2-09ADEBA5FB16--