From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 09B3922E09 for ; Thu, 19 Jul 2018 07:14:34 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p1MOHxrlO9PC for ; Thu, 19 Jul 2018 07:14:33 -0400 (EDT) Received: from mail-lf0-f67.google.com (mail-lf0-f67.google.com [209.85.215.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 7D0F021F7B for ; Thu, 19 Jul 2018 07:14:33 -0400 (EDT) Received: by mail-lf0-f67.google.com with SMTP id a4-v6so5745140lff.5 for ; Thu, 19 Jul 2018 04:14:33 -0700 (PDT) MIME-Version: 1.0 References: <1530190036-10105-1-git-send-email-hollow653@gmail.com> <20180718024314.be245cmsgklxuvnk@tkn_work_nb> <20180718171024.ry2cltazdl2hsrpc@tkn_work_nb> In-Reply-To: <20180718171024.ry2cltazdl2hsrpc@tkn_work_nb> From: Nikita Tatunov Date: Thu, 19 Jul 2018 14:14:20 +0300 Message-ID: Subject: [tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue Content-Type: multipart/alternative; boundary="000000000000917c8305715848e2" Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: alexander.turenko@tarantool.org Cc: tarantool-patches@freelists.org, avkhatskevich@tarantool.org --000000000000917c8305715848e2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =D1=81=D1=80, 18 =D0=B8=D1=8E=D0=BB. 2018 =D0=B3. =D0=B2 20:10, Alexander T= urenko < alexander.turenko@tarantool.org>: > Two minor comments are below. > > WBR, Alexander Turenko. > > On Wed, Jul 18, 2018 at 06:57:39PM +0300, Nikita Tatunov wrote: > > =D1=81=D1=80, 18 =D0=B8=D1=8E=D0=BB. 2018 =D0=B3. =D0=B2 18:53, Alex= Khatskevich > > <[1]avkhatskevich@tarantool.org>: > > > > Once you have fixed comments, please apply a new full diff at the en= d > > of email, to > > continue review process. > > Answer with a full diff to this email please (just paste as a plain > > text output of `git diff` command) > > On 18.07.2018 18:24, Nikita Tatunov wrote: > > > > -/* > > - * For LIKE and GLOB matching on EBCDIC machines, assume that every > > - * character is exactly one byte in size. Also, provde the > Utf8Read() > > - * macro for fast reading of the next character in the common case > > where > > - * the next character is ASCII. > > +/** > > + * Providing there are symbols in string s this macro returns > > + * UTF-8 code of character and pushes pointer to the next symbol > > + * in the string. Otherwise return code is SQL_END_OF_STRING. > > Nitpicking: pushes -> promotes? > > Both should be fine, I guess. But ok, changed it. > > */ > > -#define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, > &status) > > +#define Utf8Read(s, e) (s < e ?\ > > + ucnv_getNextUChar(pUtf8conv, &s, e, &status) : 0) > > + > > +#define SQL_END_OF_STRING 0 > > Always wrap argument usages of a function-like macro within its body > with parenthesis (consider example I give in the previous email). It is > usual way to handle the fact that a macros works as text replacement, > so, say > > #define SUB(x, y) x - y > > will give the wrong result for the expression SUB(10, 5 - 3), while > > #define SUB(x, y) ((x) - (y)) > > will produce the correct result. > Fixed. Also made few minor codestyle fixes in tests compared to prev. patch. diff --git a/src/box/sql/func.c b/src/box/sql/func.c index c06e3bd..a9784cc 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -617,13 +617,16 @@ struct compareInfo { u8 noCase; /* true to ignore case differences */ }; -/* - * For LIKE and GLOB matching on EBCDIC machines, assume that every - * character is exactly one byte in size. Also, provde the Utf8Read() - * macro for fast reading of the next character in the common case where - * the next character is ASCII. +/** + * Providing there are symbols in string s this + * macro returns UTF-8 code of character and + * promotes pointer to the next symbol in the string. + * Otherwise return code is SQL_END_OF_STRING. */ -#define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, &status) +#define Utf8Read(s, e) (((s) < (e)) ?\ + ucnv_getNextUChar(pUtf8conv, &s, e, &status) : 0) + +#define SQL_END_OF_STRING 0 --000000000000917c8305715848e2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

=D1= =81=D1=80, 18 =D0=B8=D1=8E=D0=BB. 2018 =D0=B3. =D0=B2 20:10, Alexander Ture= nko <alexander.turenk= o@tarantool.org>:
Two minor comments are below.

WBR, Alexander Turenko.

On Wed, Jul 18, 2018 at 06:57:39PM +0300, Nikita Tatunov wrote:
>=C2=A0 =C2=A0 =D1=81=D1=80, 18 =D0=B8=D1=8E=D0=BB. 2018 =D0=B3. =D0=B2 = 18:53, Alex Khatskevich
>=C2=A0 =C2=A0 <[1]avkhatskevich@tarantool.org>:
>
>=C2=A0 =C2=A0 Once you have fixed comments, please apply a new full dif= f at the end
>=C2=A0 =C2=A0 of email, to
>=C2=A0 =C2=A0 continue review process.
>=C2=A0 =C2=A0 Answer with a full diff to this email please (just paste = as a plain
>=C2=A0 =C2=A0 text output of `git diff` command)
>=C2=A0 =C2=A0 On 18.07.2018 18:24, Nikita Tatunov wrote:
>
>=C2=A0 =C2=A0 -/*
>=C2=A0 =C2=A0 - * For LIKE and GLOB matching on EBCDIC machines, assume= that every
>=C2=A0 =C2=A0 - * character is exactly one byte in size.=C2=A0 Also, pr= ovde the Utf8Read()
>=C2=A0 =C2=A0 - * macro for fast reading of the next character in the c= ommon case
>=C2=A0 =C2=A0 where
>=C2=A0 =C2=A0 - * the next character is ASCII.
>=C2=A0 =C2=A0 +/**
>=C2=A0 =C2=A0 + * Providing there are symbols in string s this macro re= turns
>=C2=A0 =C2=A0 + * UTF-8 code of character and pushes pointer to the nex= t symbol
>=C2=A0 =C2=A0 + * in the string. Otherwise return code is SQL_END_OF_ST= RING.

Nitpicking: pushes -> promotes?


Both should be fine, I guess. But ok, = changed it.
=C2=A0
>=C2=A0 =C2=A0 =C2=A0 */
>=C2=A0 =C2=A0 -#define Utf8Read(s, e)=C2=A0 =C2=A0 ucnv_getNextUChar(pU= tf8conv, &s, e, &status)
>=C2=A0 =C2=A0 +#define Utf8Read(s, e) (s < e ?\
>=C2=A0 =C2=A0 + ucnv_getNextUChar(pUtf8conv, &s, e, &status) : = 0)
>=C2=A0 =C2=A0 +
>=C2=A0 =C2=A0 +#define SQL_END_OF_STRING=C2=A0 =C2=A0 =C2=A0 =C2=A00
Always wrap argument usages of a function-like macro within its body
with parenthesis (consider example I give in the previous email). It is
usual way to handle the fact that a macros works as text replacement,
so, say

#define SUB(x, y) x - y

will give the wrong result for the expression SUB(10, 5 - 3), while

#define SUB(x, y) ((x) - (y))

will produce the correct result.

Fixed.=

Also made few minor codestyle fixes in tests co= mpared to prev. patch.
=

=C2=A0diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index c06e3bd..a9784cc 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -617,13 +617,16 @@ struct compareIn= fo {
=C2=A0 u8 noCase; /* true to ignore case differences */<= /div>
=C2=A0};
=C2=A0
-/*
- * For LIKE an= d GLOB matching on EBCDIC machines, assume that every
- * charact= er is exactly one byte in size.=C2=A0 Also, provde the Utf8Read()
- * macro for fast reading of the next character in the common case where<= /div>
- * the next character is ASCII.
+/**
+ * Pro= viding there are symbols in string s this
+ * macro returns UTF-8= code of character and
+ * promotes pointer to the next symbol in= the string.=C2=A0
+ * Otherwise return code is SQL_END_OF_STRING= .
=C2=A0 */
-#define Utf8Read(s, e)=C2=A0 =C2=A0 ucnv_g= etNextUChar(pUtf8conv, &s, e, &status)
+#define Utf8Read(= s, e) (((s) < (e)) ?\
+ ucnv_getNextUChar(pUtf8conv, &s, e, &status) : 0)
+
+#define SQL_END_OF_STRING=C2=A0 =C2=A0 =C2=A0 =C2=A00
=C2= =A0
--000000000000917c8305715848e2--