From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounce@freelists.org>
Received: from localhost (localhost [127.0.0.1])
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 2E9D926BE9
	for <tarantool-patches@freelists.org>; Tue,  5 Feb 2019 08:50:54 -0500 (EST)
Received: from turing.freelists.org ([127.0.0.1])
	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id Hmw1D3UPet6R for <tarantool-patches@freelists.org>;
	Tue,  5 Feb 2019 08:50:53 -0500 (EST)
Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 20C472672D
	for <tarantool-patches@freelists.org>; Tue,  5 Feb 2019 08:50:53 -0500 (EST)
From: "n.pettik" <korablev@tarantool.org>
Message-Id: <07DBA796-6DD4-41DD-8438-104FE3AE05BB@tarantool.org>
Content-Type: multipart/alternative;
	boundary="Apple-Mail=_4BD75B59-5DCA-4354-81F9-C7E1AA433DAB"
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Subject: [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0'
Date: Tue, 5 Feb 2019 16:50:50 +0300
In-Reply-To: <fd70561d-6a0e-70dc-4c20-bdcac764040a@tarantool.org>
References: <15e143f4-3ea7-c7d6-d8ac-8a0e20b76449@tarantool.org>
 <1560FF96-FECD-4368-8AF8-F8F2AE7696E3@tarantool.org>
 <fd70561d-6a0e-70dc-4c20-bdcac764040a@tarantool.org>
Sender: tarantool-patches-bounce@freelists.org
Errors-to: tarantool-patches-bounce@freelists.org
Reply-To: tarantool-patches@freelists.org
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <tarantool-patches-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: tarantool-patches <tarantool-patches.freelists.org>
List-subscribe: <tarantool-patches-request@freelists.org?Subject=subscribe>
List-owner: <mailto:>
List-post: <mailto:tarantool-patches@freelists.org>
List-archive: <http://www.freelists.org/archives/tarantool-patches>
To: tarantool-patches@freelists.org
Cc: Ivan Koptelov <ivan.koptelov@tarantool.org>


--Apple-Mail=_4BD75B59-5DCA-4354-81F9-C7E1AA433DAB
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8


> On 29/01/2019 19:35, n.pettik wrote:
>>> Fixes LIKE and LENGTH functions. '\0' now treated as
>> Nit: is treated.
> Fixed.
>>> a usual symbol. Strings with '\0' are now processed
>>> entirely. Consider examples:
>>>=20
>>> LENGTH(CHAR(65,00,65)) =3D=3D 3
>>> LIKE(CHAR(65,00,65), CHAR(65,00,66)) =3D=3D False
>> Also, I see that smth wrong with text in this mail again
> I hope now the mail text is ok.

Not quite. It is still highlighted in some way. Have no idea.
>  src/box/sql/func.c         |  88 +++++++++++++-----
>  src/box/sql/vdbeInt.h      |   2 +-
>  test/sql-tap/func.test.lua | 220 =
++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 284 insertions(+), 26 deletions(-)
>=20
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index e46b162d9..2978af983 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -128,6 +128,30 @@ typeofFunc(sqlite3_context * context, int =
NotUsed, sqlite3_value ** argv)
>  	sqlite3_result_text(context, z, -1, SQLITE_STATIC);
>  }
> =20
> +/**
> + * Return number of chars in the given string.
> + *
> + * Number of chars !=3D byte size of string because some characters
> + * are encoded with more than one byte. Also note that all
> + * characters from 'str' to 'str + byte_len' would be counted,
> + * even if there is a '\0' somewhere between them.
> + * @param str String to be counted.
> + * @param byte_len Byte length of given string.
> + * @return
Return what?
> + */
> +static int
> +count_chars(const unsigned char *str, size_t byte_len)
Quite poor naming. I would call it utf8_str_len or
smth with utf8 prefix. Mb it is worth to put it some utils source file.
Also, consider using native U8_NEXT function from utf8.c,
instead of custom SQLITE_SKIP_UTF8. It may be not so fast
but safer I suppose. I don't insist though.
> +{
What if str is NULL? Add at least an assertion.
> +	int n_chars =3D 0;
> +	const unsigned char *prev_z;
> +	for (size_t cnt =3D 0; cnt < byte_len; cnt +=3D (str - prev_z)) =
{
> +		n_chars++;
> +		prev_z =3D str;
> +		SQLITE_SKIP_UTF8(str);
> +	}
> +	return n_chars;
> +}
You can rewrite this function in a simpler way without using SQLITE =
macroses.
Read this topic: =
https://stackoverflow.com/questions/3911536/utf-8-unicode-whats-with-0xc0-=
and-0x80/3911566#3911566
It is quite useful. You may borrow implementation from there.
> +
>  /*
>   * Implementation of the length() function
>   */
> @@ -150,11 +174,7 @@ lengthFunc(sqlite3_context * context, int argc, =
sqlite3_value ** argv)
>  			const unsigned char *z =3D =
sqlite3_value_text(argv[0]);
>  			if (z =3D=3D 0)
>  				return;
> -			len =3D 0;
> -			while (*z) {
> -				len++;
> -				SQLITE_SKIP_UTF8(z);
> -			}
> +			len =3D count_chars(z, =
sqlite3_value_bytes(argv[0]));
>  			sqlite3_result_int(context, len);
>  			break;
>  		}
> @@ -340,11 +360,8 @@ substrFunc(sqlite3_context * context, int argc, =
sqlite3_value ** argv)
>  		if (z =3D=3D 0)
>  			return;
>  		len =3D 0;
> -		if (p1 < 0) {
> -			for (z2 =3D z; *z2; len++) {
> -				SQLITE_SKIP_UTF8(z2);
> -			}
> -		}
> +		if (p1 < 0)
> +			len =3D count_chars(z, =
sqlite3_value_bytes(argv[0]));
>  	}
>  #ifdef SQLITE_SUBSTR_COMPATIBILITY
>  	/* If SUBSTR_COMPATIBILITY is defined then substr(X,0,N) work =
the same as
> @@ -388,12 +405,21 @@ substrFunc(sqlite3_context * context, int argc, =
sqlite3_value ** argv)
>  	}
>  	assert(p1 >=3D 0 && p2 >=3D 0);
>  	if (p0type !=3D SQLITE_BLOB) {
> -		while (*z && p1) {
> +		/*
> +		 * In the code below 'cnt' and 'n_chars' is
> +		 * used because '\0' is not supposed to be
> +		 * end-of-string symbol.
> +		 */
> +		int n_chars =3D count_chars(z, =
sqlite3_value_bytes(argv[0]));
I=E2=80=99d better call it char_count or symbol_count or char_count.
> diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua
> index b7de1d955..8c712bd5e 100755
> --- a/test/sql-tap/func.test.lua
> +++ b/test/sql-tap/func.test.lua
> +-- REPLACE
> +test:do_execsql_test(
> +    "func-62",
> +    "SELECT REPLACE(CHAR(00,65,00,65), CHAR(00), CHAR(65)) LIKE =
'AAAA';",
> +    {1})
> +
> +test:do_execsql_test(
> +    "func-63",
> +    "SELECT REPLACE(CHAR(00,65,00,65), CHAR(65), CHAR(00)) \
> +    LIKE CHAR(00,00,00,00);",
> +    {1})
> +
> +-- SUBSTR
> +test:do_execsql_test(
> +    "func-64",
> +    "SELECT SUBSTR(CHAR(65,00,66,67), 3, 2) LIKE CHAR(66, 67);",
> +    {1})
> +
> +test:do_execsql_test(
> +    "func-65",
> +    "SELECT SUBSTR(CHAR(00,00,00,65), 1, 4) LIKE CHAR(00,00,00,65);",
> +    {1})
> +
Just wondering: why do you use LIKE function almost in all tests?


--Apple-Mail=_4BD75B59-5DCA-4354-81F9-C7E1AA433DAB
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><br =
class=3D""><div><blockquote type=3D"cite" class=3D""><div class=3D""><div =
class=3D"moz-cite-prefix" style=3D"caret-color: rgb(0, 0, 0); =
font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255); text-decoration: none;">On =
29/01/2019 19:35, n.pettik wrote:<br class=3D""></div><blockquote =
type=3D"cite" =
cite=3D"mid:1560FF96-FECD-4368-8AF8-F8F2AE7696E3@tarantool.org" =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255); text-decoration: none;" =
class=3D""><blockquote type=3D"cite" class=3D""><pre =
class=3D"moz-quote-pre" wrap=3D"">Fixes LIKE and LENGTH functions. '\0' =
now treated as
</pre></blockquote><pre class=3D"moz-quote-pre" wrap=3D"">Nit: is =
treated.</pre></blockquote><span style=3D"caret-color: rgb(0, 0, 0); =
font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255); text-decoration: none; float: =
none; display: inline !important;" class=3D"">Fixed.</span><br =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: =
normal; letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); =
text-decoration: none;" class=3D""><blockquote type=3D"cite" =
cite=3D"mid:1560FF96-FECD-4368-8AF8-F8F2AE7696E3@tarantool.org" =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255); text-decoration: none;" =
class=3D""><pre class=3D"moz-quote-pre" wrap=3D""></pre><blockquote =
type=3D"cite" class=3D""><pre class=3D"moz-quote-pre" wrap=3D"">a usual =
symbol. Strings with '\0' are now processed
entirely. Consider examples:

LENGTH(CHAR(65,00,65)) =3D=3D 3
LIKE(CHAR(65,00,65), CHAR(65,00,66)) =3D=3D False
</pre></blockquote><pre class=3D"moz-quote-pre" wrap=3D"">Also, I see =
that smth wrong with text in this mail again</pre></blockquote><span =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
12px; font-style: normal; font-variant-caps: normal; font-weight: =
normal; letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); =
text-decoration: none; float: none; display: inline !important;" =
class=3D"">I hope now the mail text is ok.</span><br style=3D"caret-color:=
 rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant-caps: normal; font-weight: normal; letter-spacing: =
normal; text-align: start; text-indent: 0px; text-transform: none; =
white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255); text-decoration: none;" =
class=3D""></div></blockquote><div><br class=3D""></div>Not quite. It is =
still highlighted in some way. Have no idea.</div><div><blockquote =
type=3D"cite" class=3D""><div class=3D""><blockquote type=3D"cite" =
cite=3D"mid:1560FF96-FECD-4368-8AF8-F8F2AE7696E3@tarantool.org" =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255); text-decoration: none;" =
class=3D""><pre class=3D"moz-quote-pre" wrap=3D""></pre></blockquote><pre =
style=3D"caret-color: rgb(0, 0, 0); font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; word-spacing: =
0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, =
255); text-decoration: none;" class=3D""> src/box/sql/func.c         |  =
88 +++++++++++++-----
 src/box/sql/vdbeInt.h      |   2 +-
 test/sql-tap/func.test.lua | 220 =
++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 284 insertions(+), 26 deletions(-)

diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index e46b162d9..2978af983 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -128,6 +128,30 @@ typeofFunc(sqlite3_context * context, int NotUsed, =
sqlite3_value ** argv)
 	sqlite3_result_text(context, z, -1, SQLITE_STATIC);
 }
=20
+/**
+ * Return number of chars in the given string.
+ *
+ * Number of chars !=3D byte size of string because some characters
+ * are encoded with more than one byte. Also note that all
+ * characters from 'str' to 'str + byte_len' would be counted,
+ * even if there is a '\0' somewhere between them.
+ * @param str String to be counted.
+ * @param byte_len Byte length of given string.
+ * @return
</pre></div></blockquote><div>Return what?</div><blockquote type=3D"cite" =
class=3D""><div class=3D""><pre style=3D"caret-color: rgb(0, 0, 0); =
font-size: 12px; font-style: normal; font-variant-caps: normal; =
font-weight: normal; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); =
text-decoration: none;" class=3D"">+ */
+static int
+count_chars(const unsigned char *str, size_t =
byte_len)</pre></div></blockquote><div>Quite poor naming. I would call =
it utf8_str_len or</div><div>smth with utf8 prefix. Mb it is worth to =
put it some utils source file.</div><div>Also, consider using native =
U8_NEXT function from utf8.c,</div><div>instead of custom =
SQLITE_SKIP_UTF8. It may be not so fast</div><div>but safer I suppose. I =
don't insist though.</div><blockquote type=3D"cite" class=3D""><div =
class=3D""><pre style=3D"caret-color: rgb(0, 0, 0); font-size: 12px; =
font-style: normal; font-variant-caps: normal; font-weight: normal; =
letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255); text-decoration: none;" =
class=3D"">+{</pre></div></blockquote>What if str is NULL? Add at least =
an assertion.<br class=3D""><blockquote type=3D"cite" class=3D""><div =
class=3D""><pre style=3D"caret-color: rgb(0, 0, 0); font-size: 12px; =
font-style: normal; font-variant-caps: normal; font-weight: normal; =
letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255); text-decoration: none;" class=3D"">+=
	int n_chars =3D 0;
+	const unsigned char *prev_z;
+	for (size_t cnt =3D 0; cnt &lt; byte_len; cnt +=3D (str - =
prev_z)) {
+		n_chars++;
+		prev_z =3D str;
+		SQLITE_SKIP_UTF8(str);
+	}
+	return n_chars;
+}
</pre></div></blockquote><div>You can rewrite this function in a simpler =
way without using SQLITE macroses.</div><div>Read this topic:&nbsp;<a =
href=3D"https://stackoverflow.com/questions/3911536/utf-8-unicode-whats-wi=
th-0xc0-and-0x80/3911566#3911566" =
class=3D"">https://stackoverflow.com/questions/3911536/utf-8-unicode-whats=
-with-0xc0-and-0x80/3911566#3911566</a></div><div>It is quite useful. =
You may borrow implementation from there.</div><blockquote type=3D"cite" =
class=3D""><div class=3D""><pre style=3D"caret-color: rgb(0, 0, 0); =
font-size: 12px; font-style: normal; font-variant-caps: normal; =
font-weight: normal; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); =
text-decoration: none;" class=3D"">+
 /*
  * Implementation of the length() function
  */
@@ -150,11 +174,7 @@ lengthFunc(sqlite3_context * context, int argc, =
sqlite3_value ** argv)
 			const unsigned char *z =3D =
sqlite3_value_text(argv[0]);
 			if (z =3D=3D 0)
 				return;
-			len =3D 0;
-			while (*z) {
-				len++;
-				SQLITE_SKIP_UTF8(z);
-			}
+			len =3D count_chars(z, =
sqlite3_value_bytes(argv[0]));
 			sqlite3_result_int(context, len);
 			break;
 		}
@@ -340,11 +360,8 @@ substrFunc(sqlite3_context * context, int argc, =
sqlite3_value ** argv)
 		if (z =3D=3D 0)
 			return;
 		len =3D 0;
-		if (p1 &lt; 0) {
-			for (z2 =3D z; *z2; len++) {
-				SQLITE_SKIP_UTF8(z2);
-			}
-		}
+		if (p1 &lt; 0)
+			len =3D count_chars(z, =
sqlite3_value_bytes(argv[0]));
 	}
 #ifdef SQLITE_SUBSTR_COMPATIBILITY
 	/* If SUBSTR_COMPATIBILITY is defined then substr(X,0,N) work =
the same as
@@ -388,12 +405,21 @@ substrFunc(sqlite3_context * context, int argc, =
sqlite3_value ** argv)
 	}
 	assert(p1 &gt;=3D 0 &amp;&amp; p2 &gt;=3D 0);
 	if (p0type !=3D SQLITE_BLOB) {
-		while (*z &amp;&amp; p1) {
+		/*
+		 * In the code below 'cnt' and 'n_chars' is
+		 * used because '\0' is not supposed to be
+		 * end-of-string symbol.
+		 */
+		int n_chars =3D count_chars(z, =
sqlite3_value_bytes(argv[0]));
</pre></div></blockquote><div>I=E2=80=99d better call it char_count or =
symbol_count or char_count.</div><blockquote type=3D"cite" class=3D""><div=
 class=3D""><pre style=3D"caret-color: rgb(0, 0, 0); font-size: 12px; =
font-style: normal; font-variant-caps: normal; font-weight: normal; =
letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
background-color: rgb(255, 255, 255); text-decoration: none;" =
class=3D"">diff --git a/test/sql-tap/func.test.lua =
b/test/sql-tap/func.test.lua
index b7de1d955..8c712bd5e 100755
--- a/test/sql-tap/func.test.lua
+++ b/test/sql-tap/func.test.lua
+-- REPLACE
+test:do_execsql_test(
+    "func-62",
+    "SELECT REPLACE(CHAR(00,65,00,65), CHAR(00), CHAR(65)) LIKE =
'AAAA';",
+    {1})
+
+test:do_execsql_test(
+    "func-63",
+    "SELECT REPLACE(CHAR(00,65,00,65), CHAR(65), CHAR(00)) \
+    LIKE CHAR(00,00,00,00);",
+    {1})
+
+-- SUBSTR
+test:do_execsql_test(
+    "func-64",
+    "SELECT SUBSTR(CHAR(65,00,66,67), 3, 2) LIKE CHAR(66, 67);",
+    {1})
+
+test:do_execsql_test(
+    "func-65",
+    "SELECT SUBSTR(CHAR(00,00,00,65), 1, 4) LIKE CHAR(00,00,00,65);",
+    {1})
+
</pre></div></blockquote></div>Just wondering: why do you use LIKE =
function almost in all tests?<div class=3D""><br =
class=3D""></div></body></html>=

--Apple-Mail=_4BD75B59-5DCA-4354-81F9-C7E1AA433DAB--