From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id B81B621625 for ; Tue, 25 Dec 2018 06:40:54 -0500 (EST) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Xn70YLHvfm9H for ; Tue, 25 Dec 2018 06:40:54 -0500 (EST) Received: from smtp31.i.mail.ru (smtp31.i.mail.ru [94.100.177.91]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id F365B21622 for ; Tue, 25 Dec 2018 06:40:53 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\)) Subject: [tarantool-patches] Re: [PATCH] sql: fix bug with BLOB TRIM() when X'00' in char set From: "n.pettik" In-Reply-To: <3b2ab896-d36f-3d19-e740-34d9db5d91f0@tarantool.org> Date: Tue, 25 Dec 2018 13:40:50 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20181215105741.28464-1-roman.habibov@tarantool.org> <3b2ab896-d36f-3d19-e740-34d9db5d91f0@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org Cc: Roman Khabibov > commit 39b2481341cd2e71d67a04bec9aeed0e1189740c > Author: Roman Khabibov > Date: Sat Dec 15 13:21:59 2018 +0300 >=20 > sql: fix bug with BLOB TRIM() when X'00' in char set >=20 > The reason for the bug was that X'00' is a terminal symbol. If the = char set > contained X'00', all characters are ignored after it (including = itself). >=20 > Closes #3543 >=20 > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 9667aead5..f397e23c1 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -1223,9 +1223,19 @@ trimFunc(sqlite3_context * context, int argc, = sqlite3_value ** argv) > } else if ((zCharSet =3D sqlite3_value_text(argv[1])) =3D=3D 0) { > return; > } else { > - const unsigned char *z; > - for (z =3D zCharSet, nChar =3D 0; *z; nChar++) { > + const unsigned char *z =3D zCharSet; > + int sizeOfCharSet =3D \ We don=E2=80=99t use backslashes to carry code (it is used for macroses = and sometimes for comments). > + sqlite3_value_bytes(argv[1]); /* Size of char set in bytes. = */ We put comments on the top of code to be commented: /* Size of char set in bytes. */ int sizeOfCharSet =3D sqlite3_value_bytes(argv[1]); I guess this comment is completely useless: name of var and function say exactly the same as comment does. > + int nProcessedBytes =3D 0; > + nChar =3D 0; > + const unsigned char *zStepBack; > + /* Count the number of UTF-8 characters passing through the > + * entire char set, but not up to the '\0' or X'00' character. = */ Use tnt-style comments. > + while(sizeOfCharSet - nProcessedBytes > 0) { > + zStepBack =3D z; > SQLITE_SKIP_UTF8(z); > + nProcessedBytes +=3D z - zStepBack; > + nChar++; > } > if (nChar > 0) { > azChar =3D > @@ -1235,10 +1245,18 @@ trimFunc(sqlite3_context * context, int argc, = sqlite3_value ** argv) > return; > } > aLen =3D (unsigned char *)&azChar[nChar]; > - for (z =3D zCharSet, nChar =3D 0; *z; nChar++) { > + z =3D zCharSet; > + nChar =3D 0; > + nProcessedBytes =3D 0; This comment is again useless. > + /* Similar to the previous cycle. But I see trailing space. Use git diff to spot such places. > + * now write into "azCharSet". */ Use tnt-style comments. > + while(sizeOfCharSet - nProcessedBytes > 0) { > azChar[nChar] =3D (unsigned char *)z; > + zStepBack =3D z; You don=E2=80=99t need here =E2=80=98zStepBack=E2=80=99, you already = saved current str position to azChar. > SQLITE_SKIP_UTF8(z); > + nProcessedBytes +=3D z - zStepBack; > aLen[nChar] =3D (u8) (z - azChar[nChar]); > + nChar++; > } > } > } All points considered, I suggest diff like this: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index f397e23c1..9b5773321 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1203,7 +1203,8 @@ trimFunc(sqlite3_context * context, int argc, = sqlite3_value ** argv) int i; /* Loop counter */ unsigned char *aLen =3D 0; /* Length of each character in = zCharSet */ unsigned char **azChar =3D 0; /* Individual characters in = zCharSet */ - int nChar; /* Number of characters in zCharSet */ + /* Number of UTF-8 characters in zCharSet. */ + int nChar; =20 if (sqlite3_value_type(argv[0]) =3D=3D SQLITE_NULL) { return; @@ -1224,17 +1225,20 @@ trimFunc(sqlite3_context * context, int argc, = sqlite3_value ** argv) return; } else { const unsigned char *z =3D zCharSet; - int sizeOfCharSet =3D \ - sqlite3_value_bytes(argv[1]); /* Size of char set in = bytes. */ - int nProcessedBytes =3D 0; + int trim_set_sz =3D sqlite3_value_bytes(argv[1]); + int handled_bytes_cnt =3D trim_set_sz; nChar =3D 0; - const unsigned char *zStepBack; - /* Count the number of UTF-8 characters passing through = the - * entire char set, but not up to the '\0' or X'00' = character. */ - while(sizeOfCharSet - nProcessedBytes > 0) { - zStepBack =3D z; + /* + * Count the number of UTF-8 characters passing + * through the entire char set, but not up + * to the '\0' or X'00' character. This allows + * to handle trimming set containing such + * characters. + */ + while(handled_bytes_cnt > 0) { + const unsigned char *prev_byte =3D z; SQLITE_SKIP_UTF8(z); - nProcessedBytes +=3D z - zStepBack; + handled_bytes_cnt -=3D (z - prev_byte); nChar++; } if (nChar > 0) { @@ -1247,15 +1251,12 @@ trimFunc(sqlite3_context * context, int argc, = sqlite3_value ** argv) aLen =3D (unsigned char *)&azChar[nChar]; z =3D zCharSet; nChar =3D 0; - nProcessedBytes =3D 0; - /* Similar to the previous cycle. But=20 - * now write into "azCharSet". */ - while(sizeOfCharSet - nProcessedBytes > 0) { + handled_bytes_cnt =3D trim_set_sz; + while(handled_bytes_cnt > 0) { azChar[nChar] =3D (unsigned char *)z; - zStepBack =3D z; SQLITE_SKIP_UTF8(z); - nProcessedBytes +=3D z - zStepBack; aLen[nChar] =3D (u8) (z - = azChar[nChar]); + handled_bytes_cnt -=3D aLen[nChar]; nChar++; Check it out. If you are ok with it, you can apply it (partially or = fully).