From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 9268726B60 for ; Tue, 17 Jul 2018 22:43:12 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1_ECaqjs9OC9 for ; Tue, 17 Jul 2018 22:43:12 -0400 (EDT) Received: from smtp39.i.mail.ru (smtp39.i.mail.ru [94.100.177.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id D256724742 for ; Tue, 17 Jul 2018 22:43:11 -0400 (EDT) Date: Wed, 18 Jul 2018 05:43:14 +0300 From: Alexander Turenko Subject: [tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue Message-ID: <20180718024314.be245cmsgklxuvnk@tkn_work_nb> References: <1530190036-10105-1-git-send-email-hollow653@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1530190036-10105-1-git-send-email-hollow653@gmail.com> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: Nikita Tatunov , Alex Khatskevich Cc: tarantool-patches@freelists.org i Nikita! I don't have objections against this patch in general, just several minor comments and side notes. Alexey, as I see you are familiar with the code. Can you give a review for the patch? WBR, Alexander Turenko. On Thu, Jun 28, 2018 at 03:47:16PM +0300, N.Tatunov wrote: > Currently function that compares pattern and > string for GLOB & LIKE operators doesn't work properly. > It uses ICU reading function which perhaps was working > differently before and the implementation for the > comparison ending isn't paying attention to some > special cases, hence in those cases it works improperly. > Now the checks for comparison should work fine. > > Сloses: #3251 > Сloses: #3334 Use 'Closes #xxxx' form. Don't sure the form with a colon will work. Utf8Read macro comment now completely irrelevant to its definition. Compare to sqlite3: 584 /* 585 ** For LIKE and GLOB matching on EBCDIC machines, assume that every 586 ** character is exactly one byte in size. Also, provde the Utf8Read() 587 ** macro for fast reading of the next character in the common case where 588 ** the next character is ASCII. 589 */ 590 #if defined(SQLITE_EBCDIC) 591 # define sqlite3Utf8Read(A) (*((*A)++)) 592 # define Utf8Read(A) (*(A++)) 593 #else 594 # define Utf8Read(A) (A[0]<0x80?*(A++):sqlite3Utf8Read(&A)) 595 #endif > --- > src/box/sql/func.c | 25 ++++---- > test/sql-tap/like1.test.lua | 152 ++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 165 insertions(+), 12 deletions(-) > create mode 100755 test/sql-tap/like1.test.lua > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index c06e3bd..dcbd7e0 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -643,6 +643,7 @@ static const struct compareInfo likeInfoAlt = { '%', '_', 0, 0 }; > #define SQLITE_MATCH 0 > #define SQLITE_NOMATCH 1 > #define SQLITE_NOWILDCARDMATCH 2 > +#define SQL_NO_SYMBOLS_LEFT 65535 > 0xffff looks better. I would use name like ICU_ERROR. By the way, it can mean not only end of a string, but also other errors. It is okay to give 'not matched' in the case, I think. Hmm, we can misinterpret some error and give 'matched' result, I guess... maybe we really should check start < end in the macro to distinguish this cases. Don't sure about the test case. As I see from ICU source code it can be some internal buffer overflow error. We can do the check like so: -#define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, &status) +#define Utf8Read(s, e) (((s) < (e)) ? ucnv_getNextUChar(pUtf8conv, &(s), (e), \ + &(status)) : 0) -#define SQL_NO_SYMBOLS_LEFT 65535 +#define SQL_NO_SYMBOLS_LEFT 0 > /* > * Compare two UTF-8 strings for equality where the first string is > @@ -698,29 +699,28 @@ patternCompare(const char * pattern, /* The glob pattern */ > const char * string_end = string + strlen(string); > UErrorCode status = U_ZERO_ERROR; > > - while (pattern < pattern_end){ > - c = Utf8Read(pattern, pattern_end); > + while ((c = Utf8Read(pattern, pattern_end)) != SQL_NO_SYMBOLS_LEFT) { Here and everywhere below we lean on the fact that ucnv_getNextUChar compares pointers and that is so: 1860 s=*source; 1861 if(sourceLimit diff --git a/test/sql-tap/like1.test.lua b/test/sql-tap/like1.test.lua > new file mode 100755 > index 0000000..42b4d43 > --- /dev/null > +++ b/test/sql-tap/like1.test.lua > @@ -0,0 +1,152 @@ > +#!/usr/bin/env tarantool > +test = require("sqltester") > +test:plan(13) > + Maybe case with % at the end? I tried, that works, but it would be good to have a regression test.