From: Nikita Tatunov <hollow653@gmail.com> To: alexander.turenko@tarantool.org Cc: avkhatskevich@tarantool.org, tarantool-patches@freelists.org Subject: [tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue Date: Wed, 18 Jul 2018 18:24:44 +0300 [thread overview] Message-ID: <CAEi+_apXUVK8QBrBDm05JYcNxYfB43R0vvsQ_2LHK5j+tv25tg@mail.gmail.com> (raw) In-Reply-To: <20180718024314.be245cmsgklxuvnk@tkn_work_nb> [-- Attachment #1: Type: text/plain, Size: 4668 bytes --] ср, 18 июл. 2018 г. в 5:43, Alexander Turenko < alexander.turenko@tarantool.org>: > i Nikita! > > I don't have objections against this patch in general, just several > minor comments and side notes. > > Alexey, as I see you are familiar with the code. Can you give a review > for the patch? > > WBR, Alexander Turenko. > > On Thu, Jun 28, 2018 at 03:47:16PM +0300, N.Tatunov wrote: > > Currently function that compares pattern and > > string for GLOB & LIKE operators doesn't work properly. > > It uses ICU reading function which perhaps was working > > differently before and the implementation for the > > comparison ending isn't paying attention to some > > special cases, hence in those cases it works improperly. > > Now the checks for comparison should work fine. > > > > Сloses: #3251 > > Сloses: #3334 > > Use 'Closes #xxxx' form. Don't sure the form with a colon will work. > > Fixed it. > Utf8Read macro comment now completely irrelevant to its definition. > Compare to sqlite3: > > 584 /* > 585 ** For LIKE and GLOB matching on EBCDIC machines, assume that every > 586 ** character is exactly one byte in size. Also, provde the Utf8Read() > 587 ** macro for fast reading of the next character in the common case > where > 588 ** the next character is ASCII. > 589 */ > 590 #if defined(SQLITE_EBCDIC) > 591 # define sqlite3Utf8Read(A) (*((*A)++)) > 592 # define Utf8Read(A) (*(A++)) > 593 #else > 594 # define Utf8Read(A) > (A[0]<0x80?*(A++):sqlite3Utf8Read(&A)) > 595 #endif > > Yeah, you're right. Fixed it. > > --- > > src/box/sql/func.c | 25 ++++---- > > test/sql-tap/like1.test.lua | 152 > ++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 165 insertions(+), 12 deletions(-) > > create mode 100755 test/sql-tap/like1.test.lua > > > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > > index c06e3bd..dcbd7e0 100644 > > --- a/src/box/sql/func.c > > +++ b/src/box/sql/func.c > > @@ -643,6 +643,7 @@ static const struct compareInfo likeInfoAlt = { '%', > '_', 0, 0 }; > > #define SQLITE_MATCH 0 > > #define SQLITE_NOMATCH 1 > > #define SQLITE_NOWILDCARDMATCH 2 > > +#define SQL_NO_SYMBOLS_LEFT 65535 > > > > 0xffff looks better. > Isn't actual with the fix. > > I would use name like ICU_ERROR. By the way, it can mean not only end of > a string, but also other errors. It is okay to give 'not matched' in the > case, I think. > > Hmm, we can misinterpret some error and give 'matched' result, I > guess... maybe we really should check start < end in the macro to > distinguish this cases. Don't sure about the test case. As I see from > ICU source code it can be some internal buffer overflow error. We can do > the check like so: > > -#define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, &status) > +#define Utf8Read(s, e) (((s) < (e)) ? ucnv_getNextUChar(pUtf8conv, &(s), > (e), \ > + &(status)) : 0) > > Yes, nice idea, I wouldn't guess it, thank you for advice. Therefore changed SQL_NO_SYMBOLS_LEFT to SQL_END_OF_STRING, I think it is more understandable. -#define SQL_NO_SYMBOLS_LEFT 65535 > +#define SQL_NO_SYMBOLS_LEFT 0 > /* > > * Compare two UTF-8 strings for equality where the first string is > > @@ -698,29 +699,28 @@ patternCompare(const char * pattern, /* The > glob pattern */ > > const char * string_end = string + strlen(string); > > UErrorCode status = U_ZERO_ERROR; > > > > - while (pattern < pattern_end){ > > - c = Utf8Read(pattern, pattern_end); > > + while ((c = Utf8Read(pattern, pattern_end)) != > SQL_NO_SYMBOLS_LEFT) { > > Here and everywhere below we lean on the fact that ucnv_getNextUChar > compares pointers and that is so: > > 1860 s=*source; > 1861 if(sourceLimit<s) { > 1862 *err=U_ILLEGAL_ARGUMENT_ERROR; > 1863 return 0xffff; > 1864 } > > By the way, the patch reverts partially implemented approach to > preliminary check start < end introduced in 198d59ce93. I think it is > okay too, because some checks were missed and now they exist again. > > > diff --git a/test/sql-tap/like1.test.lua b/test/sql-tap/like1.test.lua > > new file mode 100755 > > index 0000000..42b4d43 > > --- /dev/null > > +++ b/test/sql-tap/like1.test.lua > > @@ -0,0 +1,152 @@ > > +#!/usr/bin/env tarantool > > +test = require("sqltester") > > +test:plan(13) > > + > > Maybe case with % at the end? I tried, that works, but it would be good > to have a regression test. > Yeah, sure. [-- Attachment #2: Type: text/html, Size: 6679 bytes --]
next prev parent reply other threads:[~2018-07-18 15:24 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-06-28 12:47 [tarantool-patches] " N.Tatunov 2018-06-28 12:54 ` [tarantool-patches] " Hollow111 2018-07-18 2:43 ` Alexander Turenko 2018-07-18 5:51 ` Alex Khatskevich 2018-07-18 15:24 ` Nikita Tatunov [this message] 2018-07-18 15:53 ` Alex Khatskevich 2018-07-18 15:57 ` Nikita Tatunov 2018-07-18 17:10 ` Alexander Turenko 2018-07-19 11:14 ` Nikita Tatunov 2018-07-19 11:56 ` Alex Khatskevich 2018-07-27 11:28 ` Nikita Tatunov 2018-07-27 13:06 ` Alexander Turenko 2018-07-27 19:11 ` Nikita Tatunov 2018-07-27 20:22 ` Alexander Turenko 2018-07-31 13:27 ` Nikita Tatunov 2018-07-31 13:47 ` Alexander Turenko 2018-08-01 10:35 ` Nikita Tatunov 2018-08-01 10:51 ` Nikita Tatunov 2018-08-01 13:56 ` Alex Khatskevich 2018-08-01 18:10 ` Nikita Tatunov 2018-08-01 18:14 ` Nikita Tatunov 2018-08-08 12:38 ` Alex Khatskevich
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAEi+_apXUVK8QBrBDm05JYcNxYfB43R0vvsQ_2LHK5j+tv25tg@mail.gmail.com \ --to=hollow653@gmail.com \ --cc=alexander.turenko@tarantool.org \ --cc=avkhatskevich@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='[tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox