<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">ср, 18 июл. 2018 г. в 5:43, Alexander Turenko <<a href="mailto:alexander.turenko@tarantool.org">alexander.turenko@tarantool.org</a>>:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">i Nikita!<br>

<br>

I don't have objections against this patch in general, just several<br>

minor comments and side notes.<br>

<br>

Alexey, as I see you are familiar with the code. Can you give a review<br>

for the patch?<br>

<br>

WBR, Alexander Turenko.<br>

<br>

On Thu, Jun 28, 2018 at 03:47:16PM +0300, N.Tatunov wrote:<br>

> Currently function that compares pattern and<br>

> string for GLOB & LIKE operators doesn't work properly.<br>

> It uses ICU reading function which perhaps was working<br>

> differently before and the implementation for the<br>

> comparison ending isn't paying attention to some<br>

> special cases, hence in those cases it works improperly.<br>

> Now the checks for comparison should work fine.<br>

> <br>

> Сloses: #3251<br>

> Сloses: #3334<br>

<br>

Use 'Closes #xxxx' form. Don't sure the form with a colon will work.<br>

<br></blockquote><div><br></div><div>Fixed it.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Utf8Read macro comment now completely irrelevant to its definition.<br>

Compare to sqlite3:<br>

<br>

 584 /*<br>

 585 ** For LIKE and GLOB matching on EBCDIC machines, assume that every<br>

 586 ** character is exactly one byte in size.  Also, provde the Utf8Read()<br>

 587 ** macro for fast reading of the next character in the common case where<br>

 588 ** the next character is ASCII.<br>

 589 */<br>

 590 #if defined(SQLITE_EBCDIC)<br>

 591 # define sqlite3Utf8Read(A)        (*((*A)++))<br>

 592 # define Utf8Read(A)               (*(A++))<br>

 593 #else<br>

 594 # define Utf8Read(A)               (A[0]<0x80?*(A++):sqlite3Utf8Read(&A))<br>

 595 #endif<br>

<br></blockquote><div><br></div><div>Yeah, you're right. Fixed it.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

> ---<br>

>  src/box/sql/func.c          |  25 ++++----<br>

>  test/sql-tap/like1.test.lua | 152 ++++++++++++++++++++++++++++++++++++++++++++<br>

>  2 files changed, 165 insertions(+), 12 deletions(-)<br>

>  create mode 100755 test/sql-tap/like1.test.lua<br>

> <br>

> diff --git a/src/box/sql/func.c b/src/box/sql/func.c<br>

> index c06e3bd..dcbd7e0 100644<br>

> --- a/src/box/sql/func.c<br>

> +++ b/src/box/sql/func.c<br>

> @@ -643,6 +643,7 @@ static const struct compareInfo likeInfoAlt = { '%', '_', 0, 0 };<br>

>  #define SQLITE_MATCH             0<br>

>  #define SQLITE_NOMATCH           1<br>

>  #define SQLITE_NOWILDCARDMATCH   2<br>

> +#define SQL_NO_SYMBOLS_LEFT      65535<br>

>  <br>

<br>

0xffff looks better.<br></blockquote><div><br></div><div>Isn't actual with the fix.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

I would use name like ICU_ERROR. By the way, it can mean not only end of<br>

a string, but also other errors. It is okay to give 'not matched' in the<br>

case, I think.<br>

<br>

Hmm, we can misinterpret some error and give 'matched' result, I<br>

guess... maybe we really should check start < end in the macro to<br>

distinguish this cases. Don't sure about the test case. As I see from<br>

ICU source code it can be some internal buffer overflow error. We can do<br>

the check like so:<br>

<br>

-#define Utf8Read(s, e)    ucnv_getNextUChar(pUtf8conv, &s, e, &status)<br>

+#define Utf8Read(s, e) (((s) < (e)) ? ucnv_getNextUChar(pUtf8conv, &(s), (e), \<br>

+       &(status)) : 0)<br>

<br></blockquote><div><br></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Yes, nice idea, I wouldn't guess it, thank you for advice.</span></div><div><div style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">Therefore changed SQL_NO_SYMBOLS_LEFT to SQL_END_OF_STRING,</div><div style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">I think it is more understandable.</div> </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

-#define SQL_NO_SYMBOLS_LEFT      65535<br>

+#define SQL_NO_SYMBOLS_LEFT      0</blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

>  /*<br>

>   * Compare two UTF-8 strings for equality where the first string is<br>

> @@ -698,29 +699,28 @@ patternCompare(const char * pattern,    /* The glob pattern */<br>

>       const char * string_end = string + strlen(string);<br>

>       UErrorCode status = U_ZERO_ERROR;<br>

>  <br>

> -     while (pattern < pattern_end){<br>

> -             c = Utf8Read(pattern, pattern_end);<br>

> +     while ((c = Utf8Read(pattern, pattern_end)) != SQL_NO_SYMBOLS_LEFT) {<br>

<br>

Here and everywhere below we lean on the fact that ucnv_getNextUChar<br>

compares pointers and that is so:<br>

<br>

1860     s=*source;<br>

1861     if(sourceLimit<s) {<br>

1862         *err=U_ILLEGAL_ARGUMENT_ERROR;<br>

1863         return 0xffff;<br>

1864     }<br>

<br>

By the way, the patch reverts partially implemented approach to<br>

preliminary check start < end introduced in 198d59ce93. I think it is<br>

okay too, because some checks were missed and now they exist again.<br>

<br>

> diff --git a/test/sql-tap/like1.test.lua b/test/sql-tap/like1.test.lua<br>

> new file mode 100755<br>

> index 0000000..42b4d43<br>

> --- /dev/null<br>

> +++ b/test/sql-tap/like1.test.lua<br>

> @@ -0,0 +1,152 @@<br>

> +#!/usr/bin/env tarantool<br>

> +test = require("sqltester")<br>

> +test:plan(13)<br>

> +<br>

<br>

Maybe case with % at the end? I tried, that works, but it would be good<br>

to have a regression test.<br></blockquote><div><br></div><div>Yeah, sure. </div></div></div>