Tarantool development patches archive
 help / color / mirror / Atom feed
From: Nikita Tatunov <hollow653@gmail.com>
To: alexander.turenko@tarantool.org
Cc: avkhatskevich@tarantool.org, tarantool-patches@freelists.org
Subject: [tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue
Date: Wed, 18 Jul 2018 18:24:44 +0300	[thread overview]
Message-ID: <CAEi+_apXUVK8QBrBDm05JYcNxYfB43R0vvsQ_2LHK5j+tv25tg@mail.gmail.com> (raw)
In-Reply-To: <20180718024314.be245cmsgklxuvnk@tkn_work_nb>

[-- Attachment #1: Type: text/plain, Size: 4668 bytes --]

ср, 18 июл. 2018 г. в 5:43, Alexander Turenko <
alexander.turenko@tarantool.org>:

> i Nikita!
>
> I don't have objections against this patch in general, just several
> minor comments and side notes.
>
> Alexey, as I see you are familiar with the code. Can you give a review
> for the patch?
>
> WBR, Alexander Turenko.
>
> On Thu, Jun 28, 2018 at 03:47:16PM +0300, N.Tatunov wrote:
> > Currently function that compares pattern and
> > string for GLOB & LIKE operators doesn't work properly.
> > It uses ICU reading function which perhaps was working
> > differently before and the implementation for the
> > comparison ending isn't paying attention to some
> > special cases, hence in those cases it works improperly.
> > Now the checks for comparison should work fine.
> >
> > Сloses: #3251
> > Сloses: #3334
>
> Use 'Closes #xxxx' form. Don't sure the form with a colon will work.
>
>
Fixed it.


> Utf8Read macro comment now completely irrelevant to its definition.
> Compare to sqlite3:
>
>  584 /*
>  585 ** For LIKE and GLOB matching on EBCDIC machines, assume that every
>  586 ** character is exactly one byte in size.  Also, provde the Utf8Read()
>  587 ** macro for fast reading of the next character in the common case
> where
>  588 ** the next character is ASCII.
>  589 */
>  590 #if defined(SQLITE_EBCDIC)
>  591 # define sqlite3Utf8Read(A)        (*((*A)++))
>  592 # define Utf8Read(A)               (*(A++))
>  593 #else
>  594 # define Utf8Read(A)
>  (A[0]<0x80?*(A++):sqlite3Utf8Read(&A))
>  595 #endif
>
>
Yeah, you're right. Fixed it.


> > ---
> >  src/box/sql/func.c          |  25 ++++----
> >  test/sql-tap/like1.test.lua | 152
> ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 165 insertions(+), 12 deletions(-)
> >  create mode 100755 test/sql-tap/like1.test.lua
> >
> > diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> > index c06e3bd..dcbd7e0 100644
> > --- a/src/box/sql/func.c
> > +++ b/src/box/sql/func.c
> > @@ -643,6 +643,7 @@ static const struct compareInfo likeInfoAlt = { '%',
> '_', 0, 0 };
> >  #define SQLITE_MATCH             0
> >  #define SQLITE_NOMATCH           1
> >  #define SQLITE_NOWILDCARDMATCH   2
> > +#define SQL_NO_SYMBOLS_LEFT      65535
> >
>
> 0xffff looks better.
>

Isn't actual with the fix.


>
> I would use name like ICU_ERROR. By the way, it can mean not only end of
> a string, but also other errors. It is okay to give 'not matched' in the
> case, I think.
>
> Hmm, we can misinterpret some error and give 'matched' result, I
> guess... maybe we really should check start < end in the macro to
> distinguish this cases. Don't sure about the test case. As I see from
> ICU source code it can be some internal buffer overflow error. We can do
> the check like so:
>
> -#define Utf8Read(s, e)    ucnv_getNextUChar(pUtf8conv, &s, e, &status)
> +#define Utf8Read(s, e) (((s) < (e)) ? ucnv_getNextUChar(pUtf8conv, &(s),
> (e), \
> +       &(status)) : 0)
>
>
Yes, nice idea, I wouldn't guess it, thank you for advice.
Therefore changed SQL_NO_SYMBOLS_LEFT to SQL_END_OF_STRING,
I think it is more understandable.


-#define SQL_NO_SYMBOLS_LEFT      65535
> +#define SQL_NO_SYMBOLS_LEFT      0

>  /*
> >   * Compare two UTF-8 strings for equality where the first string is
> > @@ -698,29 +699,28 @@ patternCompare(const char * pattern,    /* The
> glob pattern */
> >       const char * string_end = string + strlen(string);
> >       UErrorCode status = U_ZERO_ERROR;
> >
> > -     while (pattern < pattern_end){
> > -             c = Utf8Read(pattern, pattern_end);
> > +     while ((c = Utf8Read(pattern, pattern_end)) !=
> SQL_NO_SYMBOLS_LEFT) {
>
> Here and everywhere below we lean on the fact that ucnv_getNextUChar
> compares pointers and that is so:
>
> 1860     s=*source;
> 1861     if(sourceLimit<s) {
> 1862         *err=U_ILLEGAL_ARGUMENT_ERROR;
> 1863         return 0xffff;
> 1864     }
>
> By the way, the patch reverts partially implemented approach to
> preliminary check start < end introduced in 198d59ce93. I think it is
> okay too, because some checks were missed and now they exist again.
>
> > diff --git a/test/sql-tap/like1.test.lua b/test/sql-tap/like1.test.lua
> > new file mode 100755
> > index 0000000..42b4d43
> > --- /dev/null
> > +++ b/test/sql-tap/like1.test.lua
> > @@ -0,0 +1,152 @@
> > +#!/usr/bin/env tarantool
> > +test = require("sqltester")
> > +test:plan(13)
> > +
>
> Maybe case with % at the end? I tried, that works, but it would be good
> to have a regression test.
>

Yeah, sure.

[-- Attachment #2: Type: text/html, Size: 6679 bytes --]

  parent reply	other threads:[~2018-07-18 15:24 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-28 12:47 [tarantool-patches] " N.Tatunov
2018-06-28 12:54 ` [tarantool-patches] " Hollow111
2018-07-18  2:43 ` Alexander Turenko
2018-07-18  5:51   ` Alex Khatskevich
2018-07-18 15:24   ` Nikita Tatunov [this message]
2018-07-18 15:53     ` Alex Khatskevich
2018-07-18 15:57       ` Nikita Tatunov
2018-07-18 17:10         ` Alexander Turenko
2018-07-19 11:14           ` Nikita Tatunov
2018-07-19 11:56         ` Alex Khatskevich
2018-07-27 11:28           ` Nikita Tatunov
2018-07-27 13:06             ` Alexander Turenko
2018-07-27 19:11               ` Nikita Tatunov
2018-07-27 20:22                 ` Alexander Turenko
2018-07-31 13:27                   ` Nikita Tatunov
2018-07-31 13:47                     ` Alexander Turenko
2018-08-01 10:35                       ` Nikita Tatunov
2018-08-01 10:51                         ` Nikita Tatunov
2018-08-01 13:56                           ` Alex Khatskevich
2018-08-01 18:10                             ` Nikita Tatunov
2018-08-01 18:14                               ` Nikita Tatunov
2018-08-08 12:38                               ` Alex Khatskevich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEi+_apXUVK8QBrBDm05JYcNxYfB43R0vvsQ_2LHK5j+tv25tg@mail.gmail.com \
    --to=hollow653@gmail.com \
    --cc=alexander.turenko@tarantool.org \
    --cc=avkhatskevich@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox