Tarantool development patches archive
 help / color / mirror / Atom feed
From: Alexander Turenko <alexander.turenko@tarantool.org>
To: Nikita Tatunov <hollow653@gmail.com>
Cc: tarantool-patches@freelists.org
Subject: [tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue
Date: Fri, 27 Jul 2018 16:06:01 +0300	[thread overview]
Message-ID: <20180727130601.b2oby7dleapd5upg@tkn_work_nb> (raw)
In-Reply-To: <CAEi+_aoW02p5fUH=HZEPoSa2ebMa9meT=Z6NB73kXCLmzqQ7xw@mail.gmail.com>

Hi,

See comments below.

Don't sure we should return 'no match' situation when a left hand
expression in not correct unicode string. How other DBs handle that?

WBR, Alexander Turenko.

> sql: LIKE & GLOB pattern comparison issue
>
> Currently function that compares pattern and string for GLOB & LIKE
> operators doesn't work properly. It uses ICU reading function which
> perhaps was working differently before and the implementation for the
> comparison ending isn't paying attention to some special cases, hence
> in those cases it works improperly.
>

It used non-ICU function before, consider commit 198d59ce.

> With the patch applied an error will be returned in case there's
> invalid UTF-8 symbol in pattern & pattern will not be matched
> with the string that contains it. Some minor corrections to function
> were made as well.
>

Nitpicking: 'error will be returned' is third situatuon, other then
'matched' and 'not matched'.

> Сloses #3251
> Сloses #3334
> Part of #3572

> > 2. The thing you test is not related to a table and other columns.
> >     Please, convert the tests to the next format: {[[select '' like
> > '_B';]], {1}]]}.
> >     To make it more readable, you can do it like `like_testcases` in
> > `sql-tap/collation.test.lua`.
> >
> 
> Done.

You are still perform comparisons with table columns.

> -/*
> - * For LIKE and GLOB matching on EBCDIC machines, assume that every
> - * character is exactly one byte in size.  Also, provde the Utf8Read()
> - * macro for fast reading of the next character in the common case where
> - * the next character is ASCII.
> +/**
> + * Providing there are symbols in string s this
> + * macro returns UTF-8 code of character and
> + * promotes pointer to the next symbol in the string.
> + * Otherwise return code is SQL_END_OF_STRING.
>   */
> -#define Utf8Read(s, e)    ucnv_getNextUChar(pUtf8conv, &s, e, &status)
> +#define Utf8Read(s, e) (((s) < (e)) ?\
> + ucnv_getNextUChar(pUtf8conv, &(s), (e), &(status)) : 0)

I suggest to add a whitespace before backslash to make it looks better.

> @@ -643,51 +647,61 @@ static const struct compareInfo likeInfoAlt = { '%',
> '_', 0, 0 };
>  #define SQLITE_MATCH             0
>  #define SQLITE_NOMATCH           1
>  #define SQLITE_NOWILDCARDMATCH   2
> +#define SQL_PROHIBITED_PATTERN   3

Update comment before that has a reference to 'patternMatch' function.

>   *
>   * Globbing rules:
>   *
> - *      '*'       Matches any sequence of zero or more characters.
> + *      '*'      Matches any sequence of zero or more characters.

Why?

>   *
> - *      '?'       Matches exactly one character.
> + *      '?'      Matches exactly one character.
>   *
> - *     [...]      Matches one character from the enclosed list of
> - *                characters.
> + *     [...]     Matches one character from the enclosed list of
> + *               characters.
>   *
> - *     [^...]     Matches one character not in the enclosed list.
> + *     [^...]    Matches one character not in the enclosed list.
>   *

The same question.

> + * @retval SQLITE_MATCH:            Match.
> + *    SQLITE_NOMATCH:          No match.
> + *    SQLITE_NOWILDCARDMATCH:  No match in spite of having *
> + *     or % wildcards.
> + *    SQL_PROHIBITED_PATTERN   Pattern contains invalid
> + *     symbol.

Nitpicking: no colon after SQL_PROHIBITED_PATTERN.

>   */
>  static int
> -patternCompare(const char * pattern, /* The glob pattern */
> -        const char * string, /* The string to compare against the glob */
> -        const struct compareInfo *pInfo, /* Information about how to do
> the compare */
> -        UChar32 matchOther /* The escape char (LIKE) or '[' (GLOB) */
> -    )
> +sql_utf8_pattern_compare(const char * pattern,
> +        const char * string,
> +        const struct compareInfo *pInfo,
> +        UChar32 matchOther)

Don't get why indent is tabulation + 7 spaces.

The function itself looks good except lines longer then 80 columns.

> @@ -853,8 +903,7 @@ patternCompare(const char * pattern, /* The glob
> pattern */
>  int
>  sqlite3_strglob(const char *zGlobPattern, const char *zString)
>  {
> - return patternCompare(zGlobPattern, zString, &globInfo,
> -       '[');
> + return sql_utf8_pattern_compare(zGlobPattern, zString, &globInfo, '[');
>  }
> 
>  /*
> @@ -864,7 +913,7 @@ sqlite3_strglob(const char *zGlobPattern, const char
> *zString)
>  int
>  sqlite3_strlike(const char *zPattern, const char *zStr, unsigned int esc)
>  {
> - return patternCompare(zPattern, zStr, &likeInfoNorm, esc);
> + return sql_utf8_pattern_compare(zPattern, zStr, &likeInfoNorm, esc);
>  }
> 

Should we add sqlite3_result_error is case of SQL_PROHIBITED_PATTERN for
these two functions?

> -    {"LIKE", "like"},
> -    {"GLOB", "glob"},
> +-- NOTE: This test needs refactoring after deletion of GLOB &
> +-- type restrictions for LIKE.
> +--    {"LIKE", "like"},
> +--    {"GLOB", "glob"},

I would add related issue number.

>      {"AND", "and"},
>      {"OR", "or"},
>      {"MATCH", "match"},
> @@ -96,7 +98,7 @@ operations = {
>      {"+", "-"},
>      {"<<", ">>", "&", "|"},
>      {"<", "<=", ">", ">="},
> -    {"=", "==", "!=", "<>", "LIKE", "GLOB"}, --"MATCH", "REGEXP"},
> +    {"=", "==", "!=", "<>"}, --"LIKE", "GLOB"}, "MATCH", "REGEXP"},

Leave comment before MATCH and REGEXP to avoid surprises after removing
your comment.

> --- /dev/null
> +++ b/test/sql-tap/gh-3251-string-pattern-comparison.lua
> @@ -0,0 +1,238 @@
> +#!/usr/bin/env tarantool
> +test = require("sqltester")
> +test:plan(106)
> +
> +local prefix = "like-test-"
> +
> +-- Unciode byte sequences.
> +

1. Typo: unicode.
2. Extra empty line.

> +-- Invalid testcases.
> +
> +for i, tested_string in ipairs(invalid_testcases) do
> + local test_name = prefix .. "2." .. tostring(i)
> + local test_itself = "SELECT 'abc' LIKE 'ab" .. tested_string .. "';"
> +
> +-- We should raise an error if pattern contains invalid characters.
> +

1. Trailing whitespaces.
2. Why comments are not indented as the other code?
3. test_name and test_itself before the section comment looks strange
   for me here.

  reply	other threads:[~2018-07-27 13:06 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-28 12:47 [tarantool-patches] " N.Tatunov
2018-06-28 12:54 ` [tarantool-patches] " Hollow111
2018-07-18  2:43 ` Alexander Turenko
2018-07-18  5:51   ` Alex Khatskevich
2018-07-18 15:24   ` Nikita Tatunov
2018-07-18 15:53     ` Alex Khatskevich
2018-07-18 15:57       ` Nikita Tatunov
2018-07-18 17:10         ` Alexander Turenko
2018-07-19 11:14           ` Nikita Tatunov
2018-07-19 11:56         ` Alex Khatskevich
2018-07-27 11:28           ` Nikita Tatunov
2018-07-27 13:06             ` Alexander Turenko [this message]
2018-07-27 19:11               ` Nikita Tatunov
2018-07-27 20:22                 ` Alexander Turenko
2018-07-31 13:27                   ` Nikita Tatunov
2018-07-31 13:47                     ` Alexander Turenko
2018-08-01 10:35                       ` Nikita Tatunov
2018-08-01 10:51                         ` Nikita Tatunov
2018-08-01 13:56                           ` Alex Khatskevich
2018-08-01 18:10                             ` Nikita Tatunov
2018-08-01 18:14                               ` Nikita Tatunov
2018-08-08 12:38                               ` Alex Khatskevich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180727130601.b2oby7dleapd5upg@tkn_work_nb \
    --to=alexander.turenko@tarantool.org \
    --cc=hollow653@gmail.com \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox