[tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue

Alexander Turenko alexander.turenko at tarantool.org
Fri Jul 27 16:06:01 MSK 2018


Hi,

See comments below.

Don't sure we should return 'no match' situation when a left hand
expression in not correct unicode string. How other DBs handle that?

WBR, Alexander Turenko.

> sql: LIKE & GLOB pattern comparison issue
>
> Currently function that compares pattern and string for GLOB & LIKE
> operators doesn't work properly. It uses ICU reading function which
> perhaps was working differently before and the implementation for the
> comparison ending isn't paying attention to some special cases, hence
> in those cases it works improperly.
>

It used non-ICU function before, consider commit 198d59ce.

> With the patch applied an error will be returned in case there's
> invalid UTF-8 symbol in pattern & pattern will not be matched
> with the string that contains it. Some minor corrections to function
> were made as well.
>

Nitpicking: 'error will be returned' is third situatuon, other then
'matched' and 'not matched'.

> Сloses #3251
> Сloses #3334
> Part of #3572

> > 2. The thing you test is not related to a table and other columns.
> >     Please, convert the tests to the next format: {[[select '' like
> > '_B';]], {1}]]}.
> >     To make it more readable, you can do it like `like_testcases` in
> > `sql-tap/collation.test.lua`.
> >
> 
> Done.

You are still perform comparisons with table columns.

> -/*
> - * For LIKE and GLOB matching on EBCDIC machines, assume that every
> - * character is exactly one byte in size.  Also, provde the Utf8Read()
> - * macro for fast reading of the next character in the common case where
> - * the next character is ASCII.
> +/**
> + * Providing there are symbols in string s this
> + * macro returns UTF-8 code of character and
> + * promotes pointer to the next symbol in the string.
> + * Otherwise return code is SQL_END_OF_STRING.
>   */
> -#define Utf8Read(s, e)    ucnv_getNextUChar(pUtf8conv, &s, e, &status)
> +#define Utf8Read(s, e) (((s) < (e)) ?\
> + ucnv_getNextUChar(pUtf8conv, &(s), (e), &(status)) : 0)

I suggest to add a whitespace before backslash to make it looks better.

> @@ -643,51 +647,61 @@ static const struct compareInfo likeInfoAlt = { '%',
> '_', 0, 0 };
>  #define SQLITE_MATCH             0
>  #define SQLITE_NOMATCH           1
>  #define SQLITE_NOWILDCARDMATCH   2
> +#define SQL_PROHIBITED_PATTERN   3

Update comment before that has a reference to 'patternMatch' function.

>   *
>   * Globbing rules:
>   *
> - *      '*'       Matches any sequence of zero or more characters.
> + *      '*'      Matches any sequence of zero or more characters.

Why?

>   *
> - *      '?'       Matches exactly one character.
> + *      '?'      Matches exactly one character.
>   *
> - *     [...]      Matches one character from the enclosed list of
> - *                characters.
> + *     [...]     Matches one character from the enclosed list of
> + *               characters.
>   *
> - *     [^...]     Matches one character not in the enclosed list.
> + *     [^...]    Matches one character not in the enclosed list.
>   *

The same question.

> + * @retval SQLITE_MATCH:            Match.
> + *    SQLITE_NOMATCH:          No match.
> + *    SQLITE_NOWILDCARDMATCH:  No match in spite of having *
> + *     or % wildcards.
> + *    SQL_PROHIBITED_PATTERN   Pattern contains invalid
> + *     symbol.

Nitpicking: no colon after SQL_PROHIBITED_PATTERN.

>   */
>  static int
> -patternCompare(const char * pattern, /* The glob pattern */
> -        const char * string, /* The string to compare against the glob */
> -        const struct compareInfo *pInfo, /* Information about how to do
> the compare */
> -        UChar32 matchOther /* The escape char (LIKE) or '[' (GLOB) */
> -    )
> +sql_utf8_pattern_compare(const char * pattern,
> +        const char * string,
> +        const struct compareInfo *pInfo,
> +        UChar32 matchOther)

Don't get why indent is tabulation + 7 spaces.

The function itself looks good except lines longer then 80 columns.

> @@ -853,8 +903,7 @@ patternCompare(const char * pattern, /* The glob
> pattern */
>  int
>  sqlite3_strglob(const char *zGlobPattern, const char *zString)
>  {
> - return patternCompare(zGlobPattern, zString, &globInfo,
> -       '[');
> + return sql_utf8_pattern_compare(zGlobPattern, zString, &globInfo, '[');
>  }
> 
>  /*
> @@ -864,7 +913,7 @@ sqlite3_strglob(const char *zGlobPattern, const char
> *zString)
>  int
>  sqlite3_strlike(const char *zPattern, const char *zStr, unsigned int esc)
>  {
> - return patternCompare(zPattern, zStr, &likeInfoNorm, esc);
> + return sql_utf8_pattern_compare(zPattern, zStr, &likeInfoNorm, esc);
>  }
> 

Should we add sqlite3_result_error is case of SQL_PROHIBITED_PATTERN for
these two functions?

> -    {"LIKE", "like"},
> -    {"GLOB", "glob"},
> +-- NOTE: This test needs refactoring after deletion of GLOB &
> +-- type restrictions for LIKE.
> +--    {"LIKE", "like"},
> +--    {"GLOB", "glob"},

I would add related issue number.

>      {"AND", "and"},
>      {"OR", "or"},
>      {"MATCH", "match"},
> @@ -96,7 +98,7 @@ operations = {
>      {"+", "-"},
>      {"<<", ">>", "&", "|"},
>      {"<", "<=", ">", ">="},
> -    {"=", "==", "!=", "<>", "LIKE", "GLOB"}, --"MATCH", "REGEXP"},
> +    {"=", "==", "!=", "<>"}, --"LIKE", "GLOB"}, "MATCH", "REGEXP"},

Leave comment before MATCH and REGEXP to avoid surprises after removing
your comment.

> --- /dev/null
> +++ b/test/sql-tap/gh-3251-string-pattern-comparison.lua
> @@ -0,0 +1,238 @@
> +#!/usr/bin/env tarantool
> +test = require("sqltester")
> +test:plan(106)
> +
> +local prefix = "like-test-"
> +
> +-- Unciode byte sequences.
> +

1. Typo: unicode.
2. Extra empty line.

> +-- Invalid testcases.
> +
> +for i, tested_string in ipairs(invalid_testcases) do
> + local test_name = prefix .. "2." .. tostring(i)
> + local test_itself = "SELECT 'abc' LIKE 'ab" .. tested_string .. "';"
> +
> +-- We should raise an error if pattern contains invalid characters.
> +

1. Trailing whitespaces.
2. Why comments are not indented as the other code?
3. test_name and test_itself before the section comment looks strange
   for me here.




More information about the Tarantool-patches mailing list