[tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue
Alexander Turenko
alexander.turenko at tarantool.org
Fri Jul 27 16:06:01 MSK 2018
Hi,
See comments below.
Don't sure we should return 'no match' situation when a left hand
expression in not correct unicode string. How other DBs handle that?
WBR, Alexander Turenko.
> sql: LIKE & GLOB pattern comparison issue
>
> Currently function that compares pattern and string for GLOB & LIKE
> operators doesn't work properly. It uses ICU reading function which
> perhaps was working differently before and the implementation for the
> comparison ending isn't paying attention to some special cases, hence
> in those cases it works improperly.
>
It used non-ICU function before, consider commit 198d59ce.
> With the patch applied an error will be returned in case there's
> invalid UTF-8 symbol in pattern & pattern will not be matched
> with the string that contains it. Some minor corrections to function
> were made as well.
>
Nitpicking: 'error will be returned' is third situatuon, other then
'matched' and 'not matched'.
> Сloses #3251
> Сloses #3334
> Part of #3572
> > 2. The thing you test is not related to a table and other columns.
> > Please, convert the tests to the next format: {[[select '' like
> > '_B';]], {1}]]}.
> > To make it more readable, you can do it like `like_testcases` in
> > `sql-tap/collation.test.lua`.
> >
>
> Done.
You are still perform comparisons with table columns.
> -/*
> - * For LIKE and GLOB matching on EBCDIC machines, assume that every
> - * character is exactly one byte in size. Also, provde the Utf8Read()
> - * macro for fast reading of the next character in the common case where
> - * the next character is ASCII.
> +/**
> + * Providing there are symbols in string s this
> + * macro returns UTF-8 code of character and
> + * promotes pointer to the next symbol in the string.
> + * Otherwise return code is SQL_END_OF_STRING.
> */
> -#define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, &status)
> +#define Utf8Read(s, e) (((s) < (e)) ?\
> + ucnv_getNextUChar(pUtf8conv, &(s), (e), &(status)) : 0)
I suggest to add a whitespace before backslash to make it looks better.
> @@ -643,51 +647,61 @@ static const struct compareInfo likeInfoAlt = { '%',
> '_', 0, 0 };
> #define SQLITE_MATCH 0
> #define SQLITE_NOMATCH 1
> #define SQLITE_NOWILDCARDMATCH 2
> +#define SQL_PROHIBITED_PATTERN 3
Update comment before that has a reference to 'patternMatch' function.
> *
> * Globbing rules:
> *
> - * '*' Matches any sequence of zero or more characters.
> + * '*' Matches any sequence of zero or more characters.
Why?
> *
> - * '?' Matches exactly one character.
> + * '?' Matches exactly one character.
> *
> - * [...] Matches one character from the enclosed list of
> - * characters.
> + * [...] Matches one character from the enclosed list of
> + * characters.
> *
> - * [^...] Matches one character not in the enclosed list.
> + * [^...] Matches one character not in the enclosed list.
> *
The same question.
> + * @retval SQLITE_MATCH: Match.
> + * SQLITE_NOMATCH: No match.
> + * SQLITE_NOWILDCARDMATCH: No match in spite of having *
> + * or % wildcards.
> + * SQL_PROHIBITED_PATTERN Pattern contains invalid
> + * symbol.
Nitpicking: no colon after SQL_PROHIBITED_PATTERN.
> */
> static int
> -patternCompare(const char * pattern, /* The glob pattern */
> - const char * string, /* The string to compare against the glob */
> - const struct compareInfo *pInfo, /* Information about how to do
> the compare */
> - UChar32 matchOther /* The escape char (LIKE) or '[' (GLOB) */
> - )
> +sql_utf8_pattern_compare(const char * pattern,
> + const char * string,
> + const struct compareInfo *pInfo,
> + UChar32 matchOther)
Don't get why indent is tabulation + 7 spaces.
The function itself looks good except lines longer then 80 columns.
> @@ -853,8 +903,7 @@ patternCompare(const char * pattern, /* The glob
> pattern */
> int
> sqlite3_strglob(const char *zGlobPattern, const char *zString)
> {
> - return patternCompare(zGlobPattern, zString, &globInfo,
> - '[');
> + return sql_utf8_pattern_compare(zGlobPattern, zString, &globInfo, '[');
> }
>
> /*
> @@ -864,7 +913,7 @@ sqlite3_strglob(const char *zGlobPattern, const char
> *zString)
> int
> sqlite3_strlike(const char *zPattern, const char *zStr, unsigned int esc)
> {
> - return patternCompare(zPattern, zStr, &likeInfoNorm, esc);
> + return sql_utf8_pattern_compare(zPattern, zStr, &likeInfoNorm, esc);
> }
>
Should we add sqlite3_result_error is case of SQL_PROHIBITED_PATTERN for
these two functions?
> - {"LIKE", "like"},
> - {"GLOB", "glob"},
> +-- NOTE: This test needs refactoring after deletion of GLOB &
> +-- type restrictions for LIKE.
> +-- {"LIKE", "like"},
> +-- {"GLOB", "glob"},
I would add related issue number.
> {"AND", "and"},
> {"OR", "or"},
> {"MATCH", "match"},
> @@ -96,7 +98,7 @@ operations = {
> {"+", "-"},
> {"<<", ">>", "&", "|"},
> {"<", "<=", ">", ">="},
> - {"=", "==", "!=", "<>", "LIKE", "GLOB"}, --"MATCH", "REGEXP"},
> + {"=", "==", "!=", "<>"}, --"LIKE", "GLOB"}, "MATCH", "REGEXP"},
Leave comment before MATCH and REGEXP to avoid surprises after removing
your comment.
> --- /dev/null
> +++ b/test/sql-tap/gh-3251-string-pattern-comparison.lua
> @@ -0,0 +1,238 @@
> +#!/usr/bin/env tarantool
> +test = require("sqltester")
> +test:plan(106)
> +
> +local prefix = "like-test-"
> +
> +-- Unciode byte sequences.
> +
1. Typo: unicode.
2. Extra empty line.
> +-- Invalid testcases.
> +
> +for i, tested_string in ipairs(invalid_testcases) do
> + local test_name = prefix .. "2." .. tostring(i)
> + local test_itself = "SELECT 'abc' LIKE 'ab" .. tested_string .. "';"
> +
> +-- We should raise an error if pattern contains invalid characters.
> +
1. Trailing whitespaces.
2. Why comments are not indented as the other code?
3. test_name and test_itself before the section comment looks strange
for me here.
More information about the Tarantool-patches
mailing list