From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 6C924285E7 for ; Wed, 1 Aug 2018 14:10:43 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qxpOmDl8MsLq for ; Wed, 1 Aug 2018 14:10:43 -0400 (EDT) Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com [209.85.167.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 69E0022C05 for ; Wed, 1 Aug 2018 14:10:42 -0400 (EDT) Received: by mail-lf1-f67.google.com with SMTP id u202-v6so13958257lff.9 for ; Wed, 01 Aug 2018 11:10:42 -0700 (PDT) MIME-Version: 1.0 References: <20180718024314.be245cmsgklxuvnk@tkn_work_nb> <20180727130601.b2oby7dleapd5upg@tkn_work_nb> <20180727202219.ikwbax7tysfnmgr4@tkn_work_nb> <20180731134705.3pij4hwyyirhiwr7@tkn_work_nb> In-Reply-To: From: Nikita Tatunov Date: Wed, 1 Aug 2018 21:10:28 +0300 Message-ID: Subject: [tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue Content-Type: multipart/alternative; boundary="000000000000bf266d0572639c61" Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: avkhatskevich@tarantool.org Cc: Alexander Turenko , tarantool-patches@freelists.org --000000000000bf266d0572639c61 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =D1=81=D1=80, 1 =D0=B0=D0=B2=D0=B3. 2018 =D0=B3. =D0=B2 16:56, Alex Khatske= vich : > > > On 01.08.2018 13:51, Nikita Tatunov wrote: > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index c06e3bd..7f93ef6 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -617,13 +617,17 @@ struct compareInfo { > u8 noCase; /* true to ignore case differences */ > }; > > -/* > - * For LIKE and GLOB matching on EBCDIC machines, assume that every > - * character is exactly one byte in size. Also, provde the Utf8Read() > - * macro for fast reading of the next character in the common case where > - * the next character is ASCII. > +/** > + * Providing there are symbols in string s this > + * macro returns UTF-8 code of character and > + * promotes pointer to the next symbol in the string. > + * Otherwise return code is SQL_END_OF_STRING. > */ > -#define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, &status) > +#define Utf8Read(s, e) (((s) < (e)) ? \ > + ucnv_getNextUChar(pUtf8conv, &(s), (e), &(status)) : 0) > > [Later I will ask you to return this macro back, so, you may not do this] > As I understand, you are returning `0` from Utf8Read in case of end of th= e > string. > Let's return `SQL_END_OF_STRING` instead of just `0`. > Yeah, thank you, didn't notice it. > + > +#define SQL_END_OF_STRING 0 > +#define SQL_INVALID_UTF8_SYMBOL 0xfffd > > static const struct compareInfo globInfo =3D { '*', '?', '[', 0 }; > > @@ -638,19 +642,16 @@ static const struct compareInfo likeInfoNorm =3D { > '%', '_', 0, 1 }; > static const struct compareInfo likeInfoAlt =3D { '%', '_', 0, 0 }; > > /* > - * Possible error returns from patternMatch() > + * Possible error returns from sql_utf8_pattern_compare() > */ > #define SQLITE_MATCH 0 > #define SQLITE_NOMATCH 1 > #define SQLITE_NOWILDCARDMATCH 2 > +#define SQL_PROHIBITED_PATTERN 3 > > I am not sure that the invalid (with invalid symbols) pattern can be > called `prohibited`. > Rename somehow? My proposal: SQL_INVALID_PATTERN. > Probably you're right, I was also thinking of changing it somehow. > Moreover, You have named this definition with the `SQL` prefix, which is > good, however, > similar definitions are still prefixed with `SQLITE`. I would like you to > rename those in > this (preferred) or in a separate commit for consistency. > > > > Tarantool !=3D SQLite and I think we should get away from this approach. The thing that names haven't been refactored yet isn't in my control. You can ask Nikita's opinion on it, I guess he will tell you almost the same. > -/* > - * Compare two UTF-8 strings for equality where the first string is > - * a GLOB or LIKE expression. Return values: > - * > - * SQLITE_MATCH: Match > - * SQLITE_NOMATCH: No match > - * SQLITE_NOWILDCARDMATCH: No match in spite of having * or % > wildcards. > +/** > + * Compare two UTF-8 strings for equality where the first string > + * is a GLOB or LIKE expression. > * > * Globbing rules: > * > @@ -663,92 +664,136 @@ static const struct compareInfo likeInfoAlt =3D { > '%', '_', 0, 0 }; > * > * [^...] Matches one character not in the enclosed list. > * > - * With the [...] and [^...] matching, a ']' character can be included > - * in the list by making it the first character after '[' or '^'. A > - * range of characters can be specified using '-'. Example: > - * "[a-z]" matches any single lower-case letter. To match a '-', make > - * it the last character in the list. > + * With the [...] and [^...] matching, a ']' character can be > + * included in the list by making it the first character after > + * '[' or '^'. A range of characters can be specified using '-'. > + * Example: "[a-z]" matches any single lower-case letter. > + * To match a '-', make it the last character in the list. > > Does it work for UTF characters? I suppose no. > Let's write about it here + let's file an issue to make it > work with UTF characters. > Soon this function is gonna be refactored & GLOB is gonna be removed anyways. > * > * Like matching rules: > * > - * '%' Matches any sequence of zero or more characters > + * '%' Matches any sequence of zero or more characters. > * > - ** '_' Matches any one character > + ** '_' Matches any one character. > * > * Ec Where E is the "esc" character and c is any other > - * character, including '%', '_', and esc, match exactly = c. > + * character, including '%', '_', and esc, match > + * exactly c. > * > * The comments within this routine usually assume glob matching. > * > - * This routine is usually quick, but can be N**2 in the worst case. > + * This routine is usually quick, but can be N**2 in the worst > + * case. > + * > + * @param pattern String containing comparison pattern. > + * @param string String being compared. > + * @param compareInfo Information about how to compare. > + * @param matchOther The escape char (LIKE) or '[' (GLOB). > + * > + * @retval SQLITE_MATCH: Match. > + * SQLITE_NOMATCH: No match. > + * SQLITE_NOWILDCARDMATCH: No match in spite of having * > + * or % wildcards. > + * SQL_PROHIBITED_PATTERN: Pattern contains invalid > + * symbol. > > Minor: It is not very good that you use symbol and character > interchangeably. > I suppose that `character` should be used everywhere. > They're synonyms, aren't they? > */ > static int > -patternCompare(const char * pattern, /* The glob pattern */ > - const char * string, /* The string to compare against the glob *= / > - const struct compareInfo *pInfo, /* Information about how to do > the compare */ > - UChar32 matchOther /* The escape char (LIKE) or '[' (GLOB) */ > - ) > +sql_utf8_pattern_compare(const char * pattern, > + const char * string, > + const struct compareInfo *pInfo, > + UChar32 matchOther) > > "star" sign should stick to the attribute name. > https://tarantool.io/en/doc/1.9/dev_guide/c_style_guide/#chapter-3-1-spac= es > > To prevent such typos in the future, you can use special perl > script which checks coding style in Linux kernel patches. > Subject of the ticket wasn't about refactoring function, but thnx, changed it. REMEMBER THIS POINT #1 > { > - UChar32 c, c2; /* Next pattern and input string chars */ > - UChar32 matchOne =3D pInfo->matchOne; /* "?" or "_" */ > - UChar32 matchAll =3D pInfo->matchAll; /* "*" or "%" */ > - UChar32 noCase =3D pInfo->noCase; /* True if uppercase=3D=3Dlowercase *= / > - const char *zEscaped =3D 0; /* One past the last escaped input char */ > + /* Next pattern and input string chars */ > + UChar32 c, c2; > + /* "?" or "_" */ > + UChar32 matchOne =3D pInfo->matchOne; > + /* "*" or "%" */ > + UChar32 matchAll =3D pInfo->matchAll; > + /* True if uppercase=3D=3Dlowercase */ > + UChar32 noCase =3D pInfo->noCase; > + /* One past the last escaped input char */ > + const char *zEscaped =3D 0; > const char * pattern_end =3D pattern + strlen(pattern); > const char * string_end =3D string + strlen(string); > UErrorCode status =3D U_ZERO_ERROR; > > - while (pattern < pattern_end){ > - c =3D Utf8Read(pattern, pattern_end); > + while ((c =3D Utf8Read(pattern, pattern_end)) !=3D SQL_END_OF_STRING) { > > REMEMBER THIS POINT #1 > > + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQL_PROHIBITED_PATTERN; > if (c =3D=3D matchAll) { /* Match "*" */ > - /* Skip over multiple "*" characters in the pattern. If there > - * are also "?" characters, skip those as well, but consume a > - * single character of the input string for each "?" skipped > + /* Skip over multiple "*" characters in > + * the pattern. If there are also "?" > + * characters, skip those as well, but > + * consume a single character of the > + * input string for each "?" skipped. > */ > - while (pattern < pattern_end){ > - c =3D Utf8Read(pattern, pattern_end); > + while ((c =3D Utf8Read(pattern, pattern_end)) !=3D > + SQL_END_OF_STRING) { > + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQL_PROHIBITED_PATTERN; > if (c !=3D matchAll && c !=3D matchOne) > break; > - if (c =3D=3D matchOne > - && Utf8Read(string, string_end) =3D=3D 0) { > + if (c =3D=3D matchOne && > + (c2 =3D Utf8Read(string, string_end)) =3D=3D > + SQL_END_OF_STRING) > return SQLITE_NOWILDCARDMATCH; > - } > + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQLITE_NOMATCH; > } > - /* "*" at the end of the pattern matches */ > - if (pattern =3D=3D pattern_end) > + /* > + * "*" at the end of the pattern matches. > + */ > + if (c =3D=3D SQL_END_OF_STRING) { > + while ((c2 =3D Utf8Read(string, string_end)) !=3D > + SQL_END_OF_STRING) > + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQLITE_NOMATCH; > return SQLITE_MATCH; > + } > if (c =3D=3D matchOther) { > if (pInfo->matchSet =3D=3D 0) { > c =3D Utf8Read(pattern, pattern_end); > - if (c =3D=3D 0) > + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQL_PROHIBITED_PATTERN; > + if (c =3D=3D SQL_END_OF_STRING) > return SQLITE_NOWILDCARDMATCH; > } else { > - /* "[...]" immediately follows the "*". We have to do a slow > - * recursive search in this case, but it is an unusual case. > + /* "[...]" immediately > + * follows the "*". We > + * have to do a slow > + * recursive search in > + * this case, but it is > + * an unusual case. > */ > - assert(matchOther < 0x80); /* '[' is a single-byte character */ > + assert(matchOther < 0x80); > while (string < string_end) { > > REMEMBER THIS POINT #2 > > int bMatch =3D > - patternCompare(&pattern[-1], > - string, > - pInfo, > - matchOther); > + sql_utf8_pattern_compare( > + &pattern[-1], > + string, > + pInfo, > + matchOther); > if (bMatch !=3D SQLITE_NOMATCH) > return bMatch; > - Utf8Read(string, string_end); > + c =3D Utf8Read(string, string_end); > + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQLITE_NOMATCH; > > look at and other `Utf8Read` usages. > You have introduced SQL_END_OF_STRING and changed `Utf8Read` pattern to > use it in > half of cases? > > 1) Look at . 2) I have introduced it not to use explicit 0 and to fix the bug. 3) Fixed it though. > Moreover,in that place you do check `string < string_end` implicitly > inside of > `Utf8Read` but you never use that result. > I suppose you should return old iteration style and `Utf8Read` macro. > ``` > while (string < string_end) { > c =3D Utf8Read(string, string_end); > if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) > return SQLITE_NOMATCH; > ``` > I think it's better to use this approach, but yes: return prev. version of macro: 1) 'zero byte' ticket will be partially fixed. 2) 0xffff is non-unicode anyways. > } > return SQLITE_NOWILDCARDMATCH; > } > } > > - /* At this point variable c contains the first character of the > - * pattern string past the "*". Search in the input string for the > - * first matching character and recursively continue the match from > - * that point. > + /* At this point variable c contains the > + * first character of the pattern string > + * past the "*". Search in the input > + * string for the first matching > + * character and recursively continue the > + * match from that point. > * > - * For a case-insensitive search, set variable cx to be the same as > - * c but in the other case and search the input string for either > - * c or cx. > + * For a case-insensitive search, set > + * variable cx to be the same as c but in > + * the other case and search the input > + * string for either c or cx. > */ > > int bMatch; > @@ -756,14 +801,18 @@ patternCompare(const char * pattern, /* The glob > pattern */ > c =3D u_tolower(c); > while (string < string_end){ > /** > - * This loop could have been implemented > - * without if converting c2 to lower case > - * (by holding c_upper and c_lower), however > - * it is implemented this way because lower > - * works better with German and Turkish > - * languages. > + * This loop could have been > + * implemented without if > + * converting c2 to lower case > + * by holding c_upper and > + * c_lower,however it is > + * implemented this way because > + * lower works better with German > + * and Turkish languages. > */ > c2 =3D Utf8Read(string, string_end); > + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQLITE_NOMATCH; > if (!noCase) { > if (c2 !=3D c) > continue; > @@ -771,9 +820,10 @@ patternCompare(const char * pattern, /* The glob > pattern */ > if (c2 !=3D c && u_tolower(c2) !=3D c) > continue; > } > - bMatch =3D > - patternCompare(pattern, string, > - pInfo, matchOther); > + bMatch =3D sql_utf8_pattern_compare(pattern, > + string, > + pInfo, > + matchOther); > if (bMatch !=3D SQLITE_NOMATCH) > return bMatch; > } > @@ -782,7 +832,9 @@ patternCompare(const char * pattern, /* The glob > pattern */ > if (c =3D=3D matchOther) { > if (pInfo->matchSet =3D=3D 0) { > c =3D Utf8Read(pattern, pattern_end); > - if (c =3D=3D 0) > + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQL_PROHIBITED_PATTERN; > + if (c =3D=3D SQL_END_OF_STRING) > return SQLITE_NOMATCH; > zEscaped =3D pattern; > } else { > @@ -790,23 +842,33 @@ patternCompare(const char * pattern, /* The glob > pattern */ > int seen =3D 0; > int invert =3D 0; > c =3D Utf8Read(string, string_end); > + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQLITE_NOMATCH; > if (string =3D=3D string_end) > return SQLITE_NOMATCH; > c2 =3D Utf8Read(pattern, pattern_end); > + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQL_PROHIBITED_PATTERN; > if (c2 =3D=3D '^') { > invert =3D 1; > c2 =3D Utf8Read(pattern, pattern_end); > + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQL_PROHIBITED_PATTERN; > } > if (c2 =3D=3D ']') { > if (c =3D=3D ']') > seen =3D 1; > c2 =3D Utf8Read(pattern, pattern_end); > + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQL_PROHIBITED_PATTERN; > } > - while (c2 && c2 !=3D ']') { > + while (c2 !=3D SQL_END_OF_STRING && c2 !=3D ']') { > if (c2 =3D=3D '-' && pattern[0] !=3D ']' > && pattern < pattern_end > && prior_c > 0) { > c2 =3D Utf8Read(pattern, pattern_end); > + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQL_PROHIBITED_PATTERN; > if (c >=3D prior_c && c <=3D c2) > seen =3D 1; > prior_c =3D 0; > @@ -817,29 +879,36 @@ patternCompare(const char * pattern, /* The glob > pattern */ > prior_c =3D c2; > } > c2 =3D Utf8Read(pattern, pattern_end); > + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQL_PROHIBITED_PATTERN; > } > - if (pattern =3D=3D pattern_end || (seen ^ invert) =3D=3D 0) { > + if (pattern =3D=3D pattern_end || > + (seen ^ invert) =3D=3D 0) { > return SQLITE_NOMATCH; > } > continue; > } > } > c2 =3D Utf8Read(string, string_end); > + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) > + return SQLITE_NOMATCH; > if (c =3D=3D c2) > continue; > if (noCase){ > /** > - * Small optimisation. Reduce number of calls > - * to u_tolower function. > - * SQL standards suggest use to_upper for symbol > - * normalisation. However, using to_lower allows to > - * respect Turkish '=C4=B0' in default locale. > + * Small optimisation. Reduce number of > + * calls to u_tolower function. SQL > + * standards suggest use to_upper for > + * symbol normalisation. However, using > + * to_lower allows to respect Turkish '=C4=B0' > + * in default locale. > */ > if (u_tolower(c) =3D=3D c2 || > c =3D=3D u_tolower(c2)) > continue; > } > - if (c =3D=3D matchOne && pattern !=3D zEscaped && c2 !=3D 0) > + if (c =3D=3D matchOne && pattern !=3D zEscaped && > + c2 !=3D SQL_END_OF_STRING) > continue; > return SQLITE_NOMATCH; > } > @@ -853,8 +922,7 @@ patternCompare(const char * pattern, /* The glob > pattern */ > int > sqlite3_strglob(const char *zGlobPattern, const char *zString) > { > - return patternCompare(zGlobPattern, zString, &globInfo, > - '['); > + return sql_utf8_pattern_compare(zGlobPattern, zString, &globInfo, '['); > } > > /* > @@ -864,7 +932,7 @@ sqlite3_strglob(const char *zGlobPattern, const char > *zString) > int > sqlite3_strlike(const char *zPattern, const char *zStr, unsigned int esc= ) > { > - return patternCompare(zPattern, zStr, &likeInfoNorm, esc); > + return sql_utf8_pattern_compare(zPattern, zStr, &likeInfoNorm, esc); > } > > /* > @@ -910,8 +978,9 @@ likeFunc(sqlite3_context * context, int argc, > sqlite3_value ** argv) > zB =3D (const char *) sqlite3_value_text(argv[0]); > zA =3D (const char *) sqlite3_value_text(argv[1]); > > - /* Limit the length of the LIKE or GLOB pattern to avoid problems > - * of deep recursion and N*N behavior in patternCompare(). > + /* Limit the length of the LIKE or GLOB pattern to avoid > + * problems of deep recursion and N*N behavior in > + * sql_utf8_pattern_compare(). > */ > nPat =3D sqlite3_value_bytes(argv[0]); > testcase(nPat =3D=3D db->aLimit[SQLITE_LIMIT_LIKE_PATTERN_LENGTH]); > @@ -947,7 +1016,12 @@ likeFunc(sqlite3_context * context, int argc, > sqlite3_value ** argv) > sqlite3_like_count++; > #endif > int res; > - res =3D patternCompare(zB, zA, pInfo, escape); > + res =3D sql_utf8_pattern_compare(zB, zA, pInfo, escape); > + if (res =3D=3D SQL_PROHIBITED_PATTERN) { > + sqlite3_result_error(context, "LIKE or GLOB pattern can only" > + " contain UTF-8 characters", -1); > + return; > + } > sqlite3_result_int(context, res =3D=3D SQLITE_MATCH); > } > > diff --git a/test-run b/test-run > index 77e9327..95562e9 160000 > --- a/test-run > +++ b/test-run > @@ -1 +1 @@ > -Subproject commit 77e93279210f8c5c1fd0ed03416fa19a184f0b6d > +Subproject commit 95562e95401fef4e0b755ab0bb430974b5d1a29a > diff --git a/test/sql-tap/e_expr.test.lua b/test/sql-tap/e_expr.test.lua > index 13d3a96..9780d2c 100755 > --- a/test/sql-tap/e_expr.test.lua > +++ b/test/sql-tap/e_expr.test.lua > @@ -1,6 +1,6 @@ > #!/usr/bin/env tarantool > test =3D require("sqltester") > -test:plan(12431) > +test:plan(10665) > > --!./tcltestrunner.lua > -- 2010 July 16 > @@ -77,8 +77,10 @@ local operations =3D { > {"<>", "ne1"}, > {"!=3D", "ne2"}, > {"IS", "is"}, > - {"LIKE", "like"}, > - {"GLOB", "glob"}, > +-- NOTE: This test needs refactoring after deletion of GLOB & > +-- type restrictions for LIKE. (See #3572) > +-- {"LIKE", "like"}, > +-- {"GLOB", "glob"}, > > Yes, this behavior is not valid anymore. > To make sure that likes and globs will be tested in the future, please, > delete this > commented lines and add your own simple test, which tries to call `like` > and `glob` > with inappropriate types. > It is important to have a functional tests for any possible behavior. > 1) Globs are going to be deleted as you can see few lines above. 2) I'm gonna refactor that when LIKE is in its final state (basically after #3572 is closed & static build is into 2.1). > {"AND", "and"}, > {"OR", "or"}, > {"MATCH", "match"}, > @@ -96,7 +98,12 @@ operations =3D { > {"+", "-"}, > {"<<", ">>", "&", "|"}, > {"<", "<=3D", ">", ">=3D"}, > - {"=3D", "=3D=3D", "!=3D", "<>", "LIKE", "GLOB"}, --"MATCH", "REGEXP"= }, > +-- NOTE: This test needs refactoring after deletion of GLOB & > +-- type restrictions for LIKE. (See #3572) > +-- Another NOTE: MATCH & REGEXP aren't supported in Tarantool & > +-- are waiting for their hour, don't confuse them > +-- being commented with ticket above. > + {"=3D", "=3D=3D", "!=3D", "<>"}, --"LIKE", "GLOB"}, --"MATCH", "REGE= XP"}, > {"AND"}, > {"OR"}, > } > @@ -475,6 +482,7 @@ for _, op in ipairs(oplist) do > end > end > end > + > > ------------------------------------------------------------------------= --- > -- Test the IS and IS NOT operators. > -- > diff --git a/test/sql-tap/gh-3251-string-pattern-comparison.test.lua > b/test/sql-tap/gh-3251-string-pattern-comparison.test.lua > new file mode 100755 > index 0000000..2a787f2 > --- /dev/null > +++ b/test/sql-tap/gh-3251-string-pattern-comparison.test.lua > @@ -0,0 +1,213 @@ > +#!/usr/bin/env tarantool > +test =3D require("sqltester") > +test:plan(128) > + > +local prefix =3D "like-test-" > + > +-- Unicode byte sequences. > +local valid_testcases =3D { > + '\x01', > + '\x09', > + '\x1F', > + '\x7F', > + '\xC2\x80', > + '\xC2\x90', > + '\xC2\x9F', > + '\xE2\x80\xA8', > + '\x20\x0B', > + '\xE2\x80\xA9', > +} > > optional: add descriptions to those byte sequences (what it is). > Some valid unicode symbols which is only that matters for LIKE operator. > + > +-- Non-Unicode byte sequences. > +local invalid_testcases =3D { > + '\xE2\x80', > + '\xFE\xFF', > + '\xC2', > + '\xED\xB0\x80', > + '\xD0', > +} > > Place that after like_test_cases, just before it is used. > Changed it. > + > +local like_test_cases =3D > +{ > + {"1.1", > + "SELECT 'AB' LIKE '_B';", > + {0, {1}} }, > + {"1.2", > + "SELECT 'CD' LIKE '_B';", > + {0, {0}} }, > + {"1.3", > + "SELECT '' LIKE '_B';", > + {0, {0}} }, > + {"1.4", > + "SELECT 'AB' LIKE '%B';", > + {0, {1}} }, > + {"1.5", > + "SELECT 'CD' LIKE '%B';", > + {0, {0}} }, > + {"1.6", > + "SELECT '' LIKE '%B';", > + {0, {0}} }, > + {"1.7", > + "SELECT 'AB' LIKE 'A__';", > + {0, {0}} }, > + {"1.8", > + "SELECT 'CD' LIKE 'A__';", > + {0, {0}} }, > + {"1.9", > + "SELECT '' LIKE 'A__';", > + {0, {0}} }, > + {"1.10", > + "SELECT 'AB' LIKE 'A_';", > + {0, {1}} }, > + {"1.11", > + "SELECT 'CD' LIKE 'A_';", > + {0, {0}} }, > + {"1.12", > + "SELECT '' LIKE 'A_';", > + {0, {0}} }, > + {"1.13", > + "SELECT 'AB' LIKE 'A';", > + {0, {0}} }, > + {"1.14", > + "SELECT 'CD' LIKE 'A';", > + {0, {0}} }, > + {"1.15", > + "SELECT '' LIKE 'A';", > + {0, {0}} }, > + {"1.16", > + "SELECT 'AB' LIKE '_';", > + {0, {0}} }, > + {"1.17", > + "SELECT 'CD' LIKE '_';", > + {0, {0}} }, > + {"1.18", > + "SELECT '' LIKE '_';", > + {0, {0}} }, > + {"1.19", > + "SELECT 'AB' LIKE '__';", > + {0, {1}} }, > + {"1.20", > + "SELECT 'CD' LIKE '__';", > + {0, {1}} }, > + {"1.21", > + "SELECT '' LIKE '__';", > + {0, {0}} }, > + {"1.22", > + "SELECT 'AB' LIKE '%A';", > + {0, {0}} }, > + {"1.23", > + "SELECT 'AB' LIKE '%C';", > + {0, {0}} }, > + {"1.24", > + "SELECT 'ab' LIKE '%df';", > + {0, {0}} }, > + {"1.25", > + "SELECT 'abCDF' LIKE '%df';", > + {0, {1}} }, > + {"1.26", > + "SELECT 'CDF' LIKE '%df';", > + {0, {1}} }, > + {"1.27", > + "SELECT 'ab' LIKE 'a_';", > + {0, {1}} }, > + {"1.28", > + "SELECT 'abCDF' LIKE 'a_';", > + {0, {0}} }, > + {"1.29", > + "SELECT 'CDF' LIKE 'a_';", > + {0, {0}} }, > + {"1.30", > + "SELECT 'ab' LIKE 'ab%';", > + {0, {1}} }, > + {"1.31", > + "SELECT 'abCDF' LIKE 'ab%';", > + {0, {1}} }, > + {"1.32", > + "SELECT 'CDF' LIKE 'ab%';", > + {0, {0}} }, > + {"1.33", > + "SELECT 'ab' LIKE 'abC%';", > + {0, {0}} }, > + {"1.34", > + "SELECT 'abCDF' LIKE 'abC%';", > + {0, {1}} }, > + {"1.35", > + "SELECT 'CDF' LIKE 'abC%';", > + {0, {0}} }, > + {"1.36", > + "SELECT 'ab' LIKE 'a_%';", > + {0, {1}} }, > + {"1.37", > + "SELECT 'abCDF' LIKE 'a_%';", > + {0, {1}} }, > + {"1.38", > + "SELECT 'CDF' LIKE 'a_%';", > + {0, {0}} }, > +} > > Please, add some tests for unicode strings. (or replace letters in those > tests with unicode letters) > Changed existing tests a little bit. > + > +test:do_catchsql_set_test(like_test_cases, prefix) > + > +-- Invalid testcases. > +for i, tested_string in ipairs(invalid_testcases) do > + > + -- We should raise an error in case > + -- pattern contains invalid characters. > + > + local test_name =3D prefix .. "2." .. tostring(i) > + local test_itself =3D "SELECT 'abc' LIKE 'ab" .. tested_string .. "'= ;" > + test:do_catchsql_test(test_name, test_itself, > + {1, "LIKE or GLOB pattern can only contain > UTF-8 characters"}) > + > + test_name =3D prefix .. "3." .. tostring(i) > + test_itself =3D "SELECT 'abc' LIKE 'abc" .. tested_string .. "';" > + test:do_catchsql_test(test_name, test_itself, > + {1, "LIKE or GLOB pattern can only contain > UTF-8 characters"}) > + > + test_name =3D prefix .. "4." .. tostring(i) > + test_itself =3D "SELECT 'abc' LIKE 'ab" .. tested_string .. "c';" > + test:do_catchsql_test(test_name, test_itself, > + {1, "LIKE or GLOB pattern can only contain > UTF-8 characters"}) > + > + -- Just skipping if row value predicand contains invalid character. > > What the predicand is? Is it a typo? > You can find it in ANSI SQL: . Basically it's just operand inside of a predicate. > + > + test_name =3D prefix .. "5." .. tostring(i) > + test_itself =3D "SELECT 'ab" .. tested_string .. "' LIKE 'abc';" > + test:do_execsql_test(test_name, test_itself, {0}) > + > + test_name =3D prefix .. "6." .. tostring(i) > + test_itself =3D "SELECT 'abc" .. tested_string .. "' LIKE 'abc';" > + test:do_execsql_test(test_name, test_itself, {0}) > + > + test_name =3D prefix .. "7." .. tostring(i) > + test_itself =3D "SELECT 'ab" .. tested_string .. "c' LIKE 'abc';" > + test:do_execsql_test(test_name, test_itself, {0}) > +end > + > +-- Valid testcases. > +for i, tested_string in ipairs(valid_testcases) do > + test_name =3D prefix .. "8." .. tostring(i) > + local test_itself =3D "SELECT 'abc' LIKE 'ab" .. tested_string .. "'= ;" > + test:do_execsql_test(test_name, test_itself, {0}) > + > + test_name =3D prefix .. "9." .. tostring(i) > + test_itself =3D "SELECT 'abc' LIKE 'abc" .. tested_string .. "';" > + test:do_execsql_test(test_name, test_itself, {0}) > + > + test_name =3D prefix .. "10." .. tostring(i) > + test_itself =3D "SELECT 'abc' LIKE 'ab" .. tested_string .. "c';" > + test:do_execsql_test(test_name, test_itself, {0}) > + > + test_name =3D prefix .. "11." .. tostring(i) > + test_itself =3D "SELECT 'ab" .. tested_string .. "' LIKE 'abc';" > + test:do_execsql_test(test_name, test_itself, {0}) > + > + test_name =3D prefix .. "12." .. tostring(i) > + test_itself =3D "SELECT 'abc" .. tested_string .. "' LIKE 'abc';" > + test:do_execsql_test(test_name, test_itself, {0}) > + > + test_name =3D prefix .. "13." .. tostring(i) > + test_itself =3D "SELECT 'ab" .. tested_string .. "c' LIKE 'abc';" > + test:do_execsql_test(test_name, test_itself, {0}) > +end > + > +test:finish_test() > > Why I cannot find a test of `GLOB`? Even if we delete it in the future, i= t > should be tested. You can write much less tests for glob. > E.g. this > ``` > select '1' glob '[0-4]'; > ``` > somewhy returns 0. > I actually don't think that operator that is going to be deleted in a few days should be tested. It's just useless and redundant code. > > Sorry, some of the tests I ask you to write are a little out of scope of > the ticket and they should already have been written. > But I suppose most of ambiguity should be clarified now. This ticket has > raised important questions related to those functions. > commit f15fbaef2182084dec7cdc6c661509cb908df892 Author: N.Tatunov Date: Thu Jun 28 15:17:32 2018 +0300 sql: LIKE & GLOB pattern comparison issue Currently function that compares pattern and string for GLOB & LIKE operators doesn't work properly. It uses ICU reading function which was assumed having other return codes and the implementation for the comparison ending isn't paying attention to some special cases, hence in those cases it works improperly. With the patch applied an error will be returned in case there's an invalid UTF-8 symbol in pattern & pattern containing only valid UTF-8 symbols will not be matched with the string that contains invalid symbol. =D0=A1loses #3251 =D0=A1loses #3334 Part of #3572 diff --git a/src/box/sql/func.c b/src/box/sql/func.c index c06e3bd..7f93ef6 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -617,13 +617,17 @@ struct compareInfo { u8 noCase; /* true to ignore case differences */ }; -/* - * For LIKE and GLOB matching on EBCDIC machines, assume that every - * character is exactly one byte in size. Also, provde the Utf8Read() - * macro for fast reading of the next character in the common case where - * the next character is ASCII. +/** + * Providing there are symbols in string s this + * macro returns UTF-8 code of character and + * promotes pointer to the next symbol in the string. + * Otherwise return code is SQL_END_OF_STRING. */ -#define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, &status) +#define Utf8Read(s, e) (((s) < (e)) ? \ + ucnv_getNextUChar(pUtf8conv, &(s), (e), &(status)) : 0) + +#define SQL_END_OF_STRING 0 +#define SQL_INVALID_UTF8_SYMBOL 0xfffd static const struct compareInfo globInfo =3D { '*', '?', '[', 0 }; @@ -638,19 +642,16 @@ static const struct compareInfo likeInfoNorm =3D { '%= ', '_', 0, 1 }; static const struct compareInfo likeInfoAlt =3D { '%', '_', 0, 0 }; /* - * Possible error returns from patternMatch() + * Possible error returns from sql_utf8_pattern_compare() */ #define SQLITE_MATCH 0 #define SQLITE_NOMATCH 1 #define SQLITE_NOWILDCARDMATCH 2 +#define SQL_PROHIBITED_PATTERN 3 -/* - * Compare two UTF-8 strings for equality where the first string is - * a GLOB or LIKE expression. Return values: - * - * SQLITE_MATCH: Match - * SQLITE_NOMATCH: No match - * SQLITE_NOWILDCARDMATCH: No match in spite of having * or % wildcards. +/** + * Compare two UTF-8 strings for equality where the first string + * is a GLOB or LIKE expression. * * Globbing rules: * @@ -663,92 +664,136 @@ static const struct compareInfo likeInfoAlt =3D { '%= ', '_', 0, 0 }; * * [^...] Matches one character not in the enclosed list. * - * With the [...] and [^...] matching, a ']' character can be included - * in the list by making it the first character after '[' or '^'. A - * range of characters can be specified using '-'. Example: - * "[a-z]" matches any single lower-case letter. To match a '-', make - * it the last character in the list. + * With the [...] and [^...] matching, a ']' character can be + * included in the list by making it the first character after + * '[' or '^'. A range of characters can be specified using '-'. + * Example: "[a-z]" matches any single lower-case letter. + * To match a '-', make it the last character in the list. * * Like matching rules: * - * '%' Matches any sequence of zero or more characters + * '%' Matches any sequence of zero or more characters. * - ** '_' Matches any one character + ** '_' Matches any one character. * * Ec Where E is the "esc" character and c is any other - * character, including '%', '_', and esc, match exactly c. + * character, including '%', '_', and esc, match + * exactly c. * * The comments within this routine usually assume glob matching. * - * This routine is usually quick, but can be N**2 in the worst case. + * This routine is usually quick, but can be N**2 in the worst + * case. + * + * @param pattern String containing comparison pattern. + * @param string String being compared. + * @param compareInfo Information about how to compare. + * @param matchOther The escape char (LIKE) or '[' (GLOB). + * + * @retval SQLITE_MATCH: Match. + * SQLITE_NOMATCH: No match. + * SQLITE_NOWILDCARDMATCH: No match in spite of having * + * or % wildcards. + * SQL_PROHIBITED_PATTERN: Pattern contains invalid + * symbol. */ static int -patternCompare(const char * pattern, /* The glob pattern */ - const char * string, /* The string to compare against the glob */ - const struct compareInfo *pInfo, /* Information about how to do the compare */ - UChar32 matchOther /* The escape char (LIKE) or '[' (GLOB) */ - ) +sql_utf8_pattern_compare(const char * pattern, + const char * string, + const struct compareInfo *pInfo, + UChar32 matchOther) { - UChar32 c, c2; /* Next pattern and input string chars */ - UChar32 matchOne =3D pInfo->matchOne; /* "?" or "_" */ - UChar32 matchAll =3D pInfo->matchAll; /* "*" or "%" */ - UChar32 noCase =3D pInfo->noCase; /* True if uppercase=3D=3Dlowercase */ - const char *zEscaped =3D 0; /* One past the last escaped input char */ + /* Next pattern and input string chars */ + UChar32 c, c2; + /* "?" or "_" */ + UChar32 matchOne =3D pInfo->matchOne; + /* "*" or "%" */ + UChar32 matchAll =3D pInfo->matchAll; + /* True if uppercase=3D=3Dlowercase */ + UChar32 noCase =3D pInfo->noCase; + /* One past the last escaped input char */ + const char *zEscaped =3D 0; const char * pattern_end =3D pattern + strlen(pattern); const char * string_end =3D string + strlen(string); UErrorCode status =3D U_ZERO_ERROR; - while (pattern < pattern_end){ - c =3D Utf8Read(pattern, pattern_end); + while ((c =3D Utf8Read(pattern, pattern_end)) !=3D SQL_END_OF_STRING) { + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQL_PROHIBITED_PATTERN; if (c =3D=3D matchAll) { /* Match "*" */ - /* Skip over multiple "*" characters in the pattern. If there - * are also "?" characters, skip those as well, but consume a - * single character of the input string for each "?" skipped + /* Skip over multiple "*" characters in + * the pattern. If there are also "?" + * characters, skip those as well, but + * consume a single character of the + * input string for each "?" skipped. */ - while (pattern < pattern_end){ - c =3D Utf8Read(pattern, pattern_end); + while ((c =3D Utf8Read(pattern, pattern_end)) !=3D + SQL_END_OF_STRING) { + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQL_PROHIBITED_PATTERN; if (c !=3D matchAll && c !=3D matchOne) break; - if (c =3D=3D matchOne - && Utf8Read(string, string_end) =3D=3D 0) { + if (c =3D=3D matchOne && + (c2 =3D Utf8Read(string, string_end)) =3D=3D + SQL_END_OF_STRING) return SQLITE_NOWILDCARDMATCH; - } + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQLITE_NOMATCH; } - /* "*" at the end of the pattern matches */ - if (pattern =3D=3D pattern_end) + /* + * "*" at the end of the pattern matches. + */ + if (c =3D=3D SQL_END_OF_STRING) { + while ((c2 =3D Utf8Read(string, string_end)) !=3D + SQL_END_OF_STRING) + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQLITE_NOMATCH; return SQLITE_MATCH; + } if (c =3D=3D matchOther) { if (pInfo->matchSet =3D=3D 0) { c =3D Utf8Read(pattern, pattern_end); - if (c =3D=3D 0) + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQL_PROHIBITED_PATTERN; + if (c =3D=3D SQL_END_OF_STRING) return SQLITE_NOWILDCARDMATCH; } else { - /* "[...]" immediately follows the "*". We have to do a slow - * recursive search in this case, but it is an unusual case. + /* "[...]" immediately + * follows the "*". We + * have to do a slow + * recursive search in + * this case, but it is + * an unusual case. */ - assert(matchOther < 0x80); /* '[' is a single-byte character */ + assert(matchOther < 0x80); while (string < string_end) { int bMatch =3D - patternCompare(&pattern[-1], - string, - pInfo, - matchOther); + sql_utf8_pattern_compare( + &pattern[-1], + string, + pInfo, + matchOther); if (bMatch !=3D SQLITE_NOMATCH) return bMatch; - Utf8Read(string, string_end); + c =3D Utf8Read(string, string_end); + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQLITE_NOMATCH; } return SQLITE_NOWILDCARDMATCH; } } - /* At this point variable c contains the first character of the - * pattern string past the "*". Search in the input string for the - * first matching character and recursively continue the match from - * that point. + /* At this point variable c contains the + * first character of the pattern string + * past the "*". Search in the input + * string for the first matching + * character and recursively continue the + * match from that point. * - * For a case-insensitive search, set variable cx to be the same as - * c but in the other case and search the input string for either - * c or cx. + * For a case-insensitive search, set + * variable cx to be the same as c but in + * the other case and search the input + * string for either c or cx. */ int bMatch; @@ -756,14 +801,18 @@ patternCompare(const char * pattern, /* The glob pattern */ c =3D u_tolower(c); while (string < string_end){ /** - * This loop could have been implemented - * without if converting c2 to lower case - * (by holding c_upper and c_lower), however - * it is implemented this way because lower - * works better with German and Turkish - * languages. + * This loop could have been + * implemented without if + * converting c2 to lower case + * by holding c_upper and + * c_lower,however it is + * implemented this way because + * lower works better with German + * and Turkish languages. */ c2 =3D Utf8Read(string, string_end); + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQLITE_NOMATCH; if (!noCase) { if (c2 !=3D c) continue; @@ -771,9 +820,10 @@ patternCompare(const char * pattern, /* The glob pattern */ if (c2 !=3D c && u_tolower(c2) !=3D c) continue; } - bMatch =3D - patternCompare(pattern, string, - pInfo, matchOther); + bMatch =3D sql_utf8_pattern_compare(pattern, + string, + pInfo, + matchOther); if (bMatch !=3D SQLITE_NOMATCH) return bMatch; } @@ -782,7 +832,9 @@ patternCompare(const char * pattern, /* The glob pattern */ if (c =3D=3D matchOther) { if (pInfo->matchSet =3D=3D 0) { c =3D Utf8Read(pattern, pattern_end); - if (c =3D=3D 0) + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQL_PROHIBITED_PATTERN; + if (c =3D=3D SQL_END_OF_STRING) return SQLITE_NOMATCH; zEscaped =3D pattern; } else { @@ -790,23 +842,33 @@ patternCompare(const char * pattern, /* The glob pattern */ int seen =3D 0; int invert =3D 0; c =3D Utf8Read(string, string_end); + if (c =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQLITE_NOMATCH; if (string =3D=3D string_end) return SQLITE_NOMATCH; c2 =3D Utf8Read(pattern, pattern_end); + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQL_PROHIBITED_PATTERN; if (c2 =3D=3D '^') { invert =3D 1; c2 =3D Utf8Read(pattern, pattern_end); + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQL_PROHIBITED_PATTERN; } if (c2 =3D=3D ']') { if (c =3D=3D ']') seen =3D 1; c2 =3D Utf8Read(pattern, pattern_end); + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQL_PROHIBITED_PATTERN; } - while (c2 && c2 !=3D ']') { + while (c2 !=3D SQL_END_OF_STRING && c2 !=3D ']') { if (c2 =3D=3D '-' && pattern[0] !=3D ']' && pattern < pattern_end && prior_c > 0) { c2 =3D Utf8Read(pattern, pattern_end); + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQL_PROHIBITED_PATTERN; if (c >=3D prior_c && c <=3D c2) seen =3D 1; prior_c =3D 0; @@ -817,29 +879,36 @@ patternCompare(const char * pattern, /* The glob pattern */ prior_c =3D c2; } c2 =3D Utf8Read(pattern, pattern_end); + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQL_PROHIBITED_PATTERN; } - if (pattern =3D=3D pattern_end || (seen ^ invert) =3D=3D 0) { + if (pattern =3D=3D pattern_end || + (seen ^ invert) =3D=3D 0) { return SQLITE_NOMATCH; } continue; } } c2 =3D Utf8Read(string, string_end); + if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL) + return SQLITE_NOMATCH; if (c =3D=3D c2) continue; if (noCase){ /** - * Small optimisation. Reduce number of calls - * to u_tolower function. - * SQL standards suggest use to_upper for symbol - * normalisation. However, using to_lower allows to - * respect Turkish '=C4=B0' in default locale. + * Small optimisation. Reduce number of + * calls to u_tolower function. SQL + * standards suggest use to_upper for + * symbol normalisation. However, using + * to_lower allows to respect Turkish '=C4=B0' + * in default locale. */ if (u_tolower(c) =3D=3D c2 || c =3D=3D u_tolower(c2)) continue; } - if (c =3D=3D matchOne && pattern !=3D zEscaped && c2 !=3D 0) + if (c =3D=3D matchOne && pattern !=3D zEscaped && + c2 !=3D SQL_END_OF_STRING) continue; return SQLITE_NOMATCH; } @@ -853,8 +922,7 @@ patternCompare(const char * pattern, /* The glob pattern */ int sqlite3_strglob(const char *zGlobPattern, const char *zString) { - return patternCompare(zGlobPattern, zString, &globInfo, - '['); + return sql_utf8_pattern_compare(zGlobPattern, zString, &globInfo, '['); } /* @@ -864,7 +932,7 @@ sqlite3_strglob(const char *zGlobPattern, const char *zString) int sqlite3_strlike(const char *zPattern, const char *zStr, unsigned int esc) { - return patternCompare(zPattern, zStr, &likeInfoNorm, esc); + return sql_utf8_pattern_compare(zPattern, zStr, &likeInfoNorm, esc); } /* @@ -910,8 +978,9 @@ likeFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) zB =3D (const char *) sqlite3_value_text(argv[0]); zA =3D (const char *) sqlite3_value_text(argv[1]); - /* Limit the length of the LIKE or GLOB pattern to avoid problems - * of deep recursion and N*N behavior in patternCompare(). + /* Limit the length of the LIKE or GLOB pattern to avoid + * problems of deep recursion and N*N behavior in + * sql_utf8_pattern_compare(). */ nPat =3D sqlite3_value_bytes(argv[0]); testcase(nPat =3D=3D db->aLimit[SQLITE_LIMIT_LIKE_PATTERN_LENGTH]); @@ -947,7 +1016,12 @@ likeFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) sqlite3_like_count++; #endif int res; - res =3D patternCompare(zB, zA, pInfo, escape); + res =3D sql_utf8_pattern_compare(zB, zA, pInfo, escape); + if (res =3D=3D SQL_PROHIBITED_PATTERN) { + sqlite3_result_error(context, "LIKE or GLOB pattern can only" + " contain UTF-8 characters", -1); + return; + } sqlite3_result_int(context, res =3D=3D SQLITE_MATCH); } diff --git a/test/sql-tap/e_expr.test.lua b/test/sql-tap/e_expr.test.lua index 13d3a96..9780d2c 100755 --- a/test/sql-tap/e_expr.test.lua +++ b/test/sql-tap/e_expr.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool test =3D require("sqltester") -test:plan(12431) +test:plan(10665) --!./tcltestrunner.lua -- 2010 July 16 @@ -77,8 +77,10 @@ local operations =3D { {"<>", "ne1"}, {"!=3D", "ne2"}, {"IS", "is"}, - {"LIKE", "like"}, - {"GLOB", "glob"}, +-- NOTE: This test needs refactoring after deletion of GLOB & +-- type restrictions for LIKE. (See #3572) +-- {"LIKE", "like"}, +-- {"GLOB", "glob"}, {"AND", "and"}, {"OR", "or"}, {"MATCH", "match"}, @@ -96,7 +98,12 @@ operations =3D { {"+", "-"}, {"<<", ">>", "&", "|"}, {"<", "<=3D", ">", ">=3D"}, - {"=3D", "=3D=3D", "!=3D", "<>", "LIKE", "GLOB"}, --"MATCH", "REGEXP"}, +-- NOTE: This test needs refactoring after deletion of GLOB & +-- type restrictions for LIKE. (See #3572) +-- Another NOTE: MATCH & REGEXP aren't supported in Tarantool & +-- are waiting for their hour, don't confuse them +-- being commented with ticket above. + {"=3D", "=3D=3D", "!=3D", "<>"}, --"LIKE", "GLOB"}, --"MATCH", "REGEXP= "}, {"AND"}, {"OR"}, } @@ -475,6 +482,7 @@ for _, op in ipairs(oplist) do end end end + --------------------------------------------------------------------------= - -- Test the IS and IS NOT operators. -- diff --git a/test/sql-tap/gh-3251-string-pattern-comparison.test.lua b/test/sql-tap/gh-3251-string-pattern-comparison.test.lua new file mode 100755 index 0000000..2a787f2 --- /dev/null +++ b/test/sql-tap/gh-3251-string-pattern-comparison.test.lua @@ -0,0 +1,213 @@ +#!/usr/bin/env tarantool +test =3D require("sqltester") +test:plan(128) + +local prefix =3D "like-test-" + +-- Unicode byte sequences. +local valid_testcases =3D { + '\x01', + '\x09', + '\x1F', + '\x7F', + '\xC2\x80', + '\xC2\x90', + '\xC2\x9F', + '\xE2\x80\xA8', + '\x20\x0B', + '\xE2\x80\xA9', +} + +-- Non-Unicode byte sequences. +local invalid_testcases =3D { + '\xE2\x80', + '\xFE\xFF', + '\xC2', + '\xED\xB0\x80', + '\xD0', +} + +local like_test_cases =3D +{ + {"1.1", + "SELECT 'AB' LIKE '_B';", + {0, {1}} }, + {"1.2", + "SELECT 'CD' LIKE '_B';", + {0, {0}} }, + {"1.3", + "SELECT '' LIKE '_B';", + {0, {0}} }, + {"1.4", + "SELECT 'AB' LIKE '%B';", + {0, {1}} }, + {"1.5", + "SELECT 'CD' LIKE '%B';", + {0, {0}} }, + {"1.6", + "SELECT '' LIKE '%B';", + {0, {0}} }, + {"1.7", + "SELECT 'AB' LIKE 'A__';", + {0, {0}} }, + {"1.8", + "SELECT 'CD' LIKE 'A__';", + {0, {0}} }, + {"1.9", + "SELECT '' LIKE 'A__';", + {0, {0}} }, + {"1.10", + "SELECT 'AB' LIKE 'A_';", + {0, {1}} }, + {"1.11", + "SELECT 'CD' LIKE 'A_';", + {0, {0}} }, + {"1.12", + "SELECT '' LIKE 'A_';", + {0, {0}} }, + {"1.13", + "SELECT 'AB' LIKE 'A';", + {0, {0}} }, + {"1.14", + "SELECT 'CD' LIKE 'A';", + {0, {0}} }, + {"1.15", + "SELECT '' LIKE 'A';", + {0, {0}} }, + {"1.16", + "SELECT 'AB' LIKE '_';", + {0, {0}} }, + {"1.17", + "SELECT 'CD' LIKE '_';", + {0, {0}} }, + {"1.18", + "SELECT '' LIKE '_';", + {0, {0}} }, + {"1.19", + "SELECT 'AB' LIKE '__';", + {0, {1}} }, + {"1.20", + "SELECT 'CD' LIKE '__';", + {0, {1}} }, + {"1.21", + "SELECT '' LIKE '__';", + {0, {0}} }, + {"1.22", + "SELECT 'AB' LIKE '%A';", + {0, {0}} }, + {"1.23", + "SELECT 'AB' LIKE '%C';", + {0, {0}} }, + {"1.24", + "SELECT 'ab' LIKE '%df';", + {0, {0}} }, + {"1.25", + "SELECT 'abCDF' LIKE '%df';", + {0, {1}} }, + {"1.26", + "SELECT 'CDF' LIKE '%df';", + {0, {1}} }, + {"1.27", + "SELECT 'ab' LIKE 'a_';", + {0, {1}} }, + {"1.28", + "SELECT 'abCDF' LIKE 'a_';", + {0, {0}} }, + {"1.29", + "SELECT 'CDF' LIKE 'a_';", + {0, {0}} }, + {"1.30", + "SELECT 'ab' LIKE 'ab%';", + {0, {1}} }, + {"1.31", + "SELECT 'abCDF' LIKE 'ab%';", + {0, {1}} }, + {"1.32", + "SELECT 'CDF' LIKE 'ab%';", + {0, {0}} }, + {"1.33", + "SELECT 'ab' LIKE 'abC%';", + {0, {0}} }, + {"1.34", + "SELECT 'abCDF' LIKE 'abC%';", + {0, {1}} }, + {"1.35", + "SELECT 'CDF' LIKE 'abC%';", + {0, {0}} }, + {"1.36", + "SELECT 'ab' LIKE 'a_%';", + {0, {1}} }, + {"1.37", + "SELECT 'abCDF' LIKE 'a_%';", + {0, {1}} }, + {"1.38", + "SELECT 'CDF' LIKE 'a_%';", + {0, {0}} }, +} + +test:do_catchsql_set_test(like_test_cases, prefix) + +-- Invalid testcases. +for i, tested_string in ipairs(invalid_testcases) do + + -- We should raise an error in case + -- pattern contains invalid characters. + + local test_name =3D prefix .. "2." .. tostring(i) + local test_itself =3D "SELECT 'abc' LIKE 'ab" .. tested_string .. "';" + test:do_catchsql_test(test_name, test_itself, + {1, "LIKE or GLOB pattern can only contain UTF-8 characters"}) + + test_name =3D prefix .. "3." .. tostring(i) + test_itself =3D "SELECT 'abc' LIKE 'abc" .. tested_string .. "';" + test:do_catchsql_test(test_name, test_itself, + {1, "LIKE or GLOB pattern can only contain UTF-8 characters"}) + + test_name =3D prefix .. "4." .. tostring(i) + test_itself =3D "SELECT 'abc' LIKE 'ab" .. tested_string .. "c';" + test:do_catchsql_test(test_name, test_itself, + {1, "LIKE or GLOB pattern can only contain UTF-8 characters"}) + + -- Just skipping if row value predicand contains invalid character. + + test_name =3D prefix .. "5." .. tostring(i) + test_itself =3D "SELECT 'ab" .. tested_string .. "' LIKE 'abc';" + test:do_execsql_test(test_name, test_itself, {0}) + + test_name =3D prefix .. "6." .. tostring(i) + test_itself =3D "SELECT 'abc" .. tested_string .. "' LIKE 'abc';" + test:do_execsql_test(test_name, test_itself, {0}) + + test_name =3D prefix .. "7." .. tostring(i) + test_itself =3D "SELECT 'ab" .. tested_string .. "c' LIKE 'abc';" + test:do_execsql_test(test_name, test_itself, {0}) +end + +-- Valid testcases. +for i, tested_string in ipairs(valid_testcases) do + test_name =3D prefix .. "8." .. tostring(i) + local test_itself =3D "SELECT 'abc' LIKE 'ab" .. tested_string .. "';" + test:do_execsql_test(test_name, test_itself, {0}) + + test_name =3D prefix .. "9." .. tostring(i) + test_itself =3D "SELECT 'abc' LIKE 'abc" .. tested_string .. "';" + test:do_execsql_test(test_name, test_itself, {0}) + + test_name =3D prefix .. "10." .. tostring(i) + test_itself =3D "SELECT 'abc' LIKE 'ab" .. tested_string .. "c';" + test:do_execsql_test(test_name, test_itself, {0}) + + test_name =3D prefix .. "11." .. tostring(i) + test_itself =3D "SELECT 'ab" .. tested_string .. "' LIKE 'abc';" + test:do_execsql_test(test_name, test_itself, {0}) + + test_name =3D prefix .. "12." .. tostring(i) + test_itself =3D "SELECT 'abc" .. tested_string .. "' LIKE 'abc';" + test:do_execsql_test(test_name, test_itself, {0}) + + test_name =3D prefix .. "13." .. tostring(i) + test_itself =3D "SELECT 'ab" .. tested_string .. "c' LIKE 'abc';" + test:do_execsql_test(test_name, test_itself, {0}) +end + +test:finish_test() --000000000000bf266d0572639c61 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=D1=81= =D1=80, 1 =D0=B0=D0=B2=D0=B3. 2018 =D0=B3. =D0=B2 16:56, Alex Khatskevich &= lt;avkhatskevich@tarantool.o= rg>:
=20 =20 =20



On 01.08.201= 8 13:51, Nikita Tatunov wrote:
=20
diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index c06e3bd..7f93ef6 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -617,13 +617,17 @@ struct compareInfo {
=C2=A0 u8 noCase;<= span style=3D"white-space:pre-wrap"> /* true to ignore case differences */
=C2=A0};
=C2=A0
-/*
- * For LIKE and GLOB matching on EBCDIC machines, assume that every
- * character is exactly one byte in size.=C2=A0 Also, provde the Utf8Read()
- * macro for fast reading of the next character in the common case where
- * the next character is ASCII.
+/**
+ * Providing there are symbols in string s this
+ * macro returns UTF-8 code of character and
+ * promotes pointer to the next symbol in the string.
+ * Otherwise return code is SQL_END_OF_STRING.
=C2=A0 */
-#define Utf8Read(s, e)=C2=A0 =C2=A0 ucnv_getNextUChar(pUtf8co= nv, &s, e, &status)
+#define Utf8Read(s, e) (((s) < (e)) ? \
+ ucnv_getNextUCha= r(pUtf8conv, &(s), (e), &(status)) : 0)
[Later I will ask you to return this macro back, so, you may not do this]
As I understand, you are returning `0` from Utf8Read in case of end of the string.
Let's return `SQL_END_OF_STRING` instead of just `0`.

Yeah, thank you, didn't notice it.=C2=A0
+
+#define SQL_END_OF_STRING=C2=A0 =C2=A0 =C2=A0 =C2=A0 0
+#define SQL_INVALID_UTF8_SYMBOL=C2=A0 0xfffd
=C2=A0
=C2=A0static const struct compareInfo globInfo =3D { '*= 9;, '?', '[', 0 };
=C2=A0
@@ -638,19 +642,16 @@ static const struct compareInfo likeInfoNorm =3D { '%', '_', 0, 1 };
=C2=A0static const struct compareInfo likeInfoAlt =3D { '%= ', '_', 0, 0 };
=C2=A0
=C2=A0/*
- * Possible error returns from patternMatch()
+ * Possible error returns from sql_utf8_pattern_compare()
=C2=A0 */
=C2=A0#define SQLITE_MATCH=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A00
=C2=A0#define SQLITE_NOMATCH=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A01
=C2=A0#define SQLITE_NOWILDCARDMATCH=C2=A0 =C2=A02
+#define SQL_PROHIBITED_PATTERN=C2=A0 =C2=A03
I am not sure that the invalid (with invalid symbols) pattern can be called `prohibited`.
Rename somehow? My proposal: SQL_INVALID_PATTERN.

Probably you're right, I was also thinking of chan= ging it somehow.
=C2=A0
Moreover, You have named this definition with the `SQL` prefix, which is good, however,
similar definitions are still prefixed with `SQLITE`. I would like you to rename those in
this (preferred) or in a separate commit for consistency.
=C2=A0
Taranto= ol !=3D SQLite and I think we should get away from this approach.
The thing that names haven't been refactored yet isn't in my contr= ol.
You can ask Nikita's opinion on it, I guess he will tell = you almost the same.
=C2=A0
-/*
- * Compare two UTF-8 strings for equality where the first string is
- * a GLOB or LIKE expression.=C2=A0 Return values:
- *
- *=C2=A0 =C2=A0 SQLITE_MATCH:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 Match
- *=C2=A0 =C2=A0 SQLITE_NOMATCH:=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 No match
- *=C2=A0 =C2=A0 SQLITE_NOWILDCARDMATCH:=C2=A0 No match in spi= te of having * or % wildcards.
+/**
+ * Compare two UTF-8 strings for equality where the first string
+ * is a GLOB or LIKE expression.
=C2=A0 *
=C2=A0 * Globbing rules:
=C2=A0 *
@@ -663,92 +664,136 @@ static const struct compareInfo likeInfoAlt =3D { '%', '_', 0, 0 };
=C2=A0 *
=C2=A0 *=C2=A0 =C2=A0 =C2=A0[^...]=C2=A0 =C2=A0 =C2=A0Matches = one character not in the enclosed list.
=C2=A0 *
- * With the [...] and [^...] matching, a ']' characte= r can be included
- * in the list by making it the first character after '[&= #39; or '^'.=C2=A0 A
- * range of characters can be specified using '-'.=C2= =A0 Example:
- * "[a-z]" matches any single lower-case letter.=C2= =A0 To match a '-', make
- * it the last character in the list.
+ * With the [...] and [^...] matching, a ']' characte= r can be
+ * included in the list by making it the first character after
+ * '[' or '^'. A range of characters can be s= pecified using '-'.
+ * Example: "[a-z]" matches any single lower-case l= etter.
+ * To match a '-', make it the last character in the = list.
Does it work for UTF characters? I suppose no.
Let's write about it here + let's file an issue to make it
work with UTF characters.

Soon th= is function is gonna be refactored & GLOB is gonna be removed anyways.<= /div>
=C2=A0
=C2=A0 *
=C2=A0 * Like matching rules:
=C2=A0 *
- *=C2=A0 =C2=A0 =C2=A0 '%'=C2=A0 =C2=A0 =C2=A0 =C2=A0= Matches any sequence of zero or more characters
+ *=C2=A0 =C2=A0 =C2=A0 '%'=C2=A0 =C2=A0 =C2=A0 =C2=A0= Matches any sequence of zero or more characters.
=C2=A0 *
- **=C2=A0 =C2=A0 =C2=A0'_'=C2=A0 =C2=A0 =C2=A0 =C2=A0= Matches any one character
+ **=C2=A0 =C2=A0 =C2=A0'_'=C2=A0 =C2=A0 =C2=A0 =C2=A0= Matches any one character.
=C2=A0 *
=C2=A0 *=C2=A0 =C2=A0 =C2=A0 Ec=C2=A0 =C2=A0 =C2=A0 =C2=A0 Whe= re E is the "esc" character and c is any other
- *=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cha= racter, including '%', '_', and esc, match exactly c.
+ *=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cha= racter, including '%', '_', and esc, match
+ *=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 exa= ctly c.
=C2=A0 *
=C2=A0 * The comments within this routine usually assume glob matching.
=C2=A0 *
- * This routine is usually quick, but can be N**2 in the worst case.
+ * This routine is usually quick, but can be N**2 in the worst
+ * case.
+ *
+ * @param pattern String containing comparison pattern.
+ * @param string String being compared.
+ * @param compareInfo Information about how to compare.
+ * @param matchOther The escape char (LIKE) or '[' (G= LOB).
+ *
+ * @retval SQLITE_MATCH:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 Match.
+ * =C2=A0 =C2=A0SQLITE_NOMATCH:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 No match.=
+ * =C2=A0 =C2=A0SQLITE_NOWILDCARDMATCH:=C2=A0 No match in spite of having *=
+ * =C2=A0 =C2= =A0 or % wildcards.
+ * =C2=A0 =C2=A0SQL_PROHIBITED_PATTERN:=C2=A0 Pattern contains invalid
+ * =C2=A0 =C2= =A0 symbol.
Minor: It is not very good that you use symbol and character interchangeably.
I suppose that `character` should be used everywhere.

They're synonyms, aren't they?=C2=A0
=C2=A0 */
=C2=A0static int
-patternCompare(const char * pattern, /* The glob pattern */
- =C2=A0 =C2=A0 = =C2=A0 =C2=A0const char * string, /* The string= to compare against the glob */
- =C2=A0 =C2=A0 = =C2=A0 =C2=A0const struct compareInfo *pInfo, /= * Information about how to do the compare */
- =C2=A0 =C2=A0 = =C2=A0 =C2=A0UChar32 matchOther /* The esc= ape char (LIKE) or '[' (GLOB) */
-=C2=A0 =C2=A0 )
+sql_utf8_pattern_compare(const char * pattern,
+ const char * string,
+ const struct compareInfo *pInfo,
+ UChar32 matchOther)
"star" sign should stick to the attribute=C2=A0 name.
https://tarantool.io/en/doc/1.9/dev_guide/c_style_guide/#ch= apter-3-1-spaces

To prevent such typos in the future, you can use special perl
script which checks coding style in Linux kernel patches.

Subject of the ticket wasn't about refactoring= function, but thnx, changed it.
REMEMBER THIS POINT #1=C2=A0
=C2=A0{
- UChar32 c, c2; /* Next pattern and input string chars */
- UChar32 matchOne= =3D pInfo->matchOne; /= * "?" or "_" */
- UChar32 matchAll= =3D pInfo->matchAll; /= * "*" or "%" */
- UChar32 noCase = =3D pInfo->noCase; /* = True if uppercase=3D=3Dlowercase */
- const char *zEsc= aped =3D 0; /* One past th= e last escaped input char */
+ /* Next pattern = and input string chars */
+ UChar32 c, c2;
+ /* "?"= or "_" */
+ UChar32 matchOne= =3D pInfo->matchOne;
+ /* "*"= or "%" */
+ UChar32 matchAll= =3D pInfo->matchAll;
+ /* True if uppercase=3D=3Dlowercase */
+ UChar32 noCase = =3D pInfo->noCase;
+ /* One past the = last escaped input char */
+ const char *zEsc= aped =3D 0;
=C2=A0 const char = * pattern_end =3D pattern + strlen(pattern);
=C2=A0 const char = * string_end =3D string + strlen(string);
=C2=A0 UErrorCode = status =3D U_ZERO_ERROR;
=C2=A0
- while (pattern &= lt; pattern_end){
- c =3D Utf8Read(pattern, pattern_end);
+ while ((c =3D Utf8Read(pattern, pattern_end)) !=3D SQL_END_OF_STRING) {
REMEMBER THIS POINT #1
+ if (c =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
=C2=A0 if (c =3D= =3D matchAll) { /* Match "*&qu= ot; */
- /* Skip over multiple "*" characters in the pattern.=C2=A0 If there<= /div>
- * are also &q= uot;?" characters, skip those as well, but consume a
- * single character of the input string for each "?" skipped
+ /* Skip over multiple "*" characters in
+ * the pattern= . If there are also "?"
+ * characters, skip those as well, but
+ * consume a single character of the
+ * input strin= g for each "?" skipped.
=C2=A0 */
- while (pattern < pattern_end){
- c =3D Utf8Read(pattern, pattern_end);
+ while ((c =3D Utf8Read(pattern, pattern_end)) !=3D
+ =C2=A0 =C2=A0 = =C2=A0 =C2=A0SQL_END_OF_STRING) {
+ if (c =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
=C2=A0 if (c != =3D matchAll && c !=3D matchOne)
=C2=A0 break;<= /div>
- if (c =3D=3D = matchOne
- =C2=A0 =C2=A0= && Utf8Read(string, string_end) =3D=3D 0) {
+ if (c =3D=3D = matchOne &&
+ =C2=A0 =C2=A0= (c2 =3D Utf8Read(string, string_end)) =3D=3D
+ =C2=A0 =C2=A0 SQL_END_OF_STRING)
=C2=A0 return SQLITE_NOWILDCARDMATCH;
- }
+ if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQLITE_NOMATCH;
=C2=A0 }
- /* "*&quo= t; at the end of the pattern matches */
- if (pattern = =3D=3D pattern_end)
+ /*
+ * "*&quo= t; at the end of the pattern matches.
+ */
+ if (c =3D=3D SQL_END_OF_STRING) {
+ while ((c2 = =3D Utf8Read(string, string_end)) !=3D
+ =C2=A0 =C2=A0= =C2=A0 =C2=A0SQL_END_OF_STRING)
+ if (c2 =3D= =3D SQL_INVALID_UTF8_SYMBOL)
+ return SQLITE_NOMATCH;
=C2=A0 return SQLITE_MATCH;
+ }
=C2=A0 if (c =3D= =3D matchOther) {
=C2=A0 if (pInfo->matchSet =3D=3D 0) {
=C2=A0 c =3D Utf8Read(pattern, pattern_end);
- if (c =3D=3D= 0)
+ if (c =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
+ if (c =3D=3D SQL_END_OF_STRING)
=C2=A0 return SQLITE_NOWILDCARDMATCH;
=C2=A0 } else {=
- /* "[..= .]" immediately follows the "*".=C2=A0 We have to do a slow=
- * recursive search in this case, but it is an unusual case.
+ /* "[..= .]" immediately
+ * follows t= he "*". We
+ * have to d= o a slow
+ * recursive search in
+ * this case= , but it is
+ * an unusua= l case.
=C2=A0 */
- assert(match= Other < 0x80); /* '[= ' is a single-byte character */
+ assert(match= Other < 0x80);
=C2=A0 while (= string < string_end) {
REMEMBER THIS POINT #2

=C2=A0 int bM= atch =3D
- =C2=A0 =C2= =A0 patternCompare(&pattern[-1],
- =C2=A0 = =C2=A0string,
- =C2=A0 = =C2=A0pInfo,
- =C2=A0 =C2=A0matchOther);
+ =C2=A0 =C2= =A0 sql_utf8_pattern_compare(
+ &patt= ern[-1],
+ string,
+ pInfo,
+ matchOthe= r);
=C2=A0 if (bM= atch !=3D SQLITE_NOMATCH)
=C2=A0 retur= n bMatch;
- Utf8Read(st= ring, string_end);
+ c =3D Utf8Read(string, string_end);
+ if (c =3D= =3D SQL_INVALID_UTF8_SYMBOL)
+ return SQLITE_NOMATCH;
look at <REMEMBER THIS POINT #1,2> and other `Utf8Read` usages.
You have introduced SQL_END_OF_STRING and changed `Utf8Read` pattern to use it in
half of cases?


1) Look at <REMEMBER TH= IS POINT #1>.
2) I hav= e introduced it not to use explicit 0 and to fix the bug.
3) Fixed it though.
=C2=A0
Moreover,in that place you do check `string < string_end` implicitly inside of
`Utf8Read` but you never use that result.=C2=A0
<= blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l= eft:1px solid rgb(204,204,204);padding-left:1ex">
I suppose you should return old iteration style and `Utf8Read` macro.
```
while (string < string_end) {
=C2=A0=C2=A0=C2=A0 c =3D Ut= f8Read(string, string_end);
if (c =3D=3D SQL_INVALID_UTF8_SYMBOL)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return SQLITE_NOMATCH;
```

I think it's better t= o use this approach, but yes: return prev. version of macro:
1) &= #39;zero byte' ticket will be partially fixed.
2) 0xffff=C2=A0is=C2=A0non-unicode=C2=A0anyways.
=C2=A0 }
=C2=A0 return SQLITE_NOWILDCARDMATCH;
=C2=A0 }
=C2=A0 }
=C2=A0
- /* At this poi= nt variable c contains the first character of the
- * pattern str= ing past the "*".=C2=A0 Search in the input string for the<= /div>
- * first match= ing character and recursively continue the match from
- * that point.=
+ /* At this poi= nt variable c contains the
+ * first chara= cter of the pattern string
+ * past the &q= uot;*". Search in the input
+ * string for = the first matching
+ * character a= nd recursively continue the
+ * match from = that point.
=C2=A0 *
- * For a case-insensitive search, set variable cx to be the same as
- * c but in th= e other case and search the input string for either
- * c or cx.
+ * For a case-insensitive search, set
+ * variable cx= to be the same as c but in
+ * the other c= ase and search the input
+ * string for either c or cx.
=C2=A0 */
=C2=A0
=C2=A0 int bMatc= h;
@@ -756,14 +801,18 @@ patternCompare(const char * pattern, /* The glob pattern */
=C2=A0 c =3D u_= tolower(c);
=C2=A0 while (st= ring < string_end){
=C2=A0 /**
- * This loop could have been implemented
- * without if converting c2 to lower case
- * (by holdin= g c_upper and c_lower), however
- * it is implemented this way because lower
- * works bett= er with German and Turkish
- * languages.=
+ * This loop could have been
+ * implemente= d without if
+ * converting= c2 to lower case
+ * by holding c_upper and
+ * c_lower,however it is
+ * implemente= d this way because
+ * lower work= s better with German
+ * and Turkis= h languages.
=C2=A0 */
=C2=A0 c2 =3D Utf8Read(string, string_end);
+ if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQLITE_NOMATCH;
=C2=A0 if (!noC= ase) {
=C2=A0 if (c2 = !=3D c)
=C2=A0 contin= ue;
@@ -771,9 +820,10 @@ patternCompare(const char * pattern, /* The glob pattern */
=C2=A0 if (c2 = !=3D c && u_tolower(c2) !=3D c)
=C2=A0 contin= ue;
=C2=A0 }
- bMatch =3D
- =C2=A0 =C2=A0 patternCompare(pattern, string,
- =C2=A0 =C2= =A0pInfo, matchOther);
+ bMatch =3D sql_utf8_pattern_compare(pattern,
+ =C2=A0 st= ring,
+ =C2=A0 pI= nfo,
+ =C2=A0 matchOther);
=C2=A0 if (bMat= ch !=3D SQLITE_NOMATCH)
=C2=A0 return = bMatch;
=C2=A0 }
@@ -782,7 +832,9 @@ patternCompare(const char * pattern, /* The glob pattern */
=C2=A0 if (c =3D= =3D matchOther) {
=C2=A0 if (pInfo->matchSet =3D=3D 0) {
=C2=A0 c =3D Utf8Read(pattern, pattern_end);
- if (c =3D=3D = 0)
+ if (c =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
+ if (c =3D=3D SQL_END_OF_STRING)
=C2=A0 return SQLITE_NOMATCH;
=C2=A0 zEscaped= =3D pattern;
=C2=A0 } else {<= /div>
@@ -790,23 +842,33 @@ patternCompare(const char * pattern, /* The glob pattern */
=C2=A0 int seen= =3D 0;
=C2=A0 int inve= rt =3D 0;
=C2=A0 c =3D Utf8Read(string, string_end);
+ if (c =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQLITE_NOMATCH;
=C2=A0 if (stri= ng =3D=3D string_end)
=C2=A0 return SQLITE_NOMATCH;
=C2=A0 c2 =3D Utf8Read(pattern, pattern_end);
+ if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
=C2=A0 if (c2 = =3D=3D '^') {
=C2=A0 invert = =3D 1;
=C2=A0 c2 =3D Utf8Read(pattern, pattern_end);
+ if (c2 =3D= =3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
=C2=A0 }
=C2=A0 if (c2 = =3D=3D ']') {
=C2=A0 if (c = =3D=3D ']')
=C2=A0 seen = =3D 1;
=C2=A0 c2 =3D Utf8Read(pattern, pattern_end);
+ if (c2 =3D= =3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
=C2=A0 }
- while (c2 && c2 !=3D ']') {
+ while (c2 != =3D SQL_END_OF_STRING && c2 !=3D ']') {
=C2=A0 if (c2 = =3D=3D '-' && pattern[0] !=3D ']'
=C2=A0 =C2=A0 = =C2=A0 && pattern < pattern_end
=C2=A0 =C2=A0 = =C2=A0 && prior_c > 0) {
=C2=A0 c2 =3D Utf8Read(pattern, pattern_end);
+ if (c2 =3D= =3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
=C2=A0 if (c = >=3D prior_c && c <=3D c2)
=C2=A0 seen = =3D 1;
=C2=A0 prior_= c =3D 0;
@@ -817,29 +879,36 @@ patternCompare(const char * pattern, /* The glob pattern */
=C2=A0 prior_= c =3D c2;
=C2=A0 }
=C2=A0 c2 =3D Utf8Read(pattern, pattern_end);
+ if (c2 =3D= =3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
=C2=A0 }
- if (pattern = =3D=3D pattern_end || (seen ^ invert) =3D=3D 0) {
+ if (pattern = =3D=3D pattern_end ||
+ =C2=A0 =C2=A0= (seen ^ invert) =3D=3D 0) {
=C2=A0 return SQLITE_NOMATCH;
=C2=A0 }
=C2=A0 continue= ;
=C2=A0 }
=C2=A0 }
=C2=A0 c2 =3D Utf8Read(string, string_end);
+ if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQLITE_NOMATCH;
=C2=A0 if (c =3D= =3D c2)
=C2=A0 continue;=
=C2=A0 if (noCase= ){
=C2=A0 /**
- * Small optimisation. Reduce number of calls
- * to u_tolowe= r function.
- * SQL standar= ds suggest use to_upper for symbol
- * normalisati= on. However, using to_lower allows to
- * respect Tur= kish '=C4=B0' in default locale.
+ * Small optimisation. Reduce number of
+ * calls to u_tolower function. SQL
+ * standards suggest use to_upper for
+ * symbol normalisation. However, using
+ * to_lower al= lows to respect Turkish '=C4=B0'
+ * in default locale.
=C2=A0 */
=C2=A0 if (u_tol= ower(c) =3D=3D c2 ||
=C2=A0 =C2=A0 = =C2=A0 c =3D=3D u_tolower(c2))
=C2=A0 continue= ;
=C2=A0 }
- if (c =3D=3D ma= tchOne && pattern !=3D zEscaped && c2 !=3D 0)
+ if (c =3D=3D ma= tchOne && pattern !=3D zEscaped &&
+ =C2=A0 =C2=A0 c= 2 !=3D SQL_END_OF_STRING)
=C2=A0 continue;=
=C2=A0 return SQLITE_NOMATCH;
=C2=A0 }
@@ -853,8 +922,7 @@ patternCompare(const char * pattern, /* The glob pattern */
=C2=A0int
=C2=A0sqlite3_strglob(const char *zGlobPattern, const char *zString)
=C2=A0{
- return patternCompare(zGlobPattern, zString, &globInfo,
- =C2=A0 =C2=A0 = =C2=A0 '[');
+ return sql_utf8_pattern_compare(zGlobPattern, zString, &globInfo, '[');
=C2=A0}
=C2=A0
=C2=A0/*
@@ -864,7 +932,7 @@ sqlite3_strglob(const char *zGlobPattern, const char *zString)
=C2=A0int
=C2=A0sqlite3_strlike(const char *zPattern, const char *zStr, unsigned int esc)
=C2=A0{
- return patternCompare(zPattern, zStr, &likeInfoNorm, esc);
+ return sql_utf8_pattern_compare(zPattern, zStr, &likeInfoNorm, esc);
=C2=A0}
=C2=A0
=C2=A0/*
@@ -910,8 +978,9 @@ likeFunc(sqlite3_context * context, int argc, sqlite3_value ** argv)
=C2=A0 zB =3D (con= st char *) sqlite3_value_text(argv[0]);
=C2=A0 zA =3D (con= st char *) sqlite3_value_text(argv[1]);
=C2=A0
- /* Limit the len= gth of the LIKE or GLOB pattern to avoid problems
- * of deep recur= sion and N*N behavior in patternCompare().
+ /* Limit the len= gth of the LIKE or GLOB pattern to avoid
+ * problems of d= eep recursion and N*N behavior in
+ * sql_utf8_pattern_compare().
=C2=A0 */
=C2=A0 nPat =3D sqlite3_value_bytes(argv[0]);
=C2=A0 testcase(nP= at =3D=3D db->aLimit[SQLITE_LIMIT_LIKE_PATTERN_LENGTH]);
@@ -947,7 +1016,12 @@ likeFunc(sqlite3_context * context, int argc, sqlite3_value ** argv)
=C2=A0 sqlite3_lik= e_count++;
=C2=A0#endif
=C2=A0 int res;
- res =3D patternCompare(zB, zA, pInfo, escape);
+ res =3D sql_utf8_pattern_compare(zB, zA, pInfo, escape);
+ if (res =3D=3D SQL_PROHIBITED_PATTERN) {
+ sqlite3_result_= error(context, "LIKE or GLOB pattern can only"
+ =C2=A0 =C2=A0= =C2=A0" contain UTF-8 characters", -1);
+ return;
+ }
=C2=A0 sqlite3_res= ult_int(context, res =3D=3D SQLITE_MATCH);
=C2=A0}
=C2=A0
diff --git a/test-run b/test-run
index 77e9327..95562e9 160000
--- a/test-run
+++ b/test-run
@@ -1 +1 @@
-Subproject commit 77e93279210f8c5c1fd0ed03416fa19a184f0b6d
+Subproject commit 95562e95401fef4e0b755ab0bb430974b5d1a29a
diff --git a/test/sql-tap/e_expr.test.lua b/test/sql-tap/e_expr.test.lua
index 13d3a96..9780d2c 100755
--- a/test/sql-tap/e_expr.test.lua
+++ b/test/sql-tap/e_expr.test.lua
@@ -1,6 +1,6 @@
=C2=A0#!/usr/bin/env tarantool
=C2=A0test =3D require("sqltester")
-test:plan(12431)
+test:plan(10665)
=C2=A0
=C2=A0--!./tcltestrunner.lua
=C2=A0-- 2010 July 16
@@ -77,8 +77,10 @@ local operations =3D {
=C2=A0 =C2=A0 =C2=A0{"<>", "ne1"},
=C2=A0 =C2=A0 =C2=A0{"!=3D", "ne2"},
=C2=A0 =C2=A0 =C2=A0{"IS", "is"},
-=C2=A0 =C2=A0 {"LIKE", "like"},
-=C2=A0 =C2=A0 {"GLOB", "glob"},
+-- NOTE: This test needs refactoring after deletion of GLOB &
+-- type restrict= ions for LIKE. (See #3572)
+--=C2=A0 =C2=A0 {"LIKE", "like"},
+--=C2=A0 =C2=A0 {"GLOB", "glob"},
Yes, this behavior is not valid anymore.
To make sure that likes and globs will be tested in the future, please, delete this
commented lines and add your own simple test, which tries to call `like` and `glob`
with inappropriate types.
It is important to have a functional tests for any possible behavior.

1) Globs are going = to be deleted as you can see few lines above.
2) I'm gonna re= factor that when LIKE is in its final state (basically after #3572 is close= d
& static build is into 2.1).
=C2=A0
=C2=A0 =C2=A0 =C2=A0{"AND", "and"},
=C2=A0 =C2=A0 =C2=A0{"OR", "or"},
=C2=A0 =C2=A0 =C2=A0{"MATCH", "match"},
@@ -96,7 +98,12 @@ operations =3D {
=C2=A0 =C2=A0 =C2=A0{"+", "-"},
=C2=A0 =C2=A0 =C2=A0{"<<", ">>"= ;, "&", "|"},
=C2=A0 =C2=A0 =C2=A0{"<", "<=3D", &q= uot;>", ">=3D"},
-=C2=A0 =C2=A0 {"=3D", "=3D=3D", "!= =3D", "<>", "LIKE", "GLOB"}, --"MATCH", "REGEXP"},
+-- NOTE: This test needs refactoring after deletion of GLOB &
+-- type restrict= ions for LIKE. (See #3572)
+-- Another NOTE: MATCH & REGEXP aren't supported in Tarantool &
+-- are waiting for their hour, don't confuse them
+-- being commen= ted with ticket above.
+=C2=A0 =C2=A0 {"=3D", "=3D=3D", "!= =3D", "<>"}, --"LIKE", "GLOB"}, --"MATCH", "REGEXP"},
=C2=A0 =C2=A0 =C2=A0{"AND"},
=C2=A0 =C2=A0 =C2=A0{"OR"},
=C2=A0}
@@ -475,6 +482,7 @@ for _, op in ipairs(oplist) do
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0end
=C2=A0 =C2=A0 =C2=A0end
=C2=A0end
+
=C2=A0--------------------------------------------------------= -------------------
=C2=A0-- Test the IS and IS NOT operators.
=C2=A0--
diff --git a/test/sql-tap/gh-3251-string-pattern-comparison.test.lua b/test/sql-tap/gh-3251-string-pattern-comparison.test.lua
new file mode 100755
index 0000000..2a787f2
--- /dev/null
+++ b/test/sql-tap/gh-3251-string-pattern-comparison.test.lua
@@ -0,0 +1,213 @@
+#!/usr/bin/env tarantool
+test =3D require("sqltester")
+test:plan(128)
+
+local prefix =3D "like-test-"
+
+-- Unicode byte sequences.
+local valid_testcases =3D {
+=C2=A0 =C2=A0 '\x01',
+=C2=A0 =C2=A0 '\x09',
+=C2=A0 =C2=A0 '\x1F',
+=C2=A0 =C2=A0 '\x7F',
+=C2=A0 =C2=A0 '\xC2\x80',
+=C2=A0 =C2=A0 '\xC2\x90',
+=C2=A0 =C2=A0 '\xC2\x9F',
+=C2=A0 =C2=A0 '\xE2\x80\xA8',
+=C2=A0 =C2=A0 '\x20\x0B',
+=C2=A0 =C2=A0 '\xE2\x80\xA9',
+}
optional: add descriptions to those byte sequences (what it is).

Some valid unicode symbols which is onl= y that matters for LIKE operator.
+
+-- Non-Unicode byte sequences.
+local invalid_testcases =3D {
+=C2=A0 =C2=A0 '\xE2\x80',
+=C2=A0 =C2=A0 '\xFE\xFF',
+=C2=A0 =C2=A0 '\xC2',
+=C2=A0 =C2=A0 '\xED\xB0\x80',
+=C2=A0 =C2=A0 '\xD0',
+}
Place that after like_test_cases, just before it is used.

Changed it.
=C2=A0
+
+local like_test_cases =3D
+{
+=C2=A0 =C2=A0 {"1.1",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE &#= 39;_B';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.2",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CD' LIKE &#= 39;_B';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.3",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT '' LIKE '= ;_B';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.4",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE &#= 39;%B';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.5",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CD' LIKE &#= 39;%B';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.6",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT '' LIKE '= ;%B';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.7",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE &#= 39;A__';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.8",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CD' LIKE &#= 39;A__';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.9",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT '' LIKE '= ;A__';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.10",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE &#= 39;A_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.11",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CD' LIKE &#= 39;A_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.12",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT '' LIKE '= ;A_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.13",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE &#= 39;A';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.14",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CD' LIKE &#= 39;A';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.15",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT '' LIKE '= ;A';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.16",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE &#= 39;_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.17",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CD' LIKE &#= 39;_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.18",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT '' LIKE '= ;_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.19",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE &#= 39;__';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.20",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CD' LIKE &#= 39;__';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.21",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT '' LIKE '= ;__';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.22",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE &#= 39;%A';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.23",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE &#= 39;%C';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.24",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'ab' LIKE &#= 39;%df';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.25",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'abCDF' LIKE= '%df';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.26",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CDF' LIKE &= #39;%df';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.27",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'ab' LIKE &#= 39;a_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.28",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'abCDF' LIKE= 'a_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.29",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CDF' LIKE &= #39;a_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.30",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'ab' LIKE &#= 39;ab%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.31",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'abCDF' LIKE= 'ab%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.32",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CDF' LIKE &= #39;ab%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.33",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'ab' LIKE &#= 39;abC%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.34",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'abCDF' LIKE= 'abC%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.35",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CDF' LIKE &= #39;abC%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.36",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'ab' LIKE &#= 39;a_%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.37",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'abCDF' LIKE= 'a_%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.38",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CDF' LIKE &= #39;a_%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+}
Please, add some tests for unicode strings. (or replace letters in those tests with unicode letters)

=
Changed existing tests a little bit.
=C2=A0
+
+test:do_catchsql_set_test(like_test_cases, prefix)
+
+-- Invalid testcases.
+for i, tested_string in ipairs(invalid_testcases) do
+
+=C2=A0 =C2=A0 -- We should raise an error in case
+=C2=A0 =C2=A0 -- pattern contains invalid characters.
+
+=C2=A0 =C2=A0 local test_name =3D prefix .. "2." ..= tostring(i)
+=C2=A0 =C2=A0 local test_itself =3D "SELECT 'abc'= ; LIKE 'ab" .. tested_string .. "';"
+=C2=A0 =C2=A0 test:do_catchsql_test(test_name, test_itself,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {1, "LIKE or GLOB pattern can only contain UTF-8 characters"})
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "3." .. tostr= ing(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'abc' LIKE= 'abc" .. tested_string .. "';"
+=C2=A0 =C2=A0 test:do_catchsql_test(test_name, test_itself,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {1, "LIKE or GLOB pattern can only contain UTF-8 characters"})
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "4." .. tostr= ing(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'abc' LIKE= 'ab" .. tested_string .. "c';"
+=C2=A0 =C2=A0 test:do_catchsql_test(test_name, test_itself,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {1, "LIKE or GLOB pattern can only contain UTF-8 characters"})
+
+=C2=A0 =C2=A0 -- Just skipping if row value predicand contain= s invalid character.
What the predicand is? Is it a typo?

You can find it in ANSI SQL: <row value predicand>.
Basically it's just operand inside of a predicate.
=C2=A0
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "5." .. tostr= ing(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'ab" .. t= ested_string .. "' LIKE 'abc';"
+=C2=A0 =C2=A0 test:do_execsql_test(test_name, test_itself, {0= })
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "6." .. tostr= ing(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'abc" .. = tested_string .. "' LIKE 'abc';"
+=C2=A0 =C2=A0 test:do_execsql_test(test_name, test_itself, {0= })
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "7." .. tostr= ing(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'ab" .. t= ested_string .. "c' LIKE 'abc';"
+=C2=A0 =C2=A0 test:do_execsql_test(test_name, test_itself, {0= })
+end
+
+-- Valid testcases.
+for i, tested_string in ipairs(valid_testcases) do
+=C2=A0 =C2=A0 test_name =3D prefix .. "8." .. tostr= ing(i)
+=C2=A0 =C2=A0 local test_itself =3D "SELECT 'abc'= ; LIKE 'ab" .. tested_string .. "';"
+=C2=A0 =C2=A0 test:do_execsql_test(test_name, test_itself, {0= })
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "9." .. tostr= ing(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'abc' LIKE= 'abc" .. tested_string .. "';"
+=C2=A0 =C2=A0 test:do_execsql_test(test_name, test_itself, {0= })
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "10." .. tost= ring(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'abc' LIKE= 'ab" .. tested_string .. "c';"
+=C2=A0 =C2=A0 test:do_execsql_test(test_name, test_itself, {0})
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "11." .. tost= ring(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'ab" .. t= ested_string .. "' LIKE 'abc';"
+=C2=A0 =C2=A0 test:do_execsql_test(test_name, test_itself, {0})
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "12." .. tost= ring(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'abc" .. = tested_string .. "' LIKE 'abc';"
+=C2=A0 =C2=A0 test:do_execsql_test(test_name, test_itself, {0= })
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "13." .. tost= ring(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'ab" .. t= ested_string .. "c' LIKE 'abc';"
+=C2=A0 =C2=A0 test:do_execsql_test(test_name, test_itself, {0= })
+end
+
+test:finish_test()
Why I cannot find a test of `GLOB`? Even if we delete it in the future, it should be tested. You can write much less tests for glob.
E.g. this
```
select '1' glob '[0-4]';
```
somewhy returns 0.

=C2=A0
I actually don't think that operator that is goi= ng to be deleted in a few days should be tested.
It's just useless and redundant code.
=C2=A0

Sorry, some of the tests I ask you to write are a little out of scope of the ticket and they should already have been written.
But I suppose most of ambiguity should be clarified now. This ticket has raised important questions related to those functions.

commit f15fbaef2182084dec7cdc6c661509cb9= 08df892
Author: N.Tatunov <hollow653@gmail.com>
Date:=C2=A0 =C2=A0Thu Jun 28 15:1= 7:32 2018 +0300

=C2=A0 =C2=A0 sql: LIKE & GLOB= pattern comparison issue
=C2=A0 =C2=A0=C2=A0
=C2=A0 = =C2=A0 Currently function that compares pattern and string for GLOB & L= IKE
=C2=A0 =C2=A0 operators doesn't work properly. It uses IC= U reading function which
=C2=A0 =C2=A0 was assumed having other r= eturn codes and the implementation for the
=C2=A0 =C2=A0 comparis= on ending isn't paying attention to some special cases, hence
=C2=A0 =C2=A0 in those cases it works improperly.
=C2=A0 =C2=A0= =C2=A0
=C2=A0 =C2=A0 With the patch applied an error will be retu= rned in case there's an
=C2=A0 =C2=A0 invalid UTF-8 symbol in= pattern & pattern containing only valid UTF-8
=C2=A0 =C2=A0 = symbols will not be matched with the string that contains invalid
=C2=A0 =C2=A0 symbol.
=C2=A0 =C2=A0=C2=A0
=C2=A0 =C2= =A0 =D0=A1loses #3251
=C2=A0 =C2=A0 =D0=A1loses #3334
= =C2=A0 =C2=A0 Part of #3572

diff --git a/src/box/s= ql/func.c b/src/box/sql/func.c
index c06e3bd..7f93ef6 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -617,13 +617,17 @@ struct compareInfo {
=C2=A0 u8 noCase; /* true to ignore case differences */
=C2=A0};
=C2=A0<= /div>
-/*
- * For LIKE and GLOB matching on EBCDIC machines, = assume that every
- * character is exactly one byte in size.=C2= =A0 Also, provde the Utf8Read()
- * macro for fast reading of the= next character in the common case where
- * the next character i= s ASCII.
+/**
+ * Providing there are symbols in string= s this
+ * macro returns UTF-8 code of character and
+= * promotes pointer to the next symbol in the string.
+ * Otherwi= se return code is SQL_END_OF_STRING.
=C2=A0 */
-#define= Utf8Read(s, e)=C2=A0 =C2=A0 ucnv_getNextUChar(pUtf8conv, &s, e, &s= tatus)
+#define Utf8Read(s, e) (((s) < (e)) ? \
+ ucnv_getNextUChar(pUtf8conv, &(s),= (e), &(status)) : 0)
+
+#define SQL_END_OF_STRING= =C2=A0 =C2=A0 =C2=A0 =C2=A0 0
+#define SQL_INVALID_UTF8_SYMBOL=C2= =A0 0xfffd
=C2=A0
=C2=A0static const struct compareInfo= globInfo =3D { '*', '?', '[', 0 };
=C2= =A0
@@ -638,19 +642,16 @@ static const struct compareInfo likeInf= oNorm =3D { '%', '_', 0, 1 };
=C2=A0static const = struct compareInfo likeInfoAlt =3D { '%', '_', 0, 0 };
=C2=A0
=C2=A0/*
- * Possible error returns from p= atternMatch()
+ * Possible error returns from sql_utf8_pattern_co= mpare()
=C2=A0 */
=C2=A0#define SQLITE_MATCH=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00
=C2=A0#define SQLITE_NOMA= TCH=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01
=C2=A0#define SQLIT= E_NOWILDCARDMATCH=C2=A0 =C2=A02
+#define SQL_PROHIBITED_PATTERN= =C2=A0 =C2=A03
=C2=A0
-/*
- * Compare two UTF= -8 strings for equality where the first string is
- * a GLOB or L= IKE expression.=C2=A0 Return values:
- *
- *=C2=A0 =C2= =A0 SQLITE_MATCH:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Match
= - *=C2=A0 =C2=A0 SQLITE_NOMATCH:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 No match=
- *=C2=A0 =C2=A0 SQLITE_NOWILDCARDMATCH:=C2=A0 No match in spite= of having * or % wildcards.
+/**
+ * Compare two UTF-8= strings for equality where the first string
+ * is a GLOB or LIK= E expression.
=C2=A0 *
=C2=A0 * Globbing rules:
=C2=A0 *
@@ -663,92 +664,136 @@ static const struct compareInf= o likeInfoAlt =3D { '%', '_', 0, 0 };
=C2=A0 *
=C2=A0 *=C2=A0 =C2=A0 =C2=A0[^...]=C2=A0 =C2=A0 =C2=A0Matches one c= haracter not in the enclosed list.
=C2=A0 *
- * With th= e [...] and [^...] matching, a ']' character can be included
<= div>- * in the list by making it the first character after '[' or &= #39;^'.=C2=A0 A
- * range of characters can be specified usin= g '-'.=C2=A0 Example:
- * "[a-z]" matches any s= ingle lower-case letter.=C2=A0 To match a '-', make
- * i= t the last character in the list.
+ * With the [...] and [^...] m= atching, a ']' character can be
+ * included in the list = by making it the first character after
+ * '[' or '^&= #39;. A range of characters can be specified using '-'.
+= * Example: "[a-z]" matches any single lower-case letter.
+ * To match a '-', make it the last character in the list.
=C2=A0 *
=C2=A0 * Like matching rules:
=C2=A0 *
- *=C2=A0 =C2=A0 =C2=A0 '%'=C2=A0 =C2=A0 =C2=A0 =C2=A0Matc= hes any sequence of zero or more characters
+ *=C2=A0 =C2=A0 =C2= =A0 '%'=C2=A0 =C2=A0 =C2=A0 =C2=A0Matches any sequence of zero or m= ore characters.
=C2=A0 *
- **=C2=A0 =C2=A0 =C2=A0'_= '=C2=A0 =C2=A0 =C2=A0 =C2=A0Matches any one character
+ **=C2= =A0 =C2=A0 =C2=A0'_'=C2=A0 =C2=A0 =C2=A0 =C2=A0Matches any one char= acter.
=C2=A0 *
=C2=A0 *=C2=A0 =C2=A0 =C2=A0 Ec=C2=A0 = =C2=A0 =C2=A0 =C2=A0 Where E is the "esc" character and c is any = other
- *=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = character, including '%', '_', and esc, match exactly c.
+ *=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 characte= r, including '%', '_', and esc, match
+ *=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 exactly c.
=C2= =A0 *
=C2=A0 * The comments within this routine usually assume gl= ob matching.
=C2=A0 *
- * This routine is usually quick= , but can be N**2 in the worst case.
+ * This routine is usually = quick, but can be N**2 in the worst
+ * case.
+ *
=
+ * @param pattern String containing comparison pattern.
+ *= @param string String being compared.
+ * @param compareInfo Info= rmation about how to compare.
+ * @param matchOther The escape ch= ar (LIKE) or '[' (GLOB).
+ *
+ * @retval SQLITE= _MATCH:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Match.
+ * =C2=A0 =C2=A0SQLITE_NOMATCH:=C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 No match.
+ * =C2=A0 =C2=A0SQLITE_NOWILDCARDMATCH:=C2=A0 No match in spite of h= aving *
+ * =C2=A0 =C2= =A0 or % wildcards.
+ * = =C2=A0 =C2=A0SQL_PROHIBITED_PATTERN:=C2=A0 Pattern contains invalid
+ * =C2=A0 =C2=A0 symbol.
=C2=A0 */
=C2=A0static int
-patternCompare(const = char * pattern, /* The glob pattern = */
- =C2=A0 =C2=A0 =C2=A0 = =C2=A0const char * string, /* The st= ring to compare against the glob */
- =C2=A0 =C2=A0 =C2=A0 =C2=A0const struct compareInfo *pInfo, /* Information about how to do the comp= are */
- =C2=A0 =C2=A0 =C2= =A0 =C2=A0UChar32 matchOther /* The = escape char (LIKE) or '[' (GLOB) */
-=C2=A0 =C2=A0 )
+sql_utf8_pattern_compare(const char * pattern,
+ const char * string,
+ const struct compareInfo *pInfo,
+ UChar32 matchOther)
= =C2=A0{
- UChar32 c, c2; /* Next pattern and input string cha= rs */
- UChar32 matchOne = =3D pInfo->matchOne; /* "?&q= uot; or "_" */
- UChar32 matchAll =3D pInfo->matchAll; <= /span>/* "*" or "%" */
- UChar32 noCase =3D pInfo->noCase; /* True if uppercase=3D=3Dlowercase */
- const char *zEscaped =3D 0; /* One past the last escaped input char */
+ /* Next pattern and input s= tring chars */
+ UChar32 c= , c2;
+ /* "?" o= r "_" */
+ UChar= 32 matchOne =3D pInfo->matchOne;
+ /* "*" or "%" */
+ UChar32 matchAll =3D pInfo->matchAll;
+ /* True if uppercase=3D=3Dlowerca= se */
+ UChar32 noCase =3D= pInfo->noCase;
+ /* On= e past the last escaped input char */
+ const char *zEscaped =3D 0;
=C2=A0 const char * pattern_end =3D pattern + strlen(patter= n);
=C2=A0 const char * st= ring_end =3D string + strlen(string);
=C2=A0 UErrorCode status =3D U_ZERO_ERROR;
=C2=A0
- while (pattern < pattern= _end){
- c =3D Utf8Read(p= attern, pattern_end);
+ wh= ile ((c =3D Utf8Read(pattern, pattern_end)) !=3D SQL_END_OF_STRING) {
=
+ if (c =3D=3D SQL_INVALID_UTF= 8_SYMBOL)
+ return SQL_P= ROHIBITED_PATTERN;
=C2=A0 if (c =3D=3D matchAll) { /* Match &= quot;*" */
- /* Ski= p over multiple "*" characters in the pattern.=C2=A0 If there
- * are also "?"= characters, skip those as well, but consume a
- * single character of the input string for each &= quot;?" skipped
+ /= * Skip over multiple "*" characters in
+ * the pattern. If there are also "?"<= /div>
+ * characters, skip th= ose as well, but
+ * co= nsume a single character of the
+= * input string for each "?" skipped.
=C2=A0<= span style=3D"white-space:pre"> */
- while (pattern < pattern_end){
- c =3D Utf8Read(pattern, pattern_end);
+ while ((c =3D Utf8Read(= pattern, pattern_end)) !=3D
+ <= /span>=C2=A0 =C2=A0 =C2=A0 =C2=A0SQL_END_OF_STRING) {
+ if (c =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PA= TTERN;
=C2=A0 if (c != =3D matchAll && c !=3D matchOne)
=C2=A0 break;
- if (c =3D=3D matchOne
- =C2=A0 =C2=A0 && Utf8Read(string, string_end) =3D=3D 0= ) {
+ if (c =3D=3D matc= hOne &&
+ =C2= =A0 =C2=A0 (c2 =3D Utf8Read(string, string_end)) =3D=3D
+ =C2=A0 =C2=A0 SQL_END_OF_STRING)
=C2=A0 return SQLITE_NOWILDCA= RDMATCH;
- }
= + if (c2 =3D=3D SQL_INVALID_UTF8_= SYMBOL)
+ return SQLIT= E_NOMATCH;
=C2=A0 }
- /* "*" at the en= d of the pattern matches */
- <= /span>if (pattern =3D=3D pattern_end)
+ /*
+ * = "*" at the end of the pattern matches.
+ */
+ = if (c =3D=3D SQL_END_OF_STRING) {
+ while ((c2 =3D Utf8Read(string, string_end)) !=3D
+ =C2=A0 =C2=A0 =C2=A0 =C2= =A0SQL_END_OF_STRING)
+ if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQLITE_NOMATCH;
=C2=A0 return SQLITE_MATCH;
+ }
=C2=A0 if (c =3D=3D matchOther) {
=C2=A0 if (pInfo->matchSet =3D=3D 0) {
=C2=A0= c =3D Utf8Read(pattern, pattern= _end);
- if (c =3D=3D = 0)
+ if (c =3D=3D SQL_= INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
+ if (c =3D=3D SQL_END_OF_STRING)
=C2=A0 return SQLITE_NOWILDCARDMATCH;
=C2= =A0 } else {
- /* "[...]" immediately follows= the "*".=C2=A0 We have to do a slow
- * recursive search in this case, but it is an u= nusual case.
+ /* &quo= t;[...]" immediately
+ <= /span> * follows the "*". We
+ * have to do a slow
+ * recursive search in
+ * this case, but it is
+ * an unusual case.
=C2=A0 */
- assert(matchOther < 0x80); /* '[' is a single-byte character */
+ assert(matchOther < 0x80);
=C2=A0 while (string < strin= g_end) {
=C2=A0 int b= Match =3D
- =C2=A0 = =C2=A0 patternCompare(&pattern[-1],
- =C2=A0 =C2=A0string,
- =C2=A0 =C2=A0pInfo,
- =C2=A0 =C2=A0matchOther);
+ =C2=A0 =C2=A0 sql_utf8_pattern_compare= (
+ &pattern[-1= ],
+ string,
<= div>+ pInfo,
+ matchOther);
=C2=A0 if (bMatch !=3D SQLITE_NOMATCH)
=C2=A0 return bMatch;<= /div>
- Utf8Read(string, st= ring_end);
+ c =3D Ut= f8Read(string, string_end);
+ = if (c =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQLITE_NOMATCH;
=C2=A0 }
=C2=A0 return SQLITE_NOWILDCARDMATCH;
=C2=A0 }
=C2=A0 }
=C2=A0
- /* At this point variable c contains the first charact= er of the
- * pattern s= tring past the "*".=C2=A0 Search in the input string for the
- * first matching characte= r and recursively continue the match from
- * that point.
+ /* At this point variable c contains the
+ * first character of the pattern string
+ * past the "*"= ;. Search in the input
+ * string for the first matching
+ * character and recursively continue the
+ * match from that point.
=C2=A0<= span style=3D"white-space:pre"> *
- * For a case-insensitive search, set variable cx to = be the same as
- * c bu= t in the other case and search the input string for either
- * c or cx.
+ * For a case-insensitive search, set
= + * variable cx to be the same as= c but in
+ * the other= case and search the input
+ * string for either c or cx.
=C2=A0 */
=C2=A0
=C2=A0 int bMatch;
@@ -756,14 +801,18 @@ patternCo= mpare(const char * pattern, /* The g= lob pattern */
=C2=A0 c= =3D u_tolower(c);
=C2=A0 while (string < string_end){
=C2=A0 /**
- * This loop could have been implemented
- * without if converting c2 to lower case
-<= span style=3D"white-space:pre"> * (by holding c_upper and c_lowe= r), however
- * it is = implemented this way because lower
- * works better with German and Turkish
- * languages.
+ * This loop could have been
+ * implemented without if
+ * converting c2 to lower case
=
+ * by holding c_upper and<= /div>
+ * c_lower,however it= is
+ * implemented th= is way because
+ * low= er works better with German
+ = * and Turkish languages.
=C2=A0 */
=C2=A0 c2 =3D Utf8Read(string, string_end);
+ if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQLITE_NOMATCH;
= =C2=A0 if (!noCase) {
= =C2=A0 if (c2 !=3D c)
= =C2=A0 continue;
@@ -= 771,9 +820,10 @@ patternCompare(const char * pattern, /* The glob pattern */
=C2=A0 if (c2 !=3D c && u_tolower(c2) !=3D c)
=C2=A0 continue;
<= div>=C2=A0 }
- bMatch =3D
- =C2=A0 =C2=A0 patternCompare(pattern, string,
=
- =C2=A0 =C2=A0pInfo, matc= hOther);
+ bMatch =3D s= ql_utf8_pattern_compare(pattern,
+ =C2=A0 string,
+ = =C2=A0 pInfo,
+ = =C2=A0 matchOther);
=C2=A0 if (bMatch !=3D SQLITE_NOMATCH)
=C2=A0 return bMatch;
=C2=A0 }
@@ -782,7 +832,9 @@ patternCompa= re(const char * pattern, /* The glob= pattern */
=C2=A0 if (c = =3D=3D matchOther) {
=C2=A0 if (pInfo->matchSet =3D=3D 0) {
=C2=A0 c =3D Utf8Read(pattern, pattern_end);
- if (c =3D=3D 0)
+ if (c =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_= PATTERN;
+ if (c =3D=3D= SQL_END_OF_STRING)
=C2=A0 return SQLITE_NOMATCH;
=C2=A0 zEscaped =3D pattern;
=C2=A0 } else {
@@ -790,23 +842,33 @@ patternCompare(co= nst char * pattern, /* The glob patt= ern */
=C2=A0 int seen = =3D 0;
=C2=A0 int inver= t =3D 0;
=C2=A0 c =3D U= tf8Read(string, string_end);
+ = if (c =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQLITE_NOMATCH;
=C2=A0 if (string =3D=3D string_end)
= =C2=A0 return SQLITE_NOMATCH;
=C2=A0 c2 =3D Utf8Read(pa= ttern, pattern_end);
+ = if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
=C2=A0 if (c2 =3D=3D '^') {
= =C2=A0 invert =3D 1;
= =C2=A0 c2 =3D Utf8Read(pattern, = pattern_end);
+ if (c2= =3D=3D SQL_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
=C2=A0 }
=C2=A0 if (c2 =3D=3D ']') {
=C2=A0 if (c =3D=3D ']')
=C2=A0 seen =3D 1;
=C2=A0 c2 =3D Utf8Read(pattern, pattern_end)= ;
+ if (c2 =3D=3D SQL_= INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
=C2=A0 }
- wh= ile (c2 && c2 !=3D ']') {
+ while (c2 !=3D SQL_END_OF_STRING && c2 !=3D &#= 39;]') {
=C2=A0 if= (c2 =3D=3D '-' && pattern[0] !=3D ']'
= =C2=A0 =C2=A0 =C2=A0 && = pattern < pattern_end
=C2=A0 = =C2=A0 =C2=A0 && prior_c > 0) {
=C2=A0 c2 =3D Utf8Read(pattern, pattern_end= );
+ if (c2 =3D=3D SQ= L_INVALID_UTF8_SYMBOL)
+ return SQL_PROHIBITED_PATTERN;
=C2=A0 if (c >=3D prior_c && c <=3D c2)
<= div>=C2=A0 seen =3D 1;
=C2=A0 prior_c =3D 0;
<= div>@@ -817,29 +879,36 @@ patternCompare(const char * pattern, /* The glob pattern */
=C2=A0 prior_c =3D c2;
=C2=A0 }
=C2=A0 c2 =3D Utf8Read(pattern, pattern_end);
= + if (c2 =3D=3D SQL_INVALID_UTF8= _SYMBOL)
+ return SQL= _PROHIBITED_PATTERN;
=C2=A0 }
- if (pattern = =3D=3D pattern_end || (seen ^ invert) =3D=3D 0) {
+ if (pattern =3D=3D pattern_end ||
+<= span style=3D"white-space:pre"> =C2=A0 =C2=A0 (seen ^ invert) =3D= =3D 0) {
=C2=A0 return= SQLITE_NOMATCH;
=C2=A0 }
=C2=A0 continue;
=C2=A0 }
=C2=A0 }
=C2=A0 c2 =3D Utf8Read(string, string_end);
+ if (c2 =3D=3D SQL_INVALID_UTF8_SYMBOL)=
+ return SQLITE_NOMATCH= ;
=C2=A0 if (c =3D=3D c2)=
=C2=A0 continue;
<= div>=C2=A0 if (noCase){
= =C2=A0 /**
- * Small optimisation. Reduce number of call= s
- * to u_tolower func= tion.
- * SQL standards= suggest use to_upper for symbol
- * normalisation. However, using to_lower allows to
-<= span style=3D"white-space:pre"> * respect Turkish '=C4=B0'= ; in default locale.
+ = * Small optimisation. Reduce number of
+ * calls to u_tolower function. SQL
+ * standards suggest use to_upper for
<= div>+ * symbol normalisation. How= ever, using
+ * to_lowe= r allows to respect Turkish '=C4=B0'
+ * in default locale.
=C2=A0 */
=C2=A0 if (u_tolower(c) =3D=3D c2 ||
=C2=A0 =C2=A0 =C2=A0 c =3D=3D u_tolower(c2))
=C2=A0 continue;
=C2= =A0 }
- if (c =3D=3D matchOne && pattern !=3D zEscap= ed && c2 !=3D 0)
+ if (c =3D=3D matchOne && pattern !=3D zEscaped &&
+ =C2=A0 =C2=A0 c2 !=3D SQL_END_= OF_STRING)
=C2=A0 contin= ue;
=C2=A0 return SQLITE_= NOMATCH;
=C2=A0 }
@@ -853,8 +922,7 @@ patternCompare(const char * pattern, /* The glob pattern */
=C2=A0int
=C2=A0sqlite3_strglob(const char *zGlobPattern, const char *zString)
=
=C2=A0{
- return patt= ernCompare(zGlobPattern, zString, &globInfo,
- =C2=A0 =C2=A0 =C2=A0 '[');
+ return sql_utf8_pattern_compare(zGlob= Pattern, zString, &globInfo, '[');
=C2=A0}
= =C2=A0
=C2=A0/*
@@ -864,7 +932,7 @@ sqlite3_strglob(con= st char *zGlobPattern, const char *zString)
=C2=A0int
= =C2=A0sqlite3_strlike(const char *zPattern, const char *zStr, unsigned int = esc)
=C2=A0{
- r= eturn patternCompare(zPattern, zStr, &likeInfoNorm, esc);
+ return sql_utf8_pattern_compare(zPatt= ern, zStr, &likeInfoNorm, esc);
=C2=A0}
=C2=A0
=C2=A0/*
@@ -910,8 +978,9 @@ likeFunc(sqlite3_context * con= text, int argc, sqlite3_value ** argv)
=C2=A0 zB =3D (const char *) sqlite3_value_text(argv[0]);
=C2=A0 zA =3D (const char *) s= qlite3_value_text(argv[1]);
=C2=A0
- /* Limit the length of the LIKE or GLOB pattern to avo= id problems
- * of deep r= ecursion and N*N behavior in patternCompare().
+ /* Limit the length of the LIKE or GLOB pattern to a= void
+ * problems of deep= recursion and N*N behavior in
+ = * sql_utf8_pattern_compare().
=C2=A0 */
=C2=A0 nPat =3D sqlite3_value_bytes(argv[0]);
=C2=A0 testcase(nPat =3D=3D db->aLimit[SQLITE_LIMIT_LIK= E_PATTERN_LENGTH]);
@@ -947,7 +1016,12 @@ likeFunc(sqlite3_contex= t * context, int argc, sqlite3_value ** argv)
=C2=A0 sqlite3_like_count++;
=C2=A0#endif
=C2=A0 int res;
-= res =3D patternCompare(zB, zA, pInf= o, escape);
+ res =3D sql_= utf8_pattern_compare(zB, zA, pInfo, escape);
+ if (res =3D=3D SQL_PROHIBITED_PATTERN) {
+ sqlite3_result_error(context, "= LIKE or GLOB pattern can only"
+ =C2=A0 =C2=A0 =C2=A0" contain UTF-8 characters", -= 1);
+ return;
+= }
=C2=A0 sqlite3_result_int(context, res =3D=3D SQLITE_MATCH)= ;
=C2=A0}
=C2=A0
diff --git a/test/sql-tap/e_= expr.test.lua b/test/sql-tap/e_expr.test.lua
index 13d3a96..9780d= 2c 100755
--- a/test/sql-tap/e_expr.test.lua
+++ b/test= /sql-tap/e_expr.test.lua
@@ -1,6 +1,6 @@
=C2=A0#!/usr/b= in/env tarantool
=C2=A0test =3D require("sqltester")
-test:plan(12431)
+test:plan(10665)
=C2=A0
=C2=A0--!./tcltestrunner.lua
=C2=A0-- 2010 July 16
@@ -77,8 +77,10 @@ local operations =3D {
=C2=A0 =C2=A0 =C2=A0{= "<>", "ne1"},
=C2=A0 =C2=A0 =C2=A0{&quo= t;!=3D", "ne2"},
=C2=A0 =C2=A0 =C2=A0{"IS&quo= t;, "is"},
-=C2=A0 =C2=A0 {"LIKE", "like= "},
-=C2=A0 =C2=A0 {"GLOB", "glob"},
+-- NOTE: This test needs refactoring after deletion of GLOB &
+-- type restrictions for = LIKE. (See #3572)
+--=C2=A0 =C2=A0 {"LIKE", "like&= quot;},
+--=C2=A0 =C2=A0 {"GLOB", "glob"},
=C2=A0 =C2=A0 =C2=A0{"AND", "and"},
= =C2=A0 =C2=A0 =C2=A0{"OR", "or"},
=C2=A0 =C2= =A0 =C2=A0{"MATCH", "match"},
@@ -96,7 +98,12= @@ operations =3D {
=C2=A0 =C2=A0 =C2=A0{"+", "-&= quot;},
=C2=A0 =C2=A0 =C2=A0{"<<", ">>= ", "&", "|"},
=C2=A0 =C2=A0 =C2=A0{&= quot;<", "<=3D", ">", ">=3D"= },
-=C2=A0 =C2=A0 {"=3D", "=3D=3D", "!= =3D", "<>", "LIKE", "GLOB"}, --&qu= ot;MATCH", "REGEXP"},
+-- NOTE: This test needs re= factoring after deletion of GLOB &
+-- type restrictions for LIKE. (See #3572)
+-- Ano= ther NOTE: MATCH & REGEXP aren't supported in Tarantool &
=
+-- are waiting for their ho= ur, don't confuse them
+-- <= /span> being commented with ticket above.
+=C2=A0 =C2=A0 {"= =3D", "=3D=3D", "!=3D", "<>"}, --&= quot;LIKE", "GLOB"}, --"MATCH", "REGEXP"= },
=C2=A0 =C2=A0 =C2=A0{"AND"},
=C2=A0 =C2=A0= =C2=A0{"OR"},
=C2=A0}
@@ -475,6 +482,7 @@ fo= r _, op in ipairs(oplist) do
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0en= d
=C2=A0 =C2=A0 =C2=A0end
=C2=A0end
+
=C2=A0------------------------------------------------------------------= ---------
=C2=A0-- Test the IS and IS NOT operators.
= =C2=A0--
diff --git a/test/sql-tap/gh-3251-string-pattern-compari= son.test.lua b/test/sql-tap/gh-3251-string-pattern-comparison.test.lua
new file mode 100755
index 0000000..2a787f2
--- /= dev/null
+++ b/test/sql-tap/gh-3251-string-pattern-comparison.tes= t.lua
@@ -0,0 +1,213 @@
+#!/usr/bin/env tarantool
=
+test =3D require("sqltester")
+test:plan(128)
+
+local prefix =3D "like-test-"
+
+-- Unicode byte sequences.
+local valid_testcases =3D {
+=C2=A0 =C2=A0 '\x01',
+=C2=A0 =C2=A0 '\x09&= #39;,
+=C2=A0 =C2=A0 '\x1F',
+=C2=A0 =C2=A0 = 9;\x7F',
+=C2=A0 =C2=A0 '\xC2\x80',
+=C2=A0= =C2=A0 '\xC2\x90',
+=C2=A0 =C2=A0 '\xC2\x9F',
+=C2=A0 =C2=A0 '\xE2\x80\xA8',
+=C2=A0 =C2=A0 = 9;\x20\x0B',
+=C2=A0 =C2=A0 '\xE2\x80\xA9',
+}
+
+-- Non-Unicode byte sequences.
+local = invalid_testcases =3D {
+=C2=A0 =C2=A0 '\xE2\x80',
<= div>+=C2=A0 =C2=A0 '\xFE\xFF',
+=C2=A0 =C2=A0 '\xC2&#= 39;,
+=C2=A0 =C2=A0 '\xED\xB0\x80',
+=C2=A0 =C2= =A0 '\xD0',
+}
+
+local like_test_cas= es =3D
+{
+=C2=A0 =C2=A0 {"1.1",
+= =C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE '_B';&qu= ot;,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 = =C2=A0 {"1.2",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELEC= T 'CD' LIKE '_B';",
+=C2=A0 =C2=A0 =C2=A0 = =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.3",
+= =C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT '' LIKE '_B';"= ;,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2= =A0 {"1.4",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT &= #39;AB' LIKE '%B';",
+=C2=A0 =C2=A0 =C2=A0 =C2= =A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.5",
+=C2= =A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CD' LIKE '%B';"= ,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2= =A0 {"1.6",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT &= #39;' LIKE '%B';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 = {0, {0}} },
+=C2=A0 =C2=A0 {"1.7",
+=C2=A0 = =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE 'A__';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 = {"1.8",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT '= CD' LIKE 'A__';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {= 0, {0}} },
+=C2=A0 =C2=A0 {"1.9",
+=C2=A0 =C2= =A0 =C2=A0 =C2=A0 "SELECT '' LIKE 'A__';",
<= div>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {&quo= t;1.10",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB&#= 39; LIKE 'A_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1= }} },
+=C2=A0 =C2=A0 {"1.11",
+=C2=A0 =C2=A0 = =C2=A0 =C2=A0 "SELECT 'CD' LIKE 'A_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1= .12",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT '' L= IKE 'A_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },=
+=C2=A0 =C2=A0 {"1.13",
+=C2=A0 =C2=A0 =C2= =A0 =C2=A0 "SELECT 'AB' LIKE 'A';",
+= =C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.1= 4",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CD' L= IKE 'A';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },<= /div>
+=C2=A0 =C2=A0 {"1.15",
+=C2=A0 =C2=A0 =C2=A0= =C2=A0 "SELECT '' LIKE 'A';",
+=C2=A0 = =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.16"= ,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE = 9;_';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.17",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0= "SELECT 'CD' LIKE '_';",
+=C2=A0 =C2= =A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.18",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT '' LIKE '_&#= 39;;",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+= =C2=A0 =C2=A0 {"1.19",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 &qu= ot;SELECT 'AB' LIKE '__';",
+=C2=A0 =C2=A0 = =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.20",
=
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CD' LIKE '__= 9;;",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+= =C2=A0 =C2=A0 {"1.21",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 &qu= ot;SELECT '' LIKE '__';",
+=C2=A0 =C2=A0 =C2= =A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.22",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'AB' LIKE '%A';= ",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2= =A0 =C2=A0 {"1.23",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "= SELECT 'AB' LIKE '%C';",
+=C2=A0 =C2=A0 =C2= =A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.24",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'ab' LIKE '%df'= ;",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2= =A0 =C2=A0 {"1.25",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "= SELECT 'abCDF' LIKE '%df';",
+=C2=A0 =C2=A0 = =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.26",
=
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CDF' LIKE '%df&= #39;;",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+= =C2=A0 =C2=A0 {"1.27",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 &qu= ot;SELECT 'ab' LIKE 'a_';",
+=C2=A0 =C2=A0 = =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.28",
=
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'abCDF' LIKE 'a_= ';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
= +=C2=A0 =C2=A0 {"1.29",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 &q= uot;SELECT 'CDF' LIKE 'a_';",
+=C2=A0 =C2=A0= =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.30",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'ab' LIKE 'ab%&= #39;;",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+= =C2=A0 =C2=A0 {"1.31",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 &qu= ot;SELECT 'abCDF' LIKE 'ab%';",
+=C2=A0 =C2= =A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.32",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CDF' LIKE '= ab%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.33",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0= "SELECT 'ab' LIKE 'abC%';",
+=C2=A0 = =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.34"= ,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'abCDF' LIKE = 'abC%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"1.35",
+=C2=A0 =C2=A0 =C2=A0 = =C2=A0 "SELECT 'CDF' LIKE 'abC%';",
+= =C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0}} },
+=C2=A0 =C2=A0 {"1.3= 6",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'ab' L= IKE 'a_%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} }= ,
+=C2=A0 =C2=A0 {"1.37",
+=C2=A0 =C2=A0 =C2= =A0 =C2=A0 "SELECT 'abCDF' LIKE 'a_%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {1}} },
+=C2=A0 =C2=A0 {"= 1.38",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "SELECT 'CDF= 9; LIKE 'a_%';",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 {0, {0= }} },
+}
+
+test:do_catchsql_set_test(like_te= st_cases, prefix)
+
+-- Invalid testcases.
+f= or i, tested_string in ipairs(invalid_testcases) do
+
+= =C2=A0 =C2=A0 -- We should raise an error in case
+=C2=A0 =C2=A0 = -- pattern contains invalid characters.
+
+=C2=A0 =C2= =A0 local test_name =3D prefix .. "2." .. tostring(i)
+= =C2=A0 =C2=A0 local test_itself =3D "SELECT 'abc' LIKE 'ab= " .. tested_string .. "';"
+=C2=A0 =C2=A0 test= :do_catchsql_test(test_name, test_itself,
+=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {1, &= quot;LIKE or GLOB pattern can only contain UTF-8 characters"})
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "3." .. tos= tring(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'abc'= ; LIKE 'abc" .. tested_string .. "';"
+=C2= =A0 =C2=A0 test:do_catchsql_test(test_name, test_itself,
+=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 {1, "LIKE or GLOB pattern can only contain UTF-8 characters= "})
+
+=C2=A0 =C2=A0 test_name =3D prefix .. "= ;4." .. tostring(i)
+=C2=A0 =C2=A0 test_itself =3D "SEL= ECT 'abc' LIKE 'ab" .. tested_string .. "c';"= ;
+=C2=A0 =C2=A0 test:do_catchsql_test(test_name, test_itself,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 {1, "LIKE or GLOB pattern can only contain UT= F-8 characters"})
+
+=C2=A0 =C2=A0 -- Just skippin= g if row value predicand contains invalid character.
+
= +=C2=A0 =C2=A0 test_name =3D prefix .. "5." .. tostring(i)
<= div>+=C2=A0 =C2=A0 test_itself =3D "SELECT 'ab" .. tested_str= ing .. "' LIKE 'abc';"
+=C2=A0 =C2=A0 test:= do_execsql_test(test_name, test_itself, {0})
+
+=C2=A0 = =C2=A0 test_name =3D prefix .. "6." .. tostring(i)
+=C2= =A0 =C2=A0 test_itself =3D "SELECT 'abc" .. tested_string .. = "' LIKE 'abc';"
+=C2=A0 =C2=A0 test:do_exec= sql_test(test_name, test_itself, {0})
+
+=C2=A0 =C2=A0 = test_name =3D prefix .. "7." .. tostring(i)
+=C2=A0 =C2= =A0 test_itself =3D "SELECT 'ab" .. tested_string .. "c&= #39; LIKE 'abc';"
+=C2=A0 =C2=A0 test:do_execsql_tes= t(test_name, test_itself, {0})
+end
+
+-- Val= id testcases.
+for i, tested_string in ipairs(valid_testcases) do=
+=C2=A0 =C2=A0 test_name =3D prefix .. "8." .. tostrin= g(i)
+=C2=A0 =C2=A0 local test_itself =3D "SELECT 'abc&#= 39; LIKE 'ab" .. tested_string .. "';"
+= =C2=A0 =C2=A0 test:do_execsql_test(test_name, test_itself, {0})
+=
+=C2=A0 =C2=A0 test_name =3D prefix .. "9." .. tostrin= g(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'abc' LI= KE 'abc" .. tested_string .. "';"
+=C2=A0 = =C2=A0 test:do_execsql_test(test_name, test_itself, {0})
+
<= div>+=C2=A0 =C2=A0 test_name =3D prefix .. "10." .. tostring(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'abc' LIKE = 9;ab" .. tested_string .. "c';"
+=C2=A0 =C2=A0= test:do_execsql_test(test_name, tes= t_itself, {0})
+
+=C2=A0 =C2=A0 test_name =3D prefix ..= "11." .. tostring(i)
+=C2=A0 =C2=A0 test_itself =3D &q= uot;SELECT 'ab" .. tested_string .. "' LIKE 'abc'= ;"
+=C2=A0 =C2=A0 test:do_execsql_test(test_name, test_itself, {0})
+
+=C2= =A0 =C2=A0 test_name =3D prefix .. "12." .. tostring(i)
+=C2=A0 =C2=A0 test_itself =3D "SELECT 'abc" .. tested_strin= g .. "' LIKE 'abc';"
+=C2=A0 =C2=A0 test:do= _execsql_test(test_name, test_itself, {0})
+
+=C2=A0 = =C2=A0 test_name =3D prefix .. "13." .. tostring(i)
+= =C2=A0 =C2=A0 test_itself =3D "SELECT 'ab" .. tested_string .= . "c' LIKE 'abc';"
+=C2=A0 =C2=A0 test:do_e= xecsql_test(test_name, test_itself, {0})
+end
+
+test:finish_test()

--000000000000bf266d0572639c61--