From: Alexander Turenko <alexander.turenko@tarantool.org> To: Nikita Tatunov <hollow653@gmail.com> Cc: tarantool-patches@freelists.org Subject: [tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue Date: Fri, 27 Jul 2018 16:06:01 +0300 [thread overview] Message-ID: <20180727130601.b2oby7dleapd5upg@tkn_work_nb> (raw) In-Reply-To: <CAEi+_aoW02p5fUH=HZEPoSa2ebMa9meT=Z6NB73kXCLmzqQ7xw@mail.gmail.com> Hi, See comments below. Don't sure we should return 'no match' situation when a left hand expression in not correct unicode string. How other DBs handle that? WBR, Alexander Turenko. > sql: LIKE & GLOB pattern comparison issue > > Currently function that compares pattern and string for GLOB & LIKE > operators doesn't work properly. It uses ICU reading function which > perhaps was working differently before and the implementation for the > comparison ending isn't paying attention to some special cases, hence > in those cases it works improperly. > It used non-ICU function before, consider commit 198d59ce. > With the patch applied an error will be returned in case there's > invalid UTF-8 symbol in pattern & pattern will not be matched > with the string that contains it. Some minor corrections to function > were made as well. > Nitpicking: 'error will be returned' is third situatuon, other then 'matched' and 'not matched'. > Сloses #3251 > Сloses #3334 > Part of #3572 > > 2. The thing you test is not related to a table and other columns. > > Please, convert the tests to the next format: {[[select '' like > > '_B';]], {1}]]}. > > To make it more readable, you can do it like `like_testcases` in > > `sql-tap/collation.test.lua`. > > > > Done. You are still perform comparisons with table columns. > -/* > - * For LIKE and GLOB matching on EBCDIC machines, assume that every > - * character is exactly one byte in size. Also, provde the Utf8Read() > - * macro for fast reading of the next character in the common case where > - * the next character is ASCII. > +/** > + * Providing there are symbols in string s this > + * macro returns UTF-8 code of character and > + * promotes pointer to the next symbol in the string. > + * Otherwise return code is SQL_END_OF_STRING. > */ > -#define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, &status) > +#define Utf8Read(s, e) (((s) < (e)) ?\ > + ucnv_getNextUChar(pUtf8conv, &(s), (e), &(status)) : 0) I suggest to add a whitespace before backslash to make it looks better. > @@ -643,51 +647,61 @@ static const struct compareInfo likeInfoAlt = { '%', > '_', 0, 0 }; > #define SQLITE_MATCH 0 > #define SQLITE_NOMATCH 1 > #define SQLITE_NOWILDCARDMATCH 2 > +#define SQL_PROHIBITED_PATTERN 3 Update comment before that has a reference to 'patternMatch' function. > * > * Globbing rules: > * > - * '*' Matches any sequence of zero or more characters. > + * '*' Matches any sequence of zero or more characters. Why? > * > - * '?' Matches exactly one character. > + * '?' Matches exactly one character. > * > - * [...] Matches one character from the enclosed list of > - * characters. > + * [...] Matches one character from the enclosed list of > + * characters. > * > - * [^...] Matches one character not in the enclosed list. > + * [^...] Matches one character not in the enclosed list. > * The same question. > + * @retval SQLITE_MATCH: Match. > + * SQLITE_NOMATCH: No match. > + * SQLITE_NOWILDCARDMATCH: No match in spite of having * > + * or % wildcards. > + * SQL_PROHIBITED_PATTERN Pattern contains invalid > + * symbol. Nitpicking: no colon after SQL_PROHIBITED_PATTERN. > */ > static int > -patternCompare(const char * pattern, /* The glob pattern */ > - const char * string, /* The string to compare against the glob */ > - const struct compareInfo *pInfo, /* Information about how to do > the compare */ > - UChar32 matchOther /* The escape char (LIKE) or '[' (GLOB) */ > - ) > +sql_utf8_pattern_compare(const char * pattern, > + const char * string, > + const struct compareInfo *pInfo, > + UChar32 matchOther) Don't get why indent is tabulation + 7 spaces. The function itself looks good except lines longer then 80 columns. > @@ -853,8 +903,7 @@ patternCompare(const char * pattern, /* The glob > pattern */ > int > sqlite3_strglob(const char *zGlobPattern, const char *zString) > { > - return patternCompare(zGlobPattern, zString, &globInfo, > - '['); > + return sql_utf8_pattern_compare(zGlobPattern, zString, &globInfo, '['); > } > > /* > @@ -864,7 +913,7 @@ sqlite3_strglob(const char *zGlobPattern, const char > *zString) > int > sqlite3_strlike(const char *zPattern, const char *zStr, unsigned int esc) > { > - return patternCompare(zPattern, zStr, &likeInfoNorm, esc); > + return sql_utf8_pattern_compare(zPattern, zStr, &likeInfoNorm, esc); > } > Should we add sqlite3_result_error is case of SQL_PROHIBITED_PATTERN for these two functions? > - {"LIKE", "like"}, > - {"GLOB", "glob"}, > +-- NOTE: This test needs refactoring after deletion of GLOB & > +-- type restrictions for LIKE. > +-- {"LIKE", "like"}, > +-- {"GLOB", "glob"}, I would add related issue number. > {"AND", "and"}, > {"OR", "or"}, > {"MATCH", "match"}, > @@ -96,7 +98,7 @@ operations = { > {"+", "-"}, > {"<<", ">>", "&", "|"}, > {"<", "<=", ">", ">="}, > - {"=", "==", "!=", "<>", "LIKE", "GLOB"}, --"MATCH", "REGEXP"}, > + {"=", "==", "!=", "<>"}, --"LIKE", "GLOB"}, "MATCH", "REGEXP"}, Leave comment before MATCH and REGEXP to avoid surprises after removing your comment. > --- /dev/null > +++ b/test/sql-tap/gh-3251-string-pattern-comparison.lua > @@ -0,0 +1,238 @@ > +#!/usr/bin/env tarantool > +test = require("sqltester") > +test:plan(106) > + > +local prefix = "like-test-" > + > +-- Unciode byte sequences. > + 1. Typo: unicode. 2. Extra empty line. > +-- Invalid testcases. > + > +for i, tested_string in ipairs(invalid_testcases) do > + local test_name = prefix .. "2." .. tostring(i) > + local test_itself = "SELECT 'abc' LIKE 'ab" .. tested_string .. "';" > + > +-- We should raise an error if pattern contains invalid characters. > + 1. Trailing whitespaces. 2. Why comments are not indented as the other code? 3. test_name and test_itself before the section comment looks strange for me here.
next prev parent reply other threads:[~2018-07-27 13:06 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-06-28 12:47 [tarantool-patches] " N.Tatunov 2018-06-28 12:54 ` [tarantool-patches] " Hollow111 2018-07-18 2:43 ` Alexander Turenko 2018-07-18 5:51 ` Alex Khatskevich 2018-07-18 15:24 ` Nikita Tatunov 2018-07-18 15:53 ` Alex Khatskevich 2018-07-18 15:57 ` Nikita Tatunov 2018-07-18 17:10 ` Alexander Turenko 2018-07-19 11:14 ` Nikita Tatunov 2018-07-19 11:56 ` Alex Khatskevich 2018-07-27 11:28 ` Nikita Tatunov 2018-07-27 13:06 ` Alexander Turenko [this message] 2018-07-27 19:11 ` Nikita Tatunov 2018-07-27 20:22 ` Alexander Turenko 2018-07-31 13:27 ` Nikita Tatunov 2018-07-31 13:47 ` Alexander Turenko 2018-08-01 10:35 ` Nikita Tatunov 2018-08-01 10:51 ` Nikita Tatunov 2018-08-01 13:56 ` Alex Khatskevich 2018-08-01 18:10 ` Nikita Tatunov 2018-08-01 18:14 ` Nikita Tatunov 2018-08-08 12:38 ` Alex Khatskevich
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180727130601.b2oby7dleapd5upg@tkn_work_nb \ --to=alexander.turenko@tarantool.org \ --cc=hollow653@gmail.com \ --cc=tarantool-patches@freelists.org \ --subject='[tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox