From: Alexander Turenko <alexander.turenko@tarantool.org>
To: Nikita Tatunov <hollow653@gmail.com>
Cc: tarantool-patches@freelists.org
Subject: [tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue
Date: Fri, 27 Jul 2018 16:06:01 +0300 [thread overview]
Message-ID: <20180727130601.b2oby7dleapd5upg@tkn_work_nb> (raw)
In-Reply-To: <CAEi+_aoW02p5fUH=HZEPoSa2ebMa9meT=Z6NB73kXCLmzqQ7xw@mail.gmail.com>
Hi,
See comments below.
Don't sure we should return 'no match' situation when a left hand
expression in not correct unicode string. How other DBs handle that?
WBR, Alexander Turenko.
> sql: LIKE & GLOB pattern comparison issue
>
> Currently function that compares pattern and string for GLOB & LIKE
> operators doesn't work properly. It uses ICU reading function which
> perhaps was working differently before and the implementation for the
> comparison ending isn't paying attention to some special cases, hence
> in those cases it works improperly.
>
It used non-ICU function before, consider commit 198d59ce.
> With the patch applied an error will be returned in case there's
> invalid UTF-8 symbol in pattern & pattern will not be matched
> with the string that contains it. Some minor corrections to function
> were made as well.
>
Nitpicking: 'error will be returned' is third situatuon, other then
'matched' and 'not matched'.
> Сloses #3251
> Сloses #3334
> Part of #3572
> > 2. The thing you test is not related to a table and other columns.
> > Please, convert the tests to the next format: {[[select '' like
> > '_B';]], {1}]]}.
> > To make it more readable, you can do it like `like_testcases` in
> > `sql-tap/collation.test.lua`.
> >
>
> Done.
You are still perform comparisons with table columns.
> -/*
> - * For LIKE and GLOB matching on EBCDIC machines, assume that every
> - * character is exactly one byte in size. Also, provde the Utf8Read()
> - * macro for fast reading of the next character in the common case where
> - * the next character is ASCII.
> +/**
> + * Providing there are symbols in string s this
> + * macro returns UTF-8 code of character and
> + * promotes pointer to the next symbol in the string.
> + * Otherwise return code is SQL_END_OF_STRING.
> */
> -#define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, &status)
> +#define Utf8Read(s, e) (((s) < (e)) ?\
> + ucnv_getNextUChar(pUtf8conv, &(s), (e), &(status)) : 0)
I suggest to add a whitespace before backslash to make it looks better.
> @@ -643,51 +647,61 @@ static const struct compareInfo likeInfoAlt = { '%',
> '_', 0, 0 };
> #define SQLITE_MATCH 0
> #define SQLITE_NOMATCH 1
> #define SQLITE_NOWILDCARDMATCH 2
> +#define SQL_PROHIBITED_PATTERN 3
Update comment before that has a reference to 'patternMatch' function.
> *
> * Globbing rules:
> *
> - * '*' Matches any sequence of zero or more characters.
> + * '*' Matches any sequence of zero or more characters.
Why?
> *
> - * '?' Matches exactly one character.
> + * '?' Matches exactly one character.
> *
> - * [...] Matches one character from the enclosed list of
> - * characters.
> + * [...] Matches one character from the enclosed list of
> + * characters.
> *
> - * [^...] Matches one character not in the enclosed list.
> + * [^...] Matches one character not in the enclosed list.
> *
The same question.
> + * @retval SQLITE_MATCH: Match.
> + * SQLITE_NOMATCH: No match.
> + * SQLITE_NOWILDCARDMATCH: No match in spite of having *
> + * or % wildcards.
> + * SQL_PROHIBITED_PATTERN Pattern contains invalid
> + * symbol.
Nitpicking: no colon after SQL_PROHIBITED_PATTERN.
> */
> static int
> -patternCompare(const char * pattern, /* The glob pattern */
> - const char * string, /* The string to compare against the glob */
> - const struct compareInfo *pInfo, /* Information about how to do
> the compare */
> - UChar32 matchOther /* The escape char (LIKE) or '[' (GLOB) */
> - )
> +sql_utf8_pattern_compare(const char * pattern,
> + const char * string,
> + const struct compareInfo *pInfo,
> + UChar32 matchOther)
Don't get why indent is tabulation + 7 spaces.
The function itself looks good except lines longer then 80 columns.
> @@ -853,8 +903,7 @@ patternCompare(const char * pattern, /* The glob
> pattern */
> int
> sqlite3_strglob(const char *zGlobPattern, const char *zString)
> {
> - return patternCompare(zGlobPattern, zString, &globInfo,
> - '[');
> + return sql_utf8_pattern_compare(zGlobPattern, zString, &globInfo, '[');
> }
>
> /*
> @@ -864,7 +913,7 @@ sqlite3_strglob(const char *zGlobPattern, const char
> *zString)
> int
> sqlite3_strlike(const char *zPattern, const char *zStr, unsigned int esc)
> {
> - return patternCompare(zPattern, zStr, &likeInfoNorm, esc);
> + return sql_utf8_pattern_compare(zPattern, zStr, &likeInfoNorm, esc);
> }
>
Should we add sqlite3_result_error is case of SQL_PROHIBITED_PATTERN for
these two functions?
> - {"LIKE", "like"},
> - {"GLOB", "glob"},
> +-- NOTE: This test needs refactoring after deletion of GLOB &
> +-- type restrictions for LIKE.
> +-- {"LIKE", "like"},
> +-- {"GLOB", "glob"},
I would add related issue number.
> {"AND", "and"},
> {"OR", "or"},
> {"MATCH", "match"},
> @@ -96,7 +98,7 @@ operations = {
> {"+", "-"},
> {"<<", ">>", "&", "|"},
> {"<", "<=", ">", ">="},
> - {"=", "==", "!=", "<>", "LIKE", "GLOB"}, --"MATCH", "REGEXP"},
> + {"=", "==", "!=", "<>"}, --"LIKE", "GLOB"}, "MATCH", "REGEXP"},
Leave comment before MATCH and REGEXP to avoid surprises after removing
your comment.
> --- /dev/null
> +++ b/test/sql-tap/gh-3251-string-pattern-comparison.lua
> @@ -0,0 +1,238 @@
> +#!/usr/bin/env tarantool
> +test = require("sqltester")
> +test:plan(106)
> +
> +local prefix = "like-test-"
> +
> +-- Unciode byte sequences.
> +
1. Typo: unicode.
2. Extra empty line.
> +-- Invalid testcases.
> +
> +for i, tested_string in ipairs(invalid_testcases) do
> + local test_name = prefix .. "2." .. tostring(i)
> + local test_itself = "SELECT 'abc' LIKE 'ab" .. tested_string .. "';"
> +
> +-- We should raise an error if pattern contains invalid characters.
> +
1. Trailing whitespaces.
2. Why comments are not indented as the other code?
3. test_name and test_itself before the section comment looks strange
for me here.
next prev parent reply other threads:[~2018-07-27 13:06 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-28 12:47 [tarantool-patches] " N.Tatunov
2018-06-28 12:54 ` [tarantool-patches] " Hollow111
2018-07-18 2:43 ` Alexander Turenko
2018-07-18 5:51 ` Alex Khatskevich
2018-07-18 15:24 ` Nikita Tatunov
2018-07-18 15:53 ` Alex Khatskevich
2018-07-18 15:57 ` Nikita Tatunov
2018-07-18 17:10 ` Alexander Turenko
2018-07-19 11:14 ` Nikita Tatunov
2018-07-19 11:56 ` Alex Khatskevich
2018-07-27 11:28 ` Nikita Tatunov
2018-07-27 13:06 ` Alexander Turenko [this message]
2018-07-27 19:11 ` Nikita Tatunov
2018-07-27 20:22 ` Alexander Turenko
2018-07-31 13:27 ` Nikita Tatunov
2018-07-31 13:47 ` Alexander Turenko
2018-08-01 10:35 ` Nikita Tatunov
2018-08-01 10:51 ` Nikita Tatunov
2018-08-01 13:56 ` Alex Khatskevich
2018-08-01 18:10 ` Nikita Tatunov
2018-08-01 18:14 ` Nikita Tatunov
2018-08-08 12:38 ` Alex Khatskevich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180727130601.b2oby7dleapd5upg@tkn_work_nb \
--to=alexander.turenko@tarantool.org \
--cc=hollow653@gmail.com \
--cc=tarantool-patches@freelists.org \
--subject='[tarantool-patches] Re: [PATCH] sql: LIKE & GLOB pattern comparison issue' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox