i Nikita!
I don't have objections against this patch in general, just several
minor comments and side notes.
Alexey, as I see you are familiar with the code. Can you give a review
for the patch?
WBR, Alexander Turenko.
On Thu, Jun 28, 2018 at 03:47:16PM +0300, N.Tatunov wrote:
> Currently function that compares pattern and
> string for GLOB & LIKE operators doesn't work properly.
> It uses ICU reading function which perhaps was working
> differently before and the implementation for the
> comparison ending isn't paying attention to some
> special cases, hence in those cases it works improperly.
> Now the checks for comparison should work fine.
>
> Сloses: #3251
> Сloses: #3334
Use 'Closes #xxxx' form. Don't sure the form with a colon will work.
Fixed it.
Utf8Read macro comment now completely irrelevant to its definition.
Compare to sqlite3:
584 /*
585 ** For LIKE and GLOB matching on EBCDIC machines, assume that every
586 ** character is exactly one byte in size. Also, provde the Utf8Read()
587 ** macro for fast reading of the next character in the common case where
588 ** the next character is ASCII.
589 */
590 #if defined(SQLITE_EBCDIC)
591 # define sqlite3Utf8Read(A) (*((*A)++))
592 # define Utf8Read(A) (*(A++))
593 #else
594 # define Utf8Read(A) (A[0]<0x80?*(A++):sqlite3Utf8Read(&A))
595 #endif
Yeah, you're right. Fixed it.
> ---
> src/box/sql/func.c | 25 ++++----
> test/sql-tap/like1.test.lua | 152 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 165 insertions(+), 12 deletions(-)
> create mode 100755 test/sql-tap/like1.test.lua
>
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index c06e3bd..dcbd7e0 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -643,6 +643,7 @@ static const struct compareInfo likeInfoAlt = { '%', '_', 0, 0 };
> #define SQLITE_MATCH 0
> #define SQLITE_NOMATCH 1
> #define SQLITE_NOWILDCARDMATCH 2
> +#define SQL_NO_SYMBOLS_LEFT 65535
>
0xffff looks better.
Isn't actual with the fix.
I would use name like ICU_ERROR. By the way, it can mean not only end of
a string, but also other errors. It is okay to give 'not matched' in the
case, I think.
Hmm, we can misinterpret some error and give 'matched' result, I
guess... maybe we really should check start < end in the macro to
distinguish this cases. Don't sure about the test case. As I see from
ICU source code it can be some internal buffer overflow error. We can do
the check like so:
-#define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, &status)
+#define Utf8Read(s, e) (((s) < (e)) ? ucnv_getNextUChar(pUtf8conv, &(s), (e), \
+ &(status)) : 0)
Yes, nice idea, I wouldn't guess it, thank you for advice.
Therefore changed SQL_NO_SYMBOLS_LEFT to SQL_END_OF_STRING,
I think it is more understandable.
-#define SQL_NO_SYMBOLS_LEFT 65535
+#define SQL_NO_SYMBOLS_LEFT 0
> /*
> * Compare two UTF-8 strings for equality where the first string is
> @@ -698,29 +699,28 @@ patternCompare(const char * pattern, /* The glob pattern */
> const char * string_end = string + strlen(string);
> UErrorCode status = U_ZERO_ERROR;
>
> - while (pattern < pattern_end){
> - c = Utf8Read(pattern, pattern_end);
> + while ((c = Utf8Read(pattern, pattern_end)) != SQL_NO_SYMBOLS_LEFT) {
Here and everywhere below we lean on the fact that ucnv_getNextUChar
compares pointers and that is so:
1860 s=*source;
1861 if(sourceLimit<s) {
1862 *err=U_ILLEGAL_ARGUMENT_ERROR;
1863 return 0xffff;
1864 }
By the way, the patch reverts partially implemented approach to
preliminary check start < end introduced in 198d59ce93. I think it is
okay too, because some checks were missed and now they exist again.
> diff --git a/test/sql-tap/like1.test.lua b/test/sql-tap/like1.test.lua
> new file mode 100755
> index 0000000..42b4d43
> --- /dev/null
> +++ b/test/sql-tap/like1.test.lua
> @@ -0,0 +1,152 @@
> +#!/usr/bin/env tarantool
> +test = require("sqltester")
> +test:plan(13)
> +
Maybe case with % at the end? I tried, that works, but it would be good
to have a regression test.
Yeah, sure.