* [tarantool-patches] [PATCH] sql: LIKE/LENGTH process '\0' @ 2019-01-29 9:56 Ivan Koptelov 2019-01-29 16:35 ` [tarantool-patches] " n.pettik 2019-03-04 15:30 ` Kirill Yukhin 0 siblings, 2 replies; 22+ messages in thread From: Ivan Koptelov @ 2019-01-29 9:56 UTC (permalink / raw) To: tarantool-patches [-- Attachment #1: Type: text/html, Size: 8208 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-01-29 9:56 [tarantool-patches] [PATCH] sql: LIKE/LENGTH process '\0' Ivan Koptelov @ 2019-01-29 16:35 ` n.pettik 2019-02-04 12:34 ` Ivan Koptelov 2019-03-04 15:30 ` Kirill Yukhin 1 sibling, 1 reply; 22+ messages in thread From: n.pettik @ 2019-01-29 16:35 UTC (permalink / raw) To: tarantool-patches; +Cc: Ivan Koptelov > Fixes LIKE and LENGTH functions. '\0' now treated as Nit: is treated. > a usual symbol. Strings with '\0' are now processed > entirely. Consider examples: > > LENGTH(CHAR(65,00,65)) == 3 > LIKE(CHAR(65,00,65), CHAR(65,00,66)) == False Also, I see that smth wrong with text in this mail again. > > Closes #3542 Please, check other functions working with strings. For instance, now I see that replace ignores \0 as well: tarantool> box.sql.execute("select replace(CHAR(65,00,66), char(0), char(1))") --- - - ["A\0B"] ... > @@ -150,9 +150,13 @@ lengthFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > const unsigned char *z = sqlite3_value_text(argv[0]); > if (z == 0) > return; > + Nit: don’t abuse empty lines. > len = 0; > - while (*z) { > + size_t byte_len = sqlite3_value_bytes(argv[0]); > + const unsigned char *prev_z; > + for (size_t cnt = 0; cnt < byte_len; cnt += (z - prev_z)) { Out of 80. > diff --git a/test/sql-tap/gh-3542-like-len-null-term.test.lua b/test/sql-tap/gh-3542-like-len-null-term.test.lua > new file mode 100755 > index 000000000..e9ea9ea30 > --- /dev/null > +++ b/test/sql-tap/gh-3542-like-len-null-term.test.lua > @@ -0,0 +1,97 @@ > +#!/usr/bin/env tarantool > +test = require("sqltester") > +test:plan(14) > + > +-- gh-3542 - LIKE/LENGTH do not scan if '\0' is encountered. > +-- This test ensures that LIKE and LENGTH functions does NOT stop > +-- string processing if '\0' is encountered. I’d rather put these tests to sql-tap/func.test.lua ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-01-29 16:35 ` [tarantool-patches] " n.pettik @ 2019-02-04 12:34 ` Ivan Koptelov 2019-02-05 13:50 ` n.pettik 0 siblings, 1 reply; 22+ messages in thread From: Ivan Koptelov @ 2019-02-04 12:34 UTC (permalink / raw) To: n.pettik, tarantool-patches [-- Attachment #1: Type: text/html, Size: 16665 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-04 12:34 ` Ivan Koptelov @ 2019-02-05 13:50 ` n.pettik 2019-02-07 15:14 ` i.koptelov 0 siblings, 1 reply; 22+ messages in thread From: n.pettik @ 2019-02-05 13:50 UTC (permalink / raw) To: tarantool-patches; +Cc: Ivan Koptelov [-- Attachment #1: Type: text/plain, Size: 4616 bytes --] > On 29/01/2019 19:35, n.pettik wrote: >>> Fixes LIKE and LENGTH functions. '\0' now treated as >> Nit: is treated. > Fixed. >>> a usual symbol. Strings with '\0' are now processed >>> entirely. Consider examples: >>> >>> LENGTH(CHAR(65,00,65)) == 3 >>> LIKE(CHAR(65,00,65), CHAR(65,00,66)) == False >> Also, I see that smth wrong with text in this mail again > I hope now the mail text is ok. Not quite. It is still highlighted in some way. Have no idea. > src/box/sql/func.c | 88 +++++++++++++----- > src/box/sql/vdbeInt.h | 2 +- > test/sql-tap/func.test.lua | 220 ++++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 284 insertions(+), 26 deletions(-) > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index e46b162d9..2978af983 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -128,6 +128,30 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) > sqlite3_result_text(context, z, -1, SQLITE_STATIC); > } > > +/** > + * Return number of chars in the given string. > + * > + * Number of chars != byte size of string because some characters > + * are encoded with more than one byte. Also note that all > + * characters from 'str' to 'str + byte_len' would be counted, > + * even if there is a '\0' somewhere between them. > + * @param str String to be counted. > + * @param byte_len Byte length of given string. > + * @return Return what? > + */ > +static int > +count_chars(const unsigned char *str, size_t byte_len) Quite poor naming. I would call it utf8_str_len or smth with utf8 prefix. Mb it is worth to put it some utils source file. Also, consider using native U8_NEXT function from utf8.c, instead of custom SQLITE_SKIP_UTF8. It may be not so fast but safer I suppose. I don't insist though. > +{ What if str is NULL? Add at least an assertion. > + int n_chars = 0; > + const unsigned char *prev_z; > + for (size_t cnt = 0; cnt < byte_len; cnt += (str - prev_z)) { > + n_chars++; > + prev_z = str; > + SQLITE_SKIP_UTF8(str); > + } > + return n_chars; > +} You can rewrite this function in a simpler way without using SQLITE macroses. Read this topic: https://stackoverflow.com/questions/3911536/utf-8-unicode-whats-with-0xc0-and-0x80/3911566#3911566 It is quite useful. You may borrow implementation from there. > + > /* > * Implementation of the length() function > */ > @@ -150,11 +174,7 @@ lengthFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > const unsigned char *z = sqlite3_value_text(argv[0]); > if (z == 0) > return; > - len = 0; > - while (*z) { > - len++; > - SQLITE_SKIP_UTF8(z); > - } > + len = count_chars(z, sqlite3_value_bytes(argv[0])); > sqlite3_result_int(context, len); > break; > } > @@ -340,11 +360,8 @@ substrFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > if (z == 0) > return; > len = 0; > - if (p1 < 0) { > - for (z2 = z; *z2; len++) { > - SQLITE_SKIP_UTF8(z2); > - } > - } > + if (p1 < 0) > + len = count_chars(z, sqlite3_value_bytes(argv[0])); > } > #ifdef SQLITE_SUBSTR_COMPATIBILITY > /* If SUBSTR_COMPATIBILITY is defined then substr(X,0,N) work the same as > @@ -388,12 +405,21 @@ substrFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > } > assert(p1 >= 0 && p2 >= 0); > if (p0type != SQLITE_BLOB) { > - while (*z && p1) { > + /* > + * In the code below 'cnt' and 'n_chars' is > + * used because '\0' is not supposed to be > + * end-of-string symbol. > + */ > + int n_chars = count_chars(z, sqlite3_value_bytes(argv[0])); I’d better call it char_count or symbol_count or char_count. > diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua > index b7de1d955..8c712bd5e 100755 > --- a/test/sql-tap/func.test.lua > +++ b/test/sql-tap/func.test.lua > +-- REPLACE > +test:do_execsql_test( > + "func-62", > + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(00), CHAR(65)) LIKE 'AAAA';", > + {1}) > + > +test:do_execsql_test( > + "func-63", > + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(65), CHAR(00)) \ > + LIKE CHAR(00,00,00,00);", > + {1}) > + > +-- SUBSTR > +test:do_execsql_test( > + "func-64", > + "SELECT SUBSTR(CHAR(65,00,66,67), 3, 2) LIKE CHAR(66, 67);", > + {1}) > + > +test:do_execsql_test( > + "func-65", > + "SELECT SUBSTR(CHAR(00,00,00,65), 1, 4) LIKE CHAR(00,00,00,65);", > + {1}) > + Just wondering: why do you use LIKE function almost in all tests? [-- Attachment #2: Type: text/html, Size: 11216 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-05 13:50 ` n.pettik @ 2019-02-07 15:14 ` i.koptelov 2019-02-11 13:15 ` n.pettik 0 siblings, 1 reply; 22+ messages in thread From: i.koptelov @ 2019-02-07 15:14 UTC (permalink / raw) To: tarantool-patches; +Cc: n.pettik > On 5 Feb 2019, at 16:50, n.pettik <korablev@tarantool.org> wrote: > > >> On 29/01/2019 19:35, n.pettik wrote: >>>> Fixes LIKE and LENGTH functions. '\0' now treated as >>>> >>> Nit: is treated. >> Fixed. >>>> a usual symbol. Strings with '\0' are now processed >>>> entirely. Consider examples: >>>> >>>> LENGTH(CHAR(65,00,65)) == 3 >>>> LIKE(CHAR(65,00,65), CHAR(65,00,66)) == False >>>> >>> Also, I see that smth wrong with text in this mail again >> I hope now the mail text is ok. > > Not quite. It is still highlighted in some way. Have no idea. Oh, I am really sorry. Does it look ok now? (I changed the mailing client) >> src/box/sql/func.c | 88 +++++++++++++----- >> src/box/sql/vdbeInt.h | 2 +- >> test/sql-tap/func.test.lua | 220 ++++++++++++++++++++++++++++++++++++++++++++- >> 3 files changed, 284 insertions(+), 26 deletions(-) >> >> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >> index e46b162d9..2978af983 100644 >> --- a/src/box/sql/func.c >> +++ b/src/box/sql/func.c >> @@ -128,6 +128,30 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) >> sqlite3_result_text(context, z, -1, SQLITE_STATIC); >> } >> >> +/** >> + * Return number of chars in the given string. >> + * >> + * Number of chars != byte size of string because some characters >> + * are encoded with more than one byte. Also note that all >> + * characters from 'str' to 'str + byte_len' would be counted, >> + * even if there is a '\0' somewhere between them. >> + * @param str String to be counted. >> + * @param byte_len Byte length of given string. >> + * @return >> > Return what? Fixed. >> + */ >> +static int >> +count_chars(const unsigned char *str, size_t byte_len) >> > Quite poor naming. I would call it utf8_str_len or > smth with utf8 prefix. Mb it is worth to put it some utils source file. > Also, consider using native U8_NEXT function from utf8.c, > instead of custom SQLITE_SKIP_UTF8. It may be not so fast > but safer I suppose. I don't insist though. >> +{ > What if str is NULL? Add at least an assertion. >> + int n_chars = 0; >> + const unsigned char *prev_z; >> + for (size_t cnt = 0; cnt < byte_len; cnt += (str - prev_z)) { >> + n_chars++; >> + prev_z = str; >> + SQLITE_SKIP_UTF8(str); >> + } >> + return n_chars; >> +} >> > You can rewrite this function in a simpler way without using SQLITE macroses. > Read this topic: https://stackoverflow.com/questions/3911536/utf-8-unicode-whats-with-0xc0-and-0x80/3911566#3911566 > It is quite useful. You may borrow implementation from there. Both usage of U8_NEXT and the solution from stack overflow causes sql-tap/badutf1 test to fail, while func.test (and other tests) are OK. In other words, SQLITE_SKIP_UTF8 and U8_NEXT proceeds Invalid byte sequences differently. As far as I understand, the purpose of the test is to check that malformed UTF-8 would not cause a crash. So, I just used U8_NEXT and fixed the test. (Since the function is used only in func.c, I prefer to keep it there and don’t move to separate utils file) diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 2978af983..8ad75adb8 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -129,27 +129,29 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) } /** - * Return number of chars in the given string. + * Return number of symbols in the given string. * - * Number of chars != byte size of string because some characters + * Number of symbols != byte size of string because some symbols * are encoded with more than one byte. Also note that all - * characters from 'str' to 'str + byte_len' would be counted, + * symbols from 'str' to 'str + byte_len' would be counted, * even if there is a '\0' somewhere between them. * @param str String to be counted. * @param byte_len Byte length of given string. - * @return + * @return number of symbols in the given string. */ static int -count_chars(const unsigned char *str, size_t byte_len) +char_count(const unsigned char *str, size_t byte_len) { - int n_chars = 0; - const unsigned char *prev_z; - for (size_t cnt = 0; cnt < byte_len; cnt += (str - prev_z)) { - n_chars++; - prev_z = str; - SQLITE_SKIP_UTF8(str); + int symbol_len = 0; + int offset = 0; + UChar32 res; + while (offset < (int) byte_len) { + U8_NEXT(str, offset, (int) byte_len, res) + if (res < 0) + break; + symbol_len++; } - return n_chars; + return symbol_len; } /* @@ -174,7 +176,7 @@ lengthFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) const unsigned char *z = sqlite3_value_text(argv[0]); if (z == 0) return; - len = count_chars(z, sqlite3_value_bytes(argv[0])); + len = char_count(z, sqlite3_value_bytes(argv[0])); sqlite3_result_int(context, len); break; } @@ -361,7 +363,7 @@ substrFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) return; len = 0; if (p1 < 0) - len = count_chars(z, sqlite3_value_bytes(argv[0])); + len = char_count(z, sqlite3_value_bytes(argv[0])); } #ifdef SQLITE_SUBSTR_COMPATIBILITY /* If SUBSTR_COMPATIBILITY is defined then substr(X,0,N) work the same as @@ -410,7 +412,7 @@ substrFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) * used because '\0' is not supposed to be * end-of-string symbol. */ - int n_chars = count_chars(z, sqlite3_value_bytes(argv[0])); + int n_chars = char_count(z, sqlite3_value_bytes(argv[0])); int cnt = 0; while (cnt < n_chars && p1) { SQLITE_SKIP_UTF8(z); diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua index 534c762ba..0a90d1b17 100755 --- a/test/sql-tap/badutf1.test.lua +++ b/test/sql-tap/badutf1.test.lua @@ -215,7 +215,7 @@ test:do_test( return test:execsql2("SELECT length('\x80') AS x") end, { -- <badutf-3.1> - "X", 1 + "X", 0 -- </badutf-3.1> }) @@ -235,7 +235,7 @@ test:do_test( return test:execsql2("SELECT length('\x7f\x80\x81') AS x") end, { -- <badutf-3.3> - "X", 3 + "X", 1 -- </badutf-3.3> }) @@ -245,7 +245,7 @@ test:do_test( return test:execsql2("SELECT length('\x61\xc0') AS x") end, { -- <badutf-3.4> - "X", 2 + "X", 1 -- </badutf-3.4> }) @@ -255,7 +255,7 @@ test:do_test( return test:execsql2("SELECT length('\x61\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") end, { -- <badutf-3.5> - "X", 2 + "X", 1 -- </badutf-3.5> }) @@ -265,7 +265,7 @@ test:do_test( return test:execsql2("SELECT length('\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") end, { -- <badutf-3.6> - "X", 1 + "X", 0 -- </badutf-3.6> }) @@ -275,7 +275,7 @@ test:do_test( return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") end, { -- <badutf-3.7> - "X", 10 + "X", 0 -- </badutf-3.7> }) @@ -285,7 +285,7 @@ test:do_test( return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\xf0\x80\x80\x80\x80') AS x") end, { -- <badutf-3.8> - "X", 6 + "X", 0 -- </badutf-3.8> }) @@ -295,7 +295,7 @@ test:do_test( return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\xf0\x80\x80\x80\xff') AS x") end, { -- <badutf-3.9> - "X", 7 + "X", 0 -- </badutf-3.9> }) >> + >> /* >> * Implementation of the length() function >> */ >> @@ -150,11 +174,7 @@ lengthFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) >> const unsigned char *z = sqlite3_value_text(argv[0]); >> if (z == 0) >> return; >> - len = 0; >> - while (*z) { >> - len++; >> - SQLITE_SKIP_UTF8(z); >> - } >> + len = count_chars(z, sqlite3_value_bytes(argv[0])); >> sqlite3_result_int(context, len); >> break; >> } >> @@ -340,11 +360,8 @@ substrFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) >> if (z == 0) >> return; >> len = 0; >> - if (p1 < 0) { >> - for (z2 = z; *z2; len++) { >> - SQLITE_SKIP_UTF8(z2); >> - } >> - } >> + if (p1 < 0) >> + len = count_chars(z, sqlite3_value_bytes(argv[0])); >> } >> #ifdef SQLITE_SUBSTR_COMPATIBILITY >> /* If SUBSTR_COMPATIBILITY is defined then substr(X,0,N) work the same as >> @@ -388,12 +405,21 @@ substrFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) >> } >> assert(p1 >= 0 && p2 >= 0); >> if (p0type != SQLITE_BLOB) { >> - while (*z && p1) { >> + /* >> + * In the code below 'cnt' and 'n_chars' is >> + * used because '\0' is not supposed to be >> + * end-of-string symbol. >> + */ >> + int n_chars = count_chars(z, sqlite3_value_bytes(argv[0])); >> > I’d better call it char_count or symbol_count or char_count. >> diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua >> index b7de1d955..8c712bd5e 100755 >> --- a/test/sql-tap/func.test.lua >> +++ b/test/sql-tap/func.test.lua >> +-- REPLACE >> +test:do_execsql_test( >> + "func-62", >> + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(00), CHAR(65)) LIKE 'AAAA';", >> + {1}) >> + >> +test:do_execsql_test( >> + "func-63", >> + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(65), CHAR(00)) \ >> + LIKE CHAR(00,00,00,00);", >> + {1}) >> + >> +-- SUBSTR >> +test:do_execsql_test( >> + "func-64", >> + "SELECT SUBSTR(CHAR(65,00,66,67), 3, 2) LIKE CHAR(66, 67);", >> + {1}) >> + >> +test:do_execsql_test( >> + "func-65", >> + "SELECT SUBSTR(CHAR(00,00,00,65), 1, 4) LIKE CHAR(00,00,00,65);", >> + {1}) >> + >> > Just wondering: why do you use LIKE function almost in all tests? > To compare actual result with expected one. But now I think It would be more readable like this: diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua index 8c712bd5e..7d1611fc2 100755 --- a/test/sql-tap/func.test.lua +++ b/test/sql-tap/func.test.lua @@ -2795,61 +2795,60 @@ test:do_execsql_test( -- REPLACE test:do_execsql_test( "func-62", - "SELECT REPLACE(CHAR(00,65,00,65), CHAR(00), CHAR(65)) LIKE 'AAAA';", - {1}) + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(00), CHAR(65));", + {'AAAA'}) test:do_execsql_test( "func-63", - "SELECT REPLACE(CHAR(00,65,00,65), CHAR(65), CHAR(00)) \ - LIKE CHAR(00,00,00,00);", - {1}) + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(65), CHAR(00));", + {string.char(00,00,00,00)}) -- SUBSTR test:do_execsql_test( "func-64", - "SELECT SUBSTR(CHAR(65,00,66,67), 3, 2) LIKE CHAR(66, 67);", - {1}) + "SELECT SUBSTR(CHAR(65,00,66,67), 3, 2);", + {string.char(66,67)}) test:do_execsql_test( "func-65", - "SELECT SUBSTR(CHAR(00,00,00,65), 1, 4) LIKE CHAR(00,00,00,65);", - {1}) + "SELECT SUBSTR(CHAR(00,00,00,65), 1, 4);", + {string.char(00,00,00,65)}) -- UPPER test:do_execsql_test( "func-66", - "SELECT UPPER(CHAR(00,97,00,98,00)) LIKE CHAR(00,65,00,66,00);", - {1}) + "SELECT UPPER(CHAR(00,97,00,98,00));", + {string.char(00,65,00,66,00)}) -- LOWER test:do_execsql_test( "func-67", - "SELECT LOWER(CHAR(00,65,00,66,00)) LIKE CHAR(00,97,00,98,00);", - {1}) + "SELECT LOWER(CHAR(00,65,00,66,00));", + {string.char(00,97,00,98,00)}) -- HEX test:do_execsql_test( "func-68", - "SELECT HEX(CHAR(00,65,00,65,00)) LIKE ('0041004100');", - {1}) + "SELECT HEX(CHAR(00,65,00,65,00));", + {'0041004100'}) -- TRIM test:do_execsql_test( "func-69", - "SELECT TRIM(CHAR(32,00,32,00,32)) LIKE (CHAR(00,32,00));", - {1}) + "SELECT TRIM(CHAR(32,00,32,00,32));", + {string.char(00,32,00)}) -- LTRIM test:do_execsql_test( "func-70", - "SELECT LTRIM(CHAR(32,00,32,00,32)) LIKE (CHAR(00,32,00,32));", - {1}) + "SELECT LTRIM(CHAR(32,00,32,00,32));", + {string.char(00,32,00,32)}) -- RTRIM test:do_execsql_test( "func-71", - "SELECT RTRIM(CHAR(32,00,32,00,32)) LIKE (CHAR(32,00,32,00));", - {1}) + "SELECT RTRIM(CHAR(32,00,32,00,32));", + {string.char(32,00,32,00)}) -- GROUP_CONCAT test:do_execsql_test( ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-07 15:14 ` i.koptelov @ 2019-02-11 13:15 ` n.pettik 2019-02-13 15:46 ` i.koptelov 0 siblings, 1 reply; 22+ messages in thread From: n.pettik @ 2019-02-11 13:15 UTC (permalink / raw) To: tarantool-patches; +Cc: Ivan Koptelov > On 7 Feb 2019, at 18:14, i.koptelov <ivan.koptelov@tarantool.org> wrote: >> On 5 Feb 2019, at 16:50, n.pettik <korablev@tarantool.org> wrote: >>> On 29/01/2019 19:35, n.pettik wrote: >>>>> Fixes LIKE and LENGTH functions. '\0' now treated as >>>>> >>>> Nit: is treated. >>> Fixed. >>>>> a usual symbol. Strings with '\0' are now processed >>>>> entirely. Consider examples: >>>>> >>>>> LENGTH(CHAR(65,00,65)) == 3 >>>>> LIKE(CHAR(65,00,65), CHAR(65,00,66)) == False >>>>> >>>> Also, I see that smth wrong with text in this mail again >>> I hope now the mail text is ok. >> >> Not quite. It is still highlighted in some way. Have no idea. > Oh, I am really sorry. Does it look ok now? (I changed the mailing client) Yes, it does. >>> src/box/sql/func.c | 88 +++++++++++++----- >>> src/box/sql/vdbeInt.h | 2 +- >>> test/sql-tap/func.test.lua | 220 ++++++++++++++++++++++++++++++++++++++++++++- >>> 3 files changed, 284 insertions(+), 26 deletions(-) >>> >>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >>> index e46b162d9..2978af983 100644 >>> --- a/src/box/sql/func.c >>> +++ b/src/box/sql/func.c >>> @@ -128,6 +128,30 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) >>> sqlite3_result_text(context, z, -1, SQLITE_STATIC); >>> } >>> >>> +/** >>> + * Return number of chars in the given string. >>> + * >>> + * Number of chars != byte size of string because some characters >>> + * are encoded with more than one byte. Also note that all >>> + * characters from 'str' to 'str + byte_len' would be counted, >>> + * even if there is a '\0' somewhere between them. >>> + * @param str String to be counted. >>> + * @param byte_len Byte length of given string. >>> + * @return >>> >> Return what? > Fixed. >>> + */ >>> +static int >>> +count_chars(const unsigned char *str, size_t byte_len) >>> >> Quite poor naming. I would call it utf8_str_len or >> smth with utf8 prefix. Mb it is worth to put it some utils source file. >> Also, consider using native U8_NEXT function from utf8.c, >> instead of custom SQLITE_SKIP_UTF8. It may be not so fast >> but safer I suppose. I don't insist though. >>> +{ >> What if str is NULL? Add at least an assertion. >>> + int n_chars = 0; >>> + const unsigned char *prev_z; >>> + for (size_t cnt = 0; cnt < byte_len; cnt += (str - prev_z)) { >>> + n_chars++; >>> + prev_z = str; >>> + SQLITE_SKIP_UTF8(str); >>> + } >>> + return n_chars; >>> +} >>> >> You can rewrite this function in a simpler way without using SQLITE macroses. >> Read this topic: https://stackoverflow.com/questions/3911536/utf-8-unicode-whats-with-0xc0-and-0x80/3911566#3911566 >> It is quite useful. You may borrow implementation from there. > Both usage of U8_NEXT and the solution from stack overflow causes sql-tap/badutf1 test to fail, > while func.test (and other tests) are OK. In other words, SQLITE_SKIP_UTF8 and U8_NEXT > proceeds Invalid byte sequences differently. > As far as I understand, the purpose of the test is to check that malformed UTF-8 would not cause a crash. > So, I just used U8_NEXT and fixed the test. > > (Since the function is used only in func.c, I prefer to keep it there and don’t move to separate utils file) > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 2978af983..8ad75adb8 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -129,27 +129,29 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) > } > > /** > - * Return number of chars in the given string. > + * Return number of symbols in the given string. > * > - * Number of chars != byte size of string because some characters > + * Number of symbols != byte size of string because some symbols > * are encoded with more than one byte. Also note that all > - * characters from 'str' to 'str + byte_len' would be counted, > + * symbols from 'str' to 'str + byte_len' would be counted, > * even if there is a '\0' somewhere between them. > * @param str String to be counted. > * @param byte_len Byte length of given string. > - * @return > + * @return number of symbols in the given string. > */ > static int > -count_chars(const unsigned char *str, size_t byte_len) > +char_count(const unsigned char *str, size_t byte_len) As I pointed in previous review, use utf8_ prefix: utf8_strlen, utf8_str_char_count or sort of. Why unsigned char? > { > - int n_chars = 0; > - const unsigned char *prev_z; > - for (size_t cnt = 0; cnt < byte_len; cnt += (str - prev_z)) { > - n_chars++; > - prev_z = str; > - SQLITE_SKIP_UTF8(str); > + int symbol_len = 0; Name is confusing: it sounds like a length of a single symbol. symbol_count would be better. > + int offset = 0; > + UChar32 res; > + while (offset < (int) byte_len) { > + U8_NEXT(str, offset, (int) byte_len, res) Instead of two explicit casts to int, lets make byte_len be of type int. > + if (res < 0) > + break; Hm, so if a sequence contains non-utf8 symbols, you simply cut the string. Not sure it is solid approach. Could you please check how other DBs behave (do they throw an error? Or ignore invalid symbols?) in similar situation and what standard says. > + symbol_len++; > } > - return n_chars; > + return symbol_len; > } > > diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua > index 534c762ba..0a90d1b17 100755 > --- a/test/sql-tap/badutf1.test.lua > +++ b/test/sql-tap/badutf1.test.lua > @@ -215,7 +215,7 @@ test:do_test( > return test:execsql2("SELECT length('\x80') AS x") > end, { > -- <badutf-3.1> > - "X", 1 > + "X", 0 Why this and the rest of tests below has been changed? I guess because now they contain invalid utf8 symbols. > -- </badutf-3.1> > }) > > @@ -235,7 +235,7 @@ test:do_test( > return test:execsql2("SELECT length('\x7f\x80\x81') AS x") > end, { > -- <badutf-3.3> > - "X", 3 > + "X", 1 But wait, I execute tarantool> SELECT length('\x7f\x80\x81') AS x --- - - [12] ... Looks extremely strangle. What do these tests check at all? Why we can’t use simple sqlexecute or smth? This test suite is so broken... >>> @@ -388,12 +405,21 @@ substrFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) >>> } >>> assert(p1 >= 0 && p2 >= 0); >>> if (p0type != SQLITE_BLOB) { >>> - while (*z && p1) { >>> + /* >>> + * In the code below 'cnt' and 'n_chars' is >>> + * used because '\0' is not supposed to be >>> + * end-of-string symbol. >>> + */ >>> + int n_chars = count_chars(z, sqlite3_value_bytes(argv[0])); >>> >> I’d better call it char_count or symbol_count or char_count. > >>> diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua >>> index b7de1d955..8c712bd5e 100755 >>> --- a/test/sql-tap/func.test.lua >>> +++ b/test/sql-tap/func.test.lua >>> +-- REPLACE >>> +test:do_execsql_test( >>> + "func-62", >>> + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(00), CHAR(65)) LIKE 'AAAA';", >>> + {1}) >>> + >>> +test:do_execsql_test( >>> + "func-63", >>> + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(65), CHAR(00)) \ >>> + LIKE CHAR(00,00,00,00);", >>> + {1}) >>> + >>> +-- SUBSTR >>> +test:do_execsql_test( >>> + "func-64", >>> + "SELECT SUBSTR(CHAR(65,00,66,67), 3, 2) LIKE CHAR(66, 67);", >>> + {1}) >>> + >>> +test:do_execsql_test( >>> + "func-65", >>> + "SELECT SUBSTR(CHAR(00,00,00,65), 1, 4) LIKE CHAR(00,00,00,65);", >>> + {1}) >>> + >>> >> Just wondering: why do you use LIKE function almost in all tests? >> > To compare actual result with expected one. But now I think It would be more readable like this: Moreover, in case LIKE fails for some reason, these tests will fail as well. Meanwhile, their purpose to test not LIKE function. Glad you fixed that. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-11 13:15 ` n.pettik @ 2019-02-13 15:46 ` i.koptelov 2019-02-14 12:57 ` n.pettik 0 siblings, 1 reply; 22+ messages in thread From: i.koptelov @ 2019-02-13 15:46 UTC (permalink / raw) To: tarantool-patches; +Cc: n.pettik > On 11 Feb 2019, at 16:15, n.pettik <korablev@tarantool.org> wrote: > >> >> On 7 Feb 2019, at 18:14, i.koptelov <ivan.koptelov@tarantool.org> wrote: >>> On 5 Feb 2019, at 16:50, n.pettik <korablev@tarantool.org> wrote: >>>> On 29/01/2019 19:35, n.pettik wrote: >>>>>> Fixes LIKE and LENGTH functions. '\0' now treated as >>>>>> >>>>> Nit: is treated. >>>> Fixed. >>>>>> a usual symbol. Strings with '\0' are now processed >>>>>> entirely. Consider examples: >>>>>> >>>>>> LENGTH(CHAR(65,00,65)) == 3 >>>>>> LIKE(CHAR(65,00,65), CHAR(65,00,66)) == False >>>>>> >>>>> Also, I see that smth wrong with text in this mail again >>>> I hope now the mail text is ok. >>> >>> Not quite. It is still highlighted in some way. Have no idea. >> Oh, I am really sorry. Does it look ok now? (I changed the mailing client) > > Yes, it does. > >>>> src/box/sql/func.c | 88 +++++++++++++----- >>>> src/box/sql/vdbeInt.h | 2 +- >>>> test/sql-tap/func.test.lua | 220 ++++++++++++++++++++++++++++++++++++++++++++- >>>> 3 files changed, 284 insertions(+), 26 deletions(-) >>>> >>>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >>>> index e46b162d9..2978af983 100644 >>>> --- a/src/box/sql/func.c >>>> +++ b/src/box/sql/func.c >>>> @@ -128,6 +128,30 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) >>>> sqlite3_result_text(context, z, -1, SQLITE_STATIC); >>>> } >>>> >>>> +/** >>>> + * Return number of chars in the given string. >>>> + * >>>> + * Number of chars != byte size of string because some characters >>>> + * are encoded with more than one byte. Also note that all >>>> + * characters from 'str' to 'str + byte_len' would be counted, >>>> + * even if there is a '\0' somewhere between them. >>>> + * @param str String to be counted. >>>> + * @param byte_len Byte length of given string. >>>> + * @return >>>> >>> Return what? >> Fixed. >>>> + */ >>>> +static int >>>> +count_chars(const unsigned char *str, size_t byte_len) >>>> >>> Quite poor naming. I would call it utf8_str_len or >>> smth with utf8 prefix. Mb it is worth to put it some utils source file. >>> Also, consider using native U8_NEXT function from utf8.c, >>> instead of custom SQLITE_SKIP_UTF8. It may be not so fast >>> but safer I suppose. I don't insist though. >>>> +{ >>> What if str is NULL? Add at least an assertion. >>>> + int n_chars = 0; >>>> + const unsigned char *prev_z; >>>> + for (size_t cnt = 0; cnt < byte_len; cnt += (str - prev_z)) { >>>> + n_chars++; >>>> + prev_z = str; >>>> + SQLITE_SKIP_UTF8(str); >>>> + } >>>> + return n_chars; >>>> +} >>>> >>> You can rewrite this function in a simpler way without using SQLITE macroses. >>> Read this topic: https://stackoverflow.com/questions/3911536/utf-8-unicode-whats-with-0xc0-and-0x80/3911566#3911566 >>> It is quite useful. You may borrow implementation from there. >> Both usage of U8_NEXT and the solution from stack overflow causes sql-tap/badutf1 test to fail, >> while func.test (and other tests) are OK. In other words, SQLITE_SKIP_UTF8 and U8_NEXT >> proceeds Invalid byte sequences differently. >> As far as I understand, the purpose of the test is to check that malformed UTF-8 would not cause a crash. >> So, I just used U8_NEXT and fixed the test. >> >> (Since the function is used only in func.c, I prefer to keep it there and don’t move to separate utils file) >> >> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >> index 2978af983..8ad75adb8 100644 >> --- a/src/box/sql/func.c >> +++ b/src/box/sql/func.c >> @@ -129,27 +129,29 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) >> } >> >> /** >> - * Return number of chars in the given string. >> + * Return number of symbols in the given string. >> * >> - * Number of chars != byte size of string because some characters >> + * Number of symbols != byte size of string because some symbols >> * are encoded with more than one byte. Also note that all >> - * characters from 'str' to 'str + byte_len' would be counted, >> + * symbols from 'str' to 'str + byte_len' would be counted, >> * even if there is a '\0' somewhere between them. >> * @param str String to be counted. >> * @param byte_len Byte length of given string. >> - * @return >> + * @return number of symbols in the given string. >> */ >> static int >> -count_chars(const unsigned char *str, size_t byte_len) >> +char_count(const unsigned char *str, size_t byte_len) > > As I pointed in previous review, use utf8_ prefix: > utf8_strlen, utf8_str_char_count or sort of. Sorry, overlooked this. Fixed now. diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 8ad75adb8..bab102988 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -140,18 +140,18 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) * @return number of symbols in the given string. */ static int -char_count(const unsigned char *str, size_t byte_len) +utf8_char_count(const unsigned char *str, int byte_len) { > > Why unsigned char? I used unsigned char because sqlite3_value_text() returns result of this type. > >> { >> - int n_chars = 0; >> - const unsigned char *prev_z; >> - for (size_t cnt = 0; cnt < byte_len; cnt += (str - prev_z)) { >> - n_chars++; >> - prev_z = str; >> - SQLITE_SKIP_UTF8(str); >> + int symbol_len = 0; > > Name is confusing: it sounds like a length of a single symbol. > symbol_count would be better. Ok, thank you, fixed. > >> + int offset = 0; >> + UChar32 res; >> + while (offset < (int) byte_len) { >> + U8_NEXT(str, offset, (int) byte_len, res) > > Instead of two explicit casts to int, lets make byte_len > be of type int. Ok. diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 8ad75adb8..bab102988 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -140,18 +140,18 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) * @return number of symbols in the given string. */ static int -char_count(const unsigned char *str, size_t byte_len) +utf8_char_count(const unsigned char *str, int byte_len) { - int symbol_len = 0; + int symbol_count = 0; int offset = 0; UChar32 res; - while (offset < (int) byte_len) { - U8_NEXT(str, offset, (int) byte_len, res) + while (offset < byte_len) { + U8_NEXT(str, offset, byte_len, res) if (res < 0) break; - symbol_len++; + symbol_count++; } - return symbol_len; + return symbol_count; } > >> + if (res < 0) >> + break; > > Hm, so if a sequence contains non-utf8 symbols, > you simply cut the string. Not sure it is solid approach. > Could you please check how other DBs behave (do they throw > an error? Or ignore invalid symbols?) in similar situation and > what standard says. I’ve tested statements with LENGTH from the test badutf1 in different DBs. PostgreSQL raised an error "invalid byte sequence for encoding “UTF8”. MySQL, DB2 and MSSQL behaved all the same way: count each invalid byte as a symbol. For example: 0x80, 0x7f, 0x81 are all invalid first bytes from UTF8 point of view, 0xC03F is bad two byte seq. where first byte is ok and second is broken, 0xE0800F is bad three byte seq. where first two bytes are ok, but the third is broken. (this is MySQL SQL) select length(X'80'); 1 select length(concat(X'7f', X'80', X'81')); 3 select length(concat(X'61', X'c0')); 2 select length(concat(X'61', X'c0', X'80', X'80', X'80', X'80', X'80', X'80', X'80', X'80', X'80', X'80')); 12 select length(X'C03F'); 2 select length(concat(X'C03F', 'a')); 3 select length(X'E0800F'); 3 select length(concat(X'E0800F', 'a')); 4 I have not found anything in standard about dealing with invalid UTF8 sequences. Even before the patch test gave results different from what all others DBs. I propose to behave all other DBs do, as I describe above. Is it ok? > >> + symbol_len++; >> } >> - return n_chars; >> + return symbol_len; >> } >> >> diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua >> index 534c762ba..0a90d1b17 100755 >> --- a/test/sql-tap/badutf1.test.lua >> +++ b/test/sql-tap/badutf1.test.lua >> @@ -215,7 +215,7 @@ test:do_test( >> return test:execsql2("SELECT length('\x80') AS x") >> end, { >> -- <badutf-3.1> >> - "X", 1 >> + "X", 0 > > Why this and the rest of tests below has been changed? > I guess because now they contain invalid utf8 symbols. They contained invalid utf8 symbols before. The tests are changed because I changed the way we handle malformed utf8 strings. > >> -- </badutf-3.1> >> }) >> >> @@ -235,7 +235,7 @@ test:do_test( >> return test:execsql2("SELECT length('\x7f\x80\x81') AS x") >> end, { >> -- <badutf-3.3> >> - "X", 3 >> + "X", 1 > > But wait, I execute > tarantool> SELECT length('\x7f\x80\x81') AS x > --- > - - [12] > ... > Maybe you got this result because ‘\x7f’ is treated by tarantool sql as a simple string with 4 characters? > Looks extremely strangle. What do these tests check at all? > Why we can’t use simple sqlexecute or smth? This test suite > is so broken... I think these tests check that DB does not crash if malformed utf8 is encountered. > >>>> @@ -388,12 +405,21 @@ substrFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) >>>> } >>>> assert(p1 >= 0 && p2 >= 0); >>>> if (p0type != SQLITE_BLOB) { >>>> - while (*z && p1) { >>>> + /* >>>> + * In the code below 'cnt' and 'n_chars' is >>>> + * used because '\0' is not supposed to be >>>> + * end-of-string symbol. >>>> + */ >>>> + int n_chars = count_chars(z, sqlite3_value_bytes(argv[0])); >>>> >>> I’d better call it char_count or symbol_count or char_count. >> >>>> diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua >>>> index b7de1d955..8c712bd5e 100755 >>>> --- a/test/sql-tap/func.test.lua >>>> +++ b/test/sql-tap/func.test.lua >>>> +-- REPLACE >>>> +test:do_execsql_test( >>>> + "func-62", >>>> + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(00), CHAR(65)) LIKE 'AAAA';", >>>> + {1}) >>>> + >>>> +test:do_execsql_test( >>>> + "func-63", >>>> + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(65), CHAR(00)) \ >>>> + LIKE CHAR(00,00,00,00);", >>>> + {1}) >>>> + >>>> +-- SUBSTR >>>> +test:do_execsql_test( >>>> + "func-64", >>>> + "SELECT SUBSTR(CHAR(65,00,66,67), 3, 2) LIKE CHAR(66, 67);", >>>> + {1}) >>>> + >>>> +test:do_execsql_test( >>>> + "func-65", >>>> + "SELECT SUBSTR(CHAR(00,00,00,65), 1, 4) LIKE CHAR(00,00,00,65);", >>>> + {1}) >>>> + >>>> >>> Just wondering: why do you use LIKE function almost in all tests? >>> >> To compare actual result with expected one. But now I think It would be more readable like this: > > Moreover, in case LIKE fails for some reason, these tests will > fail as well. Meanwhile, their purpose to test not LIKE function. > Glad you fixed that. Thanks. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-13 15:46 ` i.koptelov @ 2019-02-14 12:57 ` n.pettik 2019-02-20 13:54 ` i.koptelov 0 siblings, 1 reply; 22+ messages in thread From: n.pettik @ 2019-02-14 12:57 UTC (permalink / raw) To: tarantool-patches; +Cc: Ivan Koptelov I see some strange file on your branch: https://github.com/tarantool/tarantool/commit/0a34e978c8023dcb4f99965d93203a99adf21520 I guess it’s an artefact from your investigation. If you want to share some results/proposals, just attach them to the this letter or write to dev list. > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 8ad75adb8..bab102988 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -140,18 +140,18 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) > * @return number of symbols in the given string. > */ > static int > -char_count(const unsigned char *str, size_t byte_len) > +utf8_char_count(const unsigned char *str, int byte_len) > { >> >> Why unsigned char? > I used unsigned char because sqlite3_value_text() returns result > of this type. Ok, let it be. > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 8ad75adb8..bab102988 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -140,18 +140,18 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) > * @return number of symbols in the given string. > */ > static int > -char_count(const unsigned char *str, size_t byte_len) > +utf8_char_count(const unsigned char *str, int byte_len) > { > - int symbol_len = 0; > + int symbol_count = 0; > int offset = 0; > UChar32 res; > - while (offset < (int) byte_len) { > - U8_NEXT(str, offset, (int) byte_len, res) > + while (offset < byte_len) { > + U8_NEXT(str, offset, byte_len, res) > if (res < 0) > break; > - symbol_len++; > + symbol_count++; > } > - return symbol_len; > + return symbol_count; > } > >> >>> + if (res < 0) >>> + break; >> >> Hm, so if a sequence contains non-utf8 symbols, >> you simply cut the string. Not sure it is solid approach. >> Could you please check how other DBs behave (do they throw >> an error? Or ignore invalid symbols?) in similar situation and >> what standard says. > I’ve tested statements with LENGTH from the test badutf1 in different > DBs. PostgreSQL raised an error "invalid byte sequence for encoding “UTF8”. > MySQL, DB2 and MSSQL behaved all the same way: count each invalid byte as > a symbol. For example: > > 0x80, 0x7f, 0x81 are all invalid first bytes from > UTF8 point of view, 0xC03F is bad two byte seq. where > first byte is ok and second is broken, 0xE0800F is > bad three byte seq. where first two bytes are ok, but the > third is broken. > > (this is MySQL SQL) > > select length(X'80'); 1 > select length(concat(X'7f', X'80', X'81')); 3 > select length(concat(X'61', X'c0')); 2 > select length(concat(X'61', X'c0', X'80', X'80', X'80', > X'80', X'80', X'80', X'80', X'80', X'80', X'80')); 12 > > select length(X'C03F'); 2 > select length(concat(X'C03F', 'a')); 3 > select length(X'E0800F'); 3 > select length(concat(X'E0800F', 'a')); 4 > > I have not found anything in standard about dealing with > invalid UTF8 sequences. > > Even before the patch test gave results different from what > all others DBs. > > I propose to behave all other DBs do, as I describe above. > Is it ok? PostgreSQL makes me feel uncertain concerning this question. Anyway, I still believe that it’s better to raise an error. I propose you ask other members of core team for their opinion. Additionally, you could ask author of origin issue and our expert P. Gulutzan. >>> + symbol_len++; >>> } >>> - return n_chars; >>> + return symbol_len; >>> } >>> >>> diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua >>> index 534c762ba..0a90d1b17 100755 >>> --- a/test/sql-tap/badutf1.test.lua >>> +++ b/test/sql-tap/badutf1.test.lua >>> @@ -215,7 +215,7 @@ test:do_test( >>> return test:execsql2("SELECT length('\x80') AS x") >>> end, { >>> -- <badutf-3.1> >>> - "X", 1 >>> + "X", 0 >> >> Why this and the rest of tests below has been changed? >> I guess because now they contain invalid utf8 symbols. > They contained invalid utf8 symbols before. The tests are changed because > I changed the way we handle malformed utf8 strings. Ok, but at least underline this fact in commit message. >>> -- </badutf-3.1> >>> }) >>> >>> @@ -235,7 +235,7 @@ test:do_test( >>> return test:execsql2("SELECT length('\x7f\x80\x81') AS x") >>> end, { >>> -- <badutf-3.3> >>> - "X", 3 >>> + "X", 1 >> >> But wait, I execute >> tarantool> SELECT length('\x7f\x80\x81') AS x >> --- >> - - [12] >> ... >> > Maybe you got this result because ‘\x7f’ is treated by tarantool sql > as a simple string with 4 characters? And why in tests it is treated in other way? >> Looks extremely strangle. What do these tests check at all? >> Why we can’t use simple sqlexecute or smth? This test suite >> is so broken... > I think these tests check that DB does not crash if malformed utf8 is > encountered. I mean why results from console run are different from tests results in our suite? Still can’t understand that. Now I execute this: tarantool> box.sql.execute('select length("\x7f\x80\x81")’) And got strange error: --- - error: !!binary bm8gc3VjaCBjb2x1bW46IH+AgQ== ... ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-14 12:57 ` n.pettik @ 2019-02-20 13:54 ` i.koptelov 2019-02-20 15:47 ` i.koptelov 0 siblings, 1 reply; 22+ messages in thread From: i.koptelov @ 2019-02-20 13:54 UTC (permalink / raw) To: tarantool-patches; +Cc: n.pettik > On 14 Feb 2019, at 15:57, n.pettik <korablev@tarantool.org> wrote: > > I see some strange file on your branch: > > https://github.com/tarantool/tarantool/commit/0a34e978c8023dcb4f99965d93203a99adf21520 > > I guess it’s an artefact from your investigation. > If you want to share some results/proposals, just > attach them to the this letter or write to dev list. Sorry, I committed it by mistake. Removed now. > >> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >> index 8ad75adb8..bab102988 100644 >> --- a/src/box/sql/func.c >> +++ b/src/box/sql/func.c >> @@ -140,18 +140,18 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) >> * @return number of symbols in the given string. >> */ >> static int >> -char_count(const unsigned char *str, size_t byte_len) >> +utf8_char_count(const unsigned char *str, int byte_len) >> { >>> >>> Why unsigned char? >> I used unsigned char because sqlite3_value_text() returns result >> of this type. > > Ok, let it be. > >> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >> index 8ad75adb8..bab102988 100644 >> --- a/src/box/sql/func.c >> +++ b/src/box/sql/func.c >> @@ -140,18 +140,18 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) >> * @return number of symbols in the given string. >> */ >> static int >> -char_count(const unsigned char *str, size_t byte_len) >> +utf8_char_count(const unsigned char *str, int byte_len) >> { >> - int symbol_len = 0; >> + int symbol_count = 0; >> int offset = 0; >> UChar32 res; >> - while (offset < (int) byte_len) { >> - U8_NEXT(str, offset, (int) byte_len, res) >> + while (offset < byte_len) { >> + U8_NEXT(str, offset, byte_len, res) >> if (res < 0) >> break; >> - symbol_len++; >> + symbol_count++; >> } >> - return symbol_len; >> + return symbol_count; >> } >> >>> >>>> + if (res < 0) >>>> + break; >>> >>> Hm, so if a sequence contains non-utf8 symbols, >>> you simply cut the string. Not sure it is solid approach. >>> Could you please check how other DBs behave (do they throw >>> an error? Or ignore invalid symbols?) in similar situation and >>> what standard says. >> I’ve tested statements with LENGTH from the test badutf1 in different >> DBs. PostgreSQL raised an error "invalid byte sequence for encoding “UTF8”. >> MySQL, DB2 and MSSQL behaved all the same way: count each invalid byte as >> a symbol. For example: >> >> 0x80, 0x7f, 0x81 are all invalid first bytes from >> UTF8 point of view, 0xC03F is bad two byte seq. where >> first byte is ok and second is broken, 0xE0800F is >> bad three byte seq. where first two bytes are ok, but the >> third is broken. >> >> (this is MySQL SQL) >> >> select length(X'80'); 1 >> select length(concat(X'7f', X'80', X'81')); 3 >> select length(concat(X'61', X'c0')); 2 >> select length(concat(X'61', X'c0', X'80', X'80', X'80', >> X'80', X'80', X'80', X'80', X'80', X'80', X'80')); 12 >> >> select length(X'C03F'); 2 >> select length(concat(X'C03F', 'a')); 3 >> select length(X'E0800F'); 3 >> select length(concat(X'E0800F', 'a')); 4 >> >> I have not found anything in standard about dealing with >> invalid UTF8 sequences. >> >> Even before the patch test gave results different from what >> all others DBs. >> >> I propose to behave all other DBs do, as I describe above. >> Is it ok? > > PostgreSQL makes me feel uncertain concerning this question. > Anyway, I still believe that it’s better to raise an error. > I propose you ask other members of core team for their opinion. > Additionally, you could ask author of origin issue and our expert P. Gulutzan. After discussions, we decided to use the following algorithm: Starting from the first byte in string, we try to determine what kind of byte is it: first byte of 1,2,3 or 4 byte sequence. Then we skip corresponding amount of bytes and increment symbol length. (e.g 2 bytes for 2 byte sequence). If current byte is not a valid first byte of any sequence, when we skip it and increment symbol length. diff --git a/src/box/sql/func.c b/src/box/sql/func.c index bab102988..8406eb42f 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -135,6 +135,11 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) * are encoded with more than one byte. Also note that all * symbols from 'str' to 'str + byte_len' would be counted, * even if there is a '\0' somewhere between them. + * + * This function is implemented to be fast and indifferent to + * correctness of string being processed. If input string has + * even one invalid utf-8 sequence, then the resulting length + * could be arbitary in these boundaries (0 < len < byte_len). * @param str String to be counted. * @param byte_len Byte length of given string. * @return number of symbols in the given string. @@ -143,12 +148,17 @@ static int utf8_char_count(const unsigned char *str, int byte_len) { int symbol_count = 0; - int offset = 0; - UChar32 res; - while (offset < byte_len) { - U8_NEXT(str, offset, byte_len, res) - if (res < 0) - break; + for (int i = 0; i < byte_len;) { + if ((str[i] & 0x80) == 0) + i += 1; + else if ((str[i] & 0xe0) == 0xc0) + i += 2; + else if ((str[i] & 0xf0) == 0xe0) + i += 3; + else if ((str[i] & 0xf8) == 0xf0) + i += 4; + else + i += 1; symbol_count++; } return symbol_count; > >>>> + symbol_len++; >>>> } >>>> - return n_chars; >>>> + return symbol_len; >>>> } >>>> >>>> diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua >>>> index 534c762ba..0a90d1b17 100755 >>>> --- a/test/sql-tap/badutf1.test.lua >>>> +++ b/test/sql-tap/badutf1.test.lua >>>> @@ -215,7 +215,7 @@ test:do_test( >>>> return test:execsql2("SELECT length('\x80') AS x") >>>> end, { >>>> -- <badutf-3.1> >>>> - "X", 1 >>>> + "X", 0 >>> >>> Why this and the rest of tests below has been changed? >>> I guess because now they contain invalid utf8 symbols. >> They contained invalid utf8 symbols before. The tests are changed because >> I changed the way we handle malformed utf8 strings. > > Ok, but at least underline this fact in commit message. Ok. > >>>> -- </badutf-3.1> >>>> }) >>>> >>>> @@ -235,7 +235,7 @@ test:do_test( >>>> return test:execsql2("SELECT length('\x7f\x80\x81') AS x") >>>> end, { >>>> -- <badutf-3.3> >>>> - "X", 3 >>>> + "X", 1 >>> >>> But wait, I execute >>> tarantool> SELECT length('\x7f\x80\x81') AS x >>> --- >>> - - [12] >>> ... >>> x§ >> Maybe you got this result because ‘\x7f’ is treated by tarantool sql >> as a simple string with 4 characters? > > And why in tests it is treated in other way? When you use SQL interface in console and type SELECT LENGTH (‘\x7f\x80\x81’) AS x; things like ‘\x7f’ are not treated as hexadecimal literals, they are just treated as symbols ‘\’, ‘x’, ’7’, ‘f’ (And if you want to use hexadecimal literals, you should use: > \set language sql > select length(X'7f' || X'80' || X'81' ); --- - - [3] ) > >>> Looks extremely strangle. What do these tests check at all? >>> Why we can’t use simple sqlexecute or smth? This test suite >>> is so broken... >> I think these tests check that DB does not crash if malformed utf8 is >> encountered. > > I mean why results from console run are different from tests > results in our suite? Still can’t understand that. Now I execute this: > tarantool> box.sql.execute('select length("\x7f\x80\x81")’) > > And got strange error: > --- > - error: !!binary bm8gc3VjaCBjb2x1bW46IH+AgQ== > ... I would like to investigate it in scope of separate task, because it seems to be quite time consuming problem. At the end, we got these changes in badutf1.test.lua diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua index 534c762ba..2b9f9b209 100755 --- a/test/sql-tap/badutf1.test.lua +++ b/test/sql-tap/badutf1.test.lua @@ -255,7 +255,7 @@ test:do_test( return test:execsql2("SELECT length('\x61\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") end, { -- <badutf-3.5> - "X", 2 + "X", 11 -- </badutf-3.5> }) @@ -265,7 +265,7 @@ test:do_test( return test:execsql2("SELECT length('\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") end, { -- <badutf-3.6> - "X", 1 + "X", 10 -- </badutf-3.6> }) @@ -285,7 +285,7 @@ test:do_test( return test:execsql2("SELECT length('\x80\x80\x80\x80\x80\xf0\x80\x80\x80\x80') AS x") end, { -- <badutf-3.8> - "X", 6 + "X", 7 -- </badutf-3.8> }) ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-20 13:54 ` i.koptelov @ 2019-02-20 15:47 ` i.koptelov 2019-02-20 16:04 ` n.pettik 0 siblings, 1 reply; 22+ messages in thread From: i.koptelov @ 2019-02-20 15:47 UTC (permalink / raw) To: tarantool-patches; +Cc: n.pettik Thanks to Alexander, I fixed my patch to use a function from icu to count the length of the string. Changes: diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 233ea2901..8ddb9780f 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -149,16 +149,7 @@ utf8_char_count(const unsigned char *str, int byte_len) { int symbol_count = 0; for (int i = 0; i < byte_len;) { - if ((str[i] & 0x80) == 0) - i += 1; - else if ((str[i] & 0xe0) == 0xc0) - i += 2; - else if ((str[i] & 0xf0) == 0xe0) - i += 3; - else if ((str[i] & 0xf8) == 0xf0) - i += 4; - else - i += 1; + U8_FWD_1_UNSAFE(str, i); symbol_count++; } return symbol_count; diff --git a/test/sql-tap/badutf1.test.lua b/test/sql-tap/badutf1.test.lua index 67c1071b3..d104efaa9 100755 --- a/test/sql-tap/badutf1.test.lua +++ b/test/sql-tap/badutf1.test.lua @@ -255,7 +255,7 @@ test:do_test( return test:execsql2("SELECT length('\x61\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") end, { -- <badutf-3.5> - "X", 11 + "X", 12 -- </badutf-3.5> }) @@ -265,7 +265,7 @@ test:do_test( return test:execsql2("SELECT length('\xc0\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80') AS x") end, { -- <badutf-3.6> - "X", 10 + "X", 11 -- </badutf-3.6> }) ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-20 15:47 ` i.koptelov @ 2019-02-20 16:04 ` n.pettik 2019-02-20 18:08 ` Vladislav Shpilevoy 2019-02-20 19:24 ` i.koptelov 0 siblings, 2 replies; 22+ messages in thread From: n.pettik @ 2019-02-20 16:04 UTC (permalink / raw) To: tarantool-patches; +Cc: Ivan Koptelov > On 20 Feb 2019, at 18:47, i.koptelov <ivan.koptelov@tarantool.org> wrote: > > Thanks to Alexander, I fixed my patch to use a function > from icu to count the length of the string. > > Changes: > Look, each next implementation again and again changes results of certain tests. Lets firstly define exact behaviour of length() function and then write function which will satisfy these requirements, not vice versa. Is this the final version? Moreover, since Konstantin suggest as fast implementation as we can, I propose to consider sort of asm written variant: .global ap_strlen_utf8_s ap_strlen_utf8_s: push %esi cld mov 8(%esp), %esi xor %ecx, %ecx loopa: dec %ecx loopb: lodsb shl $1, %al js loopa jc loopb jnz loopa mov %ecx, %eax not %eax pop %esi ret It is taken from http://canonical.org/~kragen/strlen-utf8 and author claims that quite fast (seems like it doesn’t handle \0, but we can patch it). I didn’t bench it, so I am not absolutely sure that it ‘way faster’ than other implementations. > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 233ea2901..8ddb9780f 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -149,16 +149,7 @@ utf8_char_count(const unsigned char *str, int byte_len) > { > int symbol_count = 0; > for (int i = 0; i < byte_len;) { > - if ((str[i] & 0x80) == 0) > - i += 1; > - else if ((str[i] & 0xe0) == 0xc0) > - i += 2; > - else if ((str[i] & 0xf0) == 0xe0) > - i += 3; > - else if ((str[i] & 0xf8) == 0xf0) > - i += 4; > - else > - i += 1; > + U8_FWD_1_UNSAFE(str, i); This function handles string not in the way we’ve discussed. Furthermore, description says that it “assumes well-formed UTF-8”, which in our case is not true. So who knows what may happen if we pass malformed byte sequence. I am not even saying that behaviour of this function on invalid inputs may change later. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-20 16:04 ` n.pettik @ 2019-02-20 18:08 ` Vladislav Shpilevoy 2019-02-20 19:24 ` i.koptelov 1 sibling, 0 replies; 22+ messages in thread From: Vladislav Shpilevoy @ 2019-02-20 18:08 UTC (permalink / raw) To: tarantool-patches, n.pettik; +Cc: Ivan Koptelov On 20/02/2019 19:04, n.pettik wrote: > > >> On 20 Feb 2019, at 18:47, i.koptelov <ivan.koptelov@tarantool.org> wrote: >> >> Thanks to Alexander, I fixed my patch to use a function >> from icu to count the length of the string. >> >> Changes: >> > > Look, each next implementation again and again changes > results of certain tests. Lets firstly define exact behaviour of > length() function and then write function which will satisfy these > requirements, not vice versa. Is this the final version? > Moreover, since Konstantin suggest as fast implementation > as we can, I propose to consider sort of asm written variant: > > .global ap_strlen_utf8_s > ap_strlen_utf8_s: > push %esi > cld > mov 8(%esp), %esi > xor %ecx, %ecx > loopa: dec %ecx > loopb: lodsb > shl $1, %al > js loopa > jc loopb > jnz loopa > mov %ecx, %eax > not %eax > pop %esi > ret > > > It is taken from http://canonical.org/~kragen/strlen-utf8 > and author claims that quite fast (seems like it doesn’t > handle \0, but we can patch it). I didn’t bench it, so I am > not absolutely sure that it ‘way faster’ than other implementations. https://github.com/Gerold103/tarantool-memes/blob/master/further%20from%20god.jpg ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-20 16:04 ` n.pettik 2019-02-20 18:08 ` Vladislav Shpilevoy @ 2019-02-20 19:24 ` i.koptelov 2019-02-22 12:59 ` n.pettik 1 sibling, 1 reply; 22+ messages in thread From: i.koptelov @ 2019-02-20 19:24 UTC (permalink / raw) To: tarantool-patches; +Cc: n.pettik >> On 20 Feb 2019, at 18:47, i.koptelov <ivan.koptelov@tarantool.org> wrote: >> >> Thanks to Alexander, I fixed my patch to use a function >> from icu to count the length of the string. >> >> Changes: >> > > Look, each next implementation again and again changes > results of certain tests. Lets firstly define exact behaviour of > length() function and then write function which will satisfy these > requirements, not vice versa. Is this the final version? I thought that these changes in ‘badutf’ tests are OK because we came to an agreement that we don’t care for results of LENGTH() on invalid strings. > Moreover, since Konstantin suggest as fast implementation > as we can, I propose to consider sort of asm written variant: > > .global ap_strlen_utf8_s > ap_strlen_utf8_s: > push %esi > cld > mov 8(%esp), %esi > xor %ecx, %ecx > loopa: dec %ecx > loopb: lodsb > shl $1, %al > js loopa > jc loopb > jnz loopa > mov %ecx, %eax > not %eax > pop %esi > ret > > > It is taken from http://canonical.org/~kragen/strlen-utf8 > and author claims that quite fast (seems like it doesn’t > handle \0, but we can patch it). I didn’t bench it, so I am > not absolutely sure that it ‘way faster’ than other implementations. I’ve also came across this solution, but I considered it to be kind of overkill. > >> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >> index 233ea2901..8ddb9780f 100644 >> --- a/src/box/sql/func.c >> +++ b/src/box/sql/func.c >> @@ -149,16 +149,7 @@ utf8_char_count(const unsigned char *str, int byte_len) >> { >> int symbol_count = 0; >> for (int i = 0; i < byte_len;) { >> - if ((str[i] & 0x80) == 0) >> - i += 1; >> - else if ((str[i] & 0xe0) == 0xc0) >> - i += 2; >> - else if ((str[i] & 0xf0) == 0xe0) >> - i += 3; >> - else if ((str[i] & 0xf8) == 0xf0) >> - i += 4; >> - else >> - i += 1; >> + U8_FWD_1_UNSAFE(str, i); > > This function handles string not in the way we’ve discussed. Because it always does three comparisons? #define U8_COUNT_TRAIL_BYTES_UNSAFE(leadByte) \ (((uint8_t)(leadByte)>=0xc2)+((uint8_t)(leadByte)>=0xe0)+((uint8_t)(leadByte)>=0xf0)) #define U8_FWD_1_UNSAFE(s, i) { \ (i)+=1+U8_COUNT_TRAIL_BYTES_UNSAFE((s)[i]); \ } > Furthermore, description says that it “assumes well-formed UTF-8”, > which in our case is not true. So who knows what may happen if we pass > malformed byte sequence. I am not even saying that behaviour of > this function on invalid inputs may change later. In it's current implementation U8_FWD_1_UNSAFE satisfy our needs safely. Returned symbol length would never exceed byte_len. static int utf8_char_count(const unsigned char *str, int byte_len) { int symbol_count = 0; for (int i = 0; i < byte_len;) { U8_FWD_1_UNSAFE(str, i); symbol_count++; } return symbol_count; } I agree that it is a bad idea to relay on lib behaviour which may change lately. So maybe I would just inline these one line macros? Or use my own implementation, since it’s more efficient (but less beautiful) ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-20 19:24 ` i.koptelov @ 2019-02-22 12:59 ` n.pettik 2019-02-25 11:09 ` i.koptelov 0 siblings, 1 reply; 22+ messages in thread From: n.pettik @ 2019-02-22 12:59 UTC (permalink / raw) To: tarantool-patches; +Cc: Ivan Koptelov [-- Attachment #1: Type: text/plain, Size: 1660 bytes --] > On 20 Feb 2019, at 22:24, i.koptelov <ivan.koptelov@tarantool.org> wrote: > >>> >>> On 20 Feb 2019, at 18:47, i.koptelov <ivan.koptelov@tarantool.org <mailto:ivan.koptelov@tarantool.org>> wrote: >>> >>> Thanks to Alexander, I fixed my patch to use a function >>> from icu to count the length of the string. >>> >>> Changes: Travis has failed. Please, make sure it is OK before sending the patch. It doesn’t fail on my local (Mac) machine, so I guess this fail appears only on Linux system. >> Furthermore, description says that it “assumes well-formed UTF-8”, >> which in our case is not true. So who knows what may happen if we pass >> malformed byte sequence. I am not even saying that behaviour of >> this function on invalid inputs may change later. > > In it's current implementation U8_FWD_1_UNSAFE satisfy our needs safely. Returned > symbol length would never exceed byte_len. > > static int > utf8_char_count(const unsigned char *str, int byte_len) > { > int symbol_count = 0; > for (int i = 0; i < byte_len;) { > U8_FWD_1_UNSAFE(str, i); > symbol_count++; > } > return symbol_count; > } > > I agree that it is a bad idea to relay on lib behaviour which may > change lately. So maybe I would just inline these one line macros? > Or use my own implementation, since it’s more efficient (but less beautiful) Nevermind, let's keep it as is. I really worry only about the fact that in other places SQL_SKIP_UTF8 is used instead. It handles only two-bytes utf8 symbols, meanwhile U8_FWD_1_UNSAFE() accounts three and four bytes length symbols. Can we use everywhere the same pattern? [-- Attachment #2: Type: text/html, Size: 17493 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-22 12:59 ` n.pettik @ 2019-02-25 11:09 ` i.koptelov 2019-02-25 15:10 ` n.pettik 0 siblings, 1 reply; 22+ messages in thread From: i.koptelov @ 2019-02-25 11:09 UTC (permalink / raw) To: tarantool-patches; +Cc: n.pettik > On 22 Feb 2019, at 15:59, n.pettik <korablev@tarantool.org> wrote: > > > >> On 20 Feb 2019, at 22:24, i.koptelov <ivan.koptelov@tarantool.org> wrote: >> >>>> >>>> On 20 Feb 2019, at 18:47, i.koptelov <ivan.koptelov@tarantool.org> wrote: >>>> >>>> Thanks to Alexander, I fixed my patch to use a function >>>> from icu to count the length of the string. >>>> >>>> Changes: > > Travis has failed. Please, make sure it is OK before sending the patch. > It doesn’t fail on my local (Mac) machine, so I guess this fail appears > only on Linux system. The problem is with badutf test (LENGTH tests). I’ve tried to reproduce the problem on my machine (using Docker with Ubuntu), but with no success. It seems like that different versions of icu4c lib provide different behavior of U8_FWD_1_UNSAFE. I propose to just inline these two lines (which we need) into some util function. Logic of these lines seems to be quite simple and obvious (after you read about utf8 on wikipedia), so I see no problem. #define U8_COUNT_TRAIL_BYTES_UNSAFE(leadByte) \ (((uint8_t)(leadByte)>=0xc2)+((uint8_t)(leadByte)>=0xe0)+((uint8_t)(leadByte)>=0xf0)) #define U8_FWD_1_UNSAFE(s, i) { \ (i)+=1+U8_COUNT_TRAIL_BYTES_UNSAFE((s)[i]); \ } > >>> Furthermore, description says that it “assumes well-formed UTF-8”, >>> which in our case is not true. So who knows what may happen if we pass >>> malformed byte sequence. I am not even saying that behaviour of >>> this function on invalid inputs may change later. >> >> In it's current implementation U8_FWD_1_UNSAFE satisfy our needs safely. Returned >> symbol length would never exceed byte_len. >> >> static int >> utf8_char_count(const unsigned char *str, int byte_len) >> { >> int symbol_count = 0; >> for (int i = 0; i < byte_len;) { >> U8_FWD_1_UNSAFE(str, i); >> symbol_count++; >> } >> return symbol_count; >> } >> >> I agree that it is a bad idea to relay on lib behaviour which may >> change lately. So maybe I would just inline these one line macros? >> Or use my own implementation, since it’s more efficient (but less beautiful) > > Nevermind, let's keep it as is. > I really worry only about the fact that in other places SQL_SKIP_UTF8 > is used instead. It handles only two-bytes utf8 symbols, meanwhile > U8_FWD_1_UNSAFE() accounts three and four bytes length symbols. > Can we use everywhere the same pattern? Yes, I think, we can. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-25 11:09 ` i.koptelov @ 2019-02-25 15:10 ` n.pettik 2019-02-26 13:33 ` i.koptelov 0 siblings, 1 reply; 22+ messages in thread From: n.pettik @ 2019-02-25 15:10 UTC (permalink / raw) To: tarantool-patches; +Cc: Ivan Koptelov [-- Attachment #1: Type: text/plain, Size: 3112 bytes --] > On 25 Feb 2019, at 14:09, i.koptelov <ivan.koptelov@tarantool.org> wrote: >> On 22 Feb 2019, at 15:59, n.pettik <korablev@tarantool.org> wrote: >>> On 20 Feb 2019, at 22:24, i.koptelov <ivan.koptelov@tarantool.org> wrote: >>>>> >>>>> On 20 Feb 2019, at 18:47, i.koptelov <ivan.koptelov@tarantool.org> wrote: >>>>> >>>>> Thanks to Alexander, I fixed my patch to use a function >>>>> from icu to count the length of the string. >>>>> >>>>> Changes: >> >> Travis has failed. Please, make sure it is OK before sending the patch. >> It doesn’t fail on my local (Mac) machine, so I guess this fail appears >> only on Linux system. > The problem is with badutf test (LENGTH tests). > I’ve tried to reproduce the problem on my machine (using Docker with Ubuntu), > but with no success. It seems like that different versions of icu4c lib > provide different behavior of U8_FWD_1_UNSAFE. > I propose to just inline these two lines (which we need) into > some util function. Logic of these lines seems to be quite simple > and obvious (after you read about utf8 on wikipedia), so I see no > problem. > > #define U8_COUNT_TRAIL_BYTES_UNSAFE(leadByte) \ > (((uint8_t)(leadByte)>=0xc2)+((uint8_t)(leadByte)>=0xe0)+((uint8_t)(leadByte)>=0xf0)) > > #define U8_FWD_1_UNSAFE(s, i) { \ > (i)+=1+U8_COUNT_TRAIL_BYTES_UNSAFE((s)[i]); \ > } That’s I was talking about. But using the macros with the same name as in utf library doesn’t look like a good pattern. Yep, you can use define guards like: #ifdef U8_COUNT_TRAIL_BYTES_UNSAFE #undef U8_COUNT_TRAIL_BYTES_UNSAFE #endif #define U8_COUNT_TRAIL_BYTES_UNSAFE But I’d rather just give it another name. Hence, taking into account comment below, we are going to substitute SQL_SKIP_UTF8() with implementation borrowed from icu library. >>>> Furthermore, description says that it “assumes well-formed UTF-8”, >>>> which in our case is not true. So who knows what may happen if we pass >>>> malformed byte sequence. I am not even saying that behaviour of >>>> this function on invalid inputs may change later. >>> >>> In it's current implementation U8_FWD_1_UNSAFE satisfy our needs safely. Returned >>> symbol length would never exceed byte_len. >>> >>> static int >>> utf8_char_count(const unsigned char *str, int byte_len) >>> { >>> int symbol_count = 0; >>> for (int i = 0; i < byte_len;) { >>> U8_FWD_1_UNSAFE(str, i); >>> symbol_count++; >>> } >>> return symbol_count; >>> } >>> >>> I agree that it is a bad idea to relay on lib behaviour which may >>> change lately. So maybe I would just inline these one line macros? >>> Or use my own implementation, since it’s more efficient (but less beautiful) >> >> Nevermind, let's keep it as is. >> I really worry only about the fact that in other places SQL_SKIP_UTF8 >> is used instead. It handles only two-bytes utf8 symbols, meanwhile >> U8_FWD_1_UNSAFE() accounts three and four bytes length symbols. >> Can we use everywhere the same pattern? > Yes, I think, we can. Ok, then will be waiting for updates. [-- Attachment #2: Type: text/html, Size: 16152 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-25 15:10 ` n.pettik @ 2019-02-26 13:33 ` i.koptelov 2019-02-26 17:50 ` n.pettik 0 siblings, 1 reply; 22+ messages in thread From: i.koptelov @ 2019-02-26 13:33 UTC (permalink / raw) To: tarantool-patches; +Cc: n.pettik > On 25 Feb 2019, at 18:10, n.pettik <korablev@tarantool.org> wrote: > > That’s I was talking about. But using the macros with the same > name as in utf library doesn’t look like a good pattern. Yep, you > can use define guards like: > > #ifdef U8_COUNT_TRAIL_BYTES_UNSAFE > #undef U8_COUNT_TRAIL_BYTES_UNSAFE > #endif > #define U8_COUNT_TRAIL_BYTES_UNSAFE > > But I’d rather just give it another name. > Hence, taking into account comment below, > we are going to substitute SQL_SKIP_UTF8() with > implementation borrowed from icu library. I changed the names to SQL_UTF8_FWD_1_UNSAFE and SQL_UTF8_COUNT_TRAIL_BYTES_UNSAFE. > >>>>> Furthermore, description says that it “assumes well-formed UTF-8”, >>>>> which in our case is not true. So who knows what may happen if we pass >>>>> malformed byte sequence. I am not even saying that behaviour of >>>>> this function on invalid inputs may change later. >>>> >>>> In it's current implementation U8_FWD_1_UNSAFE satisfy our needs safely. Returned >>>> symbol length would never exceed byte_len. >>>> >>>> static int >>>> utf8_char_count(const unsigned char *str, int byte_len) >>>> { >>>> int symbol_count = 0; >>>> for (int i = 0; i < byte_len;) { >>>> U8_FWD_1_UNSAFE(str, i); >>>> symbol_count++; >>>> } >>>> return symbol_count; >>>> } >>>> >>>> I agree that it is a bad idea to relay on lib behaviour which may >>>> change lately. So maybe I would just inline these one line macros? >>>> Or use my own implementation, since it’s more efficient (but less beautiful) >>> >>> Nevermind, let's keep it as is. >>> I really worry only about the fact that in other places SQL_SKIP_UTF8 >>> is used instead. It handles only two-bytes utf8 symbols, meanwhile >>> U8_FWD_1_UNSAFE() accounts three and four bytes length symbols. >>> Can we use everywhere the same pattern? >> Yes, I think, we can. > > Ok, then will be waiting for updates. So here are the updates: sqlUtf8CharLen is removed with utf8_char_count (which is just moved to utf.c). sqlUtf8CharLen used SQL_SKIP_UTF8 and itself was used only in one place. diff --git a/src/box/sql/utf.c b/src/box/sql/utf.c index 22e0de809..5a1862a52 100644 --- a/src/box/sql/utf.c +++ b/src/box/sql/utf.c @@ -157,30 +157,31 @@ sqlUtf8Read(const unsigned char **pz /* Pointer to string from which to read cha */ /* #define TRANSLATE_TRACE 1 */ -/* - * pZ is a UTF-8 encoded unicode string. If nByte is less than zero, - * return the number of unicode characters in pZ up to (but not including) - * the first 0x00 byte. If nByte is not less than zero, return the - * number of unicode characters in the first nByte of pZ (or up to - * the first 0x00, whichever comes first). +/** + * Return number of symbols in the given string. + * + * Number of symbols != byte size of string because some symbols + * are encoded with more than one byte. Also note that all + * symbols from 'str' to 'str + byte_len' would be counted, + * even if there is a '\0' somewhere between them. + * + * This function is implemented to be fast and indifferent to + * correctness of string being processed. If input string has + * even one invalid utf-8 sequence, then the resulting length + * could be arbitary in these boundaries (0 < len < byte_len). + * @param str String to be counted. + * @param byte_len Byte length of given string. + * @return number of symbols in the given string. */ int -sqlUtf8CharLen(const char *zIn, int nByte) +utf8_char_count(const unsigned char *str, int byte_len) { - int r = 0; - const u8 *z = (const u8 *)zIn; - const u8 *zTerm; - if (nByte >= 0) { - zTerm = &z[nByte]; - } else { - zTerm = (const u8 *)(-1); - } - assert(z <= zTerm); - while (*z != 0 && z < zTerm) { - SQL_SKIP_UTF8(z); - r++; + int symbol_count = 0; + for (int i = 0; i < byte_len;) { + SQL_UTF8_FWD_1_UNSAFE(str, i, byte_len); + symbol_count++; } - return r; + return symbol_count; } I inlined SQL_UTF8_FWD_1_UNSAFE and SQL_UTF8_COUNT_TRAIL_BYTES_UNSAFE from icu library, because different versions of the library provide slightly different behavior of UTF8_FWD_1_UNSAFE. It caused sql-tap/badutf test to fail on some platforms, while on the others it was working. diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h index 2830ab639..859450e88 100644 --- a/src/box/sql/sqlInt.h +++ b/src/box/sql/sqlInt.h @@ -3071,16 +3071,6 @@ struct TreeView { }; #endif /* SQL_DEBUG */ -/* - * Assuming zIn points to the first byte of a UTF-8 character, - * advance zIn to point to the first byte of the next UTF-8 character. - */ -#define SQL_SKIP_UTF8(zIn) { \ - if( (*(zIn++))>=0xc0 ){ \ - while( (*zIn & 0xc0)==0x80 ){ zIn++; } \ - } \ -} - /* * The sql_*_BKPT macros are substitutes for the error codes with * the same name but without the _BKPT suffix. These macros invoke @@ -4142,11 +4132,51 @@ void sql_drop_foreign_key(struct Parse *parse_context, struct SrcList *table, struct Token *constraint); +/** + * Counts the trail bytes for a UTF-8 lead byte of a valid UTF-8 + * sequence. Returns 0 for 0..0xc1. Undefined for 0xf5..0xff. + * leadByte might be evaluated multiple times. + * + * @param leadByte The first byte of a UTF-8 sequence. + * Must be 0..0xff. + */ +#define SQL_UTF8_COUNT_TRAIL_BYTES_UNSAFE(leadByte) \ + (((uint8_t)(leadByte) >= 0xc2) + ((uint8_t)(leadByte) >= 0xe0) + \ + ((uint8_t)(leadByte) >= 0xf0)) + +/** + * Advance the string offset from one code point boundary to the + * next. (Post-incrementing iteration.) + * "Unsafe" macro, assumes well-formed UTF-8. + * + * After the whole string is traversed, (str + i) points to the + * position right after the last element of the string (*). + * + * If resulting offset > byte_size then resulting offset is set + * to byte_size. This is to provide (*) in cases where it might + * be violated. + * + * SQL_UTF8_FWD_1_UNSAFE sometimes is used get the size of utf-8 + * character subsequence and we don't want to get summary size + * which exceeds total string size (in bytes). Consider example: + * + * '0xE0' - this is invalid utf-8 string because it consists only + * of first byte of 3 byte sequence. After traverse, the + * offset == 2 and we set it to 1, to keep (*). + * + * @param s const uint8_t * string. + * @param i string offset. + * @param byte_size byte size of the string. + */ +#define SQL_UTF8_FWD_1_UNSAFE(str, i, byte_size) \ + (i) += 1 + SQL_UTF8_COUNT_TRAIL_BYTES_UNSAFE((str)[i]); \ + (i) = (i) <= (byte_size) ? (i) : (byte_size); + void sqlDetach(Parse *, Expr *); int sqlAtoF(const char *z, double *, int); int sqlGetInt32(const char *, int *); int sqlAtoi(const char *); -int sqlUtf8CharLen(const char *pData, int nByte); +int utf8_char_count(const unsigned char *str, int byte_len); u32 sqlUtf8Read(const u8 **); LogEst sqlLogEst(u64); LogEst sqlLogEstAdd(LogEst, LogEst); Just removed SQL_SKIP_UTF8 with SQL_UTF8_FWD_1_UNSAFE. diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 8ddb9780f..5352ca6b8 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -128,33 +128,6 @@ typeofFunc(sql_context * context, int NotUsed, sql_value ** argv) sql_result_text(context, z, -1, SQL_STATIC); } -/** - * Return number of symbols in the given string. - * - * Number of symbols != byte size of string because some symbols - * are encoded with more than one byte. Also note that all - * symbols from 'str' to 'str + byte_len' would be counted, - * even if there is a '\0' somewhere between them. - * - * This function is implemented to be fast and indifferent to - * correctness of string being processed. If input string has - * even one invalid utf-8 sequence, then the resulting length - * could be arbitary in these boundaries (0 < len < byte_len). - * @param str String to be counted. - * @param byte_len Byte length of given string. - * @return number of symbols in the given string. - */ -static int -utf8_char_count(const unsigned char *str, int byte_len) -{ - int symbol_count = 0; - for (int i = 0; i < byte_len;) { - U8_FWD_1_UNSAFE(str, i); - symbol_count++; - } - return symbol_count; -} - /* * Implementation of the length() function */ @@ -413,17 +386,22 @@ substrFunc(sql_context * context, int argc, sql_value ** argv) * used because '\0' is not supposed to be * end-of-string symbol. */ - int n_chars = utf8_char_count(z, sql_value_bytes(argv[0])); + int byte_size = sql_value_bytes(argv[0]); + int n_chars = utf8_char_count(z, byte_size); int cnt = 0; + int i = 0; while (cnt < n_chars && p1) { - SQL_SKIP_UTF8(z); + SQL_UTF8_FWD_1_UNSAFE(z, i, byte_size); cnt++; p1--; } + z += i; + i = 0; for (z2 = z; cnt < n_chars && p2; p2--) { - SQL_SKIP_UTF8(z2); + SQL_UTF8_FWD_1_UNSAFE(z2, i, byte_size); cnt++; } + z2 += i; sql_result_text64(context, (char *)z, z2 - z, SQL_TRANSIENT); } else { @@ -890,7 +868,7 @@ likeFunc(sql_context *context, int argc, sql_value **argv) return; const char *const err_msg = "ESCAPE expression must be a single character"; - if (sqlUtf8CharLen((char *)zEsc, -1) != 1) { + if (utf8_char_count(zEsc, sql_value_bytes(argv[2])) != 1) { sql_result_error(context, err_msg, -1); return; } @@ -1268,8 +1246,6 @@ trimFunc(sql_context * context, int argc, sql_value ** argv) } else { const unsigned char *z = zCharSet; int trim_set_sz = sql_value_bytes(argv[1]); - int handled_bytes_cnt = trim_set_sz; - nChar = 0; /* * Count the number of UTF-8 characters passing * through the entire char set, but not up @@ -1277,12 +1253,7 @@ trimFunc(sql_context * context, int argc, sql_value ** argv) * to handle trimming set containing such * characters. */ - while(handled_bytes_cnt > 0) { - const unsigned char *prev_byte = z; - SQL_SKIP_UTF8(z); - handled_bytes_cnt -= (z - prev_byte); - nChar++; - } + nChar = utf8_char_count(z, trim_set_sz); if (nChar > 0) { azChar = contextMalloc(context, @@ -1292,12 +1263,13 @@ trimFunc(sql_context * context, int argc, sql_value ** argv) } aLen = (unsigned char *)&azChar[nChar]; z = zCharSet; + i = 0; nChar = 0; - handled_bytes_cnt = trim_set_sz; + int handled_bytes_cnt = trim_set_sz; while(handled_bytes_cnt > 0) { - azChar[nChar] = (unsigned char *)z; - SQL_SKIP_UTF8(z); - aLen[nChar] = (u8) (z - azChar[nChar]); + azChar[nChar] = (unsigned char *)(z + i); + SQL_UTF8_FWD_1_UNSAFE(z, i, trim_set_sz); + aLen[nChar] = (u8) (z + i - azChar[nChar]); handled_bytes_cnt -= aLen[nChar]; nChar++; } ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-26 13:33 ` i.koptelov @ 2019-02-26 17:50 ` n.pettik 2019-02-26 18:44 ` i.koptelov 0 siblings, 1 reply; 22+ messages in thread From: n.pettik @ 2019-02-26 17:50 UTC (permalink / raw) To: tarantool-patches; +Cc: Ivan Koptelov [-- Attachment #1: Type: text/plain, Size: 1041 bytes --] > On 26 Feb 2019, at 16:33, i.koptelov <ivan.koptelov@tarantool.org> wrote: >> On 25 Feb 2019, at 18:10, n.pettik <korablev@tarantool.org> wrote: >> >> That’s I was talking about. But using the macros with the same >> name as in utf library doesn’t look like a good pattern. Yep, you >> can use define guards like: >> >> #ifdef U8_COUNT_TRAIL_BYTES_UNSAFE >> #undef U8_COUNT_TRAIL_BYTES_UNSAFE >> #endif >> #define U8_COUNT_TRAIL_BYTES_UNSAFE >> >> But I’d rather just give it another name. >> Hence, taking into account comment below, >> we are going to substitute SQL_SKIP_UTF8() with >> implementation borrowed from icu library. > I changed the names to SQL_UTF8_FWD_1_UNSAFE and > SQL_UTF8_COUNT_TRAIL_BYTES_UNSAFE. Won’t bother you with portion of minor comments. I’ve pushed them, take a look. If they are OK, just squash (but don’t squash last commit) them and patch will be OK as well. Also, I’ve fixed a bit commit message. Add please doc request which includes user-visible changes. [-- Attachment #2: Type: text/html, Size: 3195 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-26 17:50 ` n.pettik @ 2019-02-26 18:44 ` i.koptelov 2019-02-26 20:16 ` Vladislav Shpilevoy 0 siblings, 1 reply; 22+ messages in thread From: i.koptelov @ 2019-02-26 18:44 UTC (permalink / raw) To: tarantool-patches; +Cc: n.pettik > On 26 Feb 2019, at 20:50, n.pettik <korablev@tarantool.org> wrote: > > Won’t bother you with portion of minor comments. > I’ve pushed them, take a look. If they are OK, just > squash (but don’t squash last commit) them and > patch will be OK as well. > > Also, I’ve fixed a bit commit message. Add please > doc request which includes user-visible changes. Thank you for the fixes! They are OK for me. Squashed them. Added doc request into the commit message. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-26 18:44 ` i.koptelov @ 2019-02-26 20:16 ` Vladislav Shpilevoy 2019-03-04 11:59 ` i.koptelov 0 siblings, 1 reply; 22+ messages in thread From: Vladislav Shpilevoy @ 2019-02-26 20:16 UTC (permalink / raw) To: tarantool-patches, i.koptelov; +Cc: n.pettik On 26/02/2019 21:44, i.koptelov wrote: > > >> On 26 Feb 2019, at 20:50, n.pettik <korablev@tarantool.org> wrote: >> >> Won’t bother you with portion of minor comments. >> I’ve pushed them, take a look. If they are OK, just >> squash (but don’t squash last commit) them and >> patch will be OK as well. >> >> Also, I’ve fixed a bit commit message. Add please >> doc request which includes user-visible changes. > Thank you for the fixes! They are OK for me. > Squashed them. Added doc request into the > commit message. > If under a doc request you mean that: https://github.com/tarantool/tarantool/commit/4704f531fc4af0e074410c552a4f3a9d9848c949 then it is incorrect. Please, look at tarantool/docbot for the proper syntax, and check your requests in the online table: https://tarantool-docbot.herokuapp.com If you do not see your request here, then it is incorrect. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-02-26 20:16 ` Vladislav Shpilevoy @ 2019-03-04 11:59 ` i.koptelov 0 siblings, 0 replies; 22+ messages in thread From: i.koptelov @ 2019-03-04 11:59 UTC (permalink / raw) To: tarantool-patches > On 26 Feb 2019, at 23:16, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: > > > > On 26/02/2019 21:44, i.koptelov wrote: >>> On 26 Feb 2019, at 20:50, n.pettik <korablev@tarantool.org> wrote: >>> >>> Won’t bother you with portion of minor comments. >>> I’ve pushed them, take a look. If they are OK, just >>> squash (but don’t squash last commit) them and >>> patch will be OK as well. >>> >>> Also, I’ve fixed a bit commit message. Add please >>> doc request which includes user-visible changes. >> Thank you for the fixes! They are OK for me. >> Squashed them. Added doc request into the >> commit message. > > If under a doc request you mean that: https://github.com/tarantool/tarantool/commit/4704f531fc4af0e074410c552a4f3a9d9848c949 then it is incorrect. Please, look at > tarantool/docbot for the proper syntax, and check your requests in the online > table: https://tarantool-docbot.herokuapp.com > > If you do not see your request here, then it is incorrect. > Thank you, I’ve changed the commit message. Now I can see my doc request in the online table. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' 2019-01-29 9:56 [tarantool-patches] [PATCH] sql: LIKE/LENGTH process '\0' Ivan Koptelov 2019-01-29 16:35 ` [tarantool-patches] " n.pettik @ 2019-03-04 15:30 ` Kirill Yukhin 1 sibling, 0 replies; 22+ messages in thread From: Kirill Yukhin @ 2019-03-04 15:30 UTC (permalink / raw) To: tarantool-patches Hello, On 29 Jan 12:56, Ivan Koptelov wrote: > Fixes LIKE and LENGTH functions. '\0' now treated as > a usual symbol. Strings with '\0' are now processed > entirely. Consider examples: > > LENGTH(CHAR(65,00,65)) == 3 > LIKE(CHAR(65,00,65), CHAR(65,00,66)) == False > > Closes #3542 > --- > Branch: [1]https://github.com/tarantool/tarantool/tree/sudobobo/gh-3542-LIKE/LEN-null-term > Issue: [2]https://github.com/tarantool/tarantool/issues/3542 I've checked in final patch and clean-up by Nikita into 2.1 branch. -- Regards, Kirill Yukhin ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2019-03-04 15:30 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-29 9:56 [tarantool-patches] [PATCH] sql: LIKE/LENGTH process '\0' Ivan Koptelov 2019-01-29 16:35 ` [tarantool-patches] " n.pettik 2019-02-04 12:34 ` Ivan Koptelov 2019-02-05 13:50 ` n.pettik 2019-02-07 15:14 ` i.koptelov 2019-02-11 13:15 ` n.pettik 2019-02-13 15:46 ` i.koptelov 2019-02-14 12:57 ` n.pettik 2019-02-20 13:54 ` i.koptelov 2019-02-20 15:47 ` i.koptelov 2019-02-20 16:04 ` n.pettik 2019-02-20 18:08 ` Vladislav Shpilevoy 2019-02-20 19:24 ` i.koptelov 2019-02-22 12:59 ` n.pettik 2019-02-25 11:09 ` i.koptelov 2019-02-25 15:10 ` n.pettik 2019-02-26 13:33 ` i.koptelov 2019-02-26 17:50 ` n.pettik 2019-02-26 18:44 ` i.koptelov 2019-02-26 20:16 ` Vladislav Shpilevoy 2019-03-04 11:59 ` i.koptelov 2019-03-04 15:30 ` Kirill Yukhin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox