From: "n.pettik" <korablev@tarantool.org> To: tarantool-patches@freelists.org Cc: Ivan Koptelov <ivan.koptelov@tarantool.org> Subject: [tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\0' Date: Tue, 5 Feb 2019 16:50:50 +0300 [thread overview] Message-ID: <07DBA796-6DD4-41DD-8438-104FE3AE05BB@tarantool.org> (raw) In-Reply-To: <fd70561d-6a0e-70dc-4c20-bdcac764040a@tarantool.org> [-- Attachment #1: Type: text/plain, Size: 4616 bytes --] > On 29/01/2019 19:35, n.pettik wrote: >>> Fixes LIKE and LENGTH functions. '\0' now treated as >> Nit: is treated. > Fixed. >>> a usual symbol. Strings with '\0' are now processed >>> entirely. Consider examples: >>> >>> LENGTH(CHAR(65,00,65)) == 3 >>> LIKE(CHAR(65,00,65), CHAR(65,00,66)) == False >> Also, I see that smth wrong with text in this mail again > I hope now the mail text is ok. Not quite. It is still highlighted in some way. Have no idea. > src/box/sql/func.c | 88 +++++++++++++----- > src/box/sql/vdbeInt.h | 2 +- > test/sql-tap/func.test.lua | 220 ++++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 284 insertions(+), 26 deletions(-) > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index e46b162d9..2978af983 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -128,6 +128,30 @@ typeofFunc(sqlite3_context * context, int NotUsed, sqlite3_value ** argv) > sqlite3_result_text(context, z, -1, SQLITE_STATIC); > } > > +/** > + * Return number of chars in the given string. > + * > + * Number of chars != byte size of string because some characters > + * are encoded with more than one byte. Also note that all > + * characters from 'str' to 'str + byte_len' would be counted, > + * even if there is a '\0' somewhere between them. > + * @param str String to be counted. > + * @param byte_len Byte length of given string. > + * @return Return what? > + */ > +static int > +count_chars(const unsigned char *str, size_t byte_len) Quite poor naming. I would call it utf8_str_len or smth with utf8 prefix. Mb it is worth to put it some utils source file. Also, consider using native U8_NEXT function from utf8.c, instead of custom SQLITE_SKIP_UTF8. It may be not so fast but safer I suppose. I don't insist though. > +{ What if str is NULL? Add at least an assertion. > + int n_chars = 0; > + const unsigned char *prev_z; > + for (size_t cnt = 0; cnt < byte_len; cnt += (str - prev_z)) { > + n_chars++; > + prev_z = str; > + SQLITE_SKIP_UTF8(str); > + } > + return n_chars; > +} You can rewrite this function in a simpler way without using SQLITE macroses. Read this topic: https://stackoverflow.com/questions/3911536/utf-8-unicode-whats-with-0xc0-and-0x80/3911566#3911566 It is quite useful. You may borrow implementation from there. > + > /* > * Implementation of the length() function > */ > @@ -150,11 +174,7 @@ lengthFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > const unsigned char *z = sqlite3_value_text(argv[0]); > if (z == 0) > return; > - len = 0; > - while (*z) { > - len++; > - SQLITE_SKIP_UTF8(z); > - } > + len = count_chars(z, sqlite3_value_bytes(argv[0])); > sqlite3_result_int(context, len); > break; > } > @@ -340,11 +360,8 @@ substrFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > if (z == 0) > return; > len = 0; > - if (p1 < 0) { > - for (z2 = z; *z2; len++) { > - SQLITE_SKIP_UTF8(z2); > - } > - } > + if (p1 < 0) > + len = count_chars(z, sqlite3_value_bytes(argv[0])); > } > #ifdef SQLITE_SUBSTR_COMPATIBILITY > /* If SUBSTR_COMPATIBILITY is defined then substr(X,0,N) work the same as > @@ -388,12 +405,21 @@ substrFunc(sqlite3_context * context, int argc, sqlite3_value ** argv) > } > assert(p1 >= 0 && p2 >= 0); > if (p0type != SQLITE_BLOB) { > - while (*z && p1) { > + /* > + * In the code below 'cnt' and 'n_chars' is > + * used because '\0' is not supposed to be > + * end-of-string symbol. > + */ > + int n_chars = count_chars(z, sqlite3_value_bytes(argv[0])); I’d better call it char_count or symbol_count or char_count. > diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua > index b7de1d955..8c712bd5e 100755 > --- a/test/sql-tap/func.test.lua > +++ b/test/sql-tap/func.test.lua > +-- REPLACE > +test:do_execsql_test( > + "func-62", > + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(00), CHAR(65)) LIKE 'AAAA';", > + {1}) > + > +test:do_execsql_test( > + "func-63", > + "SELECT REPLACE(CHAR(00,65,00,65), CHAR(65), CHAR(00)) \ > + LIKE CHAR(00,00,00,00);", > + {1}) > + > +-- SUBSTR > +test:do_execsql_test( > + "func-64", > + "SELECT SUBSTR(CHAR(65,00,66,67), 3, 2) LIKE CHAR(66, 67);", > + {1}) > + > +test:do_execsql_test( > + "func-65", > + "SELECT SUBSTR(CHAR(00,00,00,65), 1, 4) LIKE CHAR(00,00,00,65);", > + {1}) > + Just wondering: why do you use LIKE function almost in all tests? [-- Attachment #2: Type: text/html, Size: 11216 bytes --]
next prev parent reply other threads:[~2019-02-05 13:50 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-01-29 9:56 [tarantool-patches] " Ivan Koptelov 2019-01-29 16:35 ` [tarantool-patches] " n.pettik 2019-02-04 12:34 ` Ivan Koptelov 2019-02-05 13:50 ` n.pettik [this message] 2019-02-07 15:14 ` i.koptelov 2019-02-11 13:15 ` n.pettik 2019-02-13 15:46 ` i.koptelov 2019-02-14 12:57 ` n.pettik 2019-02-20 13:54 ` i.koptelov 2019-02-20 15:47 ` i.koptelov 2019-02-20 16:04 ` n.pettik 2019-02-20 18:08 ` Vladislav Shpilevoy 2019-02-20 19:24 ` i.koptelov 2019-02-22 12:59 ` n.pettik 2019-02-25 11:09 ` i.koptelov 2019-02-25 15:10 ` n.pettik 2019-02-26 13:33 ` i.koptelov 2019-02-26 17:50 ` n.pettik 2019-02-26 18:44 ` i.koptelov 2019-02-26 20:16 ` Vladislav Shpilevoy 2019-03-04 11:59 ` i.koptelov 2019-03-04 15:30 ` Kirill Yukhin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=07DBA796-6DD4-41DD-8438-104FE3AE05BB@tarantool.org \ --to=korablev@tarantool.org \ --cc=ivan.koptelov@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='[tarantool-patches] Re: [PATCH] sql: LIKE/LENGTH process '\''\0'\''' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox