From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: tarantool-patches@freelists.org, Kirill Yukhin <kyukhin@tarantool.org> Subject: [tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space Date: Wed, 23 May 2018 13:29:38 +0300 [thread overview] Message-ID: <de0ebe1c-cdb6-e622-d02e-6a6a5163b679@tarantool.org> (raw) In-Reply-To: <20180523055408.6axbc53uucslcu4h@tarantool.org> Hello. Thanks for the fixes! I see, that you changed branch to kyukhin/gh-2371-utf8-spaces-check, right? And that you added a separate commit to link with ICU. But with no linking the first commit does not work. Maybe you should squash them? See 4 comments below. > diff --git a/src/box/sql/tokenize.c b/src/box/sql/tokenize.c > index c77aa9b..4c01066 100644 > --- a/src/box/sql/tokenize.c > +++ b/src/box/sql/tokenize.c > @@ -36,17 +36,21 @@ > * individual tokens and sends those tokens one-by-one over to the > * parser for analysis. > */ > -#include "sqliteInt.h" > #include <stdlib.h> > +#include <unicode/utf8.h> > +#include <unicode/uchar.h> > + > #include "say.h" > +#include "sqliteInt.h" > > /* Character classes for tokenizing > * > - * In the sqlite3GetToken() function, a switch() on aiClass[c] is implemented > - * using a lookup table, whereas a switch() directly on c uses a binary search. > - * The lookup table is much faster. To maximize speed, and to ensure that > - * a lookup table is used, all of the classes need to be small integers and > - * all of them need to be used within the switch. > + * In the sql_token() function, a switch() on sql_ascii[c] is 1. No sql_ascii. > @@ -167,360 +123,295 @@ const unsigned char ebcdicToAscii[] = { > */ > #include "keywordhash.h" > > -/* > - * If X is a character that can be used in an identifier then > - * IdChar(X) will be true. Otherwise it is false. > - * > - * For ASCII, any character with the high-order bit set is > - * allowed in an identifier. For 7-bit characters, > - * sqlite3IsIdChar[X] must be 1. > - * > - * For EBCDIC, the rules are more complex but have the same > - * end result. > +#define maybe_utf8(c) ((sqlite3CtypeMap[c] & 0x40) != 0) > + > +/** > + * Return true if current symbol is space. > * > - * Ticket #1066. the SQL standard does not allow '$' in the > - * middle of identifiers. But many SQL implementations do. > - * SQLite will allow '$' in identifiers for compatibility. > - * But the feature is undocumented. > + * @param z Input stream. > + * @retval True if current symbo1l space. 2. symbo1l -> symbol is. > diff --git a/test/sql-tap/gh-2371-utf8-spaces.test.lua b/test/sql-tap/gh-2371-utf8-spaces.test.lua 3. Can you do not create a new test file specially for the issue? I believe we will support unicode not only as white spaces, so maybe worth to create a more common file test/sql/unicode.test.lua. > new file mode 100755 > index 0000000..191cc1c > --- /dev/null > +++ b/test/sql-tap/gh-2371-utf8-spaces.test.lua > @@ -0,0 +1,37 @@ > +#!/usr/bin/env tarantool > +test = require("sqltester") > +test:plan(23 * 3) > + > +-- 23 entities > +local utf8_spaces = {"\u{0009}", "\u{000A}", "\u{000B}", "\u{000C}", "\u{000D}", > + "\u{0085}", "\u{1680}", "\u{2000}", "\u{2001}", "\u{2002}", > + "\u{2003}", "\u{2004}", "\u{2005}", "\u{2006}", "\u{2007}", > + "\u{2008}", "\u{2009}", "\u{200A}", "\u{2028}", "\u{2029}", > + "\u{202F}", "\u{205F}", "\u{3000}"} > +local spaces_cnt = 23 > + > +-- 1. Check UTF-8 single space > +for i, v in pairs(utf8_spaces) do > + test:do_execsql_test( > + "utf8-spaces-1."..i, > + "select" .. v .. "1", > + { 1 }) 4. Here and 2 same places below you use on the same line 4 spaces + 1 tab with 8 width. Please, use only spaces in .lua files. > +end > + > +-- 2. Check pair simple + UTF-8 space > +for i, v in pairs(utf8_spaces) do > + test:do_execsql_test( > + "utf8-spaces-2."..i, > + "select" .. v .. "1", > + { 1 }) > +end > + > +-- 3. Sequence of spaces > +for i, v in pairs(utf8_spaces) do > + test:do_execsql_test( > + "utf8-spaces-3."..i, > + "select" .. v .. " " .. utf8_spaces[spaces_cnt - i + 1] .. " 1", > + { 1 }) > +end > + > +test:finish_test() > >
next prev parent reply other threads:[~2018-05-23 10:29 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-05-22 15:51 [tarantool-patches] " Kirill Yukhin 2018-05-22 18:06 ` [tarantool-patches] " Vladislav Shpilevoy 2018-05-23 5:15 ` Kirill Yukhin 2018-05-23 5:54 ` Kirill Yukhin 2018-05-23 10:29 ` Vladislav Shpilevoy [this message] 2018-05-23 14:05 ` Kirill Yukhin 2018-05-24 11:09 ` Vladislav Shpilevoy 2018-05-24 14:23 ` Kirill Yukhin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=de0ebe1c-cdb6-e622-d02e-6a6a5163b679@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=kyukhin@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='[tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox