From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 3148120E1E for ; Wed, 23 May 2018 06:29:41 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VghH9haq8yJL for ; Wed, 23 May 2018 06:29:41 -0400 (EDT) Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 745FC20D47 for ; Wed, 23 May 2018 06:29:40 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space References: <20180523051500.3fx5uoetyzsjkwp5@tarantool.org> <20180523055408.6axbc53uucslcu4h@tarantool.org> From: Vladislav Shpilevoy Message-ID: Date: Wed, 23 May 2018 13:29:38 +0300 MIME-Version: 1.0 In-Reply-To: <20180523055408.6axbc53uucslcu4h@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org, Kirill Yukhin Hello. Thanks for the fixes! I see, that you changed branch to kyukhin/gh-2371-utf8-spaces-check, right? And that you added a separate commit to link with ICU. But with no linking the first commit does not work. Maybe you should squash them? See 4 comments below. > diff --git a/src/box/sql/tokenize.c b/src/box/sql/tokenize.c > index c77aa9b..4c01066 100644 > --- a/src/box/sql/tokenize.c > +++ b/src/box/sql/tokenize.c > @@ -36,17 +36,21 @@ > * individual tokens and sends those tokens one-by-one over to the > * parser for analysis. > */ > -#include "sqliteInt.h" > #include > +#include > +#include > + > #include "say.h" > +#include "sqliteInt.h" > > /* Character classes for tokenizing > * > - * In the sqlite3GetToken() function, a switch() on aiClass[c] is implemented > - * using a lookup table, whereas a switch() directly on c uses a binary search. > - * The lookup table is much faster. To maximize speed, and to ensure that > - * a lookup table is used, all of the classes need to be small integers and > - * all of them need to be used within the switch. > + * In the sql_token() function, a switch() on sql_ascii[c] is 1. No sql_ascii. > @@ -167,360 +123,295 @@ const unsigned char ebcdicToAscii[] = { > */ > #include "keywordhash.h" > > -/* > - * If X is a character that can be used in an identifier then > - * IdChar(X) will be true. Otherwise it is false. > - * > - * For ASCII, any character with the high-order bit set is > - * allowed in an identifier. For 7-bit characters, > - * sqlite3IsIdChar[X] must be 1. > - * > - * For EBCDIC, the rules are more complex but have the same > - * end result. > +#define maybe_utf8(c) ((sqlite3CtypeMap[c] & 0x40) != 0) > + > +/** > + * Return true if current symbol is space. > * > - * Ticket #1066. the SQL standard does not allow '$' in the > - * middle of identifiers. But many SQL implementations do. > - * SQLite will allow '$' in identifiers for compatibility. > - * But the feature is undocumented. > + * @param z Input stream. > + * @retval True if current symbo1l space. 2. symbo1l -> symbol is. > diff --git a/test/sql-tap/gh-2371-utf8-spaces.test.lua b/test/sql-tap/gh-2371-utf8-spaces.test.lua 3. Can you do not create a new test file specially for the issue? I believe we will support unicode not only as white spaces, so maybe worth to create a more common file test/sql/unicode.test.lua. > new file mode 100755 > index 0000000..191cc1c > --- /dev/null > +++ b/test/sql-tap/gh-2371-utf8-spaces.test.lua > @@ -0,0 +1,37 @@ > +#!/usr/bin/env tarantool > +test = require("sqltester") > +test:plan(23 * 3) > + > +-- 23 entities > +local utf8_spaces = {"\u{0009}", "\u{000A}", "\u{000B}", "\u{000C}", "\u{000D}", > + "\u{0085}", "\u{1680}", "\u{2000}", "\u{2001}", "\u{2002}", > + "\u{2003}", "\u{2004}", "\u{2005}", "\u{2006}", "\u{2007}", > + "\u{2008}", "\u{2009}", "\u{200A}", "\u{2028}", "\u{2029}", > + "\u{202F}", "\u{205F}", "\u{3000}"} > +local spaces_cnt = 23 > + > +-- 1. Check UTF-8 single space > +for i, v in pairs(utf8_spaces) do > + test:do_execsql_test( > + "utf8-spaces-1."..i, > + "select" .. v .. "1", > + { 1 }) 4. Here and 2 same places below you use on the same line 4 spaces + 1 tab with 8 width. Please, use only spaces in .lua files. > +end > + > +-- 2. Check pair simple + UTF-8 space > +for i, v in pairs(utf8_spaces) do > + test:do_execsql_test( > + "utf8-spaces-2."..i, > + "select" .. v .. "1", > + { 1 }) > +end > + > +-- 3. Sequence of spaces > +for i, v in pairs(utf8_spaces) do > + test:do_execsql_test( > + "utf8-spaces-3."..i, > + "select" .. v .. " " .. utf8_spaces[spaces_cnt - i + 1] .. " 1", > + { 1 }) > +end > + > +test:finish_test() > >