[tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Wed May 23 13:29:38 MSK 2018


Hello. Thanks for the fixes! I see, that you changed branch to
kyukhin/gh-2371-utf8-spaces-check, right?

And that you added a separate commit to link with ICU. But with no
linking the first commit does not work. Maybe you should squash them?

See 4 comments below.

> diff --git a/src/box/sql/tokenize.c b/src/box/sql/tokenize.c
> index c77aa9b..4c01066 100644
> --- a/src/box/sql/tokenize.c
> +++ b/src/box/sql/tokenize.c
> @@ -36,17 +36,21 @@
>    * individual tokens and sends those tokens one-by-one over to the
>    * parser for analysis.
>    */
> -#include "sqliteInt.h"
>   #include <stdlib.h>
> +#include <unicode/utf8.h>
> +#include <unicode/uchar.h>
> +
>   #include "say.h"
> +#include "sqliteInt.h"
>   
>   /* Character classes for tokenizing
>    *
> - * In the sqlite3GetToken() function, a switch() on aiClass[c] is implemented
> - * using a lookup table, whereas a switch() directly on c uses a binary search.
> - * The lookup table is much faster.  To maximize speed, and to ensure that
> - * a lookup table is used, all of the classes need to be small integers and
> - * all of them need to be used within the switch.
> + * In the sql_token() function, a switch() on sql_ascii[c] is

1. No sql_ascii.
> @@ -167,360 +123,295 @@ const unsigned char ebcdicToAscii[] = {
>    */
>   #include "keywordhash.h"
>   
> -/*
> - * If X is a character that can be used in an identifier then
> - * IdChar(X) will be true.  Otherwise it is false.
> - *
> - * For ASCII, any character with the high-order bit set is
> - * allowed in an identifier.  For 7-bit characters,
> - * sqlite3IsIdChar[X] must be 1.
> - *
> - * For EBCDIC, the rules are more complex but have the same
> - * end result.
> +#define maybe_utf8(c) ((sqlite3CtypeMap[c] & 0x40) != 0)
> +
> +/**
> + * Return true if current symbol is space.
>    *
> - * Ticket #1066.  the SQL standard does not allow '$' in the
> - * middle of identifiers.  But many SQL implementations do.
> - * SQLite will allow '$' in identifiers for compatibility.
> - * But the feature is undocumented.
> + * @param z Input stream.
> + * @retval True if current symbo1l space.

2. symbo1l -> symbol is.
> diff --git a/test/sql-tap/gh-2371-utf8-spaces.test.lua b/test/sql-tap/gh-2371-utf8-spaces.test.lua

3. Can you do not create a new test file specially for the issue? I believe we will
support unicode not only as white spaces, so maybe worth to create a more common
file test/sql/unicode.test.lua.

> new file mode 100755
> index 0000000..191cc1c
> --- /dev/null
> +++ b/test/sql-tap/gh-2371-utf8-spaces.test.lua
> @@ -0,0 +1,37 @@
> +#!/usr/bin/env tarantool
> +test = require("sqltester")
> +test:plan(23 * 3)
> +
> +-- 23 entities
> +local utf8_spaces = {"\u{0009}", "\u{000A}", "\u{000B}", "\u{000C}", "\u{000D}",
> +                     "\u{0085}", "\u{1680}", "\u{2000}", "\u{2001}", "\u{2002}",
> +                     "\u{2003}", "\u{2004}", "\u{2005}", "\u{2006}", "\u{2007}",
> +                     "\u{2008}", "\u{2009}", "\u{200A}", "\u{2028}", "\u{2029}",
> +                     "\u{202F}", "\u{205F}", "\u{3000}"}
> +local spaces_cnt = 23
> +
> +-- 1. Check UTF-8 single space
> +for i, v in pairs(utf8_spaces) do
> +    test:do_execsql_test(
> +    	"utf8-spaces-1."..i,
> +    	"select" .. v .. "1",
> +    	{ 1 })

4. Here and 2 same places below you use on the same line 4 spaces + 1 tab with 8 width.
Please, use only spaces in .lua files.

> +end
> +
> +-- 2. Check pair simple + UTF-8 space
> +for i, v in pairs(utf8_spaces) do
> +    test:do_execsql_test(
> +    	"utf8-spaces-2."..i,
> +    	"select" .. v .. "1",
> +    	{ 1 })
> +end
> +
> +-- 3. Sequence of spaces
> +for i, v in pairs(utf8_spaces) do
> +    test:do_execsql_test(
> +    	"utf8-spaces-3."..i,
> +    	"select" .. v .. " " .. utf8_spaces[spaces_cnt - i + 1] .. " 1",
> +    	{ 1 })
> +end
> +
> +test:finish_test()
> 
> 




More information about the Tarantool-patches mailing list