Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: tarantool-patches@freelists.org, Kirill Yukhin <kyukhin@tarantool.org>
Subject: [tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space
Date: Wed, 23 May 2018 13:29:38 +0300	[thread overview]
Message-ID: <de0ebe1c-cdb6-e622-d02e-6a6a5163b679@tarantool.org> (raw)
In-Reply-To: <20180523055408.6axbc53uucslcu4h@tarantool.org>

Hello. Thanks for the fixes! I see, that you changed branch to
kyukhin/gh-2371-utf8-spaces-check, right?

And that you added a separate commit to link with ICU. But with no
linking the first commit does not work. Maybe you should squash them?

See 4 comments below.

> diff --git a/src/box/sql/tokenize.c b/src/box/sql/tokenize.c
> index c77aa9b..4c01066 100644
> --- a/src/box/sql/tokenize.c
> +++ b/src/box/sql/tokenize.c
> @@ -36,17 +36,21 @@
>    * individual tokens and sends those tokens one-by-one over to the
>    * parser for analysis.
>    */
> -#include "sqliteInt.h"
>   #include <stdlib.h>
> +#include <unicode/utf8.h>
> +#include <unicode/uchar.h>
> +
>   #include "say.h"
> +#include "sqliteInt.h"
>   
>   /* Character classes for tokenizing
>    *
> - * In the sqlite3GetToken() function, a switch() on aiClass[c] is implemented
> - * using a lookup table, whereas a switch() directly on c uses a binary search.
> - * The lookup table is much faster.  To maximize speed, and to ensure that
> - * a lookup table is used, all of the classes need to be small integers and
> - * all of them need to be used within the switch.
> + * In the sql_token() function, a switch() on sql_ascii[c] is

1. No sql_ascii.
> @@ -167,360 +123,295 @@ const unsigned char ebcdicToAscii[] = {
>    */
>   #include "keywordhash.h"
>   
> -/*
> - * If X is a character that can be used in an identifier then
> - * IdChar(X) will be true.  Otherwise it is false.
> - *
> - * For ASCII, any character with the high-order bit set is
> - * allowed in an identifier.  For 7-bit characters,
> - * sqlite3IsIdChar[X] must be 1.
> - *
> - * For EBCDIC, the rules are more complex but have the same
> - * end result.
> +#define maybe_utf8(c) ((sqlite3CtypeMap[c] & 0x40) != 0)
> +
> +/**
> + * Return true if current symbol is space.
>    *
> - * Ticket #1066.  the SQL standard does not allow '$' in the
> - * middle of identifiers.  But many SQL implementations do.
> - * SQLite will allow '$' in identifiers for compatibility.
> - * But the feature is undocumented.
> + * @param z Input stream.
> + * @retval True if current symbo1l space.

2. symbo1l -> symbol is.
> diff --git a/test/sql-tap/gh-2371-utf8-spaces.test.lua b/test/sql-tap/gh-2371-utf8-spaces.test.lua

3. Can you do not create a new test file specially for the issue? I believe we will
support unicode not only as white spaces, so maybe worth to create a more common
file test/sql/unicode.test.lua.

> new file mode 100755
> index 0000000..191cc1c
> --- /dev/null
> +++ b/test/sql-tap/gh-2371-utf8-spaces.test.lua
> @@ -0,0 +1,37 @@
> +#!/usr/bin/env tarantool
> +test = require("sqltester")
> +test:plan(23 * 3)
> +
> +-- 23 entities
> +local utf8_spaces = {"\u{0009}", "\u{000A}", "\u{000B}", "\u{000C}", "\u{000D}",
> +                     "\u{0085}", "\u{1680}", "\u{2000}", "\u{2001}", "\u{2002}",
> +                     "\u{2003}", "\u{2004}", "\u{2005}", "\u{2006}", "\u{2007}",
> +                     "\u{2008}", "\u{2009}", "\u{200A}", "\u{2028}", "\u{2029}",
> +                     "\u{202F}", "\u{205F}", "\u{3000}"}
> +local spaces_cnt = 23
> +
> +-- 1. Check UTF-8 single space
> +for i, v in pairs(utf8_spaces) do
> +    test:do_execsql_test(
> +    	"utf8-spaces-1."..i,
> +    	"select" .. v .. "1",
> +    	{ 1 })

4. Here and 2 same places below you use on the same line 4 spaces + 1 tab with 8 width.
Please, use only spaces in .lua files.

> +end
> +
> +-- 2. Check pair simple + UTF-8 space
> +for i, v in pairs(utf8_spaces) do
> +    test:do_execsql_test(
> +    	"utf8-spaces-2."..i,
> +    	"select" .. v .. "1",
> +    	{ 1 })
> +end
> +
> +-- 3. Sequence of spaces
> +for i, v in pairs(utf8_spaces) do
> +    test:do_execsql_test(
> +    	"utf8-spaces-3."..i,
> +    	"select" .. v .. " " .. utf8_spaces[spaces_cnt - i + 1] .. " 1",
> +    	{ 1 })
> +end
> +
> +test:finish_test()
> 
> 

  reply	other threads:[~2018-05-23 10:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-22 15:51 [tarantool-patches] " Kirill Yukhin
2018-05-22 18:06 ` [tarantool-patches] " Vladislav Shpilevoy
2018-05-23  5:15   ` Kirill Yukhin
2018-05-23  5:54     ` Kirill Yukhin
2018-05-23 10:29       ` Vladislav Shpilevoy [this message]
2018-05-23 14:05         ` Kirill Yukhin
2018-05-24 11:09           ` Vladislav Shpilevoy
2018-05-24 14:23             ` Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de0ebe1c-cdb6-e622-d02e-6a6a5163b679@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=kyukhin@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox