[tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: tarantool-patches@freelists.org, Kirill Yukhin <kyukhin@tarantool.org>
Subject: [tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space
Date: Thu, 24 May 2018 14:09:22 +0300	[thread overview]
Message-ID: <7cbf79d0-8242-4079-8641-1635a45e863d@tarantool.org> (raw)
In-Reply-To: <20180523140550.qqegk7wzv7ycc76m@tarantool.org>

LGTM.

On 23/05/2018 17:05, Kirill Yukhin wrote:
> Hi Vlad,
> Thanks for review! My comments inlined.
> 
> On 23 мая 13:29, Vladislav Shpilevoy wrote:
>> Hello. Thanks for the fixes! I see, that you changed branch to
>> kyukhin/gh-2371-utf8-spaces-check, right?
> Nope, branch is the same. This one was for Travis checking.
> Original branch contains updated patch, which in turn contains
> update for CMakeLists.txt. I've removed this temporary branch.
> 
> Updated patch in the bottom.
> 
>> And that you added a separate commit to link with ICU. But with no
>> linking the first commit does not work. Maybe you should squash them?
>>
>> See 4 comments below.
>>
>>> diff --git a/src/box/sql/tokenize.c b/src/box/sql/tokenize.c
>>> index c77aa9b..4c01066 100644
>>> --- a/src/box/sql/tokenize.c
>>> +++ b/src/box/sql/tokenize.c
>>> @@ -36,17 +36,21 @@
>>>     * individual tokens and sends those tokens one-by-one over to the
>>>     * parser for analysis.
>>>     */
>>> -#include "sqliteInt.h"
>>>    #include <stdlib.h>
>>> +#include <unicode/utf8.h>
>>> +#include <unicode/uchar.h>
>>> +
>>>    #include "say.h"
>>> +#include "sqliteInt.h"
>>>    /* Character classes for tokenizing
>>>     *
>>> - * In the sqlite3GetToken() function, a switch() on aiClass[c] is implemented
>>> - * using a lookup table, whereas a switch() directly on c uses a binary search.
>>> - * The lookup table is much faster.  To maximize speed, and to ensure that
>>> - * a lookup table is used, all of the classes need to be small integers and
>>> - * all of them need to be used within the switch.
>>> + * In the sql_token() function, a switch() on sql_ascii[c] is
>> 1. No sql_ascii.
> Fixed.
> 
>>> @@ -167,360 +123,295 @@ const unsigned char ebcdicToAscii[] = {
>>> - * Ticket #1066.  the SQL standard does not allow '$' in the
>>> - * middle of identifiers.  But many SQL implementations do.
>>> - * SQLite will allow '$' in identifiers for compatibility.
>>> - * But the feature is undocumented.
>>> + * @param z Input stream.
>>> + * @retval True if current symbo1l space.
>> 2. symbo1l -> symbol is.
> Fixed.
> 
>>> diff --git a/test/sql-tap/gh-2371-utf8-spaces.test.lua b/test/sql-tap/gh-2371-utf8-spaces.test.lua
>> 3. Can you do not create a new test file specially for the issue? I believe we will
>> support unicode not only as white spaces, so maybe worth to create a more common
>> file test/sql/unicode.test.lua.
> I'd prefer TAP tests for SQL FE. Renamed to unicode.test.lua.
> 
>>> new file mode 100755
>>> index 0000000..191cc1c
>>> --- /dev/null
>>> +++ b/test/sql-tap/gh-2371-utf8-spaces.test.lua
>>> +-- 1. Check UTF-8 single space
>>> +for i, v in pairs(utf8_spaces) do
>>> +    test:do_execsql_test(
>>> +    	"utf8-spaces-1."..i,
>>> +    	"select" .. v .. "1",
>>> +    	{ 1 })
>>
>> 4. Here and 2 same places below you use on the same line 4 spaces + 1 tab with 8 width.
>> Please, use only spaces in .lua files.
> Fixed.
> 
> --
> Regards, Kirill Yukhin
> 
> commit 46c40d785552d8cc652b31b48207032216bb067f
> Author: Kirill Yukhin <kyukhin@tarantool.org>
> Date:   Tue May 22 18:45:35 2018 +0300
> 
>      sql: allow any space symbols to be a white space
>      
>      ANSI SQL allows any of Unicode classes ZI, Zp or Zs to
>      act as white space symbol. Allow this in lexical analyzer.
>      Refactor lexical analyzer routine to follow Tarantool's
>      coding style.
>      Also, remove dead encoding: EBCDIC.
>      
>      Closes #2371
> 
> diff --git a/extra/mkkeywordhash.c b/extra/mkkeywordhash.c
> index cf34831..9e0e24b 100644
> --- a/extra/mkkeywordhash.c
> +++ b/extra/mkkeywordhash.c
> @@ -611,12 +611,7 @@ int main(int argc, char **argv){
>     printf("      if( aLen[i]!=n ) continue;\n");
>     printf("      j = 0;\n");
>     printf("      zKW = &zText[aOffset[i]];\n");
> -  printf("#ifdef SQLITE_ASCII\n");
>     printf("      while( j<n && (z[j]&~0x20)==zKW[j] ){ j++; }\n");
> -  printf("#endif\n");
> -  printf("#ifdef SQLITE_EBCDIC\n");
> -  printf("      while( j<n && toupper(z[j])==zKW[j] ){ j++; }\n");
> -  printf("#endif\n");
>     printf("      if( j<n ) continue;\n");
>     for(i=0; i<nKeyword; i++){
>       printf("      testcase( i==%d ); /* %s */\n",
> diff --git a/src/box/sql/CMakeLists.txt b/src/box/sql/CMakeLists.txt
> index 2204191..8a83a0a 100644
> --- a/src/box/sql/CMakeLists.txt
> +++ b/src/box/sql/CMakeLists.txt
> @@ -78,6 +78,7 @@ add_library(sql STATIC
>   )
>   set_target_properties(sql PROPERTIES COMPILE_DEFINITIONS
>       "${TEST_DEFINITIONS}")
> +target_link_libraries(sql ${ICU_LIBRARIES})
>   
>   add_custom_target(generate_sql_files DEPENDS
>       parse.h
> diff --git a/src/box/sql/alter.c b/src/box/sql/alter.c
> index c9c8f9b..c9325c5 100644
> --- a/src/box/sql/alter.c
> +++ b/src/box/sql/alter.c
> @@ -351,7 +351,7 @@ rename_table(sqlite3 *db, const char *sql_stmt, const char *table_name,
>   
>   	int token;
>   	Token old_name;
> -	unsigned char const *csr = (unsigned const char *)sql_stmt;
> +	const char *csr = sql_stmt;
>   	int len = 0;
>   	char *new_sql_stmt;
>   	bool unused;
> @@ -374,7 +374,7 @@ rename_table(sqlite3 *db, const char *sql_stmt, const char *table_name,
>   		 */
>   		do {
>   			csr += len;
> -			len = sqlite3GetToken(csr, &token, &unused);
> +			len = sql_token(csr, &token, &unused);
>   		} while (token == TK_SPACE);
>   		assert(len > 0);
>   	} while (token != TK_LP && token != TK_USING);
> @@ -430,13 +430,12 @@ rename_parent_table(sqlite3 *db, const char *sql_stmt, const char *old_name,
>   	bool is_quoted;
>   
>   	for (csr = sql_stmt; *csr; csr = csr + n) {
> -		n = sqlite3GetToken((const unsigned char *)csr, &token, &unused);
> +		n = sql_token(csr, &token, &unused);
>   		if (token == TK_REFERENCES) {
>   			char *zParent;
>   			do {
>   				csr += n;
> -				n = sqlite3GetToken((const unsigned char *)csr,
> -						    &token, &unused);
> +				n = sql_token(csr, &token, &unused);
>   			} while (token == TK_SPACE);
>   			if (token == TK_ILLEGAL)
>   				break;
> @@ -482,7 +481,7 @@ rename_trigger(sqlite3 *db, char const *sql_stmt, char const *table_name,
>   	int token;
>   	Token tname;
>   	int dist = 3;
> -	unsigned char const *csr = (unsigned char const*)sql_stmt;
> +	char const *csr = (char const*)sql_stmt;
>   	int len = 0;
>   	char *new_sql_stmt;
>   	bool unused;
> @@ -505,7 +504,7 @@ rename_trigger(sqlite3 *db, char const *sql_stmt, char const *table_name,
>   		 */
>   		do {
>   			csr += len;
> -			len = sqlite3GetToken(csr, &token, &unused);
> +			len = sql_token(csr, &token, &unused);
>   		} while (token == TK_SPACE);
>   		assert(len > 0);
>   		/* Variable 'dist' stores the number of tokens read since the most
> diff --git a/src/box/sql/complete.c b/src/box/sql/complete.c
> index 092d4fb..74b057b 100644
> --- a/src/box/sql/complete.c
> +++ b/src/box/sql/complete.c
> @@ -40,18 +40,7 @@
>   #include "sqliteInt.h"
>   #ifndef SQLITE_OMIT_COMPLETE
>   
> -/*
> - * This is defined in tokenize.c.  We just have to import the definition.
> - */
> -#ifndef SQLITE_AMALGAMATION
> -#ifdef SQLITE_ASCII
>   #define IdChar(C)  ((sqlite3CtypeMap[(unsigned char)C]&0x46)!=0)
> -#endif
> -#ifdef SQLITE_EBCDIC
> -extern const char sqlite3IsEbcdicIdChar[];
> -#define IdChar(C)  (((c=C)>=0x42 && sqlite3IsEbcdicIdChar[c-0x40]))
> -#endif
> -#endif				/* SQLITE_AMALGAMATION */
>   
>   /*
>    * Token types used by the sqlite3_complete() routine.  See the header
> @@ -230,9 +219,6 @@ sqlite3_complete(const char *zSql)
>   				break;
>   			}
>   		default:{
> -#ifdef SQLITE_EBCDIC
> -				unsigned char c;
> -#endif
>   				if (IdChar((u8) * zSql)) {
>   					/* Keywords and unquoted identifiers */
>   					int nId;
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index dcac22c..c06e3bd 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -623,12 +623,7 @@ struct compareInfo {
>    * macro for fast reading of the next character in the common case where
>    * the next character is ASCII.
>    */
> -#if defined(SQLITE_EBCDIC)
> -#define sqlite3Utf8Read(A)        (*((*A)++))
> -#define Utf8Read(A)               (*(A++))
> -#else
>   #define Utf8Read(s, e)    ucnv_getNextUChar(pUtf8conv, &s, e, &status)
> -#endif
>   
>   static const struct compareInfo globInfo = { '*', '?', '[', 0 };
>   
> diff --git a/src/box/sql/global.c b/src/box/sql/global.c
> index cd6f9c4..8e53bcc 100644
> --- a/src/box/sql/global.c
> +++ b/src/box/sql/global.c
> @@ -43,7 +43,6 @@
>    * involved are nearly as big or bigger than SQLite itself.
>    */
>   const unsigned char sqlite3UpperToLower[] = {
> -#ifdef SQLITE_ASCII
>   	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
>   	18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
>   	36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
> @@ -70,25 +69,6 @@ const unsigned char sqlite3UpperToLower[] = {
>   	234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247,
>   	    248, 249, 250, 251,
>   	252, 253, 254, 255
> -#endif
> -#ifdef SQLITE_EBCDIC
> -	    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,	/* 0x */
> -	16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,	/* 1x */
> -	32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,	/* 2x */
> -	48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,	/* 3x */
> -	64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,	/* 4x */
> -	80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,	/* 5x */
> -	96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,	/* 6x */
> -	112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127,	/* 7x */
> -	128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,	/* 8x */
> -	144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159,	/* 9x */
> -	160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 140, 141, 142, 175,	/* Ax */
> -	176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,	/* Bx */
> -	192, 129, 130, 131, 132, 133, 134, 135, 136, 137, 202, 203, 204, 205, 206, 207,	/* Cx */
> -	208, 145, 146, 147, 148, 149, 150, 151, 152, 153, 218, 219, 220, 221, 222, 223,	/* Dx */
> -	224, 225, 162, 163, 164, 165, 166, 167, 168, 169, 234, 235, 236, 237, 238, 239,	/* Ex */
> -	240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255,	/* Fx */
> -#endif
>   };
>   
>   /*
> @@ -119,7 +99,6 @@ const unsigned char sqlite3UpperToLower[] = {
>    * non-ASCII UTF character. Hence the test for whether or not a character is
>    * part of an identifier is 0x46.
>    */
> -#ifdef SQLITE_ASCII
>   const unsigned char sqlite3CtypeMap[256] = {
>   	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,	/* 00..07    ........ */
>   	0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x00, 0x00,	/* 08..0f    ........ */
> @@ -157,7 +136,6 @@ const unsigned char sqlite3CtypeMap[256] = {
>   	0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40,	/* f0..f7    ........ */
>   	0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40	/* f8..ff    ........ */
>   };
> -#endif
>   
>   /* EVIDENCE-OF: R-02982-34736 In order to maintain full backwards
>    * compatibility for legacy applications, the URI filename capability is
> diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h
> index b3db468..e6ffda4 100644
> --- a/src/box/sql/sqliteInt.h
> +++ b/src/box/sql/sqliteInt.h
> @@ -36,6 +36,8 @@
>   #ifndef SQLITEINT_H
>   #define SQLITEINT_H
>   
> +#define IdChar(C)  ((sqlite3CtypeMap[(unsigned char)C]&0x46)!=0)
> +
>   /* Special Comments:
>    *
>    * Some comments have special meaning to the tools that measure test
> @@ -1129,16 +1131,6 @@ sqlite3_bind_parameter_lindex(sqlite3_stmt * pStmt, const char *zName,
>   #define MAX(A,B) ((A)>(B)?(A):(B))
>   #endif
>   
> -/*
> - * Check to see if this machine uses EBCDIC.  (Yes, believe it or
> - * not, there are still machines out there that use EBCDIC.)
> - */
> -#if 'A' == '\301'
> -#define SQLITE_EBCDIC 1
> -#else
> -#define SQLITE_ASCII 1
> -#endif
> -
>   /*
>    * Integers of known sizes.  These typedefs might change for architectures
>    * where the sizes very.  Preprocessor macros are available so that the
> @@ -3368,21 +3360,11 @@ int sqlite3IoerrnomemError(int);
>   #define SQLITE_ENABLE_FTS3 1
>   #endif
>   
> -/*
> - * The ctype.h header is needed for non-ASCII systems.  It is also
> - * needed by FTS3 when FTS3 is included in the amalgamation.
> - */
> -#if !defined(SQLITE_ASCII) || \
> -    (defined(SQLITE_ENABLE_FTS3) && defined(SQLITE_AMALGAMATION))
> -#include <ctype.h>
> -#endif
> -
>   /*
>    * The following macros mimic the standard library functions toupper(),
>    * isspace(), isalnum(), isdigit() and isxdigit(), respectively. The
>    * sqlite versions only work for ASCII characters, regardless of locale.
>    */
> -#ifdef SQLITE_ASCII
>   #define sqlite3Toupper(x)  ((x)&~(sqlite3CtypeMap[(unsigned char)(x)]&0x20))
>   #define sqlite3Isspace(x)   (sqlite3CtypeMap[(unsigned char)(x)]&0x01)
>   #define sqlite3Isalnum(x)   (sqlite3CtypeMap[(unsigned char)(x)]&0x06)
> @@ -3391,16 +3373,6 @@ int sqlite3IoerrnomemError(int);
>   #define sqlite3Isxdigit(x)  (sqlite3CtypeMap[(unsigned char)(x)]&0x08)
>   #define sqlite3Tolower(x)   (sqlite3UpperToLower[(unsigned char)(x)])
>   #define sqlite3Isquote(x)   (sqlite3CtypeMap[(unsigned char)(x)]&0x80)
> -#else
> -#define sqlite3Toupper(x)   toupper((unsigned char)(x))
> -#define sqlite3Isspace(x)   isspace((unsigned char)(x))
> -#define sqlite3Isalnum(x)   isalnum((unsigned char)(x))
> -#define sqlite3Isalpha(x)   isalpha((unsigned char)(x))
> -#define sqlite3Isdigit(x)   isdigit((unsigned char)(x))
> -#define sqlite3Isxdigit(x)  isxdigit((unsigned char)(x))
> -#define sqlite3Tolower(x)   tolower((unsigned char)(x))
> -#define sqlite3Isquote(x)   ((x)=='"'||(x)=='\''||(x)=='['||(x)=='`')
> -#endif
>   
>   /*
>    * Internal function prototypes
> @@ -4164,7 +4136,18 @@ extern int sqlite3PendingByte;
>   #endif
>   void sqlite3Reindex(Parse *, Token *, Token *);
>   void sqlite3AlterRenameTable(Parse *, SrcList *, Token *);
> -int sqlite3GetToken(const unsigned char *, int *, bool *);
> +
> +/**
> + * Return the length (in bytes) of the token that begins at z[0].
> + * Store the token type in *type before returning.
> + *
> + * @param z Input stream.
> + * @param[out] type Detected type of token.
> + * @param[out] is_reserved True if reserved word.
> + */
> +int
> +sql_token(const char *z, int *type, bool *is_reserved);
> +
>   void sqlite3NestedParse(Parse *, const char *, ...);
>   void sqlite3ExpirePreparedStatements(sqlite3 *);
>   int sqlite3CodeSubselect(Parse *, Expr *, int);
> diff --git a/src/box/sql/tokenize.c b/src/box/sql/tokenize.c
> index c77aa9b..1766eef 100644
> --- a/src/box/sql/tokenize.c
> +++ b/src/box/sql/tokenize.c
> @@ -36,17 +36,21 @@
>    * individual tokens and sends those tokens one-by-one over to the
>    * parser for analysis.
>    */
> -#include "sqliteInt.h"
>   #include <stdlib.h>
> +#include <unicode/utf8.h>
> +#include <unicode/uchar.h>
> +
>   #include "say.h"
> +#include "sqliteInt.h"
>   
>   /* Character classes for tokenizing
>    *
> - * In the sqlite3GetToken() function, a switch() on aiClass[c] is implemented
> - * using a lookup table, whereas a switch() directly on c uses a binary search.
> - * The lookup table is much faster.  To maximize speed, and to ensure that
> - * a lookup table is used, all of the classes need to be small integers and
> - * all of them need to be used within the switch.
> + * In the sql_token() function, a switch() on sql_ascii_class[c]
> + * is implemented using a lookup table, whereas a switch()
> + * directly on c uses a binary search. The lookup table is much
> + * faster. To maximize speed, and to ensure that a lookup table is
> + * used, all of the classes need to be small integers and all of
> + * them need to be used within the switch.
>    */
>   #define CC_X          0		/* The letter 'x', or start of BLOB literal */
>   #define CC_KYWD       1		/* Alphabetics or '_'.  Usable in a keyword */
> @@ -77,10 +81,9 @@
>   #define CC_DOT       26		/* '.' */
>   #define CC_ILLEGAL   27		/* Illegal character */
>   
> -static const unsigned char aiClass[] = {
> -#ifdef SQLITE_ASCII
> +static const char sql_ascii_class[] = {
>   /*       x0  x1  x2  x3  x4  x5  x6  x7  x8 x9  xa xb  xc xd xe  xf */
> -/* 0x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 7,  7, 27, 7, 7, 27, 27,
> +/* 0x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 7,  7, 7, 7, 7, 27, 27,
>   /* 1x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
>   /* 2x */ 7, 15, 9, 5, 4, 22, 24, 8, 17, 18, 21, 20, 23, 11, 26, 16,
>   /* 3x */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 19, 12, 14, 13, 6,
> @@ -96,63 +99,16 @@ static const unsigned char aiClass[] = {
>   /* Dx */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>   /* Ex */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>   /* Fx */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
> -#endif
> -#ifdef SQLITE_EBCDIC
> -/*         x0  x1  x2  x3  x4  x5  x6  x7  x8  x9  xa  xb  xc  xd  xe  xf */
> -/* 0x */ 27, 27, 27, 27, 27, 7, 27, 27, 27, 27, 27, 27, 7, 7, 27,
> -	    27,
> -/* 1x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> -/* 2x */ 27, 27, 27, 27, 27, 7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> -/* 3x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> -/* 4x */ 7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 26, 12, 17, 20, 10,
> -/* 5x */ 24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15, 4, 21, 18, 19, 27,
> -/* 6x */ 11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22, 1, 13, 6,
> -/* 7x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 8, 5, 5, 5, 8, 14, 8,
> -/* 8x */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* 9x */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* Ax */ 27, 25, 1, 1, 1, 1, 1, 0, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* Bx */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 9, 27, 27, 27, 27, 27,
> -/* Cx */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* Dx */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* Ex */ 27, 27, 1, 1, 1, 1, 1, 0, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* Fx */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 27, 27, 27, 27, 27, 27,
> -#endif
>   };
>   
> -/*
> - * The charMap() macro maps alphabetic characters (only) into their
> - * lower-case ASCII equivalent.  On ASCII machines, this is just
> - * an upper-to-lower case map.  On EBCDIC machines we also need
> - * to adjust the encoding.  The mapping is only valid for alphabetics
> - * which are the only characters for which this feature is used.
> +/**
> + * The charMap() macro maps alphabetic characters (only) into
> + * their lower-case ASCII equivalent.  On ASCII machines, this
> + * is just an upper-to-lower case map.
>    *
>    * Used by keywordhash.h
>    */
> -#ifdef SQLITE_ASCII
>   #define charMap(X) sqlite3UpperToLower[(unsigned char)X]
> -#endif
> -#ifdef SQLITE_EBCDIC
> -#define charMap(X) ebcdicToAscii[(unsigned char)X]
> -const unsigned char ebcdicToAscii[] = {
> -/* 0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F */
> -	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,	/* 0x */
> -	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,	/* 1x */
> -	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,	/* 2x */
> -	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,	/* 3x */
> -	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,	/* 4x */
> -	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,	/* 5x */
> -	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95, 0, 0,	/* 6x */
> -	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,	/* 7x */
> -	0, 97, 98, 99, 100, 101, 102, 103, 104, 105, 0, 0, 0, 0, 0, 0,	/* 8x */
> -	0, 106, 107, 108, 109, 110, 111, 112, 113, 114, 0, 0, 0, 0, 0, 0,	/* 9x */
> -	0, 0, 115, 116, 117, 118, 119, 120, 121, 122, 0, 0, 0, 0, 0, 0,	/* Ax */
> -	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,	/* Bx */
> -	0, 97, 98, 99, 100, 101, 102, 103, 104, 105, 0, 0, 0, 0, 0, 0,	/* Cx */
> -	0, 106, 107, 108, 109, 110, 111, 112, 113, 114, 0, 0, 0, 0, 0, 0,	/* Dx */
> -	0, 0, 115, 116, 117, 118, 119, 120, 121, 122, 0, 0, 0, 0, 0, 0,	/* Ex */
> -	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,	/* Fx */
> -};
> -#endif
>   
>   /*
>    * The sqlite3KeywordCode function looks up an identifier to determine if
> @@ -167,360 +123,295 @@ const unsigned char ebcdicToAscii[] = {
>    */
>   #include "keywordhash.h"
>   
> -/*
> - * If X is a character that can be used in an identifier then
> - * IdChar(X) will be true.  Otherwise it is false.
> - *
> - * For ASCII, any character with the high-order bit set is
> - * allowed in an identifier.  For 7-bit characters,
> - * sqlite3IsIdChar[X] must be 1.
> - *
> - * For EBCDIC, the rules are more complex but have the same
> - * end result.
> +#define maybe_utf8(c) ((sqlite3CtypeMap[c] & 0x40) != 0)
> +
> +/**
> + * Return true if current symbol is space.
>    *
> - * Ticket #1066.  the SQL standard does not allow '$' in the
> - * middle of identifiers.  But many SQL implementations do.
> - * SQLite will allow '$' in identifiers for compatibility.
> - * But the feature is undocumented.
> + * @param z Input stream.
> + * @retval True if current symbol space.
>    */
> -#ifdef SQLITE_ASCII
> -#define IdChar(C)  ((sqlite3CtypeMap[(unsigned char)C]&0x46)!=0)
> -#endif
> -#ifdef SQLITE_EBCDIC
> -const char sqlite3IsEbcdicIdChar[] = {
> -/* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF */
> -	0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0,	/* 4x */
> -	0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0,	/* 5x */
> -	0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0,	/* 6x */
> -	0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,	/* 7x */
> -	0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0,	/* 8x */
> -	0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0,	/* 9x */
> -	1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,	/* Ax */
> -	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,	/* Bx */
> -	0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,	/* Cx */
> -	0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,	/* Dx */
> -	0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,	/* Ex */
> -	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,	/* Fx */
> -};
> -
> -#define IdChar(C)  (((c=C)>=0x42 && sqlite3IsEbcdicIdChar[c-0x40]))
> -#endif
> +static inline bool
> +sql_is_space_char(const char *z)
> +{
> +	if (sqlite3Isspace(z[0]))
> +		return true;
> +	if (maybe_utf8(*(unsigned char*)z)) {
> +		UChar32 c;
> +		int unused = 0;
> +		U8_NEXT_UNSAFE(z, unused, c);
> +		if (u_isspace(c))
> +			return true;
> +	}
> +	return false;
> +}
>   
> -/*
> - * Return the length (in bytes) of the token that begins at z[0].
> - * Store the token type in *tokenType before returning.
> +/**
> + * Calculate length of continuous sequence of
> + * space symbols.
> + *
> + * @param z Input stream.
> + * @retval Number of bytes which constitute sequence of spaces.
> + *         Can be 0 if first symbol in stram is not space.
>    */
> +static inline int
> +sql_skip_spaces(const char *z)
> +{
> +	int idx = 0;
> +	while (true) {
> +		if (sqlite3Isspace(z[idx])) {
> +			idx += 1;
> +		} else if (maybe_utf8(*(unsigned char *)(z + idx))) {
> +			UChar32 c;
> +			int new_offset = idx;
> +			U8_NEXT_UNSAFE(z, new_offset, c);
> +			if (!u_isspace(c))
> +				break;
> +			idx = new_offset;
> +		} else {
> +			break;
> +		}
> +	}
> +	return idx;
> +}
> +
>   int
> -sqlite3GetToken(const unsigned char *z, int *tokenType, bool *is_reserved)
> +sql_token(const char *z, int *type, bool *is_reserved)
>   {
>   	*is_reserved = false;
> -	int i, c;
> -	switch (aiClass[*z]) {	/* Switch on the character-class of the first byte
> -				 * of the token. See the comment on the CC_ defines
> -				 * above.
> -				 */
> -	case CC_SPACE:{
> -			testcase(z[0] == ' ');
> -			testcase(z[0] == '\t');
> -			testcase(z[0] == '\n');
> -			testcase(z[0] == '\f');
> -			testcase(z[0] == '\r');
> -			for (i = 1; sqlite3Isspace(z[i]); i++) {
> +	int i, n;
> +	char c, delim;
> +	/* Switch on the character-class of the first byte
> +	 * of the token. See the comment on the CC_ defines
> +	 * above.
> +	 */
> +	switch (sql_ascii_class[*(unsigned char*)z]) {
> +	case CC_SPACE:
> +		i = 1 + sql_skip_spaces(z+1);
> +		*type = TK_SPACE;
> +		return i;
> +	case CC_MINUS:
> +		if (z[1] == '-') {
> +			for (i = 2; (c = z[i]) != 0 && c != '\n'; i++) {
>   			}
> -			*tokenType = TK_SPACE;
> +			*type = TK_SPACE;
>   			return i;
>   		}
> -	case CC_MINUS:{
> -			if (z[1] == '-') {
> -				for (i = 2; (c = z[i]) != 0 && c != '\n'; i++) {
> -				}
> -				*tokenType = TK_SPACE;	/* IMP: R-22934-25134
> -				*/
> -				return i;
> -			}
> -			*tokenType = TK_MINUS;
> +		*type = TK_MINUS;
> +		return 1;
> +	case CC_LP:
> +		*type = TK_LP;
> +		return 1;
> +	case CC_RP:
> +		*type = TK_RP;
> +		return 1;
> +	case CC_SEMI:
> +		*type = TK_SEMI;
> +		return 1;
> +	case CC_PLUS:
> +		*type = TK_PLUS;
> +		return 1;
> +	case CC_STAR:
> +		*type = TK_STAR;
> +		return 1;
> +	case CC_SLASH:
> +		if (z[1] != '*' || z[2] == 0) {
> +			*type = TK_SLASH;
>   			return 1;
>   		}
> -	case CC_LP:{
> -			*tokenType = TK_LP;
> -			return 1;
> +		for (i = 3, c = z[2];
> +		     (c != '*' || z[i] != '/') && (c = z[i]) != 0;
> +		     i++) {
>   		}
> -	case CC_RP:{
> -			*tokenType = TK_RP;
> +		if (c)
> +			i++;
> +		*type = TK_SPACE;
> +		return i;
> +	case CC_PERCENT:
> +		*type = TK_REM;
> +		return 1;
> +	case CC_EQ:
> +		*type = TK_EQ;
> +		return 1 + (z[1] == '=');
> +	case CC_LT:
> +		if ((c = z[1]) == '=') {
> +			*type = TK_LE;
> +			return 2;
> +		} else if (c == '>') {
> +			*type = TK_NE;
> +			return 2;
> +		} else if (c == '<') {
> +			*type = TK_LSHIFT;
> +			return 2;
> +		} else {
> +			*type = TK_LT;
>   			return 1;
>   		}
> -	case CC_SEMI:{
> -			*tokenType = TK_SEMI;
> +	case CC_GT:
> +		if ((c = z[1]) == '=') {
> +			*type = TK_GE;
> +			return 2;
> +		} else if (c == '>') {
> +			*type = TK_RSHIFT;
> +			return 2;
> +		} else {
> +			*type = TK_GT;
>   			return 1;
>   		}
> -	case CC_PLUS:{
> -			*tokenType = TK_PLUS;
> +	case CC_BANG:
> +		if (z[1] != '=') {
> +			*type = TK_ILLEGAL;
>   			return 1;
> +		} else {
> +			*type = TK_NE;
> +			return 2;
>   		}
> -	case CC_STAR:{
> -			*tokenType = TK_STAR;
> +	case CC_PIPE:
> +		if (z[1] != '|') {
> +			*type = TK_BITOR;
>   			return 1;
> +		} else {
> +			*type = TK_CONCAT;
> +			return 2;
>   		}
> -	case CC_SLASH:{
> -			if (z[1] != '*' || z[2] == 0) {
> -				*tokenType = TK_SLASH;
> -				return 1;
> -			}
> -			for (i = 3, c = z[2];
> -			     (c != '*' || z[i] != '/') && (c = z[i]) != 0;
> -			     i++) {
> +	case CC_COMMA:
> +		*type = TK_COMMA;
> +		return 1;
> +	case CC_AND:
> +		*type = TK_BITAND;
> +		return 1;
> +	case CC_TILDA:
> +		*type = TK_BITNOT;
> +		return 1;
> +	case CC_QUOTE:
> +	case CC_DQUOTE:
> +		delim = z[0];
> +		for (i = 1; (c = z[i]) != 0; i++) {
> +			if (c == delim) {
> +				if (z[i + 1] == delim)
> +					i++;
> +				else
> +					break;
>   			}
> -			if (c)
> -				i++;
> -			*tokenType = TK_SPACE;	/* IMP: R-22934-25134
> -			*/
> +		}
> +		if (c == '\'') {
> +			*type = TK_STRING;
> +			return i + 1;
> +		} else if (c != 0) {
> +			*type = TK_ID;
> +			return i + 1;
> +		} else {
> +			*type = TK_ILLEGAL;
>   			return i;
>   		}
> -	case CC_PERCENT:{
> -			*tokenType = TK_REM;
> +		FALLTHROUGH;
> +	case CC_DOT:
> +		if (!sqlite3Isdigit(z[1])) {
> +			*type = TK_DOT;
>   			return 1;
>   		}
> -	case CC_EQ:{
> -			*tokenType = TK_EQ;
> -			return 1 + (z[1] == '=');
> -		}
> -	case CC_LT:{
> -			if ((c = z[1]) == '=') {
> -				*tokenType = TK_LE;
> -				return 2;
> -			} else if (c == '>') {
> -				*tokenType = TK_NE;
> -				return 2;
> -			} else if (c == '<') {
> -				*tokenType = TK_LSHIFT;
> -				return 2;
> -			} else {
> -				*tokenType = TK_LT;
> -				return 1;
> +		/* If the next character is a digit, this is a
> +		 * floating point number that begins with ".".
> +		 * Fall thru into the next case.
> +		 */
> +		FALLTHROUGH;
> +	case CC_DIGIT:
> +		*type = TK_INTEGER;
> +		if (z[0] == '0' && (z[1] == 'x' || z[1] == 'X') &&
> +		    sqlite3Isxdigit(z[2])) {
> +			for (i = 3; sqlite3Isxdigit(z[i]); i++) {
>   			}
> +			return i;
>   		}
> -	case CC_GT:{
> -			if ((c = z[1]) == '=') {
> -				*tokenType = TK_GE;
> -				return 2;
> -			} else if (c == '>') {
> -				*tokenType = TK_RSHIFT;
> -				return 2;
> -			} else {
> -				*tokenType = TK_GT;
> -				return 1;
> -			}
> +		for (i = 0; sqlite3Isdigit(z[i]); i++) {
>   		}
> -	case CC_BANG:{
> -			if (z[1] != '=') {
> -				*tokenType = TK_ILLEGAL;
> -				return 1;
> -			} else {
> -				*tokenType = TK_NE;
> -				return 2;
> +		if (z[i] == '.') {
> +			while (sqlite3Isdigit(z[++i])) {
>   			}
> +			*type = TK_FLOAT;
>   		}
> -	case CC_PIPE:{
> -			if (z[1] != '|') {
> -				*tokenType = TK_BITOR;
> -				return 1;
> -			} else {
> -				*tokenType = TK_CONCAT;
> -				return 2;
> -			}
> +		if ((z[i] == 'e' || z[i] == 'E') &&
> +		    (sqlite3Isdigit(z[i + 1])
> +		     || ((z[i + 1] == '+' || z[i + 1] == '-') &&
> +			 sqlite3Isdigit(z[i + 2])))) {
> +			i += 2;
> +			while (sqlite3Isdigit(z[i]))
> +				i++;
> +			*type = TK_FLOAT;
>   		}
> -	case CC_COMMA:{
> -			*tokenType = TK_COMMA;
> -			return 1;
> +		if (IdChar(z[i])) {
> +			*type = TK_ILLEGAL;
> +			while (IdChar(z[++i])) {
> +			}
>   		}
> -	case CC_AND:{
> -			*tokenType = TK_BITAND;
> -			return 1;
> +		return i;
> +	case CC_VARNUM:
> +		*type = TK_VARIABLE;
> +		for (i = 1; sqlite3Isdigit(z[i]); i++) {
>   		}
> -	case CC_TILDA:{
> -			*tokenType = TK_BITNOT;
> -			return 1;
> +		return i;
> +	case CC_DOLLAR:
> +	case CC_VARALPHA:
> +		n = 0;
> +		*type = TK_VARIABLE;
> +		for (i = 1; (c = z[i]) != 0; i++) {
> +			if (IdChar(c))
> +				n++;
> +			else
> +				break;
>   		}
> -	case CC_QUOTE:
> -	case CC_DQUOTE:{
> -			int delim = z[0];
> -			testcase(delim == '\'');
> -			testcase(delim == '"');
> -			for (i = 1; (c = z[i]) != 0; i++) {
> -				if (c == delim) {
> -					if (z[i + 1] == delim) {
> -						i++;
> -					} else {
> -						break;
> -					}
> -				}
> -			}
> -			if (c == '\'') {
> -				*tokenType = TK_STRING;
> -				return i + 1;
> -			} else if (c != 0) {
> -				*tokenType = TK_ID;
> -				return i + 1;
> -			} else {
> -				*tokenType = TK_ILLEGAL;
> -				return i;
> -			}
> -			FALLTHROUGH;
> +		if (n == 0)
> +			*type = TK_ILLEGAL;
> +		return i;
> +	case CC_KYWD:
> +		for (i = 1; sql_ascii_class[*(unsigned char*)(z+i)] <= CC_KYWD;
> +		     i++) {
>   		}
> -	case CC_DOT:{
> -#ifndef SQLITE_OMIT_FLOATING_POINT
> -			if (!sqlite3Isdigit(z[1]))
> -#endif
> -			{
> -				*tokenType = TK_DOT;
> -				return 1;
> -			}
> -			/* If the next character is a digit, this is a floating point
> -			 * number that begins with ".".  Fall thru into the next case
> +		if (!sql_is_space_char(z + i) && IdChar(z[i])) {
> +			/* This token started out using characters
> +			 * that can appear in keywords, but z[i] is
> +			 * a character not allowed within keywords,
> +			 * so this must be an identifier instead.
>   			 */
> -			FALLTHROUGH;
> +			i++;
> +			break;
>   		}
> -	case CC_DIGIT:{
> -			testcase(z[0] == '0');
> -			testcase(z[0] == '1');
> -			testcase(z[0] == '2');
> -			testcase(z[0] == '3');
> -			testcase(z[0] == '4');
> -			testcase(z[0] == '5');
> -			testcase(z[0] == '6');
> -			testcase(z[0] == '7');
> -			testcase(z[0] == '8');
> -			testcase(z[0] == '9');
> -			*tokenType = TK_INTEGER;
> -#ifndef SQLITE_OMIT_HEX_INTEGER
> -			if (z[0] == '0' && (z[1] == 'x' || z[1] == 'X')
> -			    && sqlite3Isxdigit(z[2])) {
> -				for (i = 3; sqlite3Isxdigit(z[i]); i++) {
> -				}
> -				return i;
> +		*type = TK_ID;
> +		return keywordCode(z, i, type, is_reserved);
> +	case CC_X:
> +		if (z[1] == '\'') {
> +			*type = TK_BLOB;
> +			for (i = 2; sqlite3Isxdigit(z[i]); i++) {
>   			}
> -#endif
> -			for (i = 0; sqlite3Isdigit(z[i]); i++) {
> -			}
> -#ifndef SQLITE_OMIT_FLOATING_POINT
> -			if (z[i] == '.') {
> -				i++;
> -				while (sqlite3Isdigit(z[i])) {
> +			if (z[i] != '\'' || i % 2) {
> +				*type = TK_ILLEGAL;
> +				while (z[i] != 0 && z[i] != '\'')
>   					i++;
> -				}
> -				*tokenType = TK_FLOAT;
>   			}
> -			if ((z[i] == 'e' || z[i] == 'E') &&
> -			    (sqlite3Isdigit(z[i + 1])
> -			     || ((z[i + 1] == '+' || z[i + 1] == '-')
> -				 && sqlite3Isdigit(z[i + 2]))
> -			    )
> -			    ) {
> -				i += 2;
> -				while (sqlite3Isdigit(z[i])) {
> -					i++;
> -				}
> -				*tokenType = TK_FLOAT;
> -			}
> -#endif
> -			while (IdChar(z[i])) {
> -				*tokenType = TK_ILLEGAL;
> +			if (z[i] != 0)
>   				i++;
> -			}
> -			return i;
> -		}
> -	case CC_VARNUM:{
> -			*tokenType = TK_VARIABLE;
> -			for (i = 1; sqlite3Isdigit(z[i]); i++) {
> -			}
>   			return i;
>   		}
> -	case CC_DOLLAR:
> -	case CC_VARALPHA:{
> -			int n = 0;
> -			testcase(z[0] == '$');
> -			testcase(z[0] == '@');
> -			testcase(z[0] == ':');
> -			testcase(z[0] == '#');
> -			*tokenType = TK_VARIABLE;
> -			for (i = 1; (c = z[i]) != 0; i++) {
> -				if (IdChar(c)) {
> -					n++;
> -#ifndef SQLITE_OMIT_TCL_VARIABLE
> -				} else if (c == '(' && n > 0) {
> -					do {
> -						i++;
> -					} while ((c = z[i]) != 0
> -						 && !sqlite3Isspace(c)
> -						 && c != ')');
> -					if (c == ')') {
> -						i++;
> -					} else {
> -						*tokenType = TK_ILLEGAL;
> -					}
> -					break;
> -				} else if (c == ':' && z[i + 1] == ':') {
> -					i++;
> -#endif
> -				} else {
> -					break;
> -				}
> -			}
> -			if (n == 0)
> -				*tokenType = TK_ILLEGAL;
> -			return i;
> -		}
> -	case CC_KYWD:{
> -			for (i = 1; aiClass[z[i]] <= CC_KYWD; i++) {
> -			}
> -			if (IdChar(z[i])) {
> -				/* This token started out using characters that can appear in keywords,
> -				 * but z[i] is a character not allowed within keywords, so this must
> -				 * be an identifier instead
> -				 */
> -				i++;
> -				break;
> -			}
> -			*tokenType = TK_ID;
> -			return keywordCode((char *)z, i, tokenType, is_reserved);
> -		}
> -	case CC_X:{
> -#ifndef SQLITE_OMIT_BLOB_LITERAL
> -			testcase(z[0] == 'x');
> -			testcase(z[0] == 'X');
> -			if (z[1] == '\'') {
> -				*tokenType = TK_BLOB;
> -				for (i = 2; sqlite3Isxdigit(z[i]); i++) {
> -				}
> -				if (z[i] != '\'' || i % 2) {
> -					*tokenType = TK_ILLEGAL;
> -					while (z[i] && z[i] != '\'') {
> -						i++;
> -					}
> -				}
> -				if (z[i])
> -					i++;
> -				return i;
> -			}
> -#endif
> -			/* If it is not a BLOB literal, then it must be an ID, since no
> -			 * SQL keywords start with the letter 'x'.  Fall through
> -			 */
> -			FALLTHROUGH;
> -		}
> -	case CC_ID:{
> -			i = 1;
> -			break;
> -		}
> -	default:{
> -			*tokenType = TK_ILLEGAL;
> -			return 1;
> -		}
> +		/* If it is not a BLOB literal, then it must be an
> +		 * ID, since no SQL keywords start with the letter
> +		 * 'x'.  Fall through.
> +		 */
> +		FALLTHROUGH;
> +	case CC_ID:
> +		i = 1;
> +		break;
> +	default:
> +		*type = TK_ILLEGAL;
> +		return 1;
>   	}
> -	while (IdChar(z[i])) {
> -		i++;
> +	int spaces_len = sql_skip_spaces(z);
> +	if (spaces_len > 0) {
> +		*type = TK_SPACE;
> +		return spaces_len;
>   	}
> -	*tokenType = TK_ID;
> +	while (IdChar(z[i]))
> +		i++;
> +	*type = TK_ID;
>   	return i;
>   }
>   
> @@ -566,8 +457,8 @@ sqlite3RunParser(Parse * pParse, const char *zSql, char **pzErrMsg)
>   		if (zSql[i] != 0) {
>   			pParse->sLastToken.z = &zSql[i];
>   			pParse->sLastToken.n =
> -			    sqlite3GetToken((u8 *) & zSql[i], &tokenType,
> -					    &pParse->sLastToken.isReserved);
> +			    sql_token(&zSql[i], &tokenType,
> +				      &pParse->sLastToken.isReserved);
>   			i += pParse->sLastToken.n;
>   			if (i > mxSqlLen) {
>   				pParse->rc = SQLITE_TOOBIG;
> diff --git a/src/box/sql/util.c b/src/box/sql/util.c
> index 8c4e7b9..0c2a050 100644
> --- a/src/box/sql/util.c
> +++ b/src/box/sql/util.c
> @@ -1228,12 +1228,7 @@ sqlite3HexToInt(int h)
>   {
>   	assert((h >= '0' && h <= '9') || (h >= 'a' && h <= 'f')
>   	       || (h >= 'A' && h <= 'F'));
> -#ifdef SQLITE_ASCII
>   	h += 9 * (1 & (h >> 6));
> -#endif
> -#ifdef SQLITE_EBCDIC
> -	h += 9 * (1 & ~(h >> 4));
> -#endif
>   	return (u8) (h & 0xf);
>   }
>   
> diff --git a/src/box/sql/vdbetrace.c b/src/box/sql/vdbetrace.c
> index 8623e68..63e2311 100644
> --- a/src/box/sql/vdbetrace.c
> +++ b/src/box/sql/vdbetrace.c
> @@ -57,7 +57,7 @@ findNextHostParameter(const char *zSql, int *pnToken)
>   
>   	*pnToken = 0;
>   	while (zSql[0]) {
> -		n = sqlite3GetToken((u8 *) zSql, &tokenType, &unused);
> +		n = sql_token(zSql, &tokenType, &unused);
>   		assert(n > 0 && tokenType != TK_ILLEGAL);
>   		if (tokenType == TK_VARIABLE) {
>   			*pnToken = n;
> diff --git a/src/box/sql/whereexpr.c b/src/box/sql/whereexpr.c
> index 34a1f13..c3a8634 100644
> --- a/src/box/sql/whereexpr.c
> +++ b/src/box/sql/whereexpr.c
> @@ -256,10 +256,6 @@ isLikeOrGlob(Parse * pParse,	/* Parsing and code generating context */
>   	if (!sqlite3IsLikeFunction(db, pExpr, pnoCase, wc)) {
>   		return 0;
>   	}
> -#ifdef SQLITE_EBCDIC
> -	if (*pnoCase)
> -		return 0;
> -#endif
>   	pList = pExpr->x.pList;
>   	pLeft = pList->a[1].pExpr;
>   	if (pLeft->op != TK_COLUMN || sqlite3ExprAffinity(pLeft) != SQLITE_AFF_TEXT	/* Value might be numeric */
> diff --git a/test/sql-tap/e_expr.test.lua b/test/sql-tap/e_expr.test.lua
> index d0f6895..f7f3b15 100755
> --- a/test/sql-tap/e_expr.test.lua
> +++ b/test/sql-tap/e_expr.test.lua
> @@ -1,6 +1,6 @@
>   #!/usr/bin/env tarantool
>   test = require("sqltester")
> -test:plan(14750)
> +test:plan(14748)
>   
>   --!./tcltestrunner.lua
>   -- 2010 July 16
> @@ -1506,8 +1506,6 @@ local test_cases12 ={
>       {10, "?123"},
>       {11, "@hello"},
>       {12, ":world"},
> -    {13, "$tcl"},
> -    {14, "$tcl(array)"},
>   
>       {15, "cname"},
>       {16, "tblname.cname"},
> diff --git a/test/sql-tap/unicode.test.lua b/test/sql-tap/unicode.test.lua
> new file mode 100755
> index 0000000..1990739
> --- /dev/null
> +++ b/test/sql-tap/unicode.test.lua
> @@ -0,0 +1,37 @@
> +#!/usr/bin/env tarantool
> +test = require("sqltester")
> +test:plan(23 * 3)
> +
> +-- 23 entities
> +local utf8_spaces = {"\u{0009}", "\u{000A}", "\u{000B}", "\u{000C}", "\u{000D}",
> +                     "\u{0085}", "\u{1680}", "\u{2000}", "\u{2001}", "\u{2002}",
> +                     "\u{2003}", "\u{2004}", "\u{2005}", "\u{2006}", "\u{2007}",
> +                     "\u{2008}", "\u{2009}", "\u{200A}", "\u{2028}", "\u{2029}",
> +                     "\u{202F}", "\u{205F}", "\u{3000}"}
> +local spaces_cnt = 23
> +
> +-- 1. Check UTF-8 single space
> +for i, v in pairs(utf8_spaces) do
> +    test:do_execsql_test(
> +        "utf8-spaces-1."..i,
> +        "select" .. v .. "1",
> +        { 1 })
> +end
> +
> +-- 2. Check pair simple + UTF-8 space
> +for i, v in pairs(utf8_spaces) do
> +    test:do_execsql_test(
> +        "utf8-spaces-2."..i,
> +        "select" .. v .. "1",
> +        { 1 })
> +end
> +
> +-- 3. Sequence of spaces
> +for i, v in pairs(utf8_spaces) do
> +    test:do_execsql_test(
> +        "utf8-spaces-3."..i,
> +        "select" .. v .. " " .. utf8_spaces[spaces_cnt - i + 1] .. " 1",
> +        { 1 })
> +end
> +
> +test:finish_test()
> 
> 
>

next prev parent reply	other threads:[~2018-05-24 11:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-22 15:51 [tarantool-patches] " Kirill Yukhin
2018-05-22 18:06 ` [tarantool-patches] " Vladislav Shpilevoy
2018-05-23  5:15   ` Kirill Yukhin
2018-05-23  5:54     ` Kirill Yukhin
2018-05-23 10:29       ` Vladislav Shpilevoy
2018-05-23 14:05         ` Kirill Yukhin
2018-05-24 11:09           ` Vladislav Shpilevoy [this message]
2018-05-24 14:23             ` Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7cbf79d0-8242-4079-8641-1635a45e863d@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=kyukhin@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox