From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: tarantool-patches@freelists.org, Kirill Yukhin <kyukhin@tarantool.org>
Subject: [tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space
Date: Thu, 24 May 2018 14:09:22 +0300 [thread overview]
Message-ID: <7cbf79d0-8242-4079-8641-1635a45e863d@tarantool.org> (raw)
In-Reply-To: <20180523140550.qqegk7wzv7ycc76m@tarantool.org>
LGTM.
On 23/05/2018 17:05, Kirill Yukhin wrote:
> Hi Vlad,
> Thanks for review! My comments inlined.
>
> On 23 мая 13:29, Vladislav Shpilevoy wrote:
>> Hello. Thanks for the fixes! I see, that you changed branch to
>> kyukhin/gh-2371-utf8-spaces-check, right?
> Nope, branch is the same. This one was for Travis checking.
> Original branch contains updated patch, which in turn contains
> update for CMakeLists.txt. I've removed this temporary branch.
>
> Updated patch in the bottom.
>
>> And that you added a separate commit to link with ICU. But with no
>> linking the first commit does not work. Maybe you should squash them?
>>
>> See 4 comments below.
>>
>>> diff --git a/src/box/sql/tokenize.c b/src/box/sql/tokenize.c
>>> index c77aa9b..4c01066 100644
>>> --- a/src/box/sql/tokenize.c
>>> +++ b/src/box/sql/tokenize.c
>>> @@ -36,17 +36,21 @@
>>> * individual tokens and sends those tokens one-by-one over to the
>>> * parser for analysis.
>>> */
>>> -#include "sqliteInt.h"
>>> #include <stdlib.h>
>>> +#include <unicode/utf8.h>
>>> +#include <unicode/uchar.h>
>>> +
>>> #include "say.h"
>>> +#include "sqliteInt.h"
>>> /* Character classes for tokenizing
>>> *
>>> - * In the sqlite3GetToken() function, a switch() on aiClass[c] is implemented
>>> - * using a lookup table, whereas a switch() directly on c uses a binary search.
>>> - * The lookup table is much faster. To maximize speed, and to ensure that
>>> - * a lookup table is used, all of the classes need to be small integers and
>>> - * all of them need to be used within the switch.
>>> + * In the sql_token() function, a switch() on sql_ascii[c] is
>> 1. No sql_ascii.
> Fixed.
>
>>> @@ -167,360 +123,295 @@ const unsigned char ebcdicToAscii[] = {
>>> - * Ticket #1066. the SQL standard does not allow '$' in the
>>> - * middle of identifiers. But many SQL implementations do.
>>> - * SQLite will allow '$' in identifiers for compatibility.
>>> - * But the feature is undocumented.
>>> + * @param z Input stream.
>>> + * @retval True if current symbo1l space.
>> 2. symbo1l -> symbol is.
> Fixed.
>
>>> diff --git a/test/sql-tap/gh-2371-utf8-spaces.test.lua b/test/sql-tap/gh-2371-utf8-spaces.test.lua
>> 3. Can you do not create a new test file specially for the issue? I believe we will
>> support unicode not only as white spaces, so maybe worth to create a more common
>> file test/sql/unicode.test.lua.
> I'd prefer TAP tests for SQL FE. Renamed to unicode.test.lua.
>
>>> new file mode 100755
>>> index 0000000..191cc1c
>>> --- /dev/null
>>> +++ b/test/sql-tap/gh-2371-utf8-spaces.test.lua
>>> +-- 1. Check UTF-8 single space
>>> +for i, v in pairs(utf8_spaces) do
>>> + test:do_execsql_test(
>>> + "utf8-spaces-1."..i,
>>> + "select" .. v .. "1",
>>> + { 1 })
>>
>> 4. Here and 2 same places below you use on the same line 4 spaces + 1 tab with 8 width.
>> Please, use only spaces in .lua files.
> Fixed.
>
> --
> Regards, Kirill Yukhin
>
> commit 46c40d785552d8cc652b31b48207032216bb067f
> Author: Kirill Yukhin <kyukhin@tarantool.org>
> Date: Tue May 22 18:45:35 2018 +0300
>
> sql: allow any space symbols to be a white space
>
> ANSI SQL allows any of Unicode classes ZI, Zp or Zs to
> act as white space symbol. Allow this in lexical analyzer.
> Refactor lexical analyzer routine to follow Tarantool's
> coding style.
> Also, remove dead encoding: EBCDIC.
>
> Closes #2371
>
> diff --git a/extra/mkkeywordhash.c b/extra/mkkeywordhash.c
> index cf34831..9e0e24b 100644
> --- a/extra/mkkeywordhash.c
> +++ b/extra/mkkeywordhash.c
> @@ -611,12 +611,7 @@ int main(int argc, char **argv){
> printf(" if( aLen[i]!=n ) continue;\n");
> printf(" j = 0;\n");
> printf(" zKW = &zText[aOffset[i]];\n");
> - printf("#ifdef SQLITE_ASCII\n");
> printf(" while( j<n && (z[j]&~0x20)==zKW[j] ){ j++; }\n");
> - printf("#endif\n");
> - printf("#ifdef SQLITE_EBCDIC\n");
> - printf(" while( j<n && toupper(z[j])==zKW[j] ){ j++; }\n");
> - printf("#endif\n");
> printf(" if( j<n ) continue;\n");
> for(i=0; i<nKeyword; i++){
> printf(" testcase( i==%d ); /* %s */\n",
> diff --git a/src/box/sql/CMakeLists.txt b/src/box/sql/CMakeLists.txt
> index 2204191..8a83a0a 100644
> --- a/src/box/sql/CMakeLists.txt
> +++ b/src/box/sql/CMakeLists.txt
> @@ -78,6 +78,7 @@ add_library(sql STATIC
> )
> set_target_properties(sql PROPERTIES COMPILE_DEFINITIONS
> "${TEST_DEFINITIONS}")
> +target_link_libraries(sql ${ICU_LIBRARIES})
>
> add_custom_target(generate_sql_files DEPENDS
> parse.h
> diff --git a/src/box/sql/alter.c b/src/box/sql/alter.c
> index c9c8f9b..c9325c5 100644
> --- a/src/box/sql/alter.c
> +++ b/src/box/sql/alter.c
> @@ -351,7 +351,7 @@ rename_table(sqlite3 *db, const char *sql_stmt, const char *table_name,
>
> int token;
> Token old_name;
> - unsigned char const *csr = (unsigned const char *)sql_stmt;
> + const char *csr = sql_stmt;
> int len = 0;
> char *new_sql_stmt;
> bool unused;
> @@ -374,7 +374,7 @@ rename_table(sqlite3 *db, const char *sql_stmt, const char *table_name,
> */
> do {
> csr += len;
> - len = sqlite3GetToken(csr, &token, &unused);
> + len = sql_token(csr, &token, &unused);
> } while (token == TK_SPACE);
> assert(len > 0);
> } while (token != TK_LP && token != TK_USING);
> @@ -430,13 +430,12 @@ rename_parent_table(sqlite3 *db, const char *sql_stmt, const char *old_name,
> bool is_quoted;
>
> for (csr = sql_stmt; *csr; csr = csr + n) {
> - n = sqlite3GetToken((const unsigned char *)csr, &token, &unused);
> + n = sql_token(csr, &token, &unused);
> if (token == TK_REFERENCES) {
> char *zParent;
> do {
> csr += n;
> - n = sqlite3GetToken((const unsigned char *)csr,
> - &token, &unused);
> + n = sql_token(csr, &token, &unused);
> } while (token == TK_SPACE);
> if (token == TK_ILLEGAL)
> break;
> @@ -482,7 +481,7 @@ rename_trigger(sqlite3 *db, char const *sql_stmt, char const *table_name,
> int token;
> Token tname;
> int dist = 3;
> - unsigned char const *csr = (unsigned char const*)sql_stmt;
> + char const *csr = (char const*)sql_stmt;
> int len = 0;
> char *new_sql_stmt;
> bool unused;
> @@ -505,7 +504,7 @@ rename_trigger(sqlite3 *db, char const *sql_stmt, char const *table_name,
> */
> do {
> csr += len;
> - len = sqlite3GetToken(csr, &token, &unused);
> + len = sql_token(csr, &token, &unused);
> } while (token == TK_SPACE);
> assert(len > 0);
> /* Variable 'dist' stores the number of tokens read since the most
> diff --git a/src/box/sql/complete.c b/src/box/sql/complete.c
> index 092d4fb..74b057b 100644
> --- a/src/box/sql/complete.c
> +++ b/src/box/sql/complete.c
> @@ -40,18 +40,7 @@
> #include "sqliteInt.h"
> #ifndef SQLITE_OMIT_COMPLETE
>
> -/*
> - * This is defined in tokenize.c. We just have to import the definition.
> - */
> -#ifndef SQLITE_AMALGAMATION
> -#ifdef SQLITE_ASCII
> #define IdChar(C) ((sqlite3CtypeMap[(unsigned char)C]&0x46)!=0)
> -#endif
> -#ifdef SQLITE_EBCDIC
> -extern const char sqlite3IsEbcdicIdChar[];
> -#define IdChar(C) (((c=C)>=0x42 && sqlite3IsEbcdicIdChar[c-0x40]))
> -#endif
> -#endif /* SQLITE_AMALGAMATION */
>
> /*
> * Token types used by the sqlite3_complete() routine. See the header
> @@ -230,9 +219,6 @@ sqlite3_complete(const char *zSql)
> break;
> }
> default:{
> -#ifdef SQLITE_EBCDIC
> - unsigned char c;
> -#endif
> if (IdChar((u8) * zSql)) {
> /* Keywords and unquoted identifiers */
> int nId;
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index dcac22c..c06e3bd 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -623,12 +623,7 @@ struct compareInfo {
> * macro for fast reading of the next character in the common case where
> * the next character is ASCII.
> */
> -#if defined(SQLITE_EBCDIC)
> -#define sqlite3Utf8Read(A) (*((*A)++))
> -#define Utf8Read(A) (*(A++))
> -#else
> #define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, &status)
> -#endif
>
> static const struct compareInfo globInfo = { '*', '?', '[', 0 };
>
> diff --git a/src/box/sql/global.c b/src/box/sql/global.c
> index cd6f9c4..8e53bcc 100644
> --- a/src/box/sql/global.c
> +++ b/src/box/sql/global.c
> @@ -43,7 +43,6 @@
> * involved are nearly as big or bigger than SQLite itself.
> */
> const unsigned char sqlite3UpperToLower[] = {
> -#ifdef SQLITE_ASCII
> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
> 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
> 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
> @@ -70,25 +69,6 @@ const unsigned char sqlite3UpperToLower[] = {
> 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247,
> 248, 249, 250, 251,
> 252, 253, 254, 255
> -#endif
> -#ifdef SQLITE_EBCDIC
> - 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, /* 0x */
> - 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, /* 1x */
> - 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, /* 2x */
> - 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, /* 3x */
> - 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, /* 4x */
> - 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, /* 5x */
> - 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, /* 6x */
> - 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, /* 7x */
> - 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, /* 8x */
> - 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, /* 9x */
> - 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 140, 141, 142, 175, /* Ax */
> - 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, /* Bx */
> - 192, 129, 130, 131, 132, 133, 134, 135, 136, 137, 202, 203, 204, 205, 206, 207, /* Cx */
> - 208, 145, 146, 147, 148, 149, 150, 151, 152, 153, 218, 219, 220, 221, 222, 223, /* Dx */
> - 224, 225, 162, 163, 164, 165, 166, 167, 168, 169, 234, 235, 236, 237, 238, 239, /* Ex */
> - 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, /* Fx */
> -#endif
> };
>
> /*
> @@ -119,7 +99,6 @@ const unsigned char sqlite3UpperToLower[] = {
> * non-ASCII UTF character. Hence the test for whether or not a character is
> * part of an identifier is 0x46.
> */
> -#ifdef SQLITE_ASCII
> const unsigned char sqlite3CtypeMap[256] = {
> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 00..07 ........ */
> 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x00, 0x00, /* 08..0f ........ */
> @@ -157,7 +136,6 @@ const unsigned char sqlite3CtypeMap[256] = {
> 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, /* f0..f7 ........ */
> 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40 /* f8..ff ........ */
> };
> -#endif
>
> /* EVIDENCE-OF: R-02982-34736 In order to maintain full backwards
> * compatibility for legacy applications, the URI filename capability is
> diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h
> index b3db468..e6ffda4 100644
> --- a/src/box/sql/sqliteInt.h
> +++ b/src/box/sql/sqliteInt.h
> @@ -36,6 +36,8 @@
> #ifndef SQLITEINT_H
> #define SQLITEINT_H
>
> +#define IdChar(C) ((sqlite3CtypeMap[(unsigned char)C]&0x46)!=0)
> +
> /* Special Comments:
> *
> * Some comments have special meaning to the tools that measure test
> @@ -1129,16 +1131,6 @@ sqlite3_bind_parameter_lindex(sqlite3_stmt * pStmt, const char *zName,
> #define MAX(A,B) ((A)>(B)?(A):(B))
> #endif
>
> -/*
> - * Check to see if this machine uses EBCDIC. (Yes, believe it or
> - * not, there are still machines out there that use EBCDIC.)
> - */
> -#if 'A' == '\301'
> -#define SQLITE_EBCDIC 1
> -#else
> -#define SQLITE_ASCII 1
> -#endif
> -
> /*
> * Integers of known sizes. These typedefs might change for architectures
> * where the sizes very. Preprocessor macros are available so that the
> @@ -3368,21 +3360,11 @@ int sqlite3IoerrnomemError(int);
> #define SQLITE_ENABLE_FTS3 1
> #endif
>
> -/*
> - * The ctype.h header is needed for non-ASCII systems. It is also
> - * needed by FTS3 when FTS3 is included in the amalgamation.
> - */
> -#if !defined(SQLITE_ASCII) || \
> - (defined(SQLITE_ENABLE_FTS3) && defined(SQLITE_AMALGAMATION))
> -#include <ctype.h>
> -#endif
> -
> /*
> * The following macros mimic the standard library functions toupper(),
> * isspace(), isalnum(), isdigit() and isxdigit(), respectively. The
> * sqlite versions only work for ASCII characters, regardless of locale.
> */
> -#ifdef SQLITE_ASCII
> #define sqlite3Toupper(x) ((x)&~(sqlite3CtypeMap[(unsigned char)(x)]&0x20))
> #define sqlite3Isspace(x) (sqlite3CtypeMap[(unsigned char)(x)]&0x01)
> #define sqlite3Isalnum(x) (sqlite3CtypeMap[(unsigned char)(x)]&0x06)
> @@ -3391,16 +3373,6 @@ int sqlite3IoerrnomemError(int);
> #define sqlite3Isxdigit(x) (sqlite3CtypeMap[(unsigned char)(x)]&0x08)
> #define sqlite3Tolower(x) (sqlite3UpperToLower[(unsigned char)(x)])
> #define sqlite3Isquote(x) (sqlite3CtypeMap[(unsigned char)(x)]&0x80)
> -#else
> -#define sqlite3Toupper(x) toupper((unsigned char)(x))
> -#define sqlite3Isspace(x) isspace((unsigned char)(x))
> -#define sqlite3Isalnum(x) isalnum((unsigned char)(x))
> -#define sqlite3Isalpha(x) isalpha((unsigned char)(x))
> -#define sqlite3Isdigit(x) isdigit((unsigned char)(x))
> -#define sqlite3Isxdigit(x) isxdigit((unsigned char)(x))
> -#define sqlite3Tolower(x) tolower((unsigned char)(x))
> -#define sqlite3Isquote(x) ((x)=='"'||(x)=='\''||(x)=='['||(x)=='`')
> -#endif
>
> /*
> * Internal function prototypes
> @@ -4164,7 +4136,18 @@ extern int sqlite3PendingByte;
> #endif
> void sqlite3Reindex(Parse *, Token *, Token *);
> void sqlite3AlterRenameTable(Parse *, SrcList *, Token *);
> -int sqlite3GetToken(const unsigned char *, int *, bool *);
> +
> +/**
> + * Return the length (in bytes) of the token that begins at z[0].
> + * Store the token type in *type before returning.
> + *
> + * @param z Input stream.
> + * @param[out] type Detected type of token.
> + * @param[out] is_reserved True if reserved word.
> + */
> +int
> +sql_token(const char *z, int *type, bool *is_reserved);
> +
> void sqlite3NestedParse(Parse *, const char *, ...);
> void sqlite3ExpirePreparedStatements(sqlite3 *);
> int sqlite3CodeSubselect(Parse *, Expr *, int);
> diff --git a/src/box/sql/tokenize.c b/src/box/sql/tokenize.c
> index c77aa9b..1766eef 100644
> --- a/src/box/sql/tokenize.c
> +++ b/src/box/sql/tokenize.c
> @@ -36,17 +36,21 @@
> * individual tokens and sends those tokens one-by-one over to the
> * parser for analysis.
> */
> -#include "sqliteInt.h"
> #include <stdlib.h>
> +#include <unicode/utf8.h>
> +#include <unicode/uchar.h>
> +
> #include "say.h"
> +#include "sqliteInt.h"
>
> /* Character classes for tokenizing
> *
> - * In the sqlite3GetToken() function, a switch() on aiClass[c] is implemented
> - * using a lookup table, whereas a switch() directly on c uses a binary search.
> - * The lookup table is much faster. To maximize speed, and to ensure that
> - * a lookup table is used, all of the classes need to be small integers and
> - * all of them need to be used within the switch.
> + * In the sql_token() function, a switch() on sql_ascii_class[c]
> + * is implemented using a lookup table, whereas a switch()
> + * directly on c uses a binary search. The lookup table is much
> + * faster. To maximize speed, and to ensure that a lookup table is
> + * used, all of the classes need to be small integers and all of
> + * them need to be used within the switch.
> */
> #define CC_X 0 /* The letter 'x', or start of BLOB literal */
> #define CC_KYWD 1 /* Alphabetics or '_'. Usable in a keyword */
> @@ -77,10 +81,9 @@
> #define CC_DOT 26 /* '.' */
> #define CC_ILLEGAL 27 /* Illegal character */
>
> -static const unsigned char aiClass[] = {
> -#ifdef SQLITE_ASCII
> +static const char sql_ascii_class[] = {
> /* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf */
> -/* 0x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 7, 7, 27, 7, 7, 27, 27,
> +/* 0x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 7, 7, 7, 7, 7, 27, 27,
> /* 1x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> /* 2x */ 7, 15, 9, 5, 4, 22, 24, 8, 17, 18, 21, 20, 23, 11, 26, 16,
> /* 3x */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 19, 12, 14, 13, 6,
> @@ -96,63 +99,16 @@ static const unsigned char aiClass[] = {
> /* Dx */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> /* Ex */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> /* Fx */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
> -#endif
> -#ifdef SQLITE_EBCDIC
> -/* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf */
> -/* 0x */ 27, 27, 27, 27, 27, 7, 27, 27, 27, 27, 27, 27, 7, 7, 27,
> - 27,
> -/* 1x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> -/* 2x */ 27, 27, 27, 27, 27, 7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> -/* 3x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> -/* 4x */ 7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 26, 12, 17, 20, 10,
> -/* 5x */ 24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15, 4, 21, 18, 19, 27,
> -/* 6x */ 11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22, 1, 13, 6,
> -/* 7x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 8, 5, 5, 5, 8, 14, 8,
> -/* 8x */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* 9x */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* Ax */ 27, 25, 1, 1, 1, 1, 1, 0, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* Bx */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 9, 27, 27, 27, 27, 27,
> -/* Cx */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* Dx */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* Ex */ 27, 27, 1, 1, 1, 1, 1, 0, 1, 1, 27, 27, 27, 27, 27, 27,
> -/* Fx */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 27, 27, 27, 27, 27, 27,
> -#endif
> };
>
> -/*
> - * The charMap() macro maps alphabetic characters (only) into their
> - * lower-case ASCII equivalent. On ASCII machines, this is just
> - * an upper-to-lower case map. On EBCDIC machines we also need
> - * to adjust the encoding. The mapping is only valid for alphabetics
> - * which are the only characters for which this feature is used.
> +/**
> + * The charMap() macro maps alphabetic characters (only) into
> + * their lower-case ASCII equivalent. On ASCII machines, this
> + * is just an upper-to-lower case map.
> *
> * Used by keywordhash.h
> */
> -#ifdef SQLITE_ASCII
> #define charMap(X) sqlite3UpperToLower[(unsigned char)X]
> -#endif
> -#ifdef SQLITE_EBCDIC
> -#define charMap(X) ebcdicToAscii[(unsigned char)X]
> -const unsigned char ebcdicToAscii[] = {
> -/* 0 1 2 3 4 5 6 7 8 9 A B C D E F */
> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x */
> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 1x */
> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 2x */
> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 3x */
> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 4x */
> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 5x */
> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95, 0, 0, /* 6x */
> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 7x */
> - 0, 97, 98, 99, 100, 101, 102, 103, 104, 105, 0, 0, 0, 0, 0, 0, /* 8x */
> - 0, 106, 107, 108, 109, 110, 111, 112, 113, 114, 0, 0, 0, 0, 0, 0, /* 9x */
> - 0, 0, 115, 116, 117, 118, 119, 120, 121, 122, 0, 0, 0, 0, 0, 0, /* Ax */
> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* Bx */
> - 0, 97, 98, 99, 100, 101, 102, 103, 104, 105, 0, 0, 0, 0, 0, 0, /* Cx */
> - 0, 106, 107, 108, 109, 110, 111, 112, 113, 114, 0, 0, 0, 0, 0, 0, /* Dx */
> - 0, 0, 115, 116, 117, 118, 119, 120, 121, 122, 0, 0, 0, 0, 0, 0, /* Ex */
> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* Fx */
> -};
> -#endif
>
> /*
> * The sqlite3KeywordCode function looks up an identifier to determine if
> @@ -167,360 +123,295 @@ const unsigned char ebcdicToAscii[] = {
> */
> #include "keywordhash.h"
>
> -/*
> - * If X is a character that can be used in an identifier then
> - * IdChar(X) will be true. Otherwise it is false.
> - *
> - * For ASCII, any character with the high-order bit set is
> - * allowed in an identifier. For 7-bit characters,
> - * sqlite3IsIdChar[X] must be 1.
> - *
> - * For EBCDIC, the rules are more complex but have the same
> - * end result.
> +#define maybe_utf8(c) ((sqlite3CtypeMap[c] & 0x40) != 0)
> +
> +/**
> + * Return true if current symbol is space.
> *
> - * Ticket #1066. the SQL standard does not allow '$' in the
> - * middle of identifiers. But many SQL implementations do.
> - * SQLite will allow '$' in identifiers for compatibility.
> - * But the feature is undocumented.
> + * @param z Input stream.
> + * @retval True if current symbol space.
> */
> -#ifdef SQLITE_ASCII
> -#define IdChar(C) ((sqlite3CtypeMap[(unsigned char)C]&0x46)!=0)
> -#endif
> -#ifdef SQLITE_EBCDIC
> -const char sqlite3IsEbcdicIdChar[] = {
> -/* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF */
> - 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, /* 4x */
> - 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, /* 5x */
> - 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, /* 6x */
> - 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, /* 7x */
> - 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, /* 8x */
> - 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, /* 9x */
> - 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, /* Ax */
> - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* Bx */
> - 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, /* Cx */
> - 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, /* Dx */
> - 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, /* Ex */
> - 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, /* Fx */
> -};
> -
> -#define IdChar(C) (((c=C)>=0x42 && sqlite3IsEbcdicIdChar[c-0x40]))
> -#endif
> +static inline bool
> +sql_is_space_char(const char *z)
> +{
> + if (sqlite3Isspace(z[0]))
> + return true;
> + if (maybe_utf8(*(unsigned char*)z)) {
> + UChar32 c;
> + int unused = 0;
> + U8_NEXT_UNSAFE(z, unused, c);
> + if (u_isspace(c))
> + return true;
> + }
> + return false;
> +}
>
> -/*
> - * Return the length (in bytes) of the token that begins at z[0].
> - * Store the token type in *tokenType before returning.
> +/**
> + * Calculate length of continuous sequence of
> + * space symbols.
> + *
> + * @param z Input stream.
> + * @retval Number of bytes which constitute sequence of spaces.
> + * Can be 0 if first symbol in stram is not space.
> */
> +static inline int
> +sql_skip_spaces(const char *z)
> +{
> + int idx = 0;
> + while (true) {
> + if (sqlite3Isspace(z[idx])) {
> + idx += 1;
> + } else if (maybe_utf8(*(unsigned char *)(z + idx))) {
> + UChar32 c;
> + int new_offset = idx;
> + U8_NEXT_UNSAFE(z, new_offset, c);
> + if (!u_isspace(c))
> + break;
> + idx = new_offset;
> + } else {
> + break;
> + }
> + }
> + return idx;
> +}
> +
> int
> -sqlite3GetToken(const unsigned char *z, int *tokenType, bool *is_reserved)
> +sql_token(const char *z, int *type, bool *is_reserved)
> {
> *is_reserved = false;
> - int i, c;
> - switch (aiClass[*z]) { /* Switch on the character-class of the first byte
> - * of the token. See the comment on the CC_ defines
> - * above.
> - */
> - case CC_SPACE:{
> - testcase(z[0] == ' ');
> - testcase(z[0] == '\t');
> - testcase(z[0] == '\n');
> - testcase(z[0] == '\f');
> - testcase(z[0] == '\r');
> - for (i = 1; sqlite3Isspace(z[i]); i++) {
> + int i, n;
> + char c, delim;
> + /* Switch on the character-class of the first byte
> + * of the token. See the comment on the CC_ defines
> + * above.
> + */
> + switch (sql_ascii_class[*(unsigned char*)z]) {
> + case CC_SPACE:
> + i = 1 + sql_skip_spaces(z+1);
> + *type = TK_SPACE;
> + return i;
> + case CC_MINUS:
> + if (z[1] == '-') {
> + for (i = 2; (c = z[i]) != 0 && c != '\n'; i++) {
> }
> - *tokenType = TK_SPACE;
> + *type = TK_SPACE;
> return i;
> }
> - case CC_MINUS:{
> - if (z[1] == '-') {
> - for (i = 2; (c = z[i]) != 0 && c != '\n'; i++) {
> - }
> - *tokenType = TK_SPACE; /* IMP: R-22934-25134
> - */
> - return i;
> - }
> - *tokenType = TK_MINUS;
> + *type = TK_MINUS;
> + return 1;
> + case CC_LP:
> + *type = TK_LP;
> + return 1;
> + case CC_RP:
> + *type = TK_RP;
> + return 1;
> + case CC_SEMI:
> + *type = TK_SEMI;
> + return 1;
> + case CC_PLUS:
> + *type = TK_PLUS;
> + return 1;
> + case CC_STAR:
> + *type = TK_STAR;
> + return 1;
> + case CC_SLASH:
> + if (z[1] != '*' || z[2] == 0) {
> + *type = TK_SLASH;
> return 1;
> }
> - case CC_LP:{
> - *tokenType = TK_LP;
> - return 1;
> + for (i = 3, c = z[2];
> + (c != '*' || z[i] != '/') && (c = z[i]) != 0;
> + i++) {
> }
> - case CC_RP:{
> - *tokenType = TK_RP;
> + if (c)
> + i++;
> + *type = TK_SPACE;
> + return i;
> + case CC_PERCENT:
> + *type = TK_REM;
> + return 1;
> + case CC_EQ:
> + *type = TK_EQ;
> + return 1 + (z[1] == '=');
> + case CC_LT:
> + if ((c = z[1]) == '=') {
> + *type = TK_LE;
> + return 2;
> + } else if (c == '>') {
> + *type = TK_NE;
> + return 2;
> + } else if (c == '<') {
> + *type = TK_LSHIFT;
> + return 2;
> + } else {
> + *type = TK_LT;
> return 1;
> }
> - case CC_SEMI:{
> - *tokenType = TK_SEMI;
> + case CC_GT:
> + if ((c = z[1]) == '=') {
> + *type = TK_GE;
> + return 2;
> + } else if (c == '>') {
> + *type = TK_RSHIFT;
> + return 2;
> + } else {
> + *type = TK_GT;
> return 1;
> }
> - case CC_PLUS:{
> - *tokenType = TK_PLUS;
> + case CC_BANG:
> + if (z[1] != '=') {
> + *type = TK_ILLEGAL;
> return 1;
> + } else {
> + *type = TK_NE;
> + return 2;
> }
> - case CC_STAR:{
> - *tokenType = TK_STAR;
> + case CC_PIPE:
> + if (z[1] != '|') {
> + *type = TK_BITOR;
> return 1;
> + } else {
> + *type = TK_CONCAT;
> + return 2;
> }
> - case CC_SLASH:{
> - if (z[1] != '*' || z[2] == 0) {
> - *tokenType = TK_SLASH;
> - return 1;
> - }
> - for (i = 3, c = z[2];
> - (c != '*' || z[i] != '/') && (c = z[i]) != 0;
> - i++) {
> + case CC_COMMA:
> + *type = TK_COMMA;
> + return 1;
> + case CC_AND:
> + *type = TK_BITAND;
> + return 1;
> + case CC_TILDA:
> + *type = TK_BITNOT;
> + return 1;
> + case CC_QUOTE:
> + case CC_DQUOTE:
> + delim = z[0];
> + for (i = 1; (c = z[i]) != 0; i++) {
> + if (c == delim) {
> + if (z[i + 1] == delim)
> + i++;
> + else
> + break;
> }
> - if (c)
> - i++;
> - *tokenType = TK_SPACE; /* IMP: R-22934-25134
> - */
> + }
> + if (c == '\'') {
> + *type = TK_STRING;
> + return i + 1;
> + } else if (c != 0) {
> + *type = TK_ID;
> + return i + 1;
> + } else {
> + *type = TK_ILLEGAL;
> return i;
> }
> - case CC_PERCENT:{
> - *tokenType = TK_REM;
> + FALLTHROUGH;
> + case CC_DOT:
> + if (!sqlite3Isdigit(z[1])) {
> + *type = TK_DOT;
> return 1;
> }
> - case CC_EQ:{
> - *tokenType = TK_EQ;
> - return 1 + (z[1] == '=');
> - }
> - case CC_LT:{
> - if ((c = z[1]) == '=') {
> - *tokenType = TK_LE;
> - return 2;
> - } else if (c == '>') {
> - *tokenType = TK_NE;
> - return 2;
> - } else if (c == '<') {
> - *tokenType = TK_LSHIFT;
> - return 2;
> - } else {
> - *tokenType = TK_LT;
> - return 1;
> + /* If the next character is a digit, this is a
> + * floating point number that begins with ".".
> + * Fall thru into the next case.
> + */
> + FALLTHROUGH;
> + case CC_DIGIT:
> + *type = TK_INTEGER;
> + if (z[0] == '0' && (z[1] == 'x' || z[1] == 'X') &&
> + sqlite3Isxdigit(z[2])) {
> + for (i = 3; sqlite3Isxdigit(z[i]); i++) {
> }
> + return i;
> }
> - case CC_GT:{
> - if ((c = z[1]) == '=') {
> - *tokenType = TK_GE;
> - return 2;
> - } else if (c == '>') {
> - *tokenType = TK_RSHIFT;
> - return 2;
> - } else {
> - *tokenType = TK_GT;
> - return 1;
> - }
> + for (i = 0; sqlite3Isdigit(z[i]); i++) {
> }
> - case CC_BANG:{
> - if (z[1] != '=') {
> - *tokenType = TK_ILLEGAL;
> - return 1;
> - } else {
> - *tokenType = TK_NE;
> - return 2;
> + if (z[i] == '.') {
> + while (sqlite3Isdigit(z[++i])) {
> }
> + *type = TK_FLOAT;
> }
> - case CC_PIPE:{
> - if (z[1] != '|') {
> - *tokenType = TK_BITOR;
> - return 1;
> - } else {
> - *tokenType = TK_CONCAT;
> - return 2;
> - }
> + if ((z[i] == 'e' || z[i] == 'E') &&
> + (sqlite3Isdigit(z[i + 1])
> + || ((z[i + 1] == '+' || z[i + 1] == '-') &&
> + sqlite3Isdigit(z[i + 2])))) {
> + i += 2;
> + while (sqlite3Isdigit(z[i]))
> + i++;
> + *type = TK_FLOAT;
> }
> - case CC_COMMA:{
> - *tokenType = TK_COMMA;
> - return 1;
> + if (IdChar(z[i])) {
> + *type = TK_ILLEGAL;
> + while (IdChar(z[++i])) {
> + }
> }
> - case CC_AND:{
> - *tokenType = TK_BITAND;
> - return 1;
> + return i;
> + case CC_VARNUM:
> + *type = TK_VARIABLE;
> + for (i = 1; sqlite3Isdigit(z[i]); i++) {
> }
> - case CC_TILDA:{
> - *tokenType = TK_BITNOT;
> - return 1;
> + return i;
> + case CC_DOLLAR:
> + case CC_VARALPHA:
> + n = 0;
> + *type = TK_VARIABLE;
> + for (i = 1; (c = z[i]) != 0; i++) {
> + if (IdChar(c))
> + n++;
> + else
> + break;
> }
> - case CC_QUOTE:
> - case CC_DQUOTE:{
> - int delim = z[0];
> - testcase(delim == '\'');
> - testcase(delim == '"');
> - for (i = 1; (c = z[i]) != 0; i++) {
> - if (c == delim) {
> - if (z[i + 1] == delim) {
> - i++;
> - } else {
> - break;
> - }
> - }
> - }
> - if (c == '\'') {
> - *tokenType = TK_STRING;
> - return i + 1;
> - } else if (c != 0) {
> - *tokenType = TK_ID;
> - return i + 1;
> - } else {
> - *tokenType = TK_ILLEGAL;
> - return i;
> - }
> - FALLTHROUGH;
> + if (n == 0)
> + *type = TK_ILLEGAL;
> + return i;
> + case CC_KYWD:
> + for (i = 1; sql_ascii_class[*(unsigned char*)(z+i)] <= CC_KYWD;
> + i++) {
> }
> - case CC_DOT:{
> -#ifndef SQLITE_OMIT_FLOATING_POINT
> - if (!sqlite3Isdigit(z[1]))
> -#endif
> - {
> - *tokenType = TK_DOT;
> - return 1;
> - }
> - /* If the next character is a digit, this is a floating point
> - * number that begins with ".". Fall thru into the next case
> + if (!sql_is_space_char(z + i) && IdChar(z[i])) {
> + /* This token started out using characters
> + * that can appear in keywords, but z[i] is
> + * a character not allowed within keywords,
> + * so this must be an identifier instead.
> */
> - FALLTHROUGH;
> + i++;
> + break;
> }
> - case CC_DIGIT:{
> - testcase(z[0] == '0');
> - testcase(z[0] == '1');
> - testcase(z[0] == '2');
> - testcase(z[0] == '3');
> - testcase(z[0] == '4');
> - testcase(z[0] == '5');
> - testcase(z[0] == '6');
> - testcase(z[0] == '7');
> - testcase(z[0] == '8');
> - testcase(z[0] == '9');
> - *tokenType = TK_INTEGER;
> -#ifndef SQLITE_OMIT_HEX_INTEGER
> - if (z[0] == '0' && (z[1] == 'x' || z[1] == 'X')
> - && sqlite3Isxdigit(z[2])) {
> - for (i = 3; sqlite3Isxdigit(z[i]); i++) {
> - }
> - return i;
> + *type = TK_ID;
> + return keywordCode(z, i, type, is_reserved);
> + case CC_X:
> + if (z[1] == '\'') {
> + *type = TK_BLOB;
> + for (i = 2; sqlite3Isxdigit(z[i]); i++) {
> }
> -#endif
> - for (i = 0; sqlite3Isdigit(z[i]); i++) {
> - }
> -#ifndef SQLITE_OMIT_FLOATING_POINT
> - if (z[i] == '.') {
> - i++;
> - while (sqlite3Isdigit(z[i])) {
> + if (z[i] != '\'' || i % 2) {
> + *type = TK_ILLEGAL;
> + while (z[i] != 0 && z[i] != '\'')
> i++;
> - }
> - *tokenType = TK_FLOAT;
> }
> - if ((z[i] == 'e' || z[i] == 'E') &&
> - (sqlite3Isdigit(z[i + 1])
> - || ((z[i + 1] == '+' || z[i + 1] == '-')
> - && sqlite3Isdigit(z[i + 2]))
> - )
> - ) {
> - i += 2;
> - while (sqlite3Isdigit(z[i])) {
> - i++;
> - }
> - *tokenType = TK_FLOAT;
> - }
> -#endif
> - while (IdChar(z[i])) {
> - *tokenType = TK_ILLEGAL;
> + if (z[i] != 0)
> i++;
> - }
> - return i;
> - }
> - case CC_VARNUM:{
> - *tokenType = TK_VARIABLE;
> - for (i = 1; sqlite3Isdigit(z[i]); i++) {
> - }
> return i;
> }
> - case CC_DOLLAR:
> - case CC_VARALPHA:{
> - int n = 0;
> - testcase(z[0] == '$');
> - testcase(z[0] == '@');
> - testcase(z[0] == ':');
> - testcase(z[0] == '#');
> - *tokenType = TK_VARIABLE;
> - for (i = 1; (c = z[i]) != 0; i++) {
> - if (IdChar(c)) {
> - n++;
> -#ifndef SQLITE_OMIT_TCL_VARIABLE
> - } else if (c == '(' && n > 0) {
> - do {
> - i++;
> - } while ((c = z[i]) != 0
> - && !sqlite3Isspace(c)
> - && c != ')');
> - if (c == ')') {
> - i++;
> - } else {
> - *tokenType = TK_ILLEGAL;
> - }
> - break;
> - } else if (c == ':' && z[i + 1] == ':') {
> - i++;
> -#endif
> - } else {
> - break;
> - }
> - }
> - if (n == 0)
> - *tokenType = TK_ILLEGAL;
> - return i;
> - }
> - case CC_KYWD:{
> - for (i = 1; aiClass[z[i]] <= CC_KYWD; i++) {
> - }
> - if (IdChar(z[i])) {
> - /* This token started out using characters that can appear in keywords,
> - * but z[i] is a character not allowed within keywords, so this must
> - * be an identifier instead
> - */
> - i++;
> - break;
> - }
> - *tokenType = TK_ID;
> - return keywordCode((char *)z, i, tokenType, is_reserved);
> - }
> - case CC_X:{
> -#ifndef SQLITE_OMIT_BLOB_LITERAL
> - testcase(z[0] == 'x');
> - testcase(z[0] == 'X');
> - if (z[1] == '\'') {
> - *tokenType = TK_BLOB;
> - for (i = 2; sqlite3Isxdigit(z[i]); i++) {
> - }
> - if (z[i] != '\'' || i % 2) {
> - *tokenType = TK_ILLEGAL;
> - while (z[i] && z[i] != '\'') {
> - i++;
> - }
> - }
> - if (z[i])
> - i++;
> - return i;
> - }
> -#endif
> - /* If it is not a BLOB literal, then it must be an ID, since no
> - * SQL keywords start with the letter 'x'. Fall through
> - */
> - FALLTHROUGH;
> - }
> - case CC_ID:{
> - i = 1;
> - break;
> - }
> - default:{
> - *tokenType = TK_ILLEGAL;
> - return 1;
> - }
> + /* If it is not a BLOB literal, then it must be an
> + * ID, since no SQL keywords start with the letter
> + * 'x'. Fall through.
> + */
> + FALLTHROUGH;
> + case CC_ID:
> + i = 1;
> + break;
> + default:
> + *type = TK_ILLEGAL;
> + return 1;
> }
> - while (IdChar(z[i])) {
> - i++;
> + int spaces_len = sql_skip_spaces(z);
> + if (spaces_len > 0) {
> + *type = TK_SPACE;
> + return spaces_len;
> }
> - *tokenType = TK_ID;
> + while (IdChar(z[i]))
> + i++;
> + *type = TK_ID;
> return i;
> }
>
> @@ -566,8 +457,8 @@ sqlite3RunParser(Parse * pParse, const char *zSql, char **pzErrMsg)
> if (zSql[i] != 0) {
> pParse->sLastToken.z = &zSql[i];
> pParse->sLastToken.n =
> - sqlite3GetToken((u8 *) & zSql[i], &tokenType,
> - &pParse->sLastToken.isReserved);
> + sql_token(&zSql[i], &tokenType,
> + &pParse->sLastToken.isReserved);
> i += pParse->sLastToken.n;
> if (i > mxSqlLen) {
> pParse->rc = SQLITE_TOOBIG;
> diff --git a/src/box/sql/util.c b/src/box/sql/util.c
> index 8c4e7b9..0c2a050 100644
> --- a/src/box/sql/util.c
> +++ b/src/box/sql/util.c
> @@ -1228,12 +1228,7 @@ sqlite3HexToInt(int h)
> {
> assert((h >= '0' && h <= '9') || (h >= 'a' && h <= 'f')
> || (h >= 'A' && h <= 'F'));
> -#ifdef SQLITE_ASCII
> h += 9 * (1 & (h >> 6));
> -#endif
> -#ifdef SQLITE_EBCDIC
> - h += 9 * (1 & ~(h >> 4));
> -#endif
> return (u8) (h & 0xf);
> }
>
> diff --git a/src/box/sql/vdbetrace.c b/src/box/sql/vdbetrace.c
> index 8623e68..63e2311 100644
> --- a/src/box/sql/vdbetrace.c
> +++ b/src/box/sql/vdbetrace.c
> @@ -57,7 +57,7 @@ findNextHostParameter(const char *zSql, int *pnToken)
>
> *pnToken = 0;
> while (zSql[0]) {
> - n = sqlite3GetToken((u8 *) zSql, &tokenType, &unused);
> + n = sql_token(zSql, &tokenType, &unused);
> assert(n > 0 && tokenType != TK_ILLEGAL);
> if (tokenType == TK_VARIABLE) {
> *pnToken = n;
> diff --git a/src/box/sql/whereexpr.c b/src/box/sql/whereexpr.c
> index 34a1f13..c3a8634 100644
> --- a/src/box/sql/whereexpr.c
> +++ b/src/box/sql/whereexpr.c
> @@ -256,10 +256,6 @@ isLikeOrGlob(Parse * pParse, /* Parsing and code generating context */
> if (!sqlite3IsLikeFunction(db, pExpr, pnoCase, wc)) {
> return 0;
> }
> -#ifdef SQLITE_EBCDIC
> - if (*pnoCase)
> - return 0;
> -#endif
> pList = pExpr->x.pList;
> pLeft = pList->a[1].pExpr;
> if (pLeft->op != TK_COLUMN || sqlite3ExprAffinity(pLeft) != SQLITE_AFF_TEXT /* Value might be numeric */
> diff --git a/test/sql-tap/e_expr.test.lua b/test/sql-tap/e_expr.test.lua
> index d0f6895..f7f3b15 100755
> --- a/test/sql-tap/e_expr.test.lua
> +++ b/test/sql-tap/e_expr.test.lua
> @@ -1,6 +1,6 @@
> #!/usr/bin/env tarantool
> test = require("sqltester")
> -test:plan(14750)
> +test:plan(14748)
>
> --!./tcltestrunner.lua
> -- 2010 July 16
> @@ -1506,8 +1506,6 @@ local test_cases12 ={
> {10, "?123"},
> {11, "@hello"},
> {12, ":world"},
> - {13, "$tcl"},
> - {14, "$tcl(array)"},
>
> {15, "cname"},
> {16, "tblname.cname"},
> diff --git a/test/sql-tap/unicode.test.lua b/test/sql-tap/unicode.test.lua
> new file mode 100755
> index 0000000..1990739
> --- /dev/null
> +++ b/test/sql-tap/unicode.test.lua
> @@ -0,0 +1,37 @@
> +#!/usr/bin/env tarantool
> +test = require("sqltester")
> +test:plan(23 * 3)
> +
> +-- 23 entities
> +local utf8_spaces = {"\u{0009}", "\u{000A}", "\u{000B}", "\u{000C}", "\u{000D}",
> + "\u{0085}", "\u{1680}", "\u{2000}", "\u{2001}", "\u{2002}",
> + "\u{2003}", "\u{2004}", "\u{2005}", "\u{2006}", "\u{2007}",
> + "\u{2008}", "\u{2009}", "\u{200A}", "\u{2028}", "\u{2029}",
> + "\u{202F}", "\u{205F}", "\u{3000}"}
> +local spaces_cnt = 23
> +
> +-- 1. Check UTF-8 single space
> +for i, v in pairs(utf8_spaces) do
> + test:do_execsql_test(
> + "utf8-spaces-1."..i,
> + "select" .. v .. "1",
> + { 1 })
> +end
> +
> +-- 2. Check pair simple + UTF-8 space
> +for i, v in pairs(utf8_spaces) do
> + test:do_execsql_test(
> + "utf8-spaces-2."..i,
> + "select" .. v .. "1",
> + { 1 })
> +end
> +
> +-- 3. Sequence of spaces
> +for i, v in pairs(utf8_spaces) do
> + test:do_execsql_test(
> + "utf8-spaces-3."..i,
> + "select" .. v .. " " .. utf8_spaces[spaces_cnt - i + 1] .. " 1",
> + { 1 })
> +end
> +
> +test:finish_test()
>
>
>
next prev parent reply other threads:[~2018-05-24 11:09 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-22 15:51 [tarantool-patches] " Kirill Yukhin
2018-05-22 18:06 ` [tarantool-patches] " Vladislav Shpilevoy
2018-05-23 5:15 ` Kirill Yukhin
2018-05-23 5:54 ` Kirill Yukhin
2018-05-23 10:29 ` Vladislav Shpilevoy
2018-05-23 14:05 ` Kirill Yukhin
2018-05-24 11:09 ` Vladislav Shpilevoy [this message]
2018-05-24 14:23 ` Kirill Yukhin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7cbf79d0-8242-4079-8641-1635a45e863d@tarantool.org \
--to=v.shpilevoy@tarantool.org \
--cc=kyukhin@tarantool.org \
--cc=tarantool-patches@freelists.org \
--subject='[tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox