From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 62DC821084 for ; Thu, 24 May 2018 07:09:26 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Zqcytl7f0U-1 for ; Thu, 24 May 2018 07:09:26 -0400 (EDT) Received: from smtp48.i.mail.ru (smtp48.i.mail.ru [94.100.177.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 61DAD1FD35 for ; Thu, 24 May 2018 07:09:25 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH] sql: allow any space symbols to be a white space References: <20180523051500.3fx5uoetyzsjkwp5@tarantool.org> <20180523055408.6axbc53uucslcu4h@tarantool.org> <20180523140550.qqegk7wzv7ycc76m@tarantool.org> From: Vladislav Shpilevoy Message-ID: <7cbf79d0-8242-4079-8641-1635a45e863d@tarantool.org> Date: Thu, 24 May 2018 14:09:22 +0300 MIME-Version: 1.0 In-Reply-To: <20180523140550.qqegk7wzv7ycc76m@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org, Kirill Yukhin LGTM. On 23/05/2018 17:05, Kirill Yukhin wrote: > Hi Vlad, > Thanks for review! My comments inlined. > > On 23 мая 13:29, Vladislav Shpilevoy wrote: >> Hello. Thanks for the fixes! I see, that you changed branch to >> kyukhin/gh-2371-utf8-spaces-check, right? > Nope, branch is the same. This one was for Travis checking. > Original branch contains updated patch, which in turn contains > update for CMakeLists.txt. I've removed this temporary branch. > > Updated patch in the bottom. > >> And that you added a separate commit to link with ICU. But with no >> linking the first commit does not work. Maybe you should squash them? >> >> See 4 comments below. >> >>> diff --git a/src/box/sql/tokenize.c b/src/box/sql/tokenize.c >>> index c77aa9b..4c01066 100644 >>> --- a/src/box/sql/tokenize.c >>> +++ b/src/box/sql/tokenize.c >>> @@ -36,17 +36,21 @@ >>> * individual tokens and sends those tokens one-by-one over to the >>> * parser for analysis. >>> */ >>> -#include "sqliteInt.h" >>> #include >>> +#include >>> +#include >>> + >>> #include "say.h" >>> +#include "sqliteInt.h" >>> /* Character classes for tokenizing >>> * >>> - * In the sqlite3GetToken() function, a switch() on aiClass[c] is implemented >>> - * using a lookup table, whereas a switch() directly on c uses a binary search. >>> - * The lookup table is much faster. To maximize speed, and to ensure that >>> - * a lookup table is used, all of the classes need to be small integers and >>> - * all of them need to be used within the switch. >>> + * In the sql_token() function, a switch() on sql_ascii[c] is >> 1. No sql_ascii. > Fixed. > >>> @@ -167,360 +123,295 @@ const unsigned char ebcdicToAscii[] = { >>> - * Ticket #1066. the SQL standard does not allow '$' in the >>> - * middle of identifiers. But many SQL implementations do. >>> - * SQLite will allow '$' in identifiers for compatibility. >>> - * But the feature is undocumented. >>> + * @param z Input stream. >>> + * @retval True if current symbo1l space. >> 2. symbo1l -> symbol is. > Fixed. > >>> diff --git a/test/sql-tap/gh-2371-utf8-spaces.test.lua b/test/sql-tap/gh-2371-utf8-spaces.test.lua >> 3. Can you do not create a new test file specially for the issue? I believe we will >> support unicode not only as white spaces, so maybe worth to create a more common >> file test/sql/unicode.test.lua. > I'd prefer TAP tests for SQL FE. Renamed to unicode.test.lua. > >>> new file mode 100755 >>> index 0000000..191cc1c >>> --- /dev/null >>> +++ b/test/sql-tap/gh-2371-utf8-spaces.test.lua >>> +-- 1. Check UTF-8 single space >>> +for i, v in pairs(utf8_spaces) do >>> + test:do_execsql_test( >>> + "utf8-spaces-1."..i, >>> + "select" .. v .. "1", >>> + { 1 }) >> >> 4. Here and 2 same places below you use on the same line 4 spaces + 1 tab with 8 width. >> Please, use only spaces in .lua files. > Fixed. > > -- > Regards, Kirill Yukhin > > commit 46c40d785552d8cc652b31b48207032216bb067f > Author: Kirill Yukhin > Date: Tue May 22 18:45:35 2018 +0300 > > sql: allow any space symbols to be a white space > > ANSI SQL allows any of Unicode classes ZI, Zp or Zs to > act as white space symbol. Allow this in lexical analyzer. > Refactor lexical analyzer routine to follow Tarantool's > coding style. > Also, remove dead encoding: EBCDIC. > > Closes #2371 > > diff --git a/extra/mkkeywordhash.c b/extra/mkkeywordhash.c > index cf34831..9e0e24b 100644 > --- a/extra/mkkeywordhash.c > +++ b/extra/mkkeywordhash.c > @@ -611,12 +611,7 @@ int main(int argc, char **argv){ > printf(" if( aLen[i]!=n ) continue;\n"); > printf(" j = 0;\n"); > printf(" zKW = &zText[aOffset[i]];\n"); > - printf("#ifdef SQLITE_ASCII\n"); > printf(" while( j - printf("#endif\n"); > - printf("#ifdef SQLITE_EBCDIC\n"); > - printf(" while( j - printf("#endif\n"); > printf(" if( j for(i=0; i printf(" testcase( i==%d ); /* %s */\n", > diff --git a/src/box/sql/CMakeLists.txt b/src/box/sql/CMakeLists.txt > index 2204191..8a83a0a 100644 > --- a/src/box/sql/CMakeLists.txt > +++ b/src/box/sql/CMakeLists.txt > @@ -78,6 +78,7 @@ add_library(sql STATIC > ) > set_target_properties(sql PROPERTIES COMPILE_DEFINITIONS > "${TEST_DEFINITIONS}") > +target_link_libraries(sql ${ICU_LIBRARIES}) > > add_custom_target(generate_sql_files DEPENDS > parse.h > diff --git a/src/box/sql/alter.c b/src/box/sql/alter.c > index c9c8f9b..c9325c5 100644 > --- a/src/box/sql/alter.c > +++ b/src/box/sql/alter.c > @@ -351,7 +351,7 @@ rename_table(sqlite3 *db, const char *sql_stmt, const char *table_name, > > int token; > Token old_name; > - unsigned char const *csr = (unsigned const char *)sql_stmt; > + const char *csr = sql_stmt; > int len = 0; > char *new_sql_stmt; > bool unused; > @@ -374,7 +374,7 @@ rename_table(sqlite3 *db, const char *sql_stmt, const char *table_name, > */ > do { > csr += len; > - len = sqlite3GetToken(csr, &token, &unused); > + len = sql_token(csr, &token, &unused); > } while (token == TK_SPACE); > assert(len > 0); > } while (token != TK_LP && token != TK_USING); > @@ -430,13 +430,12 @@ rename_parent_table(sqlite3 *db, const char *sql_stmt, const char *old_name, > bool is_quoted; > > for (csr = sql_stmt; *csr; csr = csr + n) { > - n = sqlite3GetToken((const unsigned char *)csr, &token, &unused); > + n = sql_token(csr, &token, &unused); > if (token == TK_REFERENCES) { > char *zParent; > do { > csr += n; > - n = sqlite3GetToken((const unsigned char *)csr, > - &token, &unused); > + n = sql_token(csr, &token, &unused); > } while (token == TK_SPACE); > if (token == TK_ILLEGAL) > break; > @@ -482,7 +481,7 @@ rename_trigger(sqlite3 *db, char const *sql_stmt, char const *table_name, > int token; > Token tname; > int dist = 3; > - unsigned char const *csr = (unsigned char const*)sql_stmt; > + char const *csr = (char const*)sql_stmt; > int len = 0; > char *new_sql_stmt; > bool unused; > @@ -505,7 +504,7 @@ rename_trigger(sqlite3 *db, char const *sql_stmt, char const *table_name, > */ > do { > csr += len; > - len = sqlite3GetToken(csr, &token, &unused); > + len = sql_token(csr, &token, &unused); > } while (token == TK_SPACE); > assert(len > 0); > /* Variable 'dist' stores the number of tokens read since the most > diff --git a/src/box/sql/complete.c b/src/box/sql/complete.c > index 092d4fb..74b057b 100644 > --- a/src/box/sql/complete.c > +++ b/src/box/sql/complete.c > @@ -40,18 +40,7 @@ > #include "sqliteInt.h" > #ifndef SQLITE_OMIT_COMPLETE > > -/* > - * This is defined in tokenize.c. We just have to import the definition. > - */ > -#ifndef SQLITE_AMALGAMATION > -#ifdef SQLITE_ASCII > #define IdChar(C) ((sqlite3CtypeMap[(unsigned char)C]&0x46)!=0) > -#endif > -#ifdef SQLITE_EBCDIC > -extern const char sqlite3IsEbcdicIdChar[]; > -#define IdChar(C) (((c=C)>=0x42 && sqlite3IsEbcdicIdChar[c-0x40])) > -#endif > -#endif /* SQLITE_AMALGAMATION */ > > /* > * Token types used by the sqlite3_complete() routine. See the header > @@ -230,9 +219,6 @@ sqlite3_complete(const char *zSql) > break; > } > default:{ > -#ifdef SQLITE_EBCDIC > - unsigned char c; > -#endif > if (IdChar((u8) * zSql)) { > /* Keywords and unquoted identifiers */ > int nId; > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index dcac22c..c06e3bd 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -623,12 +623,7 @@ struct compareInfo { > * macro for fast reading of the next character in the common case where > * the next character is ASCII. > */ > -#if defined(SQLITE_EBCDIC) > -#define sqlite3Utf8Read(A) (*((*A)++)) > -#define Utf8Read(A) (*(A++)) > -#else > #define Utf8Read(s, e) ucnv_getNextUChar(pUtf8conv, &s, e, &status) > -#endif > > static const struct compareInfo globInfo = { '*', '?', '[', 0 }; > > diff --git a/src/box/sql/global.c b/src/box/sql/global.c > index cd6f9c4..8e53bcc 100644 > --- a/src/box/sql/global.c > +++ b/src/box/sql/global.c > @@ -43,7 +43,6 @@ > * involved are nearly as big or bigger than SQLite itself. > */ > const unsigned char sqlite3UpperToLower[] = { > -#ifdef SQLITE_ASCII > 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, > 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, > 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, > @@ -70,25 +69,6 @@ const unsigned char sqlite3UpperToLower[] = { > 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, > 248, 249, 250, 251, > 252, 253, 254, 255 > -#endif > -#ifdef SQLITE_EBCDIC > - 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, /* 0x */ > - 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, /* 1x */ > - 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, /* 2x */ > - 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, /* 3x */ > - 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, /* 4x */ > - 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, /* 5x */ > - 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, /* 6x */ > - 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, /* 7x */ > - 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, /* 8x */ > - 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, /* 9x */ > - 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 140, 141, 142, 175, /* Ax */ > - 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, /* Bx */ > - 192, 129, 130, 131, 132, 133, 134, 135, 136, 137, 202, 203, 204, 205, 206, 207, /* Cx */ > - 208, 145, 146, 147, 148, 149, 150, 151, 152, 153, 218, 219, 220, 221, 222, 223, /* Dx */ > - 224, 225, 162, 163, 164, 165, 166, 167, 168, 169, 234, 235, 236, 237, 238, 239, /* Ex */ > - 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, /* Fx */ > -#endif > }; > > /* > @@ -119,7 +99,6 @@ const unsigned char sqlite3UpperToLower[] = { > * non-ASCII UTF character. Hence the test for whether or not a character is > * part of an identifier is 0x46. > */ > -#ifdef SQLITE_ASCII > const unsigned char sqlite3CtypeMap[256] = { > 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 00..07 ........ */ > 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x00, 0x00, /* 08..0f ........ */ > @@ -157,7 +136,6 @@ const unsigned char sqlite3CtypeMap[256] = { > 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, /* f0..f7 ........ */ > 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40, 0x40 /* f8..ff ........ */ > }; > -#endif > > /* EVIDENCE-OF: R-02982-34736 In order to maintain full backwards > * compatibility for legacy applications, the URI filename capability is > diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h > index b3db468..e6ffda4 100644 > --- a/src/box/sql/sqliteInt.h > +++ b/src/box/sql/sqliteInt.h > @@ -36,6 +36,8 @@ > #ifndef SQLITEINT_H > #define SQLITEINT_H > > +#define IdChar(C) ((sqlite3CtypeMap[(unsigned char)C]&0x46)!=0) > + > /* Special Comments: > * > * Some comments have special meaning to the tools that measure test > @@ -1129,16 +1131,6 @@ sqlite3_bind_parameter_lindex(sqlite3_stmt * pStmt, const char *zName, > #define MAX(A,B) ((A)>(B)?(A):(B)) > #endif > > -/* > - * Check to see if this machine uses EBCDIC. (Yes, believe it or > - * not, there are still machines out there that use EBCDIC.) > - */ > -#if 'A' == '\301' > -#define SQLITE_EBCDIC 1 > -#else > -#define SQLITE_ASCII 1 > -#endif > - > /* > * Integers of known sizes. These typedefs might change for architectures > * where the sizes very. Preprocessor macros are available so that the > @@ -3368,21 +3360,11 @@ int sqlite3IoerrnomemError(int); > #define SQLITE_ENABLE_FTS3 1 > #endif > > -/* > - * The ctype.h header is needed for non-ASCII systems. It is also > - * needed by FTS3 when FTS3 is included in the amalgamation. > - */ > -#if !defined(SQLITE_ASCII) || \ > - (defined(SQLITE_ENABLE_FTS3) && defined(SQLITE_AMALGAMATION)) > -#include > -#endif > - > /* > * The following macros mimic the standard library functions toupper(), > * isspace(), isalnum(), isdigit() and isxdigit(), respectively. The > * sqlite versions only work for ASCII characters, regardless of locale. > */ > -#ifdef SQLITE_ASCII > #define sqlite3Toupper(x) ((x)&~(sqlite3CtypeMap[(unsigned char)(x)]&0x20)) > #define sqlite3Isspace(x) (sqlite3CtypeMap[(unsigned char)(x)]&0x01) > #define sqlite3Isalnum(x) (sqlite3CtypeMap[(unsigned char)(x)]&0x06) > @@ -3391,16 +3373,6 @@ int sqlite3IoerrnomemError(int); > #define sqlite3Isxdigit(x) (sqlite3CtypeMap[(unsigned char)(x)]&0x08) > #define sqlite3Tolower(x) (sqlite3UpperToLower[(unsigned char)(x)]) > #define sqlite3Isquote(x) (sqlite3CtypeMap[(unsigned char)(x)]&0x80) > -#else > -#define sqlite3Toupper(x) toupper((unsigned char)(x)) > -#define sqlite3Isspace(x) isspace((unsigned char)(x)) > -#define sqlite3Isalnum(x) isalnum((unsigned char)(x)) > -#define sqlite3Isalpha(x) isalpha((unsigned char)(x)) > -#define sqlite3Isdigit(x) isdigit((unsigned char)(x)) > -#define sqlite3Isxdigit(x) isxdigit((unsigned char)(x)) > -#define sqlite3Tolower(x) tolower((unsigned char)(x)) > -#define sqlite3Isquote(x) ((x)=='"'||(x)=='\''||(x)=='['||(x)=='`') > -#endif > > /* > * Internal function prototypes > @@ -4164,7 +4136,18 @@ extern int sqlite3PendingByte; > #endif > void sqlite3Reindex(Parse *, Token *, Token *); > void sqlite3AlterRenameTable(Parse *, SrcList *, Token *); > -int sqlite3GetToken(const unsigned char *, int *, bool *); > + > +/** > + * Return the length (in bytes) of the token that begins at z[0]. > + * Store the token type in *type before returning. > + * > + * @param z Input stream. > + * @param[out] type Detected type of token. > + * @param[out] is_reserved True if reserved word. > + */ > +int > +sql_token(const char *z, int *type, bool *is_reserved); > + > void sqlite3NestedParse(Parse *, const char *, ...); > void sqlite3ExpirePreparedStatements(sqlite3 *); > int sqlite3CodeSubselect(Parse *, Expr *, int); > diff --git a/src/box/sql/tokenize.c b/src/box/sql/tokenize.c > index c77aa9b..1766eef 100644 > --- a/src/box/sql/tokenize.c > +++ b/src/box/sql/tokenize.c > @@ -36,17 +36,21 @@ > * individual tokens and sends those tokens one-by-one over to the > * parser for analysis. > */ > -#include "sqliteInt.h" > #include > +#include > +#include > + > #include "say.h" > +#include "sqliteInt.h" > > /* Character classes for tokenizing > * > - * In the sqlite3GetToken() function, a switch() on aiClass[c] is implemented > - * using a lookup table, whereas a switch() directly on c uses a binary search. > - * The lookup table is much faster. To maximize speed, and to ensure that > - * a lookup table is used, all of the classes need to be small integers and > - * all of them need to be used within the switch. > + * In the sql_token() function, a switch() on sql_ascii_class[c] > + * is implemented using a lookup table, whereas a switch() > + * directly on c uses a binary search. The lookup table is much > + * faster. To maximize speed, and to ensure that a lookup table is > + * used, all of the classes need to be small integers and all of > + * them need to be used within the switch. > */ > #define CC_X 0 /* The letter 'x', or start of BLOB literal */ > #define CC_KYWD 1 /* Alphabetics or '_'. Usable in a keyword */ > @@ -77,10 +81,9 @@ > #define CC_DOT 26 /* '.' */ > #define CC_ILLEGAL 27 /* Illegal character */ > > -static const unsigned char aiClass[] = { > -#ifdef SQLITE_ASCII > +static const char sql_ascii_class[] = { > /* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf */ > -/* 0x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 7, 7, 27, 7, 7, 27, 27, > +/* 0x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 7, 7, 7, 7, 7, 27, 27, > /* 1x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, > /* 2x */ 7, 15, 9, 5, 4, 22, 24, 8, 17, 18, 21, 20, 23, 11, 26, 16, > /* 3x */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 19, 12, 14, 13, 6, > @@ -96,63 +99,16 @@ static const unsigned char aiClass[] = { > /* Dx */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, > /* Ex */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, > /* Fx */ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 > -#endif > -#ifdef SQLITE_EBCDIC > -/* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf */ > -/* 0x */ 27, 27, 27, 27, 27, 7, 27, 27, 27, 27, 27, 27, 7, 7, 27, > - 27, > -/* 1x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, > -/* 2x */ 27, 27, 27, 27, 27, 7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, > -/* 3x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, > -/* 4x */ 7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 26, 12, 17, 20, 10, > -/* 5x */ 24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15, 4, 21, 18, 19, 27, > -/* 6x */ 11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22, 1, 13, 6, > -/* 7x */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 8, 5, 5, 5, 8, 14, 8, > -/* 8x */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27, > -/* 9x */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27, > -/* Ax */ 27, 25, 1, 1, 1, 1, 1, 0, 1, 1, 27, 27, 27, 27, 27, 27, > -/* Bx */ 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 9, 27, 27, 27, 27, 27, > -/* Cx */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27, > -/* Dx */ 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 27, 27, 27, 27, 27, 27, > -/* Ex */ 27, 27, 1, 1, 1, 1, 1, 0, 1, 1, 27, 27, 27, 27, 27, 27, > -/* Fx */ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 27, 27, 27, 27, 27, 27, > -#endif > }; > > -/* > - * The charMap() macro maps alphabetic characters (only) into their > - * lower-case ASCII equivalent. On ASCII machines, this is just > - * an upper-to-lower case map. On EBCDIC machines we also need > - * to adjust the encoding. The mapping is only valid for alphabetics > - * which are the only characters for which this feature is used. > +/** > + * The charMap() macro maps alphabetic characters (only) into > + * their lower-case ASCII equivalent. On ASCII machines, this > + * is just an upper-to-lower case map. > * > * Used by keywordhash.h > */ > -#ifdef SQLITE_ASCII > #define charMap(X) sqlite3UpperToLower[(unsigned char)X] > -#endif > -#ifdef SQLITE_EBCDIC > -#define charMap(X) ebcdicToAscii[(unsigned char)X] > -const unsigned char ebcdicToAscii[] = { > -/* 0 1 2 3 4 5 6 7 8 9 A B C D E F */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 1x */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 2x */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 3x */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 4x */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 5x */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95, 0, 0, /* 6x */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 7x */ > - 0, 97, 98, 99, 100, 101, 102, 103, 104, 105, 0, 0, 0, 0, 0, 0, /* 8x */ > - 0, 106, 107, 108, 109, 110, 111, 112, 113, 114, 0, 0, 0, 0, 0, 0, /* 9x */ > - 0, 0, 115, 116, 117, 118, 119, 120, 121, 122, 0, 0, 0, 0, 0, 0, /* Ax */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* Bx */ > - 0, 97, 98, 99, 100, 101, 102, 103, 104, 105, 0, 0, 0, 0, 0, 0, /* Cx */ > - 0, 106, 107, 108, 109, 110, 111, 112, 113, 114, 0, 0, 0, 0, 0, 0, /* Dx */ > - 0, 0, 115, 116, 117, 118, 119, 120, 121, 122, 0, 0, 0, 0, 0, 0, /* Ex */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* Fx */ > -}; > -#endif > > /* > * The sqlite3KeywordCode function looks up an identifier to determine if > @@ -167,360 +123,295 @@ const unsigned char ebcdicToAscii[] = { > */ > #include "keywordhash.h" > > -/* > - * If X is a character that can be used in an identifier then > - * IdChar(X) will be true. Otherwise it is false. > - * > - * For ASCII, any character with the high-order bit set is > - * allowed in an identifier. For 7-bit characters, > - * sqlite3IsIdChar[X] must be 1. > - * > - * For EBCDIC, the rules are more complex but have the same > - * end result. > +#define maybe_utf8(c) ((sqlite3CtypeMap[c] & 0x40) != 0) > + > +/** > + * Return true if current symbol is space. > * > - * Ticket #1066. the SQL standard does not allow '$' in the > - * middle of identifiers. But many SQL implementations do. > - * SQLite will allow '$' in identifiers for compatibility. > - * But the feature is undocumented. > + * @param z Input stream. > + * @retval True if current symbol space. > */ > -#ifdef SQLITE_ASCII > -#define IdChar(C) ((sqlite3CtypeMap[(unsigned char)C]&0x46)!=0) > -#endif > -#ifdef SQLITE_EBCDIC > -const char sqlite3IsEbcdicIdChar[] = { > -/* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF */ > - 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, /* 4x */ > - 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, /* 5x */ > - 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, /* 6x */ > - 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, /* 7x */ > - 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, /* 8x */ > - 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, /* 9x */ > - 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, /* Ax */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* Bx */ > - 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, /* Cx */ > - 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, /* Dx */ > - 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, /* Ex */ > - 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, /* Fx */ > -}; > - > -#define IdChar(C) (((c=C)>=0x42 && sqlite3IsEbcdicIdChar[c-0x40])) > -#endif > +static inline bool > +sql_is_space_char(const char *z) > +{ > + if (sqlite3Isspace(z[0])) > + return true; > + if (maybe_utf8(*(unsigned char*)z)) { > + UChar32 c; > + int unused = 0; > + U8_NEXT_UNSAFE(z, unused, c); > + if (u_isspace(c)) > + return true; > + } > + return false; > +} > > -/* > - * Return the length (in bytes) of the token that begins at z[0]. > - * Store the token type in *tokenType before returning. > +/** > + * Calculate length of continuous sequence of > + * space symbols. > + * > + * @param z Input stream. > + * @retval Number of bytes which constitute sequence of spaces. > + * Can be 0 if first symbol in stram is not space. > */ > +static inline int > +sql_skip_spaces(const char *z) > +{ > + int idx = 0; > + while (true) { > + if (sqlite3Isspace(z[idx])) { > + idx += 1; > + } else if (maybe_utf8(*(unsigned char *)(z + idx))) { > + UChar32 c; > + int new_offset = idx; > + U8_NEXT_UNSAFE(z, new_offset, c); > + if (!u_isspace(c)) > + break; > + idx = new_offset; > + } else { > + break; > + } > + } > + return idx; > +} > + > int > -sqlite3GetToken(const unsigned char *z, int *tokenType, bool *is_reserved) > +sql_token(const char *z, int *type, bool *is_reserved) > { > *is_reserved = false; > - int i, c; > - switch (aiClass[*z]) { /* Switch on the character-class of the first byte > - * of the token. See the comment on the CC_ defines > - * above. > - */ > - case CC_SPACE:{ > - testcase(z[0] == ' '); > - testcase(z[0] == '\t'); > - testcase(z[0] == '\n'); > - testcase(z[0] == '\f'); > - testcase(z[0] == '\r'); > - for (i = 1; sqlite3Isspace(z[i]); i++) { > + int i, n; > + char c, delim; > + /* Switch on the character-class of the first byte > + * of the token. See the comment on the CC_ defines > + * above. > + */ > + switch (sql_ascii_class[*(unsigned char*)z]) { > + case CC_SPACE: > + i = 1 + sql_skip_spaces(z+1); > + *type = TK_SPACE; > + return i; > + case CC_MINUS: > + if (z[1] == '-') { > + for (i = 2; (c = z[i]) != 0 && c != '\n'; i++) { > } > - *tokenType = TK_SPACE; > + *type = TK_SPACE; > return i; > } > - case CC_MINUS:{ > - if (z[1] == '-') { > - for (i = 2; (c = z[i]) != 0 && c != '\n'; i++) { > - } > - *tokenType = TK_SPACE; /* IMP: R-22934-25134 > - */ > - return i; > - } > - *tokenType = TK_MINUS; > + *type = TK_MINUS; > + return 1; > + case CC_LP: > + *type = TK_LP; > + return 1; > + case CC_RP: > + *type = TK_RP; > + return 1; > + case CC_SEMI: > + *type = TK_SEMI; > + return 1; > + case CC_PLUS: > + *type = TK_PLUS; > + return 1; > + case CC_STAR: > + *type = TK_STAR; > + return 1; > + case CC_SLASH: > + if (z[1] != '*' || z[2] == 0) { > + *type = TK_SLASH; > return 1; > } > - case CC_LP:{ > - *tokenType = TK_LP; > - return 1; > + for (i = 3, c = z[2]; > + (c != '*' || z[i] != '/') && (c = z[i]) != 0; > + i++) { > } > - case CC_RP:{ > - *tokenType = TK_RP; > + if (c) > + i++; > + *type = TK_SPACE; > + return i; > + case CC_PERCENT: > + *type = TK_REM; > + return 1; > + case CC_EQ: > + *type = TK_EQ; > + return 1 + (z[1] == '='); > + case CC_LT: > + if ((c = z[1]) == '=') { > + *type = TK_LE; > + return 2; > + } else if (c == '>') { > + *type = TK_NE; > + return 2; > + } else if (c == '<') { > + *type = TK_LSHIFT; > + return 2; > + } else { > + *type = TK_LT; > return 1; > } > - case CC_SEMI:{ > - *tokenType = TK_SEMI; > + case CC_GT: > + if ((c = z[1]) == '=') { > + *type = TK_GE; > + return 2; > + } else if (c == '>') { > + *type = TK_RSHIFT; > + return 2; > + } else { > + *type = TK_GT; > return 1; > } > - case CC_PLUS:{ > - *tokenType = TK_PLUS; > + case CC_BANG: > + if (z[1] != '=') { > + *type = TK_ILLEGAL; > return 1; > + } else { > + *type = TK_NE; > + return 2; > } > - case CC_STAR:{ > - *tokenType = TK_STAR; > + case CC_PIPE: > + if (z[1] != '|') { > + *type = TK_BITOR; > return 1; > + } else { > + *type = TK_CONCAT; > + return 2; > } > - case CC_SLASH:{ > - if (z[1] != '*' || z[2] == 0) { > - *tokenType = TK_SLASH; > - return 1; > - } > - for (i = 3, c = z[2]; > - (c != '*' || z[i] != '/') && (c = z[i]) != 0; > - i++) { > + case CC_COMMA: > + *type = TK_COMMA; > + return 1; > + case CC_AND: > + *type = TK_BITAND; > + return 1; > + case CC_TILDA: > + *type = TK_BITNOT; > + return 1; > + case CC_QUOTE: > + case CC_DQUOTE: > + delim = z[0]; > + for (i = 1; (c = z[i]) != 0; i++) { > + if (c == delim) { > + if (z[i + 1] == delim) > + i++; > + else > + break; > } > - if (c) > - i++; > - *tokenType = TK_SPACE; /* IMP: R-22934-25134 > - */ > + } > + if (c == '\'') { > + *type = TK_STRING; > + return i + 1; > + } else if (c != 0) { > + *type = TK_ID; > + return i + 1; > + } else { > + *type = TK_ILLEGAL; > return i; > } > - case CC_PERCENT:{ > - *tokenType = TK_REM; > + FALLTHROUGH; > + case CC_DOT: > + if (!sqlite3Isdigit(z[1])) { > + *type = TK_DOT; > return 1; > } > - case CC_EQ:{ > - *tokenType = TK_EQ; > - return 1 + (z[1] == '='); > - } > - case CC_LT:{ > - if ((c = z[1]) == '=') { > - *tokenType = TK_LE; > - return 2; > - } else if (c == '>') { > - *tokenType = TK_NE; > - return 2; > - } else if (c == '<') { > - *tokenType = TK_LSHIFT; > - return 2; > - } else { > - *tokenType = TK_LT; > - return 1; > + /* If the next character is a digit, this is a > + * floating point number that begins with ".". > + * Fall thru into the next case. > + */ > + FALLTHROUGH; > + case CC_DIGIT: > + *type = TK_INTEGER; > + if (z[0] == '0' && (z[1] == 'x' || z[1] == 'X') && > + sqlite3Isxdigit(z[2])) { > + for (i = 3; sqlite3Isxdigit(z[i]); i++) { > } > + return i; > } > - case CC_GT:{ > - if ((c = z[1]) == '=') { > - *tokenType = TK_GE; > - return 2; > - } else if (c == '>') { > - *tokenType = TK_RSHIFT; > - return 2; > - } else { > - *tokenType = TK_GT; > - return 1; > - } > + for (i = 0; sqlite3Isdigit(z[i]); i++) { > } > - case CC_BANG:{ > - if (z[1] != '=') { > - *tokenType = TK_ILLEGAL; > - return 1; > - } else { > - *tokenType = TK_NE; > - return 2; > + if (z[i] == '.') { > + while (sqlite3Isdigit(z[++i])) { > } > + *type = TK_FLOAT; > } > - case CC_PIPE:{ > - if (z[1] != '|') { > - *tokenType = TK_BITOR; > - return 1; > - } else { > - *tokenType = TK_CONCAT; > - return 2; > - } > + if ((z[i] == 'e' || z[i] == 'E') && > + (sqlite3Isdigit(z[i + 1]) > + || ((z[i + 1] == '+' || z[i + 1] == '-') && > + sqlite3Isdigit(z[i + 2])))) { > + i += 2; > + while (sqlite3Isdigit(z[i])) > + i++; > + *type = TK_FLOAT; > } > - case CC_COMMA:{ > - *tokenType = TK_COMMA; > - return 1; > + if (IdChar(z[i])) { > + *type = TK_ILLEGAL; > + while (IdChar(z[++i])) { > + } > } > - case CC_AND:{ > - *tokenType = TK_BITAND; > - return 1; > + return i; > + case CC_VARNUM: > + *type = TK_VARIABLE; > + for (i = 1; sqlite3Isdigit(z[i]); i++) { > } > - case CC_TILDA:{ > - *tokenType = TK_BITNOT; > - return 1; > + return i; > + case CC_DOLLAR: > + case CC_VARALPHA: > + n = 0; > + *type = TK_VARIABLE; > + for (i = 1; (c = z[i]) != 0; i++) { > + if (IdChar(c)) > + n++; > + else > + break; > } > - case CC_QUOTE: > - case CC_DQUOTE:{ > - int delim = z[0]; > - testcase(delim == '\''); > - testcase(delim == '"'); > - for (i = 1; (c = z[i]) != 0; i++) { > - if (c == delim) { > - if (z[i + 1] == delim) { > - i++; > - } else { > - break; > - } > - } > - } > - if (c == '\'') { > - *tokenType = TK_STRING; > - return i + 1; > - } else if (c != 0) { > - *tokenType = TK_ID; > - return i + 1; > - } else { > - *tokenType = TK_ILLEGAL; > - return i; > - } > - FALLTHROUGH; > + if (n == 0) > + *type = TK_ILLEGAL; > + return i; > + case CC_KYWD: > + for (i = 1; sql_ascii_class[*(unsigned char*)(z+i)] <= CC_KYWD; > + i++) { > } > - case CC_DOT:{ > -#ifndef SQLITE_OMIT_FLOATING_POINT > - if (!sqlite3Isdigit(z[1])) > -#endif > - { > - *tokenType = TK_DOT; > - return 1; > - } > - /* If the next character is a digit, this is a floating point > - * number that begins with ".". Fall thru into the next case > + if (!sql_is_space_char(z + i) && IdChar(z[i])) { > + /* This token started out using characters > + * that can appear in keywords, but z[i] is > + * a character not allowed within keywords, > + * so this must be an identifier instead. > */ > - FALLTHROUGH; > + i++; > + break; > } > - case CC_DIGIT:{ > - testcase(z[0] == '0'); > - testcase(z[0] == '1'); > - testcase(z[0] == '2'); > - testcase(z[0] == '3'); > - testcase(z[0] == '4'); > - testcase(z[0] == '5'); > - testcase(z[0] == '6'); > - testcase(z[0] == '7'); > - testcase(z[0] == '8'); > - testcase(z[0] == '9'); > - *tokenType = TK_INTEGER; > -#ifndef SQLITE_OMIT_HEX_INTEGER > - if (z[0] == '0' && (z[1] == 'x' || z[1] == 'X') > - && sqlite3Isxdigit(z[2])) { > - for (i = 3; sqlite3Isxdigit(z[i]); i++) { > - } > - return i; > + *type = TK_ID; > + return keywordCode(z, i, type, is_reserved); > + case CC_X: > + if (z[1] == '\'') { > + *type = TK_BLOB; > + for (i = 2; sqlite3Isxdigit(z[i]); i++) { > } > -#endif > - for (i = 0; sqlite3Isdigit(z[i]); i++) { > - } > -#ifndef SQLITE_OMIT_FLOATING_POINT > - if (z[i] == '.') { > - i++; > - while (sqlite3Isdigit(z[i])) { > + if (z[i] != '\'' || i % 2) { > + *type = TK_ILLEGAL; > + while (z[i] != 0 && z[i] != '\'') > i++; > - } > - *tokenType = TK_FLOAT; > } > - if ((z[i] == 'e' || z[i] == 'E') && > - (sqlite3Isdigit(z[i + 1]) > - || ((z[i + 1] == '+' || z[i + 1] == '-') > - && sqlite3Isdigit(z[i + 2])) > - ) > - ) { > - i += 2; > - while (sqlite3Isdigit(z[i])) { > - i++; > - } > - *tokenType = TK_FLOAT; > - } > -#endif > - while (IdChar(z[i])) { > - *tokenType = TK_ILLEGAL; > + if (z[i] != 0) > i++; > - } > - return i; > - } > - case CC_VARNUM:{ > - *tokenType = TK_VARIABLE; > - for (i = 1; sqlite3Isdigit(z[i]); i++) { > - } > return i; > } > - case CC_DOLLAR: > - case CC_VARALPHA:{ > - int n = 0; > - testcase(z[0] == '$'); > - testcase(z[0] == '@'); > - testcase(z[0] == ':'); > - testcase(z[0] == '#'); > - *tokenType = TK_VARIABLE; > - for (i = 1; (c = z[i]) != 0; i++) { > - if (IdChar(c)) { > - n++; > -#ifndef SQLITE_OMIT_TCL_VARIABLE > - } else if (c == '(' && n > 0) { > - do { > - i++; > - } while ((c = z[i]) != 0 > - && !sqlite3Isspace(c) > - && c != ')'); > - if (c == ')') { > - i++; > - } else { > - *tokenType = TK_ILLEGAL; > - } > - break; > - } else if (c == ':' && z[i + 1] == ':') { > - i++; > -#endif > - } else { > - break; > - } > - } > - if (n == 0) > - *tokenType = TK_ILLEGAL; > - return i; > - } > - case CC_KYWD:{ > - for (i = 1; aiClass[z[i]] <= CC_KYWD; i++) { > - } > - if (IdChar(z[i])) { > - /* This token started out using characters that can appear in keywords, > - * but z[i] is a character not allowed within keywords, so this must > - * be an identifier instead > - */ > - i++; > - break; > - } > - *tokenType = TK_ID; > - return keywordCode((char *)z, i, tokenType, is_reserved); > - } > - case CC_X:{ > -#ifndef SQLITE_OMIT_BLOB_LITERAL > - testcase(z[0] == 'x'); > - testcase(z[0] == 'X'); > - if (z[1] == '\'') { > - *tokenType = TK_BLOB; > - for (i = 2; sqlite3Isxdigit(z[i]); i++) { > - } > - if (z[i] != '\'' || i % 2) { > - *tokenType = TK_ILLEGAL; > - while (z[i] && z[i] != '\'') { > - i++; > - } > - } > - if (z[i]) > - i++; > - return i; > - } > -#endif > - /* If it is not a BLOB literal, then it must be an ID, since no > - * SQL keywords start with the letter 'x'. Fall through > - */ > - FALLTHROUGH; > - } > - case CC_ID:{ > - i = 1; > - break; > - } > - default:{ > - *tokenType = TK_ILLEGAL; > - return 1; > - } > + /* If it is not a BLOB literal, then it must be an > + * ID, since no SQL keywords start with the letter > + * 'x'. Fall through. > + */ > + FALLTHROUGH; > + case CC_ID: > + i = 1; > + break; > + default: > + *type = TK_ILLEGAL; > + return 1; > } > - while (IdChar(z[i])) { > - i++; > + int spaces_len = sql_skip_spaces(z); > + if (spaces_len > 0) { > + *type = TK_SPACE; > + return spaces_len; > } > - *tokenType = TK_ID; > + while (IdChar(z[i])) > + i++; > + *type = TK_ID; > return i; > } > > @@ -566,8 +457,8 @@ sqlite3RunParser(Parse * pParse, const char *zSql, char **pzErrMsg) > if (zSql[i] != 0) { > pParse->sLastToken.z = &zSql[i]; > pParse->sLastToken.n = > - sqlite3GetToken((u8 *) & zSql[i], &tokenType, > - &pParse->sLastToken.isReserved); > + sql_token(&zSql[i], &tokenType, > + &pParse->sLastToken.isReserved); > i += pParse->sLastToken.n; > if (i > mxSqlLen) { > pParse->rc = SQLITE_TOOBIG; > diff --git a/src/box/sql/util.c b/src/box/sql/util.c > index 8c4e7b9..0c2a050 100644 > --- a/src/box/sql/util.c > +++ b/src/box/sql/util.c > @@ -1228,12 +1228,7 @@ sqlite3HexToInt(int h) > { > assert((h >= '0' && h <= '9') || (h >= 'a' && h <= 'f') > || (h >= 'A' && h <= 'F')); > -#ifdef SQLITE_ASCII > h += 9 * (1 & (h >> 6)); > -#endif > -#ifdef SQLITE_EBCDIC > - h += 9 * (1 & ~(h >> 4)); > -#endif > return (u8) (h & 0xf); > } > > diff --git a/src/box/sql/vdbetrace.c b/src/box/sql/vdbetrace.c > index 8623e68..63e2311 100644 > --- a/src/box/sql/vdbetrace.c > +++ b/src/box/sql/vdbetrace.c > @@ -57,7 +57,7 @@ findNextHostParameter(const char *zSql, int *pnToken) > > *pnToken = 0; > while (zSql[0]) { > - n = sqlite3GetToken((u8 *) zSql, &tokenType, &unused); > + n = sql_token(zSql, &tokenType, &unused); > assert(n > 0 && tokenType != TK_ILLEGAL); > if (tokenType == TK_VARIABLE) { > *pnToken = n; > diff --git a/src/box/sql/whereexpr.c b/src/box/sql/whereexpr.c > index 34a1f13..c3a8634 100644 > --- a/src/box/sql/whereexpr.c > +++ b/src/box/sql/whereexpr.c > @@ -256,10 +256,6 @@ isLikeOrGlob(Parse * pParse, /* Parsing and code generating context */ > if (!sqlite3IsLikeFunction(db, pExpr, pnoCase, wc)) { > return 0; > } > -#ifdef SQLITE_EBCDIC > - if (*pnoCase) > - return 0; > -#endif > pList = pExpr->x.pList; > pLeft = pList->a[1].pExpr; > if (pLeft->op != TK_COLUMN || sqlite3ExprAffinity(pLeft) != SQLITE_AFF_TEXT /* Value might be numeric */ > diff --git a/test/sql-tap/e_expr.test.lua b/test/sql-tap/e_expr.test.lua > index d0f6895..f7f3b15 100755 > --- a/test/sql-tap/e_expr.test.lua > +++ b/test/sql-tap/e_expr.test.lua > @@ -1,6 +1,6 @@ > #!/usr/bin/env tarantool > test = require("sqltester") > -test:plan(14750) > +test:plan(14748) > > --!./tcltestrunner.lua > -- 2010 July 16 > @@ -1506,8 +1506,6 @@ local test_cases12 ={ > {10, "?123"}, > {11, "@hello"}, > {12, ":world"}, > - {13, "$tcl"}, > - {14, "$tcl(array)"}, > > {15, "cname"}, > {16, "tblname.cname"}, > diff --git a/test/sql-tap/unicode.test.lua b/test/sql-tap/unicode.test.lua > new file mode 100755 > index 0000000..1990739 > --- /dev/null > +++ b/test/sql-tap/unicode.test.lua > @@ -0,0 +1,37 @@ > +#!/usr/bin/env tarantool > +test = require("sqltester") > +test:plan(23 * 3) > + > +-- 23 entities > +local utf8_spaces = {"\u{0009}", "\u{000A}", "\u{000B}", "\u{000C}", "\u{000D}", > + "\u{0085}", "\u{1680}", "\u{2000}", "\u{2001}", "\u{2002}", > + "\u{2003}", "\u{2004}", "\u{2005}", "\u{2006}", "\u{2007}", > + "\u{2008}", "\u{2009}", "\u{200A}", "\u{2028}", "\u{2029}", > + "\u{202F}", "\u{205F}", "\u{3000}"} > +local spaces_cnt = 23 > + > +-- 1. Check UTF-8 single space > +for i, v in pairs(utf8_spaces) do > + test:do_execsql_test( > + "utf8-spaces-1."..i, > + "select" .. v .. "1", > + { 1 }) > +end > + > +-- 2. Check pair simple + UTF-8 space > +for i, v in pairs(utf8_spaces) do > + test:do_execsql_test( > + "utf8-spaces-2."..i, > + "select" .. v .. "1", > + { 1 }) > +end > + > +-- 3. Sequence of spaces > +for i, v in pairs(utf8_spaces) do > + test:do_execsql_test( > + "utf8-spaces-3."..i, > + "select" .. v .. " " .. utf8_spaces[spaces_cnt - i + 1] .. " 1", > + { 1 }) > +end > + > +test:finish_test() > > >