From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 50870220EC for ; Mon, 1 Jul 2019 10:21:01 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LKtNlRLNyaX1 for ; Mon, 1 Jul 2019 10:21:01 -0400 (EDT) Received: from smtp44.i.mail.ru (smtp44.i.mail.ru [94.100.177.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 5F9E621BF6 for ; Mon, 1 Jul 2019 10:21:00 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: [tarantool-patches] Re: [PATCH 1/6] sql: refactor sql_atoi64() From: "n.pettik" In-Reply-To: <0ae708be-9670-e726-b7ea-9faec8c007ea@tarantool.org> Date: Mon, 1 Jul 2019 17:20:56 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <9898E449-3735-4744-83A9-47B1DC64A72C@tarantool.org> References: <15f86ff05673055364c6a6bd11de4568e4c3854f.1559919361.git.korablev@tarantool.org> <0ae708be-9670-e726-b7ea-9faec8c007ea@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: Vladislav Shpilevoy >> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c >> index ba7eea59d..158631416 100644 >> --- a/src/box/sql/expr.c >> +++ b/src/box/sql/expr.c> @@ -3325,29 +3326,53 @@ = expr_code_int(struct Parse *parse, struct Expr *expr, bool is_neg, >> if (is_neg) >> i =3D -i; >> sqlVdbeAddOp2(v, OP_Integer, i, mem); >> + return; >> + } >> + int64_t value; >> + const char *z =3D expr->u.zToken; >> + assert(z !=3D NULL); >> + const char *sign =3D is_neg ? "-" : ""; >> + if (z[0] =3D=3D '0' && (z[1] =3D=3D 'x' || z[1] =3D=3D 'X')) { >> + uint64_t u =3D 0; >> + int i, k; >> + for (i =3D 2; z[i] =3D=3D '0'; i++); >> + for (k =3D i; sqlIsxdigit(z[k]); k++) >> + u =3D u * 16 + sqlHexToInt(z[k]); >=20 > 1. Why not to use strtoll and strtol? They accept 'base' argument. > Set it to 16, and they will do the same, but probably faster. NP: diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c index 158631416..182950e9f 100644 --- a/src/box/sql/expr.c +++ b/src/box/sql/expr.c @@ -3333,13 +3333,16 @@ expr_code_int(struct Parse *parse, struct Expr = *expr, bool is_neg, assert(z !=3D NULL); const char *sign =3D is_neg ? "-" : ""; if (z[0] =3D=3D '0' && (z[1] =3D=3D 'x' || z[1] =3D=3D 'X')) { - uint64_t u =3D 0; - int i, k; - for (i =3D 2; z[i] =3D=3D '0'; i++); - for (k =3D i; sqlIsxdigit(z[k]); k++) - u =3D u * 16 + sqlHexToInt(z[k]); - memcpy(&value, &u, 8); - if (z[k] !=3D 0 || k - i > 16) { + errno =3D 0; + char *end =3D NULL; + if (is_neg) { + value =3D strtoll(z, &end, 16); + } else { + value =3D strtoull(z, &end, 16); + if (value > INT64_MAX) + goto int_overflow; + } + if (errno !=3D 0) { diag_set(ClientError, ER_HEX_LITERAL_MAX, sign, = z, strlen(z) - 2, 16); parse->is_aborted =3D true; >=20 >> + memcpy(&value, &u, 8); >> + if (z[k] !=3D 0 || k - i > 16) { >=20 > 2. You treat tail spaces as an error. Not sure if it is right. This code has been removed (see fix above). >=20 >> + diag_set(ClientError, ER_HEX_LITERAL_MAX, sign, = z, >> + strlen(z) - 2, 16); >> + parse->is_aborted =3D true; >> + return; >> + } >> } else { >> - int64_t value; >> - const char *z =3D expr->u.zToken; >> - assert(z !=3D NULL); >> - int c =3D sql_dec_or_hex_to_i64(z, &value); >> - if (c =3D=3D 1 || (c =3D=3D 2 && !is_neg) || >> - (is_neg && value =3D=3D SMALLEST_INT64)) { >> - const char *sign =3D is_neg ? "-" : ""; >> - if (sql_strnicmp(z, "0x", 2) =3D=3D 0) { >> - diag_set(ClientError, = ER_HEX_LITERAL_MAX, sign, >> - z, strlen(z) - 2, 16); >> - } else { >> - diag_set(ClientError, = ER_INT_LITERAL_MAX, sign, >> - z, INT64_MIN, INT64_MAX); >> - } >> + size_t len =3D strlen(z); >> + if (is_neg) { >> + /* >> + * UINT64_MAX is constant, so each integer >> + * literal certainly contains less digits. >> + * It makes no sense to continue processing >> + * string if it contains more characters. >> + */ >> + if (len > UINT64_MAX_DIGIT_COUNT) >> + goto int_overflow; >> + char *neg_z =3D tt_static_buf(); >> + neg_z[0] =3D '-'; >> + memcpy(neg_z + 1, z, len++); >> + neg_z[len]=3D '\0'; >> + z =3D neg_z; >> + } >=20 > 3. This looks like a crutch. Why do you need strlen() and number of > digits here? Firstly, sql_atoi64 already returns an error in case of > too long integer. Secondly, if you wanted to add '-', it could be done > much faster - make 'value =3D -value;' in case of 'neg' with a = preliminary > check that the value is not too big. Will it work? It will, but we have to allow parsing in sql_atoi() positive values up = to INT64_MAX + 1. So, several tests fail with this change = (sql-tap/func.test.lua and sql/integer-overflow.test.lua). It is not critical, since in the next = patch we are going to allow using integers up to UINT64_MAX. Hence, I don=E2=80=99t = think this deserves any workaround, so I simply fixed tests. Diff: --- a/src/box/errcode.h +++ b/src/box/errcode.h @@ -242,7 +242,7 @@ struct errcode_record { /*187 */_(ER_SQL_ANALYZE_ARGUMENT, "ANALYZE statement = argument %s is not a base table") \ /*188 */_(ER_SQL_COLUMN_COUNT_MAX, "Failed to create space = '%s': space column count %d exceeds the limit (%d)") \ /*189 */_(ER_HEX_LITERAL_MAX, "Hex literal %s%s length = %d exceeds the supported limit (%d)") \ - /*190 */_(ER_INT_LITERAL_MAX, "Integer literal %s = exceeds the supported range %lld - %lld") \ + /*190 */_(ER_INT_LITERAL_MAX, "Integer literal %s%s = exceeds the supported range %lld - %lld") \ /*191 */_(ER_SQL_PARSER_LIMIT, "%s %d exceeds the limit = (%d)") \ /*192 */_(ER_INDEX_DEF_UNSUPPORTED, "%s are prohibited in an = index definition") \ /*193 */_(ER_CK_DEF_UNSUPPORTED, "%s are prohibited in a = CHECK constraint definition") \ diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c index 158631416..4bb5ac180 100644 --- a/src/box/sql/expr.c +++ b/src/box/sql/expr.c @@ -3347,31 +3350,19 @@ expr_code_int(struct Parse *parse, struct Expr = *expr, bool is_neg, } } else { size_t len =3D strlen(z); - if (is_neg) { - /* - * UINT64_MAX is constant, so each integer - * literal certainly contains less digits. - * It makes no sense to continue processing - * string if it contains more characters. - */ - if (len > UINT64_MAX_DIGIT_COUNT) - goto int_overflow; - char *neg_z =3D tt_static_buf(); - neg_z[0] =3D '-'; - memcpy(neg_z + 1, z, len++); - neg_z[len]=3D '\0'; - z =3D neg_z; - } bool unused; - int rc =3D sql_atoi64(z, &value, &unused, len); - if (rc !=3D 0) { + if (sql_atoi64(z, &value, &unused, len) !=3D 0 || + (is_neg && (uint64_t) value > (uint64_t) INT64_MAX + = 1) || + (!is_neg && (uint64_t) value > INT64_MAX)) { int_overflow: - diag_set(ClientError, ER_INT_LITERAL_MAX, z, = INT64_MIN, - INT64_MAX); + diag_set(ClientError, ER_INT_LITERAL_MAX, sign, = z, + INT64_MIN, INT64_MAX); parse->is_aborted =3D true; return; } } + if (is_neg) + value =3D -value; sqlVdbeAddOp4Dup8(v, OP_Int64, 0, mem, 0, (u8 *)&value, = P4_INT64); } =20 diff --git a/src/box/sql/util.c b/src/box/sql/util.c index 2e7c298e1..a2340f001 100644 --- a/src/box/sql/util.c +++ b/src/box/sql/util.c @@ -611,7 +611,7 @@ sql_atoi64(const char *z, int64_t *val, bool = *is_neg, int length) *val =3D strtoll(z, &end, 10); } else { uint64_t u_val =3D strtoull(z, &end, 10); - if (u_val > INT64_MAX) + if (u_val > (uint64_t) INT64_MAX + 1) return -1; *val =3D u_val; } diff --git a/test/sql-tap/func.test.lua b/test/sql-tap/func.test.lua index 09b1cf967..6cb14f3a3 100755 --- a/test/sql-tap/func.test.lua +++ b/test/sql-tap/func.test.lua @@ -912,25 +912,25 @@ test:do_execsql_test( -- }) =20 -test:do_execsql_test( +test:do_catchsql_test( "func-8.7", [[ SELECT typeof(sum(x)) FROM (SELECT '9223372036' || '854775808' = AS x UNION ALL SELECT -9223372036854775807) ]], { -- - "number" + 1, "Failed to execute SQL statement: integer overflow" -- }) =20 -test:do_execsql_test( +test:do_catchsql_test( "func-8.8", [[ SELECT sum(x)>0.0 FROM (SELECT '9223372036' || '854775808' AS x UNION ALL SELECT -9223372036850000000) ]], { -- - 1 + 1, "Failed to execute SQL statement: integer overflow" -- }) =20 diff --git a/test/sql/integer-overflow.result = b/test/sql/integer-overflow.result index d6ca66ec9..a4944d1a0 100644 --- a/test/sql/integer-overflow.result +++ b/test/sql/integer-overflow.result @@ -48,7 +48,11 @@ box.execute('SELECT 9223372036854775808 - 1;') -- box.execute('SELECT CAST(\'9223372036854775808\' AS INTEGER);') --- -- error: 'Type mismatch: can not convert 9223372036854775808 to = integer' +- metadata: + - name: CAST('9223372036854775808' AS INTEGER) + type: integer + rows: + - [-9223372036854775808] ... -- Due to inexact represantation of large integers in terms of -- floating point numbers, numerics with value < INT64_MAX >> }> diff --git a/src/box/sql/util.c b/src/box/sql/util.c >> index d5c93f8aa..2e7c298e1 100644 >> --- a/src/box/sql/util.c >> +++ b/src/box/sql/util.c >> @@ -593,120 +593,42 @@ sqlAtoF(const char *z, double *pResult, int = length) >> int >> -sql_atoi64(const char *z, int64_t *val, int length) >> +sql_atoi64(const char *z, int64_t *val, bool *is_neg, int length) >> { >> - int incr =3D 1; >> - u64 u =3D 0; >> - int neg =3D 0; /* assume positive */ >> - int i; >> - int c =3D 0; >> - int nonNum =3D 0; /* True if input contains UTF16 = with high byte non-zero */ >> - const char *zStart; >> - const char *zEnd =3D z + length; >> - incr =3D 1; >> - while (z < zEnd && sqlIsspace(*z)) >> - z +=3D incr; >> - if (z < zEnd) { >> - if (*z =3D=3D '-') { >> - neg =3D 1; >> - z +=3D incr; >> - } else if (*z =3D=3D '+') { >> - z +=3D incr; >> - } >> - } >> - zStart =3D z; >> - /* Skip leading zeros. */ >> - while (z < zEnd && z[0] =3D=3D '0') { >> - z +=3D incr; >> - } >> - for (i =3D 0; &z[i] < zEnd && (c =3D z[i]) >=3D '0' && c <=3D = '9'; >> - i +=3D incr) { >> - u =3D u * 10 + c - '0'; >> - } >> - if (u > LARGEST_INT64) { >> - *val =3D neg ? SMALLEST_INT64 : LARGEST_INT64; >> - } else if (neg) { >> - *val =3D -(i64) u; >> + *is_neg =3D false; >> + const char *str_end =3D z + length; >> + for (; z < str_end && sqlIsspace(*z); z++) >> + /* invalid format */ >=20 > 4. Usually we start sentences from capital letters and finish > with dots. Refactored a bit whole function (almost applied your diff): diff --git a/src/box/sql/util.c b/src/box/sql/util.c index 2e7c298e1..826e4a673 100644 --- a/src/box/sql/util.c +++ b/src/box/sql/util.c @@ -38,6 +38,7 @@ */ #include "sqlInt.h" #include +#include #if HAVE_ISNAN || SQL_HAVE_ISNAN #include #endif @@ -598,26 +599,24 @@ sql_atoi64(const char *z, int64_t *val, bool = *is_neg, int length) { *is_neg =3D false; const char *str_end =3D z + length; - for (; z < str_end && sqlIsspace(*z); z++) - /* invalid format */ + for (; z < str_end && isspace(*z); z++); if (z >=3D str_end) return -1; if (*z =3D=3D '-') *is_neg =3D true; =20 - char* end =3D NULL; + char *end =3D NULL; errno =3D 0; - if (*is_neg) { + if (*z =3D=3D '-') { + *is_neg =3D true; *val =3D strtoll(z, &end, 10); } else { + *is_neg =3D false; uint64_t u_val =3D strtoull(z, &end, 10); - if (u_val > INT64_MAX) + if (u_val > (uint64_t) INT64_MAX + 1) return -1; *val =3D u_val; } - /* No digits were found, e.g. an empty string. */ - if (end =3D=3D z) - return -1; >> + if (z >=3D str_end) >> + return -1; >> + if (*z =3D=3D '-') >> + *is_neg =3D true; >> + >> + char* end =3D NULL; >> + errno =3D 0; >> + if (*is_neg) { >=20 > 5. I don't like this double check. 4 lines above you already > know if the value is negative, but you double check it here. Ok, see diff above. >> + *val =3D strtoll(z, &end, 10); >> } else { >> - *val =3D (i64) u; >> + uint64_t u_val =3D strtoull(z, &end, 10); >> + if (u_val > INT64_MAX) >> + return -1; >> + *val =3D u_val; >> } >> - if (&z[i] < zEnd || (i =3D=3D 0 && zStart =3D=3D z) || i > 19 * = incr || >> - nonNum) { >> - /* zNum is empty or contains non-numeric text or is = longer >> - * than 19 digits (thus guaranteeing that it is too = large) >> - */ >> - return 1; >> - } else if (i < 19 * incr) { >> - /* Less than 19 digits, so we know that it fits in 64 = bits */ >> - assert(u <=3D LARGEST_INT64); >> - return 0; >> - } else { >> - /* zNum is a 19-digit numbers. Compare it against = 9223372036854775808. */ >> - c =3D compare2pow63(z, incr); >> - if (c < 0) { >> - /* zNum is less than 9223372036854775808 so it = fits */ >> - assert(u <=3D LARGEST_INT64); >> - return 0; >> - } else if (c > 0) { >> - /* zNum is greater than 9223372036854775808 so = it overflows */ >> - return 1; >> - } else { >> - /* zNum is exactly 9223372036854775808. Fits if = negative. The >> - * special case 2 overflow if positive >> - */ >> - assert(u - 1 =3D=3D LARGEST_INT64); >> - return neg ? 0 : 2; >> - } >> - } >> -} >> + /* No digits were found, e.g. an empty string. */ >> + if (end =3D=3D z) >> + return -1; >=20 > 6. You have already checked emptiness above with > this condition: 'if (z >=3D str_end)=E2=80=99. Indeed, see diff above. >> + /* >> + * String has been parsed, but there's >> + * additional character(s) remained. >> + */ >> + if (*end !=3D '\0') >> + return -1; >=20 > 7. What if other symbols are spaces? So, you are speaking about situation like: SELECT CAST(=E2=80=9C123 =E2=80=9C AS INTEGER); According to ANSI cast is allowed in this case: =E2=80=A6 8) If TD is exact numeric, then =E2=80=A6 b) If SD is character string, then SV is replaced by SV with any leading = or trailing s removed. Hence, you are right, we should skip trailing space as well. Applied your diff: diff --git a/src/box/sql/util.c b/src/box/sql/util.c index 826e4a673..e73fe3f87 100644 --- a/src/box/sql/util.c +++ b/src/box/sql/util.c @@ -618,21 +618,18 @@ sql_atoi64(const char *z, int64_t *val, bool = *is_neg, int length) *val =3D u_val; } - /* - * String has been parsed, but there's - * additional character(s) remained. - */ - if (*end !=3D '\0') - return -1; /* Overflow and underflow errors. */ if (errno !=3D 0) return -1; + for (; *end !=3D 0; ++end) { + if (! isspace(*end)) + return -1; + } return 0; } >> + /* Overflow and underflow errors. */ >> + if (errno !=3D 0) >> + return -1; >=20 > 8. You didn't remove 'sql_atoi64() =3D=3D SQL_OK' in some > places. Thx, applied this part of your diff. > diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c > index 6ecdb26fc..11a92372b 100644 > --- a/src/box/sql/vdbe.c > +++ b/src/box/sql/vdbe.c > @@ -386,7 +386,7 @@ static u16 SQL_NOINLINE computeNumericType(Mem = *pMem) > if (sqlAtoF(pMem->z, &pMem->u.r, pMem->n)=3D=3D0) > return 0; > bool is_neg; > - if (sql_atoi64(pMem->z, (int64_t *)&pMem->u.i, &is_neg, = pMem->n)=3D=3DSQL_OK) > + if (sql_atoi64(pMem->z, &pMem->u.i, &is_neg, pMem->n) =3D=3D 0) This cast is required.. Otherwise on linux compilation error is raised: /tarantool/src/box/sql/vdbe.c:387:26: error: passing argument 2 of = =E2=80=98sql_atoi64=E2=80=99 from incompatible pointer type = [-Werror=3Dincompatible-pointer-types] if (sql_atoi64(pMem->z, &pMem->u.i, &is_neg, pMem->n) =3D=3D 0) ^ In file included from /tarantool/src/box/sql/vdbe.c:47:0: /tarantool/src/box/sql/sqlInt.h:4435:1: note: expected =E2=80=98int64_t = * {aka long int *}=E2=80=99 but argument is of type =E2=80=98i64 * {aka = long long int *}=E2=80=99 sql_atoi64(const char *z, int64_t *val, bool *is_neg, int length);