From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 4AF57228AD for ; Mon, 1 Jul 2019 10:21:31 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id io2ak0Ru8YQR for ; Mon, 1 Jul 2019 10:21:31 -0400 (EDT) Received: from smtp44.i.mail.ru (smtp44.i.mail.ru [94.100.177.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id E0350220EC for ; Mon, 1 Jul 2019 10:21:30 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: [tarantool-patches] Re: [PATCH 3/6] sql: refactor arithmetic operations to support unsigned ints From: "n.pettik" In-Reply-To: <9233795e-a77c-565a-9bd5-3712499e7fce@tarantool.org> Date: Mon, 1 Jul 2019 17:21:29 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <53B73D9B-740B-469F-99DC-FF2FA14E16BA@tarantool.org> References: <7d8e776bc82cb9d25460c2104e687540526aa7cd.1559919361.git.korablev@tarantool.org> <9233795e-a77c-565a-9bd5-3712499e7fce@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: Vladislav Shpilevoy >> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >> index f4c1cbcca..457c9b92b 100644 >> --- a/src/box/sql/func.c >> +++ b/src/box/sql/func.c >> @@ -1620,9 +1622,13 @@ sum_step(struct sql_context *context, int = argc, sql_value **argv) >> p->cnt++; >> if (type =3D=3D MP_INT || type =3D=3D MP_UINT) { >> int64_t v =3D sql_value_int64(argv[0]); >> - p->rSum +=3D v; >> + if (type =3D=3D MP_INT) >> + p->rSum +=3D v; >> + else >> + p->rSum +=3D (uint64_t) v; >> if ((p->approx | p->overflow) =3D=3D 0 && >> - sqlAddInt64(&p->iSum, v) !=3D 0) { >> + sql_add_int(p->iSum, p->is_neg, v, v < 0, &p->iSum, >=20 > 1. As I understand, now sql_value_int64() returns an integer buffer, > and its result can't be compared with 0. How do you do it here with > 'v=E2=80=99? It is fixed in the next patch. Fix itself is obvious. Moved this change to current patch. diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 457c9b92b..77340c7fb 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1626,8 +1626,9 @@ sum_step(struct sql_context *context, int argc, = sql_value **argv) p->rSum +=3D v; else p->rSum +=3D (uint64_t) v; + if ((p->approx | p->overflow) =3D=3D 0 && - sql_add_int(p->iSum, p->is_neg, v, v < 0, &p->iSum, + sql_add_int(p->iSum, p->is_neg, v, type =3D=3D = MP_INT, &p->iSum, &p->is_neg) !=3D 0) { p->overflow =3D 1; >> diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h >> index da17e24ed..c0e2ab699 100644 >> --- a/src/box/sql/sqlInt.h >> +++ b/src/box/sql/sqlInt.h >> @@ -4461,10 +4461,37 @@ Expr *sqlExprAddCollateString(Parse *, Expr = *, const char *); >> +int >> +sql_add_int(int64_t lhs, bool is_lhs_neg, int64_t rhs, bool = is_rhs_neg, >> + int64_t *res, bool *is_res_neg); >> + >> +int >> +sql_sub_int(int64_t lhs, bool is_lhs_neg, int64_t rhs, bool = is_rhs_neg, >> + int64_t *res, bool *is_res_neg); >> + >> +int >> +sql_mul_int(int64_t lhs, bool is_lhs_neg, int64_t rhs, bool = is_rhs_neg, >> + int64_t *res, bool *is_res_neg); >> + >> +int >> +sql_div_int(int64_t lhs, bool is_lhs_neg, int64_t rhs, bool = is_rhs_neg, >> + int64_t *res, bool *is_res_neg); >> + >> +int >> +sql_rem_int(int64_t lhs, bool is_lhs_neg, int64_t rhs, bool = is_rhs_neg, >> + int64_t *res, bool *is_res_neg); >=20 > 2. That function is usually called 'mod', not 'rem=E2=80=99. Opcode is called OP_Reminder, so to avoid confusions I guess it would be better to keep _rem_ name. If you insist I will rename it. >> diff --git a/src/box/sql/util.c b/src/box/sql/util.c >> index 2e7c298e1..ee6a83ad5 100644 >> --- a/src/box/sql/util.c >> +++ b/src/box/sql/util.c >> @@ -1194,89 +1194,196 @@ sqlSafetyCheckSickOrOk(sql * db) >> } >> } >>=20 >> -/* >> - * Attempt to add, substract, or multiply the 64-bit signed value iB = against >> - * the other 64-bit signed integer at *pA and store the result in = *pA. >> - * Return 0 on success. Or if the operation would have resulted in = an >> - * overflow, leave *pA unchanged and return 1. >> - */ >> int >> -sqlAddInt64(i64 * pA, i64 iB) >> +sql_add_int(int64_t lhs, bool is_lhs_neg, int64_t rhs, bool = is_rhs_neg, >> + int64_t *res, bool *is_res_neg) >> { >> - i64 iA =3D *pA; >> - testcase(iA =3D=3D 0); >> - testcase(iA =3D=3D 1); >> - testcase(iB =3D=3D -1); >> - testcase(iB =3D=3D 0); >> - if (iB >=3D 0) { >> - testcase(iA > 0 && LARGEST_INT64 - iA =3D=3D iB); >> - testcase(iA > 0 && LARGEST_INT64 - iA =3D=3D iB - 1); >> - if (iA > 0 && LARGEST_INT64 - iA < iB) >> - return 1; >> - } else { >> - testcase(iA < 0 && -(iA + LARGEST_INT64) =3D=3D iB + 1); >> - testcase(iA < 0 && -(iA + LARGEST_INT64) =3D=3D iB + 2); >> - if (iA < 0 && -(iA + LARGEST_INT64) > iB + 1) >> - return 1; >> + /* Addition of two negative integers. */ >> + if (is_lhs_neg && is_rhs_neg) { >> + assert(lhs < 0 && rhs < 0); >> + /* This is the same as (lhs + rhs) < INT64_MIN */ >> + if (-(lhs + INT64_MAX) > rhs + 1) >> + return -1; >=20 > 3. Why so complex? Why not 'lhs < INT64_MIN - rhs'? I did it, and the > tests pass. If they should not, then please, provide a test. Well, it is the same condition. Let=E2=80=99s use your variant: diff --git a/src/box/sql/util.c b/src/box/sql/util.c index 3cbcda04f..73d7b95b6 100644 --- a/src/box/sql/util.c +++ b/src/box/sql/util.c @@ -1199,7 +1199,7 @@ sql_add_int(int64_t lhs, bool is_lhs_neg, int64_t = rhs, bool is_rhs_neg, if (is_lhs_neg && is_rhs_neg) { assert(lhs < 0 && rhs < 0); /* This is the same as (lhs + rhs) < INT64_MIN */ - if (-(lhs + INT64_MAX) > rhs + 1) + if (lhs < INT64_MIN - rhs) return -1; *is_res_neg =3D true; *res =3D lhs + rhs; >> + /* >> + * Make sure we've got only one combination of >> + * positive and negative operands. >> + */ >> + if (is_lhs_neg) { >> + SWAP(is_lhs_neg, is_rhs_neg); >> + SWAP(lhs, rhs); >> + } >> + assert(! is_lhs_neg && is_rhs_neg); >=20 > 4. You have spent more code lines on the SWAP and its > explanation, than could on avoidance of this SWAP. In > addition, SWAP version is slower obviously. Ok, np, refactored: @@ -1216,16 +1216,8 @@ sql_add_int(int64_t lhs, bool is_lhs_neg, int64_t = rhs, bool is_rhs_neg, *res =3D lhs + rhs; return 0; } - /* - * Make sure we've got only one combination of - * positive and negative operands. - */ - if (is_lhs_neg) { - SWAP(is_lhs_neg, is_rhs_neg); - SWAP(lhs, rhs); - } - assert(! is_lhs_neg && is_rhs_neg); - *is_res_neg =3D (uint64_t)(-rhs) > (uint64_t) lhs; + *is_res_neg =3D is_rhs_neg ? (uint64_t)(-rhs) > (uint64_t) lhs : + (uint64_t)(-lhs) > (uint64_t) rhs; *res =3D lhs + rhs; return 0; } >=20 > - /* > - * Make sure we've got only one combination of > - * positive and negative operands. > - */ > - if (is_lhs_neg) { > - SWAP(is_lhs_neg, is_rhs_neg); > - SWAP(lhs, rhs); > - } > - assert(! is_lhs_neg && is_rhs_neg); > - *is_res_neg =3D (uint64_t)(-rhs) > (uint64_t) lhs; > + if (is_rhs_neg) > + *is_res_neg =3D (uint64_t)(-rhs) > (uint64_t) lhs; > + else > + *is_res_neg =3D (uint64_t)(-lhs) > (uint64_t) rhs; > *res =3D lhs + rhs; > return 0; >> int >> -sqlSubInt64(i64 * pA, i64 iB) >> +sql_sub_int(int64_t lhs, bool is_lhs_neg, int64_t rhs, bool = is_rhs_neg, >> + int64_t *res, bool *is_res_neg) >> { >> - testcase(iB =3D=3D SMALLEST_INT64 + 1); >> - if (iB =3D=3D SMALLEST_INT64) { >> - testcase((*pA) =3D=3D (-1)); >> - testcase((*pA) =3D=3D 0); >> - if ((*pA) >=3D 0) >> - return 1; >> - *pA -=3D iB; >> + if (!is_lhs_neg && !is_rhs_neg) { >> + uint64_t u_lhs =3D (uint64_t) lhs; >> + uint64_t u_rhs =3D (uint64_t) rhs; >> + /* >> + * Unsigned arithmetic has no overflows, so to >> + * check if lhs is less than rhs we can compare >> + * result of their subtraction with lhs. >=20 > 5. Why not to compare just the values? They are of the same > sign, use '<=E2=80=98. Thx, fixed: @@ -1237,19 +1229,15 @@ sql_sub_int(int64_t lhs, bool is_lhs_neg, = int64_t rhs, bool is_rhs_neg, if (!is_lhs_neg && !is_rhs_neg) { uint64_t u_lhs =3D (uint64_t) lhs; uint64_t u_rhs =3D (uint64_t) rhs; - /* - * Unsigned arithmetic has no overflows, so to - * check if lhs is less than rhs we can compare - * result of their subtraction with lhs. - */ - *is_res_neg =3D u_lhs < (uint64_t)(u_lhs - u_rhs); - if (! *is_res_neg) { + if (u_lhs >=3D u_rhs) { + *is_res_neg =3D false; *res =3D u_lhs - u_rhs; return 0; } if (u_rhs > (uint64_t) INT64_MAX + 1 && u_lhs < (uint64_t)(u_rhs - INT64_MAX - 1)) return -1; + *is_res_neg =3D true; *res =3D lhs - rhs; return 0; } >> + */ >> + *is_res_neg =3D u_lhs < (uint64_t)(u_lhs - u_rhs); >> + if (! *is_res_neg) { >> + *res =3D u_lhs - u_rhs; >> + return 0; >> + } >> + if (u_rhs > (uint64_t) INT64_MAX + 1 && >> + u_lhs < (uint64_t)(u_rhs - INT64_MAX - 1)) >=20 > 6. Too complex. You wanted to check u_lhs - u_rhs < INT64_MIN. > Then check it, in a one comparison. >=20 > u_lhs - u_rhs < INT64_MIN >=20 > But u_lhs is < u_rhs, so swap them and change the signs: >=20 > u_rhs - u_lhs > (uint64_t) INT64_MAX + 1 >=20 > No overflows are possible here. Thx! Applied this diff: - if (u_rhs > (uint64_t) INT64_MAX + 1 && - u_lhs < (uint64_t)(u_rhs - INT64_MAX - 1)) + if (u_rhs - u_lhs > (uint64_t) INT64_MAX + 1) return -1; >> int >> -sqlMulInt64(i64 * pA, i64 iB) >> +sql_mul_int(int64_t lhs, bool is_lhs_neg, int64_t rhs, bool = is_rhs_neg, >> + int64_t *res, bool *is_res_neg) >> { >> - i64 iA =3D *pA; >> - if (iB > 0) { >> - if (iA > LARGEST_INT64 / iB) >> - return 1; >> - if (iA < SMALLEST_INT64 / iB) >> - return 1; >> - } else if (iB < 0) { >> - if (iA > 0) { >> - if (iB < SMALLEST_INT64 / iA) >> - return 1; >> - } else if (iA < 0) { >> - if (iB =3D=3D SMALLEST_INT64) >> - return 1; >> - if (iA =3D=3D SMALLEST_INT64) >> - return 1; >> - if (-iA > LARGEST_INT64 / -iB) >> - return 1; >> - } >> + if (lhs =3D=3D 0 || rhs =3D=3D 0) { >> + *res =3D 0; >> + *is_res_neg =3D false; >> + return 0; >> + } >> + /* >> + * Multiplication of integers with same sign leads to >> + * unsigned result. >> + */ >> + if (! (is_lhs_neg ^ is_rhs_neg)) { >=20 > 7. Dear God, what is it? What about '!=3D' operator? Is it > disabled, or !^ is faster? Ok, replaced: @@ -1282,7 +1269,7 @@ sql_mul_int(int64_t lhs, bool is_lhs_neg, int64_t = rhs, bool is_rhs_neg, * Multiplication of integers with same sign leads to * unsigned result. */ - if (! (is_lhs_neg ^ is_rhs_neg)) { + if (is_lhs_neg =3D=3D is_rhs_neg) { uint64_t u_res =3D is_lhs_neg ? (uint64_t) (-lhs) * (uint64_t) (-rhs) : (uint64_t) lhs * (uint64_t) (rhs); >> + uint64_t u_res =3D is_lhs_neg ? >> + (uint64_t) (-lhs) * (uint64_t) (-rhs) : >> + (uint64_t) lhs * (uint64_t) (rhs); >> + /* >> + * Overflow detection is quite primitive due to >> + * the absence of overflow with unsigned values: >> + * lhs * rhs =3D=3D res --> rhs =3D=3D res / lhs; >> + * If this predicate is false, then result was >> + * reduced modulo UINT_MAX + 1. >> + */ >> + if ((is_lhs_neg && u_res / (uint64_t) (-lhs) !=3D >> + (uint64_t) (-rhs)) || >> + (!is_lhs_neg && u_res / (uint64_t) lhs !=3D = (uint64_t) rhs)) >> + return -1; >> + *is_res_neg =3D false; >> + *res =3D u_res; >> + return 0; >> + } >> + /* >> + * Make sure we've got only one combination of >> + * positive and negative operands. >> + */ >> + if (is_lhs_neg) { >> + SWAP(is_lhs_neg, is_rhs_neg); >> + SWAP(lhs, rhs); >> } >> - *pA =3D iA * iB; >> + assert(! is_lhs_neg && is_rhs_neg); >> + uint64_t u_rhs =3D (uint64_t) (-rhs); >> + uint64_t u_res =3D u_rhs * (uint64_t) lhs; >> + if (u_res / u_rhs !=3D (uint64_t) lhs || >> + u_rhs * (uint64_t) lhs > (uint64_t) INT64_MAX + 1) >=20 > 8. You have already calculated that multiplication, it is > u_res. @@ -1313,7 +1300,7 @@ sql_mul_int(int64_t lhs, bool is_lhs_neg, int64_t = rhs, bool is_rhs_neg, uint64_t u_rhs =3D (uint64_t) (-rhs); uint64_t u_res =3D u_rhs * (uint64_t) lhs; if (u_res / u_rhs !=3D (uint64_t) lhs || - u_rhs * (uint64_t) lhs > (uint64_t) INT64_MAX + 1) + u_res > (uint64_t) INT64_MAX + 1) return -1; *is_res_neg =3D true; *res =3D lhs * rhs; >> +int >> +sql_div_int(int64_t lhs, bool is_lhs_neg, int64_t rhs, bool = is_rhs_neg, >> + int64_t *res, bool *is_res_neg) >> +{ >> + if (lhs =3D=3D 0) { >> + *res =3D 0; >> + *is_res_neg =3D false; >> + return 0; >> + } >> + /* >> + * The only possible overflow situations is when operands >> + * of different signs and result turns out to be less >> + * than INT64_MIN. >> + */ >> + if (is_lhs_neg ^ is_rhs_neg) { >=20 > 9. =F0=9F=98=A2 @@ -1334,7 +1321,7 @@ sql_div_int(int64_t lhs, bool is_lhs_neg, int64_t = rhs, bool is_rhs_neg, * of different signs and result turns out to be less * than INT64_MIN. */ - if (is_lhs_neg ^ is_rhs_neg) { + if (is_lhs_neg !=3D is_rhs_neg) { uint64_t u_res =3D is_lhs_neg ? (uint64_t) (-lhs) / (uint64_t) rhs : (uint64_t) lhs / (uint64_t) (-rhs); >> + uint64_t u_res =3D is_lhs_neg ? >> + (uint64_t) (-lhs) / (uint64_t) rhs : >> + (uint64_t) lhs / (uint64_t) (-rhs); >> + if (u_res > (uint64_t) INT64_MAX + 1) >> + return -1; >> + *is_res_neg =3D u_res !=3D 0 ? true : false; >=20 > 10. Does not it look tautological to you? 'if then true else = false=E2=80=99. if (u_res > (uint64_t) INT64_MAX + 1) return -1; - *is_res_neg =3D u_res !=3D 0 ? true : false; + *is_res_neg =3D u_res !=3D 0; *res =3D -u_res; return 0; } >> + *res =3D -u_res; >> + return 0; >> + } >> + *is_res_neg =3D false; >> + /* >> + * Another one special case: INT64_MIN / -1 >> + * Signed division leads to program termination due >> + * to overflow. >> + */ >> + if (is_lhs_neg && lhs =3D=3D INT64_MIN && rhs =3D=3D -1) { >> + *res =3D (uint64_t) INT64_MAX + 1; >> + return 0; >> + } >> + *res =3D is_lhs_neg ? (uint64_t) (lhs / rhs) : >=20 > 11. Why do you need that cast? *res is int64_t, so it makes no > sense to cast the rvalue to uint64_t. Otherwise compilation error is raised (on linux target): /tarantool/src/box/sql/util.c: In function =E2=80=98sql_div_int=E2=80=99: /tarantool/src/box/sql/util.c:1342:34: error: signed and unsigned type = in conditional expression [-Werror=3Dsign-compare] *res =3D is_lhs_neg ? (lhs / rhs) : (uint64_t) lhs / (uint64_t) rhs; >> + (uint64_t) lhs / (uint64_t) rhs; >> return 0; >> } >>=20 >> -/* >> - * Compute the absolute value of a 32-bit signed integer, of = possible. Or >> - * if the integer has a value of -2147483648, return +2147483647 >> - */ >> int >> -sqlAbsInt32(int x) >> +sql_rem_int(int64_t lhs, bool is_lhs_neg, int64_t rhs, bool = is_rhs_neg, >> + int64_t *res, bool *is_res_neg) >> { >> - if (x >=3D 0) >> - return x; >> - if (x =3D=3D (int)0x80000000) >> - return 0x7fffffff; >> - return -x; >> + if (is_lhs_neg) { >> + uint64_t u_lhs =3D (uint64_t) (-lhs); >> + uint64_t u_res =3D u_lhs % (uint64_t) rhs; >=20 > 12. Why do you ignore sign of 'rhs'? You can't just cast it, > if it is negative. Yep, you are right, fixed: @@ -1365,7 +1351,9 @@ sql_rem_int(int64_t lhs, bool is_lhs_neg, int64_t = rhs, bool is_rhs_neg, { if (is_lhs_neg) { uint64_t u_lhs =3D (uint64_t) (-lhs); - uint64_t u_res =3D u_lhs % (uint64_t) rhs; + uint64_t u_rhs =3D is_rhs_neg ? (uint64_t) (-rhs) : + (uint64_t) rhs; + uint64_t u_res =3D u_lhs % u_rhs; if (u_res > (uint64_t) INT64_MAX + 1) return -1; *res =3D -u_res; >> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c >> index d141397a0..9c28b9131 100644 >> --- a/src/box/sql/vdbe.c >> +++ b/src/box/sql/vdbe.c >> @@ -1621,27 +1621,48 @@ case OP_Remainder: { /* same as = TK_REM, in1, in2, out3 */ >> (type2 & (MEM_Int | MEM_UInt)) !=3D0) { >> iA =3D pIn1->u.i; >> iB =3D pIn2->u.i; >> + bool is_lhs_neg =3D pIn1->flags & MEM_Int; >> + bool is_rhs_neg =3D pIn2->flags & MEM_Int; >> + bool is_res_neg; >> bIntint =3D 1; >> switch( pOp->opcode) { >> - case OP_Add: if (sqlAddInt64(&iB,iA)) goto = integer_overflow; break; >> - case OP_Subtract: if (sqlSubInt64(&iB,iA)) goto = integer_overflow; break; >> - case OP_Multiply: if (sqlMulInt64(&iB,iA)) goto = integer_overflow; break; >> + case OP_Add: { >> + if (sql_add_int(iA, is_lhs_neg, iB, is_rhs_neg, >> + (int64_t *) &iB, &is_res_neg) !=3D= 0) >=20 > 13. Why do you need all these casts to (int64_t *)? These variables = already > are signed 64 bit integers. On linux (Travis targets) I get compilation errors since there they (i64 and int64_t) are considered to be different types.=20 >> @@ -5134,7 +5175,11 @@ case OP_OffsetLimit: { /* in1, out2, in3 */ >> assert(pIn1->flags & (MEM_Int | MEM_UInt)); >> assert(pIn3->flags & (MEM_Int | MEM_UInt)); >> x =3D pIn1->u.i; >> - if (x<=3D0 || sqlAddInt64(&x, pIn3->u.i > 0 ? pIn3->u.i : 0)) { >> + int64_t rhs =3D pIn3->flags & MEM_Int ? 0 : pIn3->u.u; >> + bool unused; >> + if ((x =3D=3D 0 || pIn1->flags & MEM_Int) || >> + sql_add_int(x, pIn1->flags & MEM_Int, rhs, false, >=20 > 14. If you get to this line, then (pIn1->flags & MEM_Int) is already > 0 and can be inlined. Wait, why? If x =3D=3D 0 then pIn1->flags =3D=3D MEM_UInt - we consider 0 as an unsigned value. >> + (int64_t *) &x, &unused) !=3D 0) { >> /* If the LIMIT is less than or equal to zero, loop = forever. This >> * is documented. But also, if the LIMIT+OFFSET exceeds = 2^63 then >> * also loop forever. This is undocumented. In fact, = one could argue >=20 > Consider my review fixes here and on the branch in a > separate commit. These changes and the comments are arguable, > probably I am wrong somewhere. Almost all applied (see diffs above).