From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 11DEB2319C for ; Thu, 1 Aug 2019 11:56:15 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id h8REQJSQDT1n for ; Thu, 1 Aug 2019 11:56:14 -0400 (EDT) Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 2E15520EF1 for ; Thu, 1 Aug 2019 11:56:14 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: [tarantool-patches] Re: [PATCH 1/2] sql: extend struct Mem with field_type member From: "n.pettik" In-Reply-To: <351df50f-d096-45de-0d04-c9cee83d1289@tarantool.org> Date: Thu, 1 Aug 2019 18:56:11 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <9DC6E648-2C15-4C1C-A2B3-AEA174FA7CD3@tarantool.org> References: <8be21965-b8ef-4f96-8ce3-99c2efcecfc0@tarantool.org> <351df50f-d096-45de-0d04-c9cee83d1289@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: Vladislav Shpilevoy > On 31 Jul 2019, at 19:59, Vladislav Shpilevoy = wrote: > Hi! Thanks for the fixes! See 5 comments below. > On 31/07/2019 11:14, n.pettik wrote: >>=20 >>>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >>>> index fcf147c96..f50df105d 100644 >>>> --- a/src/box/sql/func.c >>>> +++ b/src/box/sql/func.c >>>> @@ -107,6 +107,17 @@ typeofFunc(sql_context * context, int NotUsed, = sql_value ** argv) >>>> { >>>> const char *z =3D 0; >>>> UNUSED_PARAMETER(NotUsed); >>>> + enum field_type f_t =3D argv[0]->field_type; >>>=20 >>> 1. Perhaps, field_type is not the best name, because Mem is not = always >>> is a field. It can be just a value, not from a space. It is rather >>> just 'type=E2=80=99. >>=20 >> When memory represents smth different from value fetched from >> tuple, it is set to field_type_MAX. So I=E2=80=99d rather not rename = this member. >=20 > 1. Ok, now I got it. It works only for tuple fields, and is totally = disabled > for other values. Or not? See other comments. Sort of. The only addition to this rule is CAST operation to display appropriate type for NULL values. >>> 7. All these manual assignments of field_type are broken by design. >>> When you have to update field_type after each operation manually, >>> you will miss some places for sure. For example: >>>=20 >>> tarantool> box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a = INTEGER, b INTEGER)') >>> --- >>> - row_count: 1 >>> ... >>>=20 >>> tarantool> box.execute('INSERT INTO test VALUES (1, NULL, NULL)') >>> --- >>> - row_count: 1 >>> ... >>>=20 >>> tarantool> box.execute('SELECT typeof(a & b) FROM test') >>> --- >>> - metadata: >>> - name: typeof(a & b) >>> type: string >>> rows: >>> - ['null'] >>> ... >>>=20 >>> tarantool> box.execute('SELECT typeof(a), typeof(b), typeof(a & b) = FROM test') >>> --- >>> - metadata: >>> - name: typeof(a) >>> type: string >>> - name: typeof(b) >>> type: string >>> - name: typeof(a & b) >>> type: string >>> rows: >>> - ['integer', 'integer', 'integer'] >>> ... >>>=20 >>>=20 >>> Here in the last two queries 'typeof(a & b)' result *depends* on >>> other columns in result set. But it should be integer always, = shouldn't >>> it? >>=20 >> Yep, it should, I=E2=80=99ve fixed that: >>=20 >> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c >> index b3b230ad0..f6eab8a0e 100644 >> --- a/src/box/sql/vdbe.c >> +++ b/src/box/sql/vdbe.c >> @@ -1842,6 +1842,8 @@ case OP_ShiftRight: { /* same as = TK_RSHIFT, in1, in2, out3 */ >> pOut =3D &aMem[pOp->p3]; >> if ((pIn1->flags | pIn2->flags) & MEM_Null) { >> sqlVdbeMemSetNull(pOut); >> + /* Force NULL be of type INTEGER. */ >> + pOut->field_type =3D FIELD_TYPE_INTEGER; >> break; >> } >> bool unused; >> @@ -2477,6 +2479,8 @@ case OP_BitNot: { /* same as = TK_BITNOT, in1, out2 */ >> pIn1 =3D &aMem[pOp->p1]; >> pOut =3D &aMem[pOp->p2]; >> sqlVdbeMemSetNull(pOut); >> + /* Force NULL be of type INTEGER. */ >> + pOut->field_type =3D FIELD_TYPE_INTEGER; > 2. >=20 > tarantool> box.execute('SELECT typeof(~NULL)') > --- > - metadata: > - name: typeof(~NULL) > type: string > rows: > - ['integer'] > ... >=20 > But it is not integer, as I think. I didn't select NULL from > an integer column. Here NULL was of type 'boolean'. And the > result should be 'boolean' too, because it is just NULL, not > a column NULL. The same about other bit operations. I guess it is exactly what I was asked to do. =E2=80=9CBitnot=E2=80=9D = operation never returns =E2=80=9Cboolean=E2=80=9D result - it=E2=80=99s argument = is checked to be of type integer. Hence, IMHO if it returned =E2=80=9Cboolean=E2=80=9D = even for NULL values it would make users wonder why it does so. Anyway, just in case I=E2=80=99m going to re-ask Konstantin and Peter is this what we want or not. > And the same about strings: >=20 > tarantool> box.execute("SELECT typeof(NULL || NULL)") > --- > - metadata: > - name: typeof(NULL || NULL) > type: string > rows: > - ['string'] > ... >=20 > Here NULLs were of type 'boolean', but output shows 'string=E2=80=99. The same logic is applied for concatenation, bitwise and arithmetic operations. >> diff --git a/test/sql/types.test.lua b/test/sql/types.test.lua >> index 489a5a91c..8a509da96 100644 >> --- a/test/sql/types.test.lua >> +++ b/test/sql/types.test.lua >> @@ -407,6 +407,12 @@ box.execute("CREATE TABLE t (id INT PRIMARY KEY, = a INT, s SCALAR);") >> box.execute("INSERT INTO t VALUES (1, 1, 1), (2, NULL, NULL);") >> box.execute("SELECT typeof(a), typeof(s) FROM t;") >>=20 >> +box.execute('CREATE TABLE t1 (id INTEGER PRIMARY KEY, a INTEGER, b = INTEGER)') >> +box.execute('INSERT INTO t1 VALUES (1, NULL, NULL);') >> +box.execute('SELECT typeof(a & b) FROM t1;') >> +box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM t1') >> + >> box.execute("SELECT typeof(CAST(0 AS UNSIGNED));") >>=20 >> box.space.T:drop() >> +box.space.T1:drop() >>=20 >>> The same about all other places, where you call mem_set_* and don't >>> update field_type. >>=20 >> It=E2=80=99s okay, since updating current value in memory cell = doesn=E2=80=99t >> imply that field type should be updated as well. In most scenarios >> before setting new value to memory cell, we provide clean-up >> via memAboutToChange(), which in turn invalidates field type. >=20 > 3. memAboutToChange is no-op in Release build, so it does not = invalidate > anything in that mode. It means, that field_type now can be a garbage > from a previous value, if the memory cell was reused to assign = something > new. Oops, I moved field type resetting to out2prerelease() and added call of this function to several places: diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c index f6eab8a0e..77529d346 100644 --- a/src/box/sql/vdbe.c +++ b/src/box/sql/vdbe.c @@ -558,6 +558,7 @@ out2Prerelease(Vdbe *p, VdbeOp *pOp) pOut->flags =3D MEM_Int; return pOut; } + pOut->field_type =3D field_type_MAX; } =20 struct stailq * @@ -1781,6 +1782,7 @@ case OP_Function: { } #endif MemSetTypeFlag(pCtx->pOut, MEM_Null); + pCtx->pOut->field_type =3D field_type_MAX; pCtx->is_aborted =3D false; (*pCtx->pFunc->xSFunc)(pCtx, pCtx->argc, pCtx->argv);/* IMP: = R-24505-23230 */ =20 @@ -2227,6 +2229,7 @@ case OP_Ge: { /* same as TK_GE, jump, = in1, in3 */ =20 if (pOp->p5 & SQL_STOREP2) { pOut =3D &aMem[pOp->p2]; + pOut->field_type =3D field_type_MAX; iCompare =3D res; res2 =3D res2!=3D0; /* For this path res2 must be = exactly 0 or 1 */ if ((pOp->p5 & SQL_KEEPNULL)!=3D0) { @@ -2625,6 +2628,7 @@ case OP_Column: { pReg->z, = pReg->n); } else { sqlVdbeMemSetNull(pDest); + pDest->field_type =3D field_type_MAX; goto op_column_out; } } else { @@ -3684,7 +3688,7 @@ case OP_Sequence: { /* out2 */ * incremented by one. */ case OP_NextSequenceId: { - pOut =3D &aMem[pOp->p2]; + pOut =3D out2Prerelease(p, pOp); uint64_t id; tarantoolSqlNextSeqId(&id); id++; @@ -3717,7 +3721,7 @@ case OP_NextIdEphemeral: { diag_set(ClientError, ER_ROWID_OVERFLOW); goto abort_due_to_error; } - pOut =3D &aMem[pOp->p2]; + pOut =3D out2Prerelease(p, pOp); mem_set_u64(pOut, rowid); break; } @@ -3925,7 +3929,7 @@ case OP_RowData: { } #endif =20 - pOut =3D &aMem[pOp->p2]; + pOut =3D out2Prerelease(p, pOp); memAboutToChange(p, pOut); =20 assert(pOp->p1>=3D0 && pOp->p1nCursor); diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c index e5941de8e..bf020657f 100644 --- a/src/box/sql/vdbemem.c +++ b/src/box/sql/vdbemem.c @@ -894,7 +894,6 @@ sqlVdbeMemAboutToChange(Vdbe * pVdbe, Mem * pMem) for (i =3D 0, pX =3D pVdbe->aMem; i < pVdbe->nMem; i++, pX++) { if (pX->pScopy=46rom =3D=3D pMem) { pX->flags |=3D MEM_Undefined; - pX->field_type =3D field_type_MAX; pX->pScopy=46rom =3D 0; } } >>> I propose you to redesign how to update field_type, so as it would >>> be hard to forget to update it. >>>=20 >>> For instance, add a field_type argument to sqlVdbeMemSetNull() to = never >>> miss to set a real type for a NULL value. Make mem_set_int update = field_type >>> depending on the positivity flag. Make mem_set_u64 always set = field_type >>> UNSIGNED. Make sqlVdbeMemCast always set field_type to the target = type. >>=20 >> I=E2=80=99ve tried to refactor patch this way, but in most cases = sqlVdbeMemSetNull() >> would set default (field_type_MAX). As for mem_set_* family functions = - >> I can move assignment of field_type_MAX to the body, but does it make >> much sense? >>=20 >=20 > 4. Currently, as I said above, I don't see where you invalidate type = of a > memory cell before it takes a new value. Yes, field_type is = initialized > with field_type_MAX when the register array is just created, but where > the cell type is reset, when it takes a new value? >=20 > And one another possible bug: >=20 > box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a INTEGER)') > box.execute('INSERT INTO test VALUES (1, NULL)') > tarantool> box.execute('SELECT typeof(NOT a) FROM test') > --- > - metadata: > - name: typeof(NOT a) > type: string > rows: > - ['boolean'] > ... >=20 > As you can see, 'NOT' lost the original type - it was integer NULL, > not boolean NULL. Again, personally I do not consider this to be a bug (see a bit detailed explanation above). > Another place is vdbe.c:2627 - here you jump to op_column_out > bypassing field_type assignment. Why? Simply missed this place, fixed: @@ -2625,6 +2628,7 @@ case OP_Column: { pReg->z, = pReg->n); } else { sqlVdbeMemSetNull(pDest); + pDest->field_type =3D field_type_MAX; goto op_column_out; } } else { >>>> @@ -1909,6 +1924,7 @@ case OP_ShiftRight: { /* same as = TK_RSHIFT, in1, in2, out3 */ >>>> } >>>> } >>>> mem_set_i64(pOut, iA); >>>> + pOut->field_type =3D field_type_MAX; >>>> break; >>>> } >>>>=20 >>>> @@ -1988,6 +2004,13 @@ case OP_Cast: { /* in1 */ >>>> if (ExpandBlob(pIn1) !=3D 0) >>>> goto abort_due_to_error; >>>> rc =3D sqlVdbeMemCast(pIn1, pOp->p2); >>>> + /* >>>> + * SCALAR is not type itself, but rather an aggregation >>>> + * of types. Hence, cast to this type shouldn't change >>>> + * original type of argument. >>>> + */ >>>> + if (pOp->p2 !=3D FIELD_TYPE_SCALAR) >>>> + pIn1->field_type =3D pOp->p2; >>>=20 >>> 8. Why sqlVdbeMemCast does not update the type inside? Now >>> this function works on half - it updates flags with a new type, >>> sets a new value, but doesn't touch field_type. >>=20 >> Because it has several exit points and otherwise we have >> to add to each branch: >>=20 >> diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c >> index e5941de8e..7892af116 100644 >> --- a/src/box/sql/vdbemem.c >> +++ b/src/box/sql/vdbemem.c >> @@ -673,10 +673,12 @@ sqlVdbeMemCast(Mem * pMem, enum field_type = type) >> case FIELD_TYPE_BOOLEAN: >> if ((pMem->flags & MEM_Int) !=3D 0) { >> mem_set_bool(pMem, pMem->u.i); >> + pMem->field_type =3D type; >> return 0; >> } >> if ((pMem->flags & MEM_UInt) !=3D 0) { >> mem_set_bool(pMem, pMem->u.u); >> + pMem->field_type =3D type; >> return 0; >> } >> if ((pMem->flags & MEM_Real) !=3D 0) { >>=20 >> Etc. >>=20 >> What is more, this is the only path (OP_Cast) which calls >> sqlVdbeMemCast() function. So IMHO it is pretty OK. >>=20 >> If you want, I can attempt at refactoring sqlVdbeMemCast() >> so that make it feature only one exit point and move filed_type >> assignment to it. >>=20 >=20 > 5. Never mind. Too many changes. I=E2=80=99m not sure if it is a sarcasm or not...