From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 9FF3B25F84 for ; Wed, 31 Jul 2019 05:14:57 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id C_4eiWncaWhT for ; Wed, 31 Jul 2019 05:14:57 -0400 (EDT) Received: from smtp49.i.mail.ru (smtp49.i.mail.ru [94.100.177.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 383D525F82 for ; Wed, 31 Jul 2019 05:14:57 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: [tarantool-patches] Re: [PATCH 1/2] sql: extend struct Mem with field_type member From: "n.pettik" In-Reply-To: <8be21965-b8ef-4f96-8ce3-99c2efcecfc0@tarantool.org> Date: Wed, 31 Jul 2019 12:14:54 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <8be21965-b8ef-4f96-8ce3-99c2efcecfc0@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: Vladislav Shpilevoy >> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >> index fcf147c96..f50df105d 100644 >> --- a/src/box/sql/func.c >> +++ b/src/box/sql/func.c >> @@ -107,6 +107,17 @@ typeofFunc(sql_context * context, int NotUsed, = sql_value ** argv) >> { >> const char *z =3D 0; >> UNUSED_PARAMETER(NotUsed); >> + enum field_type f_t =3D argv[0]->field_type; >=20 > 1. Perhaps, field_type is not the best name, because Mem is not always > is a field. It can be just a value, not from a space. It is rather > just 'type=E2=80=99. When memory represents smth different from value fetched from tuple, it is set to field_type_MAX. So I=E2=80=99d rather not rename = this member. >> + /* >> + * SCALAR is not a basic type, but rather an aggregation of >> + * types. Thus, ignore SCALAR field type and return msgpack >> + * format type. >> + */ >> + if (f_t !=3D field_type_MAX && f_t !=3D FIELD_TYPE_SCALAR) { >=20 > 2. Not sure if I understand, why do you need both _MAX and _SCALAR for > values with no strict type. Why not to set SCALAR instead of MAX = always? > MAX is not a type, so SCALAR IMO would be more appropriate when you > don't have a strict type. Or even ANY, because maps and arrays will be > supported. Because SCALAR and ANY are real field types. AFAIR we decided to introduce different comparison rules for values from SCALAR field and values from basic type fields. So I=E2=80=99m not sure that setting = SCALAR type by default is a good idea. >> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c >> index 9142932e9..2c7a1c2da 100644 >> --- a/src/box/sql/vdbe.c >> +++ b/src/box/sql/vdbe.c >> @@ -1238,6 +1238,7 @@ case OP_SoftNull: { >> assert(pOp->p1>0 && pOp->p1<=3D(p->nMem+1 - p->nCursor)); >> pOut =3D &aMem[pOp->p1]; >> pOut->flags =3D (pOut->flags|MEM_Null)&~MEM_Undefined; >> + pOut->field_type =3D field_type_MAX; >> break; >=20 > 3. This opcode is unused and can be deleted. Ok, moved this change to a separate patch: Author: Nikita Pettik Date: Mon Jul 29 16:03:38 2019 +0300 sql: remove OP_SoftNull opcode =20 It's not used anymore and can be removed to reduce size of codebase. diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c index 9142932e9..12de10f37 100644 --- a/src/box/sql/vdbe.c +++ b/src/box/sql/vdbe.c @@ -1226,21 +1226,6 @@ case OP_Null: { /* out2 */ break; } =20 -/* Opcode: SoftNull P1 * * * * - * Synopsis: r[P1]=3DNULL - * - * Set register P1 to have the value NULL as seen by the OP_MakeRecord - * instruction, but do not free any string or blob memory associated = with - * the register, so that if the value was a string or blob that was - * previously copied using OP_SCopy, the copies will continue to be = valid. - */ -case OP_SoftNull: { - assert(pOp->p1>0 && pOp->p1<=3D(p->nMem+1 - p->nCursor)); - pOut =3D &aMem[pOp->p1]; - pOut->flags =3D (pOut->flags|MEM_Null)&~MEM_Undefined; - break; -} - /* Opcode: Blob P1 P2 P3 P4 * * Synopsis: r[P2]=3DP4 (len=3DP1, subtype=3DP3) * >> @@ -1278,6 +1279,11 @@ case OP_Variable: { /* out2 */ >> } >> pOut =3D out2Prerelease(p, pOp); >> sqlVdbeMemShallowCopy(pOut, pVar, MEM_Static); >> + /* >> + * ...ShallowCopy() may overwrite field_type with irrelevant >> + * value. So make sure that in the end field_type_MAX is set. >> + */ >=20 > 4. The only place where ShallowCopy can rewrite field_type is > memcpy(pTo, pFrom, MEMCELLSIZE). If after that field_type is = irrelevant, > then it was incorrect in pFrom, i.e. in pVar. On the summary: > why pVar may have incorrect field_type? Should not bind set a correct > type? Because it is not set during bind*() routines. Moved field_type_MAX assignment to vdbeUnbind() routine: diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c index fcc1701eb..20c1e39ec 100644 --- a/src/box/sql/vdbe.c +++ b/src/box/sql/vdbe.c @@ -1263,11 +1263,6 @@ case OP_Variable: { /* out2 */ } pOut =3D out2Prerelease(p, pOp); sqlVdbeMemShallowCopy(pOut, pVar, MEM_Static); - /* - * ...ShallowCopy() may overwrite field_type with irrelevant - * value. So make sure that in the end field_type_MAX is set. - */ - pOut->field_type =3D field_type_MAX; UPDATE_MAX_BLOBSIZE(pOut); break; } diff --git a/src/box/sql/vdbeapi.c b/src/box/sql/vdbeapi.c index c62cb7a1e..d470b6cc8 100644 --- a/src/box/sql/vdbeapi.c +++ b/src/box/sql/vdbeapi.c @@ -834,6 +834,7 @@ vdbeUnbind(Vdbe * p, int i) pVar =3D &p->aVar[i]; sqlVdbeMemRelease(pVar); pVar->flags =3D MEM_Null; + pVar->field_type =3D field_type_MAX; =20 /* If the bit corresponding to this variable in Vdbe.expmask is = set, then * binding a new value to this variable invalidates the current = query plan. >> @@ -1338,6 +1344,7 @@ case OP_Copy: { >> pIn1 =3D &aMem[pOp->p1]; >> pOut =3D &aMem[pOp->p2]; >> assert(pOut!=3DpIn1); >> + >> while( 1) { >=20 > 5. Unnecessary diff. Removed. >> sqlVdbeMemShallowCopy(pOut, pIn1, MEM_Ephem); >> Deephemeralize(pOut); >> @@ -1389,6 +1396,7 @@ case OP_IntCopy: { /* out2 */ >> assert((pIn1->flags & (MEM_Int | MEM_UInt)) !=3D 0); >> pOut =3D &aMem[pOp->p2]; >> mem_set_int(pOut, pIn1->u.i, (pIn1->flags & MEM_Int) !=3D 0); >> + pOut->field_type =3D field_type_MAX; >=20 > 6. Why mem_set_int does not set a correct field_type? Why should it do so? As its name says, field type refers to type of filed in space=E2=80=99s format. For value=E2=80=99s type (mp = format) we have =E2=80=9Cflags" property. > Or why > pOut->field_type can't be assigned to pIn->field_type? You actually > do it for OP_FCopy. The same about OP_Remainder. >=20 > When you assign field_type_MAX, and pIn1 was FIELD_TYPE_INTEGER > with an unsigned value, you will have typeof(pOut) unsigned - type > loss. >=20 > Additionally, as I see, OP_IntCopy is used only once, when a space > is being created, to copy sequence id. In such a case I think it > can be deleted and replaced with OP_SCopy. Actually, that=E2=80=99s why I don=E2=80=99t bother with type = manipulations. Replaced OP_IntCopy with OP_SCopy in a separate patch: Author: Nikita Pettik Date: Mon Jul 29 19:08:41 2019 +0300 sql: remove OP_IntCopy opcode =20 Its purpose is to copy integer value from one memory cell to = another. In other words, it is particular case of OP_SCopy instruction. Since = it is used only during creation of AUTOINCREMENT property, it seems to = be reasonable to replace it with a bit general OP_SCopy and erase OP_IntCopy at all reducing size of SQL codebase. diff --git a/src/box/sql/build.c b/src/box/sql/build.c index ada7b5859..a63d2d3be 100644 --- a/src/box/sql/build.c +++ b/src/box/sql/build.c @@ -1066,7 +1066,7 @@ emitNewSysSpaceSequenceRecord(Parse *pParse, int = reg_space_id, int reg_seq_id, sqlVdbeAddOp2(v, OP_SCopy, reg_space_id, first_col + 1); =20 /* 2. Sequence id */ - sqlVdbeAddOp2(v, OP_IntCopy, reg_seq_id, first_col + 2); + sqlVdbeAddOp2(v, OP_SCopy, reg_seq_id, first_col + 2); =20 /* 3. Autogenerated. */ sqlVdbeAddOp2(v, OP_Bool, true, first_col + 3); diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c index 12de10f37..88d1fa895 100644 --- a/src/box/sql/vdbe.c +++ b/src/box/sql/vdbe.c @@ -1361,22 +1361,6 @@ case OP_SCopy: { /* out2 */ break; } =20 -/* Opcode: IntCopy P1 P2 * * * - * Synopsis: r[P2]=3Dr[P1] - * - * Transfer the integer value held in register P1 into register P2. - * - * This is an optimized version of SCopy that works only for integer - * values. - */ -case OP_IntCopy: { /* out2 */ - pIn1 =3D &aMem[pOp->p1]; - assert((pIn1->flags & (MEM_Int | MEM_UInt)) !=3D 0); - pOut =3D &aMem[pOp->p2]; - mem_set_int(pOut, pIn1->u.i, (pIn1->flags & MEM_Int) !=3D 0); - break; -} - /* Opcode: ResultRow P1 P2 * * * * Synopsis: output=3Dr[P1@P2] * >=20 >> break; >> } >>=20 >> @@ -1491,6 +1499,8 @@ case OP_Concat: { /* same as = TK_CONCAT, in1, in2, out3 */ >> assert(pIn1!=3DpOut); >> if ((pIn1->flags | pIn2->flags) & MEM_Null) { >> sqlVdbeMemSetNull(pOut); >> + /* Force NULL be of type STRING. */ >> + pOut->field_type =3D FIELD_TYPE_STRING; >> break; >> } >> /* >> @@ -1535,6 +1545,7 @@ case OP_Concat: { /* same as = TK_CONCAT, in1, in2, out3 */ >> pOut->z[nByte+1] =3D 0; >> pOut->flags |=3D MEM_Term; >> pOut->n =3D (int)nByte; >> + pOut->field_type =3D field_type_MAX; >> UPDATE_MAX_BLOBSIZE(pOut); >> break; >> } >> @@ -1644,6 +1655,7 @@ case OP_Remainder: { /* same as = TK_REM, in1, in2, out3 */ >> } >> } >> mem_set_int(pOut, iB, is_res_neg); >> + pOut->field_type =3D field_type_MAX; >> } else { >> bIntint =3D 0; >> if (sqlVdbeRealValue(pIn1, &rA) !=3D 0) { >> @@ -1681,6 +1693,7 @@ case OP_Remainder: { /* same as = TK_REM, in1, in2, out3 */ >> } >> pOut->u.r =3D rB; >> MemSetTypeFlag(pOut, MEM_Real); >> + pOut->field_type =3D field_type_MAX; >> if (((type1|type2)&MEM_Real)=3D=3D0 && !bIntint) { >> mem_apply_integer_type(pOut); >> } >> @@ -1689,6 +1702,8 @@ case OP_Remainder: { /* same as = TK_REM, in1, in2, out3 */ >>=20 >> arithmetic_result_is_null: >> sqlVdbeMemSetNull(pOut); >> + /* Force NULL be of type NUMBER. */ >> + pOut->field_type =3D FIELD_TYPE_NUMBER; >> break; >=20 > 7. All these manual assignments of field_type are broken by design. > When you have to update field_type after each operation manually, > you will miss some places for sure. For example: >=20 > tarantool> box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a = INTEGER, b INTEGER)') > --- > - row_count: 1 > ... >=20 > tarantool> box.execute('INSERT INTO test VALUES (1, NULL, NULL)') > --- > - row_count: 1 > ... >=20 > tarantool> box.execute('SELECT typeof(a & b) FROM test') > --- > - metadata: > - name: typeof(a & b) > type: string > rows: > - ['null'] > ... >=20 > tarantool> box.execute('SELECT typeof(a), typeof(b), typeof(a & b) = FROM test') > --- > - metadata: > - name: typeof(a) > type: string > - name: typeof(b) > type: string > - name: typeof(a & b) > type: string > rows: > - ['integer', 'integer', 'integer'] > ... >=20 >=20 > Here in the last two queries 'typeof(a & b)' result *depends* on > other columns in result set. But it should be integer always, = shouldn't > it? Yep, it should, I=E2=80=99ve fixed that: diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c index b3b230ad0..f6eab8a0e 100644 --- a/src/box/sql/vdbe.c +++ b/src/box/sql/vdbe.c @@ -1842,6 +1842,8 @@ case OP_ShiftRight: { /* same as = TK_RSHIFT, in1, in2, out3 */ pOut =3D &aMem[pOp->p3]; if ((pIn1->flags | pIn2->flags) & MEM_Null) { sqlVdbeMemSetNull(pOut); + /* Force NULL be of type INTEGER. */ + pOut->field_type =3D FIELD_TYPE_INTEGER; break; } bool unused; @@ -2477,6 +2479,8 @@ case OP_BitNot: { /* same as = TK_BITNOT, in1, out2 */ pIn1 =3D &aMem[pOp->p1]; pOut =3D &aMem[pOp->p2]; sqlVdbeMemSetNull(pOut); + /* Force NULL be of type INTEGER. */ + pOut->field_type =3D FIELD_TYPE_INTEGER; if ((pIn1->flags & MEM_Null)=3D=3D0) { int64_t i; bool is_neg; diff --git a/test/sql/types.test.lua b/test/sql/types.test.lua index 489a5a91c..8a509da96 100644 --- a/test/sql/types.test.lua +++ b/test/sql/types.test.lua @@ -407,6 +407,12 @@ box.execute("CREATE TABLE t (id INT PRIMARY KEY, a = INT, s SCALAR);") box.execute("INSERT INTO t VALUES (1, 1, 1), (2, NULL, NULL);") box.execute("SELECT typeof(a), typeof(s) FROM t;") =20 +box.execute('CREATE TABLE t1 (id INTEGER PRIMARY KEY, a INTEGER, b = INTEGER)') +box.execute('INSERT INTO t1 VALUES (1, NULL, NULL);') +box.execute('SELECT typeof(a & b) FROM t1;') +box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM t1') + box.execute("SELECT typeof(CAST(0 AS UNSIGNED));") =20 box.space.T:drop() +box.space.T1:drop() > The same about all other places, where you call mem_set_* and don't > update field_type. It=E2=80=99s okay, since updating current value in memory cell doesn=E2=80= =99t imply that field type should be updated as well. In most scenarios before setting new value to memory cell, we provide clean-up via memAboutToChange(), which in turn invalidates field type. > I propose you to redesign how to update field_type, so as it would > be hard to forget to update it. >=20 > For instance, add a field_type argument to sqlVdbeMemSetNull() to = never > miss to set a real type for a NULL value. Make mem_set_int update = field_type > depending on the positivity flag. Make mem_set_u64 always set = field_type > UNSIGNED. Make sqlVdbeMemCast always set field_type to the target = type. I=E2=80=99ve tried to refactor patch this way, but in most cases = sqlVdbeMemSetNull() would set default (field_type_MAX). As for mem_set_* family functions - I can move assignment of field_type_MAX to the body, but does it make much sense? >> @@ -1909,6 +1924,7 @@ case OP_ShiftRight: { /* same as = TK_RSHIFT, in1, in2, out3 */ >> } >> } >> mem_set_i64(pOut, iA); >> + pOut->field_type =3D field_type_MAX; >> break; >> } >>=20 >> @@ -1988,6 +2004,13 @@ case OP_Cast: { /* in1 */ >> if (ExpandBlob(pIn1) !=3D 0) >> goto abort_due_to_error; >> rc =3D sqlVdbeMemCast(pIn1, pOp->p2); >> + /* >> + * SCALAR is not type itself, but rather an aggregation >> + * of types. Hence, cast to this type shouldn't change >> + * original type of argument. >> + */ >> + if (pOp->p2 !=3D FIELD_TYPE_SCALAR) >> + pIn1->field_type =3D pOp->p2; >=20 > 8. Why sqlVdbeMemCast does not update the type inside? Now > this function works on half - it updates flags with a new type, > sets a new value, but doesn't touch field_type. Because it has several exit points and otherwise we have to add to each branch: diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c index e5941de8e..7892af116 100644 --- a/src/box/sql/vdbemem.c +++ b/src/box/sql/vdbemem.c @@ -673,10 +673,12 @@ sqlVdbeMemCast(Mem * pMem, enum field_type type) case FIELD_TYPE_BOOLEAN: if ((pMem->flags & MEM_Int) !=3D 0) { mem_set_bool(pMem, pMem->u.i); + pMem->field_type =3D type; return 0; } if ((pMem->flags & MEM_UInt) !=3D 0) { mem_set_bool(pMem, pMem->u.u); + pMem->field_type =3D type; return 0; } if ((pMem->flags & MEM_Real) !=3D 0) { Etc. What is more, this is the only path (OP_Cast) which calls sqlVdbeMemCast() function. So IMHO it is pretty OK. If you want, I can attempt at refactoring sqlVdbeMemCast() so that make it feature only one exit point and move filed_type assignment to it. >> UPDATE_MAX_BLOBSIZE(pIn1); >> if (rc =3D=3D 0) >> break; >> @@ -2094,6 +2117,8 @@ case OP_Ge: { /* same as TK_GE, = jump, in1, in3 */ >> pIn3 =3D &aMem[pOp->p3]; >> flags1 =3D pIn1->flags; >> flags3 =3D pIn3->flags; >> + enum field_type ft_p1 =3D pIn1->field_type; >> + enum field_type ft_p3 =3D pIn3->field_type; >> if ((flags1 | flags3)&MEM_Null) { >> /* One or both operands are NULL */ >> if (pOp->p5 & SQL_NULLEQ) { >> @@ -2232,8 +2257,10 @@ case OP_Ge: { /* same as TK_GE, = jump, in1, in3 */ >> /* Undo any changes made by mem_apply_type() to the input = registers. */ >> assert((pIn1->flags & MEM_Dyn) =3D=3D (flags1 & MEM_Dyn)); >> pIn1->flags =3D flags1; >> + pIn1->field_type =3D ft_p1; >> assert((pIn3->flags & MEM_Dyn) =3D=3D (flags3 & MEM_Dyn)); >> pIn3->flags =3D flags3; >> + pIn3->field_type =3D ft_p3; >>=20 >> if (pOp->p5 & SQL_STOREP2) { >> pOut =3D &aMem[pOp->p2]; >> @@ -2451,6 +2478,7 @@ case OP_Or: { /* same as TK_OR, = in1, in2, out3 */ >> } else { >> mem_set_bool(pOut, v1); >> } >> + pOut->field_type =3D field_type_MAX; >> break; >> } >>=20 >> @@ -2472,6 +2500,7 @@ case OP_Not: { /* same as = TK_NOT, in1, out2 */ >> goto abort_due_to_error; >> } >> mem_set_bool(pOut, ! pIn1->u.b); >> + pOut->field_type =3D field_type_MAX; >=20 > 9. So after mem_set_bool, if the cell was used for > something else before, it is in an invalid state, right? Why do you call it =E2=80=98invalid=E2=80=99? In this approach any = operation resets field type to default value to force typeof() look at actual mp format type. Since in this case field_type simply duplicates format type. The only exception is null values due to storing is_null flag alongside with format type.=20 > I don't think it should work so. Please, develop a > better way how to keep field_type up to date. >=20 > Ideally, if code doesn't touch type flags, it should not touch > field_type. Otherwise that place is broken, if you need to > do follow-up updates after each function, doing something > with types.