From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id DD986230F7 for ; Sun, 28 Jul 2019 11:20:01 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fWa_jAuC9nqf for ; Sun, 28 Jul 2019 11:20:01 -0400 (EDT) Received: from smtp17.mail.ru (smtp17.mail.ru [94.100.176.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id F22982307D for ; Sun, 28 Jul 2019 11:20:00 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH 1/2] sql: extend struct Mem with field_type member References: From: Vladislav Shpilevoy Message-ID: <8be21965-b8ef-4f96-8ce3-99c2efcecfc0@tarantool.org> Date: Sun, 28 Jul 2019 17:22:00 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org, Nikita Pettik Hi! Thanks for the patch! See 9 comments below. On 27/07/2019 20:45, Nikita Pettik wrote: > There's several reasons to do so. First one is to improve operability of > built-in function 'typeof()' which returns string containing argument's > type. Using only format (msgpack) type we can't determine real field > type of null values. Moreover, result of CAST should feature the same > type as it was specified in operation. For instance: > typeof(CAST(0 AS INTEGER)) should return "integer", not "unsigned". > > The second reason is different rules for comparison involving values of > SCALAR type. In Tarantool NoSQL it is allowed to compare values of > "incompatible" types like booleans and strings (using heuristics which > says that values of one type always are greater/less that values of > another type), in case they are stored in SCALAR field type. Without > storing actual field type in struct Mem it is obvious impossible to > achieve. > > Thus, now field_type member in struct Mem represents supposed field type > in case value is fetched from tuple or is a result of CAST operation. > > Closes #4148 > --- > src/box/sql/func.c | 11 +++++++++++ > src/box/sql/vdbe.c | 36 +++++++++++++++++++++++++++++++++++- > src/box/sql/vdbeInt.h | 11 +++++++++++ > src/box/sql/vdbeapi.c | 1 + > src/box/sql/vdbeaux.c | 1 + > src/box/sql/vdbemem.c | 3 +++ > src/box/sql/vdbesort.c | 6 ++++++ > test/sql-tap/cast.test.lua | 8 ++++---- > test/sql-tap/table.test.lua | 4 ++-- > test/sql/types.result | 35 ++++++++++++++++++++++++++++++++++- > test/sql/types.test.lua | 11 +++++++++++ > 11 files changed, 119 insertions(+), 8 deletions(-) > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index fcf147c96..f50df105d 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -107,6 +107,17 @@ typeofFunc(sql_context * context, int NotUsed, sql_value ** argv) > { > const char *z = 0; > UNUSED_PARAMETER(NotUsed); > + enum field_type f_t = argv[0]->field_type; 1. Perhaps, field_type is not the best name, because Mem is not always is a field. It can be just a value, not from a space. It is rather just 'type'. > + /* > + * SCALAR is not a basic type, but rather an aggregation of > + * types. Thus, ignore SCALAR field type and return msgpack > + * format type. > + */ > + if (f_t != field_type_MAX && f_t != FIELD_TYPE_SCALAR) { 2. Not sure if I understand, why do you need both _MAX and _SCALAR for values with no strict type. Why not to set SCALAR instead of MAX always? MAX is not a type, so SCALAR IMO would be more appropriate when you don't have a strict type. Or even ANY, because maps and arrays will be supported. > + sql_result_text(context, field_type_strs[argv[0]->field_type], > + -1, SQL_STATIC); > + return; > + } > switch (sql_value_type(argv[0])) { > case MP_INT: > case MP_UINT: > diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c > index 9142932e9..2c7a1c2da 100644 > --- a/src/box/sql/vdbe.c > +++ b/src/box/sql/vdbe.c > @@ -1238,6 +1238,7 @@ case OP_SoftNull: { > assert(pOp->p1>0 && pOp->p1<=(p->nMem+1 - p->nCursor)); > pOut = &aMem[pOp->p1]; > pOut->flags = (pOut->flags|MEM_Null)&~MEM_Undefined; > + pOut->field_type = field_type_MAX; > break; 3. This opcode is unused and can be deleted. > } > > @@ -1278,6 +1279,11 @@ case OP_Variable: { /* out2 */ > } > pOut = out2Prerelease(p, pOp); > sqlVdbeMemShallowCopy(pOut, pVar, MEM_Static); > + /* > + * ...ShallowCopy() may overwrite field_type with irrelevant > + * value. So make sure that in the end field_type_MAX is set. > + */ 4. The only place where ShallowCopy can rewrite field_type is memcpy(pTo, pFrom, MEMCELLSIZE). If after that field_type is irrelevant, then it was incorrect in pFrom, i.e. in pVar. On the summary: why pVar may have incorrect field_type? Should not bind set a correct type? > + pOut->field_type = field_type_MAX; > UPDATE_MAX_BLOBSIZE(pOut); > break; > } > @@ -1338,6 +1344,7 @@ case OP_Copy: { > pIn1 = &aMem[pOp->p1]; > pOut = &aMem[pOp->p2]; > assert(pOut!=pIn1); > + > while( 1) { 5. Unnecessary diff. > sqlVdbeMemShallowCopy(pOut, pIn1, MEM_Ephem); > Deephemeralize(pOut); > @@ -1389,6 +1396,7 @@ case OP_IntCopy: { /* out2 */ > assert((pIn1->flags & (MEM_Int | MEM_UInt)) != 0); > pOut = &aMem[pOp->p2]; > mem_set_int(pOut, pIn1->u.i, (pIn1->flags & MEM_Int) != 0); > + pOut->field_type = field_type_MAX; 6. Why mem_set_int does not set a correct field_type? Or why pOut->field_type can't be assigned to pIn->field_type? You actually do it for OP_FCopy. The same about OP_Remainder. When you assign field_type_MAX, and pIn1 was FIELD_TYPE_INTEGER with an unsigned value, you will have typeof(pOut) unsigned - type loss. Additionally, as I see, OP_IntCopy is used only once, when a space is being created, to copy sequence id. In such a case I think it can be deleted and replaced with OP_SCopy. > break; > } > > @@ -1491,6 +1499,8 @@ case OP_Concat: { /* same as TK_CONCAT, in1, in2, out3 */ > assert(pIn1!=pOut); > if ((pIn1->flags | pIn2->flags) & MEM_Null) { > sqlVdbeMemSetNull(pOut); > + /* Force NULL be of type STRING. */ > + pOut->field_type = FIELD_TYPE_STRING; > break; > } > /* > @@ -1535,6 +1545,7 @@ case OP_Concat: { /* same as TK_CONCAT, in1, in2, out3 */ > pOut->z[nByte+1] = 0; > pOut->flags |= MEM_Term; > pOut->n = (int)nByte; > + pOut->field_type = field_type_MAX; > UPDATE_MAX_BLOBSIZE(pOut); > break; > } > @@ -1644,6 +1655,7 @@ case OP_Remainder: { /* same as TK_REM, in1, in2, out3 */ > } > } > mem_set_int(pOut, iB, is_res_neg); > + pOut->field_type = field_type_MAX; > } else { > bIntint = 0; > if (sqlVdbeRealValue(pIn1, &rA) != 0) { > @@ -1681,6 +1693,7 @@ case OP_Remainder: { /* same as TK_REM, in1, in2, out3 */ > } > pOut->u.r = rB; > MemSetTypeFlag(pOut, MEM_Real); > + pOut->field_type = field_type_MAX; > if (((type1|type2)&MEM_Real)==0 && !bIntint) { > mem_apply_integer_type(pOut); > } > @@ -1689,6 +1702,8 @@ case OP_Remainder: { /* same as TK_REM, in1, in2, out3 */ > > arithmetic_result_is_null: > sqlVdbeMemSetNull(pOut); > + /* Force NULL be of type NUMBER. */ > + pOut->field_type = FIELD_TYPE_NUMBER; > break; 7. All these manual assignments of field_type are broken by design. When you have to update field_type after each operation manually, you will miss some places for sure. For example: tarantool> box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER)') --- - row_count: 1 ... tarantool> box.execute('INSERT INTO test VALUES (1, NULL, NULL)') --- - row_count: 1 ... tarantool> box.execute('SELECT typeof(a & b) FROM test') --- - metadata: - name: typeof(a & b) type: string rows: - ['null'] ... tarantool> box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM test') --- - metadata: - name: typeof(a) type: string - name: typeof(b) type: string - name: typeof(a & b) type: string rows: - ['integer', 'integer', 'integer'] ... Here in the last two queries 'typeof(a & b)' result *depends* on other columns in result set. But it should be integer always, shouldn't it? The same about all other places, where you call mem_set_* and don't update field_type. I propose you to redesign how to update field_type, so as it would be hard to forget to update it. For instance, add a field_type argument to sqlVdbeMemSetNull() to never miss to set a real type for a NULL value. Make mem_set_int update field_type depending on the positivity flag. Make mem_set_u64 always set field_type UNSIGNED. Make sqlVdbeMemCast always set field_type to the target type. > > division_by_zero: > @@ -1909,6 +1924,7 @@ case OP_ShiftRight: { /* same as TK_RSHIFT, in1, in2, out3 */ > } > } > mem_set_i64(pOut, iA); > + pOut->field_type = field_type_MAX; > break; > } > > @@ -1988,6 +2004,13 @@ case OP_Cast: { /* in1 */ > if (ExpandBlob(pIn1) != 0) > goto abort_due_to_error; > rc = sqlVdbeMemCast(pIn1, pOp->p2); > + /* > + * SCALAR is not type itself, but rather an aggregation > + * of types. Hence, cast to this type shouldn't change > + * original type of argument. > + */ > + if (pOp->p2 != FIELD_TYPE_SCALAR) > + pIn1->field_type = pOp->p2; 8. Why sqlVdbeMemCast does not update the type inside? Now this function works on half - it updates flags with a new type, sets a new value, but doesn't touch field_type. > UPDATE_MAX_BLOBSIZE(pIn1); > if (rc == 0) > break; > @@ -2094,6 +2117,8 @@ case OP_Ge: { /* same as TK_GE, jump, in1, in3 */ > pIn3 = &aMem[pOp->p3]; > flags1 = pIn1->flags; > flags3 = pIn3->flags; > + enum field_type ft_p1 = pIn1->field_type; > + enum field_type ft_p3 = pIn3->field_type; > if ((flags1 | flags3)&MEM_Null) { > /* One or both operands are NULL */ > if (pOp->p5 & SQL_NULLEQ) { > @@ -2232,8 +2257,10 @@ case OP_Ge: { /* same as TK_GE, jump, in1, in3 */ > /* Undo any changes made by mem_apply_type() to the input registers. */ > assert((pIn1->flags & MEM_Dyn) == (flags1 & MEM_Dyn)); > pIn1->flags = flags1; > + pIn1->field_type = ft_p1; > assert((pIn3->flags & MEM_Dyn) == (flags3 & MEM_Dyn)); > pIn3->flags = flags3; > + pIn3->field_type = ft_p3; > > if (pOp->p5 & SQL_STOREP2) { > pOut = &aMem[pOp->p2]; > @@ -2451,6 +2478,7 @@ case OP_Or: { /* same as TK_OR, in1, in2, out3 */ > } else { > mem_set_bool(pOut, v1); > } > + pOut->field_type = field_type_MAX; > break; > } > > @@ -2472,6 +2500,7 @@ case OP_Not: { /* same as TK_NOT, in1, out2 */ > goto abort_due_to_error; > } > mem_set_bool(pOut, ! pIn1->u.b); > + pOut->field_type = field_type_MAX; 9. So after mem_set_bool, if the cell was used for something else before, it is in an invalid state, right? I don't think it should work so. Please, develop a better way how to keep field_type up to date. Ideally, if code doesn't touch type flags, it should not touch field_type. Otherwise that place is broken, if you need to do follow-up updates after each function, doing something with types.