[tarantool-patches] Re: [PATCH 1/2] sql: extend struct Mem with field_type member

Thu Aug 1 18:56:11 MSK 2019

> On 31 Jul 2019, at 19:59, Vladislav Shpilevoy <v.shpilevoy at tarantool.org> wrote:
> Hi! Thanks for the fixes! See 5 comments below.
> On 31/07/2019 11:14, n.pettik wrote:
>> 
>>>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
>>>> index fcf147c96..f50df105d 100644
>>>> --- a/src/box/sql/func.c
>>>> +++ b/src/box/sql/func.c
>>>> @@ -107,6 +107,17 @@ typeofFunc(sql_context * context, int NotUsed, sql_value ** argv)
>>>> {
>>>> 	const char *z = 0;
>>>> 	UNUSED_PARAMETER(NotUsed);
>>>> +	enum field_type f_t = argv[0]->field_type;
>>> 
>>> 1. Perhaps, field_type is not the best name, because Mem is not always
>>> is a field. It can be just a value, not from a space. It is rather
>>> just 'type’.
>> 
>> When memory represents smth different from value fetched from
>> tuple, it is set to field_type_MAX. So I’d rather not rename this member.
> 
> 1. Ok, now I got it. It works only for tuple fields, and is totally disabled
> for other values. Or not? See other comments.

Sort of. The only addition to this rule is CAST operation to
display appropriate type for NULL values.

>>> 7. All these manual assignments of field_type are broken by design.
>>> When you have to update field_type after each operation manually,
>>> you will miss some places for sure. For example:
>>> 
>>> tarantool> box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER)')
>>> ---
>>> - row_count: 1
>>> ...
>>> 
>>> tarantool> box.execute('INSERT INTO test VALUES (1, NULL, NULL)')
>>> ---
>>> - row_count: 1
>>> ...
>>> 
>>> tarantool> box.execute('SELECT typeof(a & b) FROM test')
>>> ---
>>> - metadata:
>>> - name: typeof(a & b)
>>>   type: string
>>> rows:
>>> - ['null']
>>> ...
>>> 
>>> tarantool> box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM test')
>>> ---
>>> - metadata:
>>> - name: typeof(a)
>>>   type: string
>>> - name: typeof(b)
>>>   type: string
>>> - name: typeof(a & b)
>>>   type: string
>>> rows:
>>> - ['integer', 'integer', 'integer']
>>> ...
>>> 
>>> 
>>> Here in the last two queries 'typeof(a & b)' result *depends* on
>>> other columns in result set. But it should be integer always, shouldn't
>>> it?
>> 
>> Yep, it should, I’ve fixed that:
>> 
>> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
>> index b3b230ad0..f6eab8a0e 100644
>> --- a/src/box/sql/vdbe.c
>> +++ b/src/box/sql/vdbe.c
>> @@ -1842,6 +1842,8 @@ case OP_ShiftRight: {           /* same as TK_RSHIFT, in1, in2, out3 */
>>        pOut = &aMem[pOp->p3];
>>        if ((pIn1->flags | pIn2->flags) & MEM_Null) {
>>                sqlVdbeMemSetNull(pOut);
>> +               /* Force NULL be of type INTEGER. */
>> +               pOut->field_type = FIELD_TYPE_INTEGER;
>>                break;
>>        }
>>        bool unused;
>> @@ -2477,6 +2479,8 @@ case OP_BitNot: {             /* same as TK_BITNOT, in1, out2 */
>>        pIn1 = &aMem[pOp->p1];
>>        pOut = &aMem[pOp->p2];
>>        sqlVdbeMemSetNull(pOut);
>> +       /* Force NULL be of type INTEGER. */
>> +       pOut->field_type = FIELD_TYPE_INTEGER;
> 2.
> 
> tarantool> box.execute('SELECT typeof(~NULL)')
> ---
> - metadata:
>  - name: typeof(~NULL)
>    type: string
>  rows:
>  - ['integer']
> ...
> 
> But it is not integer, as I think. I didn't select NULL from
> an integer column. Here NULL was of type 'boolean'. And the
> result should be 'boolean' too, because it is just NULL, not
> a column NULL. The same about other bit operations.

I guess it is exactly what I was asked to do. “Bitnot” operation
never returns “boolean” result - it’s argument is checked to be
of type integer. Hence, IMHO if it returned “boolean” even for
NULL values it would make users wonder why it does so. Anyway,
just in case I’m going to re-ask Konstantin and Peter is this
what we want or not.

> And the same about strings:
> 
> tarantool> box.execute("SELECT typeof(NULL || NULL)")
> ---
> - metadata:
>  - name: typeof(NULL || NULL)
>    type: string
>  rows:
>  - ['string']
> ...
> 
> Here NULLs were of type 'boolean', but output shows 'string’.

The same logic is applied for concatenation, bitwise and
arithmetic operations.

>> diff --git a/test/sql/types.test.lua b/test/sql/types.test.lua
>> index 489a5a91c..8a509da96 100644
>> --- a/test/sql/types.test.lua
>> +++ b/test/sql/types.test.lua
>> @@ -407,6 +407,12 @@ box.execute("CREATE TABLE t (id INT PRIMARY KEY, a INT, s SCALAR);")
>> box.execute("INSERT INTO t VALUES (1, 1, 1), (2, NULL, NULL);")
>> box.execute("SELECT typeof(a), typeof(s) FROM t;")
>> 
>> +box.execute('CREATE TABLE t1 (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER)')
>> +box.execute('INSERT INTO t1 VALUES (1, NULL, NULL);')
>> +box.execute('SELECT typeof(a & b) FROM t1;')
>> +box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM t1')
>> +
>> box.execute("SELECT typeof(CAST(0 AS UNSIGNED));")
>> 
>> box.space.T:drop()
>> +box.space.T1:drop()
>> 
>>> The same about all other places, where you call mem_set_* and don't
>>> update field_type.
>> 
>> It’s okay, since updating current value in memory cell doesn’t
>> imply that field type should be updated as well. In most scenarios
>> before setting new value to memory cell, we provide clean-up
>> via memAboutToChange(), which in turn invalidates field type.
> 
> 3. memAboutToChange is no-op in Release build, so it does not invalidate
> anything in that mode. It means, that field_type now can be a garbage
> from a previous value, if the memory cell was reused to assign something
> new.

Oops, I moved field type resetting to out2prerelease() and added
call of this function to several places:

diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
index f6eab8a0e..77529d346 100644
--- a/src/box/sql/vdbe.c
+++ b/src/box/sql/vdbe.c
@@ -558,6 +558,7 @@ out2Prerelease(Vdbe *p, VdbeOp *pOp)
                pOut->flags = MEM_Int;
                return pOut;
        }
+       pOut->field_type = field_type_MAX;
 }
 
 struct stailq *
@@ -1781,6 +1782,7 @@ case OP_Function: {
        }
 #endif
        MemSetTypeFlag(pCtx->pOut, MEM_Null);
+       pCtx->pOut->field_type = field_type_MAX;
        pCtx->is_aborted = false;
        (*pCtx->pFunc->xSFunc)(pCtx, pCtx->argc, pCtx->argv);/* IMP: R-24505-23230 */
 
@@ -2227,6 +2229,7 @@ case OP_Ge: {             /* same as TK_GE, jump, in1, in3 */
 
        if (pOp->p5 & SQL_STOREP2) {
                pOut = &aMem[pOp->p2];
+               pOut->field_type = field_type_MAX;
                iCompare = res;
                res2 = res2!=0;  /* For this path res2 must be exactly 0 or 1 */
                if ((pOp->p5 & SQL_KEEPNULL)!=0) {
@@ -2625,6 +2628,7 @@ case OP_Column: {
                                                            pReg->z, pReg->n);
                        } else {
                                sqlVdbeMemSetNull(pDest);
+                               pDest->field_type = field_type_MAX;
                                goto op_column_out;
                        }
                } else {
@@ -3684,7 +3688,7 @@ case OP_Sequence: {           /* out2 */
  * incremented by one.
  */
 case OP_NextSequenceId: {
-       pOut = &aMem[pOp->p2];
+       pOut = out2Prerelease(p, pOp);
        uint64_t id;
        tarantoolSqlNextSeqId(&id);
        id++;
@@ -3717,7 +3721,7 @@ case OP_NextIdEphemeral: {
                diag_set(ClientError, ER_ROWID_OVERFLOW);
                goto abort_due_to_error;
        }
-       pOut = &aMem[pOp->p2];
+       pOut = out2Prerelease(p, pOp);
        mem_set_u64(pOut, rowid);
        break;
 }
@@ -3925,7 +3929,7 @@ case OP_RowData: {
        }
 #endif
 
-       pOut = &aMem[pOp->p2];
+       pOut = out2Prerelease(p, pOp);
        memAboutToChange(p, pOut);
 
        assert(pOp->p1>=0 && pOp->p1<p->nCursor);
diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
index e5941de8e..bf020657f 100644
--- a/src/box/sql/vdbemem.c
+++ b/src/box/sql/vdbemem.c
@@ -894,7 +894,6 @@ sqlVdbeMemAboutToChange(Vdbe * pVdbe, Mem * pMem)
        for (i = 0, pX = pVdbe->aMem; i < pVdbe->nMem; i++, pX++) {
                if (pX->pScopyFrom == pMem) {
                        pX->flags |= MEM_Undefined;
-                       pX->field_type = field_type_MAX;
                        pX->pScopyFrom = 0;
                }
        }

>>> I propose you to redesign how to update field_type, so as it would
>>> be hard to forget to update it.
>>> 
>>> For instance, add a field_type argument to sqlVdbeMemSetNull() to never
>>> miss to set a real type for a NULL value. Make mem_set_int update field_type
>>> depending on the positivity flag. Make mem_set_u64 always set field_type
>>> UNSIGNED. Make sqlVdbeMemCast always set field_type to the target type.
>> 
>> I’ve tried to refactor patch this way, but in most cases sqlVdbeMemSetNull()
>> would set default (field_type_MAX). As for mem_set_* family functions -
>> I can move assignment of field_type_MAX to the body, but does it make
>> much sense?
>> 
> 
> 4. Currently, as I said above, I don't see where you invalidate type of a
> memory cell before it takes a new value. Yes, field_type is initialized
> with field_type_MAX when the register array is just created, but where
> the cell type is reset, when it takes a new value?
> 
> And one another possible bug:
> 
> box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a INTEGER)')
> box.execute('INSERT INTO test VALUES (1, NULL)')
> tarantool> box.execute('SELECT typeof(NOT a) FROM test')
> ---
> - metadata:
>  - name: typeof(NOT a)
>    type: string
>  rows:
>  - ['boolean']
> ...
> 
> As you can see, 'NOT' lost the original type - it was integer NULL,
> not boolean NULL.

Again, personally I do not consider this to be a bug
(see a bit detailed explanation above).

> Another place is vdbe.c:2627 - here you jump to op_column_out
> bypassing field_type assignment. Why?

Simply missed this place, fixed:

@@ -2625,6 +2628,7 @@ case OP_Column: {
                                                            pReg->z, pReg->n);
                        } else {
                                sqlVdbeMemSetNull(pDest);
+                               pDest->field_type = field_type_MAX;
                                goto op_column_out;
                        }
                } else {


>>>> @@ -1909,6 +1924,7 @@ case OP_ShiftRight: {           /* same as TK_RSHIFT, in1, in2, out3 */
>>>> 		}
>>>> 	}
>>>> 	mem_set_i64(pOut, iA);
>>>> +	pOut->field_type = field_type_MAX;
>>>> 	break;
>>>> }
>>>> 
>>>> @@ -1988,6 +2004,13 @@ case OP_Cast: {                  /* in1 */
>>>> 	if (ExpandBlob(pIn1) != 0)
>>>> 		goto abort_due_to_error;
>>>> 	rc = sqlVdbeMemCast(pIn1, pOp->p2);
>>>> +	/*
>>>> +	 * SCALAR is not type itself, but rather an aggregation
>>>> +	 * of types. Hence, cast to this type shouldn't change
>>>> +	 * original type of argument.
>>>> +	 */
>>>> +	if (pOp->p2 != FIELD_TYPE_SCALAR)
>>>> +		pIn1->field_type = pOp->p2;
>>> 
>>> 8. Why sqlVdbeMemCast does not update the type inside? Now
>>> this function works on half - it updates flags with a new type,
>>> sets a new value, but doesn't touch field_type.
>> 
>> Because it has several exit points and otherwise we have
>> to add to each branch:
>> 
>> diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
>> index e5941de8e..7892af116 100644
>> --- a/src/box/sql/vdbemem.c
>> +++ b/src/box/sql/vdbemem.c
>> @@ -673,10 +673,12 @@ sqlVdbeMemCast(Mem * pMem, enum field_type type)
>>        case FIELD_TYPE_BOOLEAN:
>>                if ((pMem->flags & MEM_Int) != 0) {
>>                        mem_set_bool(pMem, pMem->u.i);
>> +                       pMem->field_type = type;
>>                        return 0;
>>                }
>>                if ((pMem->flags & MEM_UInt) != 0) {
>>                        mem_set_bool(pMem, pMem->u.u);
>> +                       pMem->field_type = type;
>>                        return 0;
>>                }
>>                if ((pMem->flags & MEM_Real) != 0) {
>> 
>> Etc.
>> 
>> What is more, this is the only path (OP_Cast) which calls
>> sqlVdbeMemCast() function. So IMHO it is pretty OK.
>> 
>> If you want, I can attempt at refactoring sqlVdbeMemCast()
>> so that make it feature only one exit point and move filed_type
>> assignment to it.
>> 
> 
> 5. Never mind. Too many changes.

I’m not sure if it is a sarcasm or not...