[tarantool-patches] Re: [PATCH 1/2] sql: extend struct Mem with field_type member

Wed Jul 31 12:14:54 MSK 2019

>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
>> index fcf147c96..f50df105d 100644
>> --- a/src/box/sql/func.c
>> +++ b/src/box/sql/func.c
>> @@ -107,6 +107,17 @@ typeofFunc(sql_context * context, int NotUsed, sql_value ** argv)
>> {
>> 	const char *z = 0;
>> 	UNUSED_PARAMETER(NotUsed);
>> +	enum field_type f_t = argv[0]->field_type;
> 
> 1. Perhaps, field_type is not the best name, because Mem is not always
> is a field. It can be just a value, not from a space. It is rather
> just 'type’.

When memory represents smth different from value fetched from
tuple, it is set to field_type_MAX. So I’d rather not rename this member.

>> +	/*
>> +	 * SCALAR is not a basic type, but rather an aggregation of
>> +	 * types. Thus, ignore SCALAR field type and return msgpack
>> +	 * format type.
>> +	 */
>> +	if (f_t != field_type_MAX && f_t != FIELD_TYPE_SCALAR) {
> 
> 2. Not sure if I understand, why do you need both _MAX and _SCALAR for
> values with no strict type. Why not to set SCALAR instead of MAX always?
> MAX is not a type, so SCALAR IMO would be more appropriate when you
> don't have a strict type. Or even ANY, because maps and arrays will be
> supported.

Because SCALAR and ANY are real field types. AFAIR we decided
to introduce different comparison rules for values from SCALAR field
and values from basic type fields. So I’m not sure that setting SCALAR
type by default is a good idea.

>> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
>> index 9142932e9..2c7a1c2da 100644
>> --- a/src/box/sql/vdbe.c
>> +++ b/src/box/sql/vdbe.c
>> @@ -1238,6 +1238,7 @@ case OP_SoftNull: {
>> 	assert(pOp->p1>0 && pOp->p1<=(p->nMem+1 - p->nCursor));
>> 	pOut = &aMem[pOp->p1];
>> 	pOut->flags = (pOut->flags|MEM_Null)&~MEM_Undefined;
>> +	pOut->field_type = field_type_MAX;
>> 	break;
> 
> 3. This opcode is unused and can be deleted.

Ok, moved this change to a separate patch:

Author: Nikita Pettik <korablev at tarantool.org>
Date:   Mon Jul 29 16:03:38 2019 +0300

    sql: remove OP_SoftNull opcode
    
    It's not used anymore and can be removed to reduce size of codebase.

diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
index 9142932e9..12de10f37 100644
--- a/src/box/sql/vdbe.c
+++ b/src/box/sql/vdbe.c
@@ -1226,21 +1226,6 @@ case OP_Null: {           /* out2 */
        break;
 }
 
-/* Opcode: SoftNull P1 * * * *
- * Synopsis: r[P1]=NULL
- *
- * Set register P1 to have the value NULL as seen by the OP_MakeRecord
- * instruction, but do not free any string or blob memory associated with
- * the register, so that if the value was a string or blob that was
- * previously copied using OP_SCopy, the copies will continue to be valid.
- */
-case OP_SoftNull: {
-       assert(pOp->p1>0 && pOp->p1<=(p->nMem+1 - p->nCursor));
-       pOut = &aMem[pOp->p1];
-       pOut->flags = (pOut->flags|MEM_Null)&~MEM_Undefined;
-       break;
-}
-
 /* Opcode: Blob P1 P2 P3 P4 *
  * Synopsis: r[P2]=P4 (len=P1, subtype=P3)
  *

>> @@ -1278,6 +1279,11 @@ case OP_Variable: {            /* out2 */
>> 	}
>> 	pOut = out2Prerelease(p, pOp);
>> 	sqlVdbeMemShallowCopy(pOut, pVar, MEM_Static);
>> +	/*
>> +	 * ...ShallowCopy() may overwrite field_type with irrelevant
>> +	 * value. So make sure that in the end field_type_MAX is set.
>> +	 */
> 
> 4. The only place where ShallowCopy can rewrite field_type is
> memcpy(pTo, pFrom, MEMCELLSIZE). If after that field_type is irrelevant,
> then it was incorrect in pFrom, i.e. in pVar. On the summary:
> why pVar may have incorrect field_type? Should not bind set a correct
> type?

Because it is not set during bind*() routines. Moved field_type_MAX
assignment to vdbeUnbind() routine:

diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
index fcc1701eb..20c1e39ec 100644
--- a/src/box/sql/vdbe.c
+++ b/src/box/sql/vdbe.c
@@ -1263,11 +1263,6 @@ case OP_Variable: {            /* out2 */
        }
        pOut = out2Prerelease(p, pOp);
        sqlVdbeMemShallowCopy(pOut, pVar, MEM_Static);
-       /*
-        * ...ShallowCopy() may overwrite field_type with irrelevant
-        * value. So make sure that in the end field_type_MAX is set.
-        */
-       pOut->field_type = field_type_MAX;
        UPDATE_MAX_BLOBSIZE(pOut);
        break;
 }
diff --git a/src/box/sql/vdbeapi.c b/src/box/sql/vdbeapi.c
index c62cb7a1e..d470b6cc8 100644
--- a/src/box/sql/vdbeapi.c
+++ b/src/box/sql/vdbeapi.c
@@ -834,6 +834,7 @@ vdbeUnbind(Vdbe * p, int i)
        pVar = &p->aVar[i];
        sqlVdbeMemRelease(pVar);
        pVar->flags = MEM_Null;
+       pVar->field_type = field_type_MAX;
 
        /* If the bit corresponding to this variable in Vdbe.expmask is set, then
         * binding a new value to this variable invalidates the current query plan.

>> @@ -1338,6 +1344,7 @@ case OP_Copy: {
>> 	pIn1 = &aMem[pOp->p1];
>> 	pOut = &aMem[pOp->p2];
>> 	assert(pOut!=pIn1);
>> +
>> 	while( 1) {
> 
> 5. Unnecessary diff.

Removed.

>> 		sqlVdbeMemShallowCopy(pOut, pIn1, MEM_Ephem);
>> 		Deephemeralize(pOut);
>> @@ -1389,6 +1396,7 @@ case OP_IntCopy: {            /* out2 */
>> 	assert((pIn1->flags & (MEM_Int | MEM_UInt)) != 0);
>> 	pOut = &aMem[pOp->p2];
>> 	mem_set_int(pOut, pIn1->u.i, (pIn1->flags & MEM_Int) != 0);
>> +	pOut->field_type = field_type_MAX;
> 
> 6. Why mem_set_int does not set a correct field_type?

Why should it do so? As its name says, field type refers
to type of filed in space’s format. For value’s type (mp format)
we have “flags" property.

> Or why
> pOut->field_type can't be assigned to pIn->field_type? You actually
> do it for OP_FCopy. The same about OP_Remainder.
> 
> When you assign field_type_MAX, and pIn1 was FIELD_TYPE_INTEGER
> with an unsigned value, you will have typeof(pOut) unsigned - type
> loss.
> 
> Additionally, as I see, OP_IntCopy is used only once, when a space
> is being created, to copy sequence id. In such a case I think it
> can be deleted and replaced with OP_SCopy.

Actually, that’s why I don’t bother with type manipulations.
Replaced OP_IntCopy with OP_SCopy in a separate patch:

Author: Nikita Pettik <korablev at tarantool.org>
Date:   Mon Jul 29 19:08:41 2019 +0300

    sql: remove OP_IntCopy opcode
    
    Its purpose is to copy integer value from one memory cell to another.
    In other words, it is particular case of OP_SCopy instruction. Since it
    is used only during creation of AUTOINCREMENT property, it seems to be
    reasonable to replace it with a bit general OP_SCopy and erase
    OP_IntCopy at all reducing size of SQL codebase.

diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index ada7b5859..a63d2d3be 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -1066,7 +1066,7 @@ emitNewSysSpaceSequenceRecord(Parse *pParse, int reg_space_id, int reg_seq_id,
        sqlVdbeAddOp2(v, OP_SCopy, reg_space_id, first_col + 1);
        
        /* 2. Sequence id  */
-       sqlVdbeAddOp2(v, OP_IntCopy, reg_seq_id, first_col + 2);
+       sqlVdbeAddOp2(v, OP_SCopy, reg_seq_id, first_col + 2);
 
        /* 3. Autogenerated. */
        sqlVdbeAddOp2(v, OP_Bool, true, first_col + 3);
diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
index 12de10f37..88d1fa895 100644
--- a/src/box/sql/vdbe.c
+++ b/src/box/sql/vdbe.c
@@ -1361,22 +1361,6 @@ case OP_SCopy: {            /* out2 */
        break;
 }
 
-/* Opcode: IntCopy P1 P2 * * *
- * Synopsis: r[P2]=r[P1]
- *
- * Transfer the integer value held in register P1 into register P2.
- *
- * This is an optimized version of SCopy that works only for integer
- * values.
- */
-case OP_IntCopy: {            /* out2 */
-       pIn1 = &aMem[pOp->p1];
-       assert((pIn1->flags & (MEM_Int | MEM_UInt)) != 0);
-       pOut = &aMem[pOp->p2];
-       mem_set_int(pOut, pIn1->u.i, (pIn1->flags & MEM_Int) != 0);
-       break;
-}
-
 /* Opcode: ResultRow P1 P2 * * *
  * Synopsis: output=r[P1 at P2]
  *

> 
>> 	break;
>> }
>> 
>> @@ -1491,6 +1499,8 @@ case OP_Concat: {           /* same as TK_CONCAT, in1, in2, out3 */
>> 	assert(pIn1!=pOut);
>> 	if ((pIn1->flags | pIn2->flags) & MEM_Null) {
>> 		sqlVdbeMemSetNull(pOut);
>> +		/* Force NULL be of type STRING. */
>> +		pOut->field_type = FIELD_TYPE_STRING;
>> 		break;
>> 	}
>> 	/*
>> @@ -1535,6 +1545,7 @@ case OP_Concat: {           /* same as TK_CONCAT, in1, in2, out3 */
>> 	pOut->z[nByte+1] = 0;
>> 	pOut->flags |= MEM_Term;
>> 	pOut->n = (int)nByte;
>> +	pOut->field_type = field_type_MAX;
>> 	UPDATE_MAX_BLOBSIZE(pOut);
>> 	break;
>> }
>> @@ -1644,6 +1655,7 @@ case OP_Remainder: {           /* same as TK_REM, in1, in2, out3 */
>> 		}
>> 		}
>> 		mem_set_int(pOut, iB, is_res_neg);
>> +		pOut->field_type = field_type_MAX;
>> 	} else {
>> 		bIntint = 0;
>> 		if (sqlVdbeRealValue(pIn1, &rA) != 0) {
>> @@ -1681,6 +1693,7 @@ case OP_Remainder: {           /* same as TK_REM, in1, in2, out3 */
>> 		}
>> 		pOut->u.r = rB;
>> 		MemSetTypeFlag(pOut, MEM_Real);
>> +		pOut->field_type = field_type_MAX;
>> 		if (((type1|type2)&MEM_Real)==0 && !bIntint) {
>> 			mem_apply_integer_type(pOut);
>> 		}
>> @@ -1689,6 +1702,8 @@ case OP_Remainder: {           /* same as TK_REM, in1, in2, out3 */
>> 
>> arithmetic_result_is_null:
>> 	sqlVdbeMemSetNull(pOut);
>> +	/* Force NULL be of type NUMBER. */
>> +	pOut->field_type = FIELD_TYPE_NUMBER;
>> 	break;
> 
> 7. All these manual assignments of field_type are broken by design.
> When you have to update field_type after each operation manually,
> you will miss some places for sure. For example:
> 
> tarantool> box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER)')
> ---
> - row_count: 1
> ...
> 
> tarantool> box.execute('INSERT INTO test VALUES (1, NULL, NULL)')
> ---
> - row_count: 1
> ...
> 
> tarantool> box.execute('SELECT typeof(a & b) FROM test')
> ---
> - metadata:
>  - name: typeof(a & b)
>    type: string
>  rows:
>  - ['null']
> ...
> 
> tarantool> box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM test')
> ---
> - metadata:
>  - name: typeof(a)
>    type: string
>  - name: typeof(b)
>    type: string
>  - name: typeof(a & b)
>    type: string
>  rows:
>  - ['integer', 'integer', 'integer']
> ...
> 
> 
> Here in the last two queries 'typeof(a & b)' result *depends* on
> other columns in result set. But it should be integer always, shouldn't
> it?

Yep, it should, I’ve fixed that:

diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
index b3b230ad0..f6eab8a0e 100644
--- a/src/box/sql/vdbe.c
+++ b/src/box/sql/vdbe.c
@@ -1842,6 +1842,8 @@ case OP_ShiftRight: {           /* same as TK_RSHIFT, in1, in2, out3 */
        pOut = &aMem[pOp->p3];
        if ((pIn1->flags | pIn2->flags) & MEM_Null) {
                sqlVdbeMemSetNull(pOut);
+               /* Force NULL be of type INTEGER. */
+               pOut->field_type = FIELD_TYPE_INTEGER;
                break;
        }
        bool unused;
@@ -2477,6 +2479,8 @@ case OP_BitNot: {             /* same as TK_BITNOT, in1, out2 */
        pIn1 = &aMem[pOp->p1];
        pOut = &aMem[pOp->p2];
        sqlVdbeMemSetNull(pOut);
+       /* Force NULL be of type INTEGER. */
+       pOut->field_type = FIELD_TYPE_INTEGER;
        if ((pIn1->flags & MEM_Null)==0) {
                int64_t i;
                bool is_neg;
diff --git a/test/sql/types.test.lua b/test/sql/types.test.lua
index 489a5a91c..8a509da96 100644
--- a/test/sql/types.test.lua
+++ b/test/sql/types.test.lua
@@ -407,6 +407,12 @@ box.execute("CREATE TABLE t (id INT PRIMARY KEY, a INT, s SCALAR);")
 box.execute("INSERT INTO t VALUES (1, 1, 1), (2, NULL, NULL);")
 box.execute("SELECT typeof(a), typeof(s) FROM t;")
 
+box.execute('CREATE TABLE t1 (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER)')
+box.execute('INSERT INTO t1 VALUES (1, NULL, NULL);')
+box.execute('SELECT typeof(a & b) FROM t1;')
+box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM t1')
+
 box.execute("SELECT typeof(CAST(0 AS UNSIGNED));")
 
 box.space.T:drop()
+box.space.T1:drop()

> The same about all other places, where you call mem_set_* and don't
> update field_type.

It’s okay, since updating current value in memory cell doesn’t
imply that field type should be updated as well. In most scenarios
before setting new value to memory cell, we provide clean-up
via memAboutToChange(), which in turn invalidates field type.

> I propose you to redesign how to update field_type, so as it would
> be hard to forget to update it.
> 
> For instance, add a field_type argument to sqlVdbeMemSetNull() to never
> miss to set a real type for a NULL value. Make mem_set_int update field_type
> depending on the positivity flag. Make mem_set_u64 always set field_type
> UNSIGNED. Make sqlVdbeMemCast always set field_type to the target type.

I’ve tried to refactor patch this way, but in most cases sqlVdbeMemSetNull()
would set default (field_type_MAX). As for mem_set_* family functions -
I can move assignment of field_type_MAX to the body, but does it make
much sense?

>> @@ -1909,6 +1924,7 @@ case OP_ShiftRight: {           /* same as TK_RSHIFT, in1, in2, out3 */
>> 		}
>> 	}
>> 	mem_set_i64(pOut, iA);
>> +	pOut->field_type = field_type_MAX;
>> 	break;
>> }
>> 
>> @@ -1988,6 +2004,13 @@ case OP_Cast: {                  /* in1 */
>> 	if (ExpandBlob(pIn1) != 0)
>> 		goto abort_due_to_error;
>> 	rc = sqlVdbeMemCast(pIn1, pOp->p2);
>> +	/*
>> +	 * SCALAR is not type itself, but rather an aggregation
>> +	 * of types. Hence, cast to this type shouldn't change
>> +	 * original type of argument.
>> +	 */
>> +	if (pOp->p2 != FIELD_TYPE_SCALAR)
>> +		pIn1->field_type = pOp->p2;
> 
> 8. Why sqlVdbeMemCast does not update the type inside? Now
> this function works on half - it updates flags with a new type,
> sets a new value, but doesn't touch field_type.

Because it has several exit points and otherwise we have
to add to each branch:

diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
index e5941de8e..7892af116 100644
--- a/src/box/sql/vdbemem.c
+++ b/src/box/sql/vdbemem.c
@@ -673,10 +673,12 @@ sqlVdbeMemCast(Mem * pMem, enum field_type type)
        case FIELD_TYPE_BOOLEAN:
                if ((pMem->flags & MEM_Int) != 0) {
                        mem_set_bool(pMem, pMem->u.i);
+                       pMem->field_type = type;
                        return 0;
                }
                if ((pMem->flags & MEM_UInt) != 0) {
                        mem_set_bool(pMem, pMem->u.u);
+                       pMem->field_type = type;
                        return 0;
                }
                if ((pMem->flags & MEM_Real) != 0) {

Etc.

What is more, this is the only path (OP_Cast) which calls
sqlVdbeMemCast() function. So IMHO it is pretty OK.

If you want, I can attempt at refactoring sqlVdbeMemCast()
so that make it feature only one exit point and move filed_type
assignment to it.

>> 	UPDATE_MAX_BLOBSIZE(pIn1);
>> 	if (rc == 0)
>> 		break;
>> @@ -2094,6 +2117,8 @@ case OP_Ge: {             /* same as TK_GE, jump, in1, in3 */
>> 	pIn3 = &aMem[pOp->p3];
>> 	flags1 = pIn1->flags;
>> 	flags3 = pIn3->flags;
>> +	enum field_type ft_p1 = pIn1->field_type;
>> +	enum field_type ft_p3 = pIn3->field_type;
>> 	if ((flags1 | flags3)&MEM_Null) {
>> 		/* One or both operands are NULL */
>> 		if (pOp->p5 & SQL_NULLEQ) {
>> @@ -2232,8 +2257,10 @@ case OP_Ge: {             /* same as TK_GE, jump, in1, in3 */
>> 	/* Undo any changes made by mem_apply_type() to the input registers. */
>> 	assert((pIn1->flags & MEM_Dyn) == (flags1 & MEM_Dyn));
>> 	pIn1->flags = flags1;
>> +	pIn1->field_type = ft_p1;
>> 	assert((pIn3->flags & MEM_Dyn) == (flags3 & MEM_Dyn));
>> 	pIn3->flags = flags3;
>> +	pIn3->field_type = ft_p3;
>> 
>> 	if (pOp->p5 & SQL_STOREP2) {
>> 		pOut = &aMem[pOp->p2];
>> @@ -2451,6 +2478,7 @@ case OP_Or: {             /* same as TK_OR, in1, in2, out3 */
>> 	} else {
>> 		mem_set_bool(pOut, v1);
>> 	}
>> +	pOut->field_type = field_type_MAX;
>> 	break;
>> }
>> 
>> @@ -2472,6 +2500,7 @@ case OP_Not: {                /* same as TK_NOT, in1, out2 */
>> 			goto abort_due_to_error;
>> 		}
>> 		mem_set_bool(pOut, ! pIn1->u.b);
>> +		pOut->field_type = field_type_MAX;
> 
> 9. So after mem_set_bool, if the cell was used for
> something else before, it is in an invalid state, right?

Why do you call it ‘invalid’? In this approach any operation
resets field type to default value to force typeof() look
at actual mp format type. Since in this case field_type simply
duplicates format type. The only exception is null values due
to storing is_null flag alongside with format type. 

> I don't think it should work so. Please, develop a
> better way how to keep field_type up to date.
> 
> Ideally, if code doesn't touch type flags, it should not touch
> field_type. Otherwise that place is broken, if you need to
> do follow-up updates after each function, doing something
> with types.