[tarantool-patches] Re: [PATCH 1/2] sql: extend struct Mem with field_type member

Wed Jul 31 19:59:52 MSK 2019

Hi! Thanks for the fixes! See 5 comments below.

On 31/07/2019 11:14, n.pettik wrote:
> 
> 
>>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
>>> index fcf147c96..f50df105d 100644
>>> --- a/src/box/sql/func.c
>>> +++ b/src/box/sql/func.c
>>> @@ -107,6 +107,17 @@ typeofFunc(sql_context * context, int NotUsed, sql_value ** argv)
>>> {
>>> 	const char *z = 0;
>>> 	UNUSED_PARAMETER(NotUsed);
>>> +	enum field_type f_t = argv[0]->field_type;
>>
>> 1. Perhaps, field_type is not the best name, because Mem is not always
>> is a field. It can be just a value, not from a space. It is rather
>> just 'type’.
> 
> When memory represents smth different from value fetched from
> tuple, it is set to field_type_MAX. So I’d rather not rename this member.

1. Ok, now I got it. It works only for tuple fields, and is totally disabled
for other values. Or not? See other comments.

>> 7. All these manual assignments of field_type are broken by design.
>> When you have to update field_type after each operation manually,
>> you will miss some places for sure. For example:
>>
>> tarantool> box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER)')
>> ---
>> - row_count: 1
>> ...
>>
>> tarantool> box.execute('INSERT INTO test VALUES (1, NULL, NULL)')
>> ---
>> - row_count: 1
>> ...
>>
>> tarantool> box.execute('SELECT typeof(a & b) FROM test')
>> ---
>> - metadata:
>>  - name: typeof(a & b)
>>    type: string
>>  rows:
>>  - ['null']
>> ...
>>
>> tarantool> box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM test')
>> ---
>> - metadata:
>>  - name: typeof(a)
>>    type: string
>>  - name: typeof(b)
>>    type: string
>>  - name: typeof(a & b)
>>    type: string
>>  rows:
>>  - ['integer', 'integer', 'integer']
>> ...
>>
>>
>> Here in the last two queries 'typeof(a & b)' result *depends* on
>> other columns in result set. But it should be integer always, shouldn't
>> it?
> 
> Yep, it should, I’ve fixed that:
> 
> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
> index b3b230ad0..f6eab8a0e 100644
> --- a/src/box/sql/vdbe.c
> +++ b/src/box/sql/vdbe.c
> @@ -1842,6 +1842,8 @@ case OP_ShiftRight: {           /* same as TK_RSHIFT, in1, in2, out3 */
>         pOut = &aMem[pOp->p3];
>         if ((pIn1->flags | pIn2->flags) & MEM_Null) {
>                 sqlVdbeMemSetNull(pOut);
> +               /* Force NULL be of type INTEGER. */
> +               pOut->field_type = FIELD_TYPE_INTEGER;
>                 break;
>         }
>         bool unused;
> @@ -2477,6 +2479,8 @@ case OP_BitNot: {             /* same as TK_BITNOT, in1, out2 */
>         pIn1 = &aMem[pOp->p1];
>         pOut = &aMem[pOp->p2];
>         sqlVdbeMemSetNull(pOut);
> +       /* Force NULL be of type INTEGER. */
> +       pOut->field_type = FIELD_TYPE_INTEGER;
2.

tarantool> box.execute('SELECT typeof(~NULL)')
---
- metadata:
  - name: typeof(~NULL)
    type: string
  rows:
  - ['integer']
...

But it is not integer, as I think. I didn't select NULL from
an integer column. Here NULL was of type 'boolean'. And the
result should be 'boolean' too, because it is just NULL, not
a column NULL. The same about other bit operations.

And the same about strings:

tarantool> box.execute("SELECT typeof(NULL || NULL)")
---
- metadata:
  - name: typeof(NULL || NULL)
    type: string
  rows:
  - ['string']
...

Here NULLs were of type 'boolean', but output shows 'string'.

>         if ((pIn1->flags & MEM_Null)==0) {
>                 int64_t i;
>                 bool is_neg;
> diff --git a/test/sql/types.test.lua b/test/sql/types.test.lua
> index 489a5a91c..8a509da96 100644
> --- a/test/sql/types.test.lua
> +++ b/test/sql/types.test.lua
> @@ -407,6 +407,12 @@ box.execute("CREATE TABLE t (id INT PRIMARY KEY, a INT, s SCALAR);")
>  box.execute("INSERT INTO t VALUES (1, 1, 1), (2, NULL, NULL);")
>  box.execute("SELECT typeof(a), typeof(s) FROM t;")
>  
> +box.execute('CREATE TABLE t1 (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER)')
> +box.execute('INSERT INTO t1 VALUES (1, NULL, NULL);')
> +box.execute('SELECT typeof(a & b) FROM t1;')
> +box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM t1')
> +
>  box.execute("SELECT typeof(CAST(0 AS UNSIGNED));")
>  
>  box.space.T:drop()
> +box.space.T1:drop()
> 
>> The same about all other places, where you call mem_set_* and don't
>> update field_type.
> 
> It’s okay, since updating current value in memory cell doesn’t
> imply that field type should be updated as well. In most scenarios
> before setting new value to memory cell, we provide clean-up
> via memAboutToChange(), which in turn invalidates field type.

3. memAboutToChange is no-op in Release build, so it does not invalidate
anything in that mode. It means, that field_type now can be a garbage
from a previous value, if the memory cell was reused to assign something
new.

>> I propose you to redesign how to update field_type, so as it would
>> be hard to forget to update it.
>>
>> For instance, add a field_type argument to sqlVdbeMemSetNull() to never
>> miss to set a real type for a NULL value. Make mem_set_int update field_type
>> depending on the positivity flag. Make mem_set_u64 always set field_type
>> UNSIGNED. Make sqlVdbeMemCast always set field_type to the target type.
> 
> I’ve tried to refactor patch this way, but in most cases sqlVdbeMemSetNull()
> would set default (field_type_MAX). As for mem_set_* family functions -
> I can move assignment of field_type_MAX to the body, but does it make
> much sense?
> 

4. Currently, as I said above, I don't see where you invalidate type of a
memory cell before it takes a new value. Yes, field_type is initialized
with field_type_MAX when the register array is just created, but where
the cell type is reset, when it takes a new value?

And one another possible bug:

box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a INTEGER)')
box.execute('INSERT INTO test VALUES (1, NULL)')
tarantool> box.execute('SELECT typeof(NOT a) FROM test')
---
- metadata:
  - name: typeof(NOT a)
    type: string
  rows:
  - ['boolean']
...

As you can see, 'NOT' lost the original type - it was integer NULL,
not boolean NULL.

Another place is vdbe.c:2627 - here you jump to op_column_out
bypassing field_type assignment. Why?

>>> @@ -1909,6 +1924,7 @@ case OP_ShiftRight: {           /* same as TK_RSHIFT, in1, in2, out3 */
>>> 		}
>>> 	}
>>> 	mem_set_i64(pOut, iA);
>>> +	pOut->field_type = field_type_MAX;
>>> 	break;
>>> }
>>>
>>> @@ -1988,6 +2004,13 @@ case OP_Cast: {                  /* in1 */
>>> 	if (ExpandBlob(pIn1) != 0)
>>> 		goto abort_due_to_error;
>>> 	rc = sqlVdbeMemCast(pIn1, pOp->p2);
>>> +	/*
>>> +	 * SCALAR is not type itself, but rather an aggregation
>>> +	 * of types. Hence, cast to this type shouldn't change
>>> +	 * original type of argument.
>>> +	 */
>>> +	if (pOp->p2 != FIELD_TYPE_SCALAR)
>>> +		pIn1->field_type = pOp->p2;
>>
>> 8. Why sqlVdbeMemCast does not update the type inside? Now
>> this function works on half - it updates flags with a new type,
>> sets a new value, but doesn't touch field_type.
> 
> Because it has several exit points and otherwise we have
> to add to each branch:
> 
> diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
> index e5941de8e..7892af116 100644
> --- a/src/box/sql/vdbemem.c
> +++ b/src/box/sql/vdbemem.c
> @@ -673,10 +673,12 @@ sqlVdbeMemCast(Mem * pMem, enum field_type type)
>         case FIELD_TYPE_BOOLEAN:
>                 if ((pMem->flags & MEM_Int) != 0) {
>                         mem_set_bool(pMem, pMem->u.i);
> +                       pMem->field_type = type;
>                         return 0;
>                 }
>                 if ((pMem->flags & MEM_UInt) != 0) {
>                         mem_set_bool(pMem, pMem->u.u);
> +                       pMem->field_type = type;
>                         return 0;
>                 }
>                 if ((pMem->flags & MEM_Real) != 0) {
> 
> Etc.
> 
> What is more, this is the only path (OP_Cast) which calls
> sqlVdbeMemCast() function. So IMHO it is pretty OK.
> 
> If you want, I can attempt at refactoring sqlVdbeMemCast()
> so that make it feature only one exit point and move filed_type
> assignment to it.
> 

5. Never mind. Too many changes.