[tarantool-patches] Re: [PATCH 1/2] sql: extend struct Mem with field_type member

Sun Jul 28 18:22:00 MSK 2019

Hi! Thanks for the patch! See 9 comments below.

On 27/07/2019 20:45, Nikita Pettik wrote:
> There's several reasons to do so. First one is to improve operability of
> built-in function 'typeof()' which returns string containing argument's
> type. Using only format (msgpack) type we can't determine real field
> type of null values. Moreover, result of CAST should feature the same
> type as it was specified in operation. For instance:
> typeof(CAST(0 AS INTEGER)) should return "integer", not "unsigned".
> 
> The second reason is different rules for comparison involving values of
> SCALAR type. In Tarantool NoSQL it is allowed to compare values of
> "incompatible" types like booleans and strings (using heuristics which
> says that values of one type always are greater/less that values of
> another type), in case they are stored in SCALAR field type. Without
> storing actual field type in struct Mem it is obvious impossible to
> achieve.
> 
> Thus, now field_type member in struct Mem represents supposed field type
> in case value is fetched from tuple or is a result of CAST operation.
> 
> Closes #4148
> ---
>  src/box/sql/func.c          | 11 +++++++++++
>  src/box/sql/vdbe.c          | 36 +++++++++++++++++++++++++++++++++++-
>  src/box/sql/vdbeInt.h       | 11 +++++++++++
>  src/box/sql/vdbeapi.c       |  1 +
>  src/box/sql/vdbeaux.c       |  1 +
>  src/box/sql/vdbemem.c       |  3 +++
>  src/box/sql/vdbesort.c      |  6 ++++++
>  test/sql-tap/cast.test.lua  |  8 ++++----
>  test/sql-tap/table.test.lua |  4 ++--
>  test/sql/types.result       | 35 ++++++++++++++++++++++++++++++++++-
>  test/sql/types.test.lua     | 11 +++++++++++
>  11 files changed, 119 insertions(+), 8 deletions(-)
> 
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index fcf147c96..f50df105d 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -107,6 +107,17 @@ typeofFunc(sql_context * context, int NotUsed, sql_value ** argv)
>  {
>  	const char *z = 0;
>  	UNUSED_PARAMETER(NotUsed);
> +	enum field_type f_t = argv[0]->field_type;

1. Perhaps, field_type is not the best name, because Mem is not always
is a field. It can be just a value, not from a space. It is rather
just 'type'.

> +	/*
> +	 * SCALAR is not a basic type, but rather an aggregation of
> +	 * types. Thus, ignore SCALAR field type and return msgpack
> +	 * format type.
> +	 */
> +	if (f_t != field_type_MAX && f_t != FIELD_TYPE_SCALAR) {

2. Not sure if I understand, why do you need both _MAX and _SCALAR for
values with no strict type. Why not to set SCALAR instead of MAX always?
MAX is not a type, so SCALAR IMO would be more appropriate when you
don't have a strict type. Or even ANY, because maps and arrays will be
supported.

> +		sql_result_text(context, field_type_strs[argv[0]->field_type],
> +				-1, SQL_STATIC);
> +		return;
> +	}
>  	switch (sql_value_type(argv[0])) {
>  	case MP_INT:
>  	case MP_UINT:
> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
> index 9142932e9..2c7a1c2da 100644
> --- a/src/box/sql/vdbe.c
> +++ b/src/box/sql/vdbe.c
> @@ -1238,6 +1238,7 @@ case OP_SoftNull: {
>  	assert(pOp->p1>0 && pOp->p1<=(p->nMem+1 - p->nCursor));
>  	pOut = &aMem[pOp->p1];
>  	pOut->flags = (pOut->flags|MEM_Null)&~MEM_Undefined;
> +	pOut->field_type = field_type_MAX;
>  	break;

3. This opcode is unused and can be deleted.

>  }
>  
> @@ -1278,6 +1279,11 @@ case OP_Variable: {            /* out2 */
>  	}
>  	pOut = out2Prerelease(p, pOp);
>  	sqlVdbeMemShallowCopy(pOut, pVar, MEM_Static);
> +	/*
> +	 * ...ShallowCopy() may overwrite field_type with irrelevant
> +	 * value. So make sure that in the end field_type_MAX is set.
> +	 */

4. The only place where ShallowCopy can rewrite field_type is
memcpy(pTo, pFrom, MEMCELLSIZE). If after that field_type is irrelevant,
then it was incorrect in pFrom, i.e. in pVar. On the summary:
why pVar may have incorrect field_type? Should not bind set a correct
type?

> +	pOut->field_type = field_type_MAX;
>  	UPDATE_MAX_BLOBSIZE(pOut);
>  	break;
>  }
> @@ -1338,6 +1344,7 @@ case OP_Copy: {
>  	pIn1 = &aMem[pOp->p1];
>  	pOut = &aMem[pOp->p2];
>  	assert(pOut!=pIn1);
> +
>  	while( 1) {

5. Unnecessary diff.

>  		sqlVdbeMemShallowCopy(pOut, pIn1, MEM_Ephem);
>  		Deephemeralize(pOut);
> @@ -1389,6 +1396,7 @@ case OP_IntCopy: {            /* out2 */
>  	assert((pIn1->flags & (MEM_Int | MEM_UInt)) != 0);
>  	pOut = &aMem[pOp->p2];
>  	mem_set_int(pOut, pIn1->u.i, (pIn1->flags & MEM_Int) != 0);
> +	pOut->field_type = field_type_MAX;

6. Why mem_set_int does not set a correct field_type? Or why
pOut->field_type can't be assigned to pIn->field_type? You actually
do it for OP_FCopy. The same about OP_Remainder.

When you assign field_type_MAX, and pIn1 was FIELD_TYPE_INTEGER
with an unsigned value, you will have typeof(pOut) unsigned - type
loss.

Additionally, as I see, OP_IntCopy is used only once, when a space
is being created, to copy sequence id. In such a case I think it
can be deleted and replaced with OP_SCopy.

>  	break;
>  }
>  
> @@ -1491,6 +1499,8 @@ case OP_Concat: {           /* same as TK_CONCAT, in1, in2, out3 */
>  	assert(pIn1!=pOut);
>  	if ((pIn1->flags | pIn2->flags) & MEM_Null) {
>  		sqlVdbeMemSetNull(pOut);
> +		/* Force NULL be of type STRING. */
> +		pOut->field_type = FIELD_TYPE_STRING;
>  		break;
>  	}
>  	/*
> @@ -1535,6 +1545,7 @@ case OP_Concat: {           /* same as TK_CONCAT, in1, in2, out3 */
>  	pOut->z[nByte+1] = 0;
>  	pOut->flags |= MEM_Term;
>  	pOut->n = (int)nByte;
> +	pOut->field_type = field_type_MAX;
>  	UPDATE_MAX_BLOBSIZE(pOut);
>  	break;
>  }
> @@ -1644,6 +1655,7 @@ case OP_Remainder: {           /* same as TK_REM, in1, in2, out3 */
>  		}
>  		}
>  		mem_set_int(pOut, iB, is_res_neg);
> +		pOut->field_type = field_type_MAX;
>  	} else {
>  		bIntint = 0;
>  		if (sqlVdbeRealValue(pIn1, &rA) != 0) {
> @@ -1681,6 +1693,7 @@ case OP_Remainder: {           /* same as TK_REM, in1, in2, out3 */
>  		}
>  		pOut->u.r = rB;
>  		MemSetTypeFlag(pOut, MEM_Real);
> +		pOut->field_type = field_type_MAX;
>  		if (((type1|type2)&MEM_Real)==0 && !bIntint) {
>  			mem_apply_integer_type(pOut);
>  		}
> @@ -1689,6 +1702,8 @@ case OP_Remainder: {           /* same as TK_REM, in1, in2, out3 */
>  
>  arithmetic_result_is_null:
>  	sqlVdbeMemSetNull(pOut);
> +	/* Force NULL be of type NUMBER. */
> +	pOut->field_type = FIELD_TYPE_NUMBER;
>  	break;

7. All these manual assignments of field_type are broken by design.
When you have to update field_type after each operation manually,
you will miss some places for sure. For example:

tarantool> box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER)')
---
- row_count: 1
...

tarantool> box.execute('INSERT INTO test VALUES (1, NULL, NULL)')
---
- row_count: 1
...

tarantool> box.execute('SELECT typeof(a & b) FROM test')
---
- metadata:
  - name: typeof(a & b)
    type: string
  rows:
  - ['null']
...

tarantool> box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM test')
---
- metadata:
  - name: typeof(a)
    type: string
  - name: typeof(b)
    type: string
  - name: typeof(a & b)
    type: string
  rows:
  - ['integer', 'integer', 'integer']
...

Here in the last two queries 'typeof(a & b)' result *depends* on
other columns in result set. But it should be integer always, shouldn't
it? The same about all other places, where you call mem_set_* and don't
update field_type.

I propose you to redesign how to update field_type, so as it would
be hard to forget to update it.

For instance, add a field_type argument to sqlVdbeMemSetNull() to never
miss to set a real type for a NULL value. Make mem_set_int update field_type
depending on the positivity flag. Make mem_set_u64 always set field_type
UNSIGNED. Make sqlVdbeMemCast always set field_type to the target type.

>  
>  division_by_zero:
> @@ -1909,6 +1924,7 @@ case OP_ShiftRight: {           /* same as TK_RSHIFT, in1, in2, out3 */
>  		}
>  	}
>  	mem_set_i64(pOut, iA);
> +	pOut->field_type = field_type_MAX;
>  	break;
>  }
>  
> @@ -1988,6 +2004,13 @@ case OP_Cast: {                  /* in1 */
>  	if (ExpandBlob(pIn1) != 0)
>  		goto abort_due_to_error;
>  	rc = sqlVdbeMemCast(pIn1, pOp->p2);
> +	/*
> +	 * SCALAR is not type itself, but rather an aggregation
> +	 * of types. Hence, cast to this type shouldn't change
> +	 * original type of argument.
> +	 */
> +	if (pOp->p2 != FIELD_TYPE_SCALAR)
> +		pIn1->field_type = pOp->p2;

8. Why sqlVdbeMemCast does not update the type inside? Now
this function works on half - it updates flags with a new type,
sets a new value, but doesn't touch field_type.

>  	UPDATE_MAX_BLOBSIZE(pIn1);
>  	if (rc == 0)
>  		break;
> @@ -2094,6 +2117,8 @@ case OP_Ge: {             /* same as TK_GE, jump, in1, in3 */
>  	pIn3 = &aMem[pOp->p3];
>  	flags1 = pIn1->flags;
>  	flags3 = pIn3->flags;
> +	enum field_type ft_p1 = pIn1->field_type;
> +	enum field_type ft_p3 = pIn3->field_type;
>  	if ((flags1 | flags3)&MEM_Null) {
>  		/* One or both operands are NULL */
>  		if (pOp->p5 & SQL_NULLEQ) {
> @@ -2232,8 +2257,10 @@ case OP_Ge: {             /* same as TK_GE, jump, in1, in3 */
>  	/* Undo any changes made by mem_apply_type() to the input registers. */
>  	assert((pIn1->flags & MEM_Dyn) == (flags1 & MEM_Dyn));
>  	pIn1->flags = flags1;
> +	pIn1->field_type = ft_p1;
>  	assert((pIn3->flags & MEM_Dyn) == (flags3 & MEM_Dyn));
>  	pIn3->flags = flags3;
> +	pIn3->field_type = ft_p3;
>  
>  	if (pOp->p5 & SQL_STOREP2) {
>  		pOut = &aMem[pOp->p2];
> @@ -2451,6 +2478,7 @@ case OP_Or: {             /* same as TK_OR, in1, in2, out3 */
>  	} else {
>  		mem_set_bool(pOut, v1);
>  	}
> +	pOut->field_type = field_type_MAX;
>  	break;
>  }
>  
> @@ -2472,6 +2500,7 @@ case OP_Not: {                /* same as TK_NOT, in1, out2 */
>  			goto abort_due_to_error;
>  		}
>  		mem_set_bool(pOut, ! pIn1->u.b);
> +		pOut->field_type = field_type_MAX;

9. So after mem_set_bool, if the cell was used for
something else before, it is in an invalid state, right?
I don't think it should work so. Please, develop a
better way how to keep field_type up to date.

Ideally, if code doesn't touch type flags, it should not touch
field_type. Otherwise that place is broken, if you need to
do follow-up updates after each function, doing something
with types.