[tarantool-patches] Re: [PATCH 1/2] sql: extend struct Mem with field_type member
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Wed Jul 31 19:59:52 MSK 2019
Hi! Thanks for the fixes! See 5 comments below.
On 31/07/2019 11:14, n.pettik wrote:
>
>
>>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
>>> index fcf147c96..f50df105d 100644
>>> --- a/src/box/sql/func.c
>>> +++ b/src/box/sql/func.c
>>> @@ -107,6 +107,17 @@ typeofFunc(sql_context * context, int NotUsed, sql_value ** argv)
>>> {
>>> const char *z = 0;
>>> UNUSED_PARAMETER(NotUsed);
>>> + enum field_type f_t = argv[0]->field_type;
>>
>> 1. Perhaps, field_type is not the best name, because Mem is not always
>> is a field. It can be just a value, not from a space. It is rather
>> just 'type’.
>
> When memory represents smth different from value fetched from
> tuple, it is set to field_type_MAX. So I’d rather not rename this member.
1. Ok, now I got it. It works only for tuple fields, and is totally disabled
for other values. Or not? See other comments.
>> 7. All these manual assignments of field_type are broken by design.
>> When you have to update field_type after each operation manually,
>> you will miss some places for sure. For example:
>>
>> tarantool> box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER)')
>> ---
>> - row_count: 1
>> ...
>>
>> tarantool> box.execute('INSERT INTO test VALUES (1, NULL, NULL)')
>> ---
>> - row_count: 1
>> ...
>>
>> tarantool> box.execute('SELECT typeof(a & b) FROM test')
>> ---
>> - metadata:
>> - name: typeof(a & b)
>> type: string
>> rows:
>> - ['null']
>> ...
>>
>> tarantool> box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM test')
>> ---
>> - metadata:
>> - name: typeof(a)
>> type: string
>> - name: typeof(b)
>> type: string
>> - name: typeof(a & b)
>> type: string
>> rows:
>> - ['integer', 'integer', 'integer']
>> ...
>>
>>
>> Here in the last two queries 'typeof(a & b)' result *depends* on
>> other columns in result set. But it should be integer always, shouldn't
>> it?
>
> Yep, it should, I’ve fixed that:
>
> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
> index b3b230ad0..f6eab8a0e 100644
> --- a/src/box/sql/vdbe.c
> +++ b/src/box/sql/vdbe.c
> @@ -1842,6 +1842,8 @@ case OP_ShiftRight: { /* same as TK_RSHIFT, in1, in2, out3 */
> pOut = &aMem[pOp->p3];
> if ((pIn1->flags | pIn2->flags) & MEM_Null) {
> sqlVdbeMemSetNull(pOut);
> + /* Force NULL be of type INTEGER. */
> + pOut->field_type = FIELD_TYPE_INTEGER;
> break;
> }
> bool unused;
> @@ -2477,6 +2479,8 @@ case OP_BitNot: { /* same as TK_BITNOT, in1, out2 */
> pIn1 = &aMem[pOp->p1];
> pOut = &aMem[pOp->p2];
> sqlVdbeMemSetNull(pOut);
> + /* Force NULL be of type INTEGER. */
> + pOut->field_type = FIELD_TYPE_INTEGER;
2.
tarantool> box.execute('SELECT typeof(~NULL)')
---
- metadata:
- name: typeof(~NULL)
type: string
rows:
- ['integer']
...
But it is not integer, as I think. I didn't select NULL from
an integer column. Here NULL was of type 'boolean'. And the
result should be 'boolean' too, because it is just NULL, not
a column NULL. The same about other bit operations.
And the same about strings:
tarantool> box.execute("SELECT typeof(NULL || NULL)")
---
- metadata:
- name: typeof(NULL || NULL)
type: string
rows:
- ['string']
...
Here NULLs were of type 'boolean', but output shows 'string'.
> if ((pIn1->flags & MEM_Null)==0) {
> int64_t i;
> bool is_neg;
> diff --git a/test/sql/types.test.lua b/test/sql/types.test.lua
> index 489a5a91c..8a509da96 100644
> --- a/test/sql/types.test.lua
> +++ b/test/sql/types.test.lua
> @@ -407,6 +407,12 @@ box.execute("CREATE TABLE t (id INT PRIMARY KEY, a INT, s SCALAR);")
> box.execute("INSERT INTO t VALUES (1, 1, 1), (2, NULL, NULL);")
> box.execute("SELECT typeof(a), typeof(s) FROM t;")
>
> +box.execute('CREATE TABLE t1 (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER)')
> +box.execute('INSERT INTO t1 VALUES (1, NULL, NULL);')
> +box.execute('SELECT typeof(a & b) FROM t1;')
> +box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM t1')
> +
> box.execute("SELECT typeof(CAST(0 AS UNSIGNED));")
>
> box.space.T:drop()
> +box.space.T1:drop()
>
>> The same about all other places, where you call mem_set_* and don't
>> update field_type.
>
> It’s okay, since updating current value in memory cell doesn’t
> imply that field type should be updated as well. In most scenarios
> before setting new value to memory cell, we provide clean-up
> via memAboutToChange(), which in turn invalidates field type.
3. memAboutToChange is no-op in Release build, so it does not invalidate
anything in that mode. It means, that field_type now can be a garbage
from a previous value, if the memory cell was reused to assign something
new.
>> I propose you to redesign how to update field_type, so as it would
>> be hard to forget to update it.
>>
>> For instance, add a field_type argument to sqlVdbeMemSetNull() to never
>> miss to set a real type for a NULL value. Make mem_set_int update field_type
>> depending on the positivity flag. Make mem_set_u64 always set field_type
>> UNSIGNED. Make sqlVdbeMemCast always set field_type to the target type.
>
> I’ve tried to refactor patch this way, but in most cases sqlVdbeMemSetNull()
> would set default (field_type_MAX). As for mem_set_* family functions -
> I can move assignment of field_type_MAX to the body, but does it make
> much sense?
>
4. Currently, as I said above, I don't see where you invalidate type of a
memory cell before it takes a new value. Yes, field_type is initialized
with field_type_MAX when the register array is just created, but where
the cell type is reset, when it takes a new value?
And one another possible bug:
box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a INTEGER)')
box.execute('INSERT INTO test VALUES (1, NULL)')
tarantool> box.execute('SELECT typeof(NOT a) FROM test')
---
- metadata:
- name: typeof(NOT a)
type: string
rows:
- ['boolean']
...
As you can see, 'NOT' lost the original type - it was integer NULL,
not boolean NULL.
Another place is vdbe.c:2627 - here you jump to op_column_out
bypassing field_type assignment. Why?
>>> @@ -1909,6 +1924,7 @@ case OP_ShiftRight: { /* same as TK_RSHIFT, in1, in2, out3 */
>>> }
>>> }
>>> mem_set_i64(pOut, iA);
>>> + pOut->field_type = field_type_MAX;
>>> break;
>>> }
>>>
>>> @@ -1988,6 +2004,13 @@ case OP_Cast: { /* in1 */
>>> if (ExpandBlob(pIn1) != 0)
>>> goto abort_due_to_error;
>>> rc = sqlVdbeMemCast(pIn1, pOp->p2);
>>> + /*
>>> + * SCALAR is not type itself, but rather an aggregation
>>> + * of types. Hence, cast to this type shouldn't change
>>> + * original type of argument.
>>> + */
>>> + if (pOp->p2 != FIELD_TYPE_SCALAR)
>>> + pIn1->field_type = pOp->p2;
>>
>> 8. Why sqlVdbeMemCast does not update the type inside? Now
>> this function works on half - it updates flags with a new type,
>> sets a new value, but doesn't touch field_type.
>
> Because it has several exit points and otherwise we have
> to add to each branch:
>
> diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
> index e5941de8e..7892af116 100644
> --- a/src/box/sql/vdbemem.c
> +++ b/src/box/sql/vdbemem.c
> @@ -673,10 +673,12 @@ sqlVdbeMemCast(Mem * pMem, enum field_type type)
> case FIELD_TYPE_BOOLEAN:
> if ((pMem->flags & MEM_Int) != 0) {
> mem_set_bool(pMem, pMem->u.i);
> + pMem->field_type = type;
> return 0;
> }
> if ((pMem->flags & MEM_UInt) != 0) {
> mem_set_bool(pMem, pMem->u.u);
> + pMem->field_type = type;
> return 0;
> }
> if ((pMem->flags & MEM_Real) != 0) {
>
> Etc.
>
> What is more, this is the only path (OP_Cast) which calls
> sqlVdbeMemCast() function. So IMHO it is pretty OK.
>
> If you want, I can attempt at refactoring sqlVdbeMemCast()
> so that make it feature only one exit point and move filed_type
> assignment to it.
>
5. Never mind. Too many changes.
More information about the Tarantool-patches
mailing list