[tarantool-patches] Re: [PATCH 1/2] sql: extend struct Mem with field_type member
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Sun Jul 28 18:22:00 MSK 2019
Hi! Thanks for the patch! See 9 comments below.
On 27/07/2019 20:45, Nikita Pettik wrote:
> There's several reasons to do so. First one is to improve operability of
> built-in function 'typeof()' which returns string containing argument's
> type. Using only format (msgpack) type we can't determine real field
> type of null values. Moreover, result of CAST should feature the same
> type as it was specified in operation. For instance:
> typeof(CAST(0 AS INTEGER)) should return "integer", not "unsigned".
>
> The second reason is different rules for comparison involving values of
> SCALAR type. In Tarantool NoSQL it is allowed to compare values of
> "incompatible" types like booleans and strings (using heuristics which
> says that values of one type always are greater/less that values of
> another type), in case they are stored in SCALAR field type. Without
> storing actual field type in struct Mem it is obvious impossible to
> achieve.
>
> Thus, now field_type member in struct Mem represents supposed field type
> in case value is fetched from tuple or is a result of CAST operation.
>
> Closes #4148
> ---
> src/box/sql/func.c | 11 +++++++++++
> src/box/sql/vdbe.c | 36 +++++++++++++++++++++++++++++++++++-
> src/box/sql/vdbeInt.h | 11 +++++++++++
> src/box/sql/vdbeapi.c | 1 +
> src/box/sql/vdbeaux.c | 1 +
> src/box/sql/vdbemem.c | 3 +++
> src/box/sql/vdbesort.c | 6 ++++++
> test/sql-tap/cast.test.lua | 8 ++++----
> test/sql-tap/table.test.lua | 4 ++--
> test/sql/types.result | 35 ++++++++++++++++++++++++++++++++++-
> test/sql/types.test.lua | 11 +++++++++++
> 11 files changed, 119 insertions(+), 8 deletions(-)
>
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index fcf147c96..f50df105d 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -107,6 +107,17 @@ typeofFunc(sql_context * context, int NotUsed, sql_value ** argv)
> {
> const char *z = 0;
> UNUSED_PARAMETER(NotUsed);
> + enum field_type f_t = argv[0]->field_type;
1. Perhaps, field_type is not the best name, because Mem is not always
is a field. It can be just a value, not from a space. It is rather
just 'type'.
> + /*
> + * SCALAR is not a basic type, but rather an aggregation of
> + * types. Thus, ignore SCALAR field type and return msgpack
> + * format type.
> + */
> + if (f_t != field_type_MAX && f_t != FIELD_TYPE_SCALAR) {
2. Not sure if I understand, why do you need both _MAX and _SCALAR for
values with no strict type. Why not to set SCALAR instead of MAX always?
MAX is not a type, so SCALAR IMO would be more appropriate when you
don't have a strict type. Or even ANY, because maps and arrays will be
supported.
> + sql_result_text(context, field_type_strs[argv[0]->field_type],
> + -1, SQL_STATIC);
> + return;
> + }
> switch (sql_value_type(argv[0])) {
> case MP_INT:
> case MP_UINT:
> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
> index 9142932e9..2c7a1c2da 100644
> --- a/src/box/sql/vdbe.c
> +++ b/src/box/sql/vdbe.c
> @@ -1238,6 +1238,7 @@ case OP_SoftNull: {
> assert(pOp->p1>0 && pOp->p1<=(p->nMem+1 - p->nCursor));
> pOut = &aMem[pOp->p1];
> pOut->flags = (pOut->flags|MEM_Null)&~MEM_Undefined;
> + pOut->field_type = field_type_MAX;
> break;
3. This opcode is unused and can be deleted.
> }
>
> @@ -1278,6 +1279,11 @@ case OP_Variable: { /* out2 */
> }
> pOut = out2Prerelease(p, pOp);
> sqlVdbeMemShallowCopy(pOut, pVar, MEM_Static);
> + /*
> + * ...ShallowCopy() may overwrite field_type with irrelevant
> + * value. So make sure that in the end field_type_MAX is set.
> + */
4. The only place where ShallowCopy can rewrite field_type is
memcpy(pTo, pFrom, MEMCELLSIZE). If after that field_type is irrelevant,
then it was incorrect in pFrom, i.e. in pVar. On the summary:
why pVar may have incorrect field_type? Should not bind set a correct
type?
> + pOut->field_type = field_type_MAX;
> UPDATE_MAX_BLOBSIZE(pOut);
> break;
> }
> @@ -1338,6 +1344,7 @@ case OP_Copy: {
> pIn1 = &aMem[pOp->p1];
> pOut = &aMem[pOp->p2];
> assert(pOut!=pIn1);
> +
> while( 1) {
5. Unnecessary diff.
> sqlVdbeMemShallowCopy(pOut, pIn1, MEM_Ephem);
> Deephemeralize(pOut);
> @@ -1389,6 +1396,7 @@ case OP_IntCopy: { /* out2 */
> assert((pIn1->flags & (MEM_Int | MEM_UInt)) != 0);
> pOut = &aMem[pOp->p2];
> mem_set_int(pOut, pIn1->u.i, (pIn1->flags & MEM_Int) != 0);
> + pOut->field_type = field_type_MAX;
6. Why mem_set_int does not set a correct field_type? Or why
pOut->field_type can't be assigned to pIn->field_type? You actually
do it for OP_FCopy. The same about OP_Remainder.
When you assign field_type_MAX, and pIn1 was FIELD_TYPE_INTEGER
with an unsigned value, you will have typeof(pOut) unsigned - type
loss.
Additionally, as I see, OP_IntCopy is used only once, when a space
is being created, to copy sequence id. In such a case I think it
can be deleted and replaced with OP_SCopy.
> break;
> }
>
> @@ -1491,6 +1499,8 @@ case OP_Concat: { /* same as TK_CONCAT, in1, in2, out3 */
> assert(pIn1!=pOut);
> if ((pIn1->flags | pIn2->flags) & MEM_Null) {
> sqlVdbeMemSetNull(pOut);
> + /* Force NULL be of type STRING. */
> + pOut->field_type = FIELD_TYPE_STRING;
> break;
> }
> /*
> @@ -1535,6 +1545,7 @@ case OP_Concat: { /* same as TK_CONCAT, in1, in2, out3 */
> pOut->z[nByte+1] = 0;
> pOut->flags |= MEM_Term;
> pOut->n = (int)nByte;
> + pOut->field_type = field_type_MAX;
> UPDATE_MAX_BLOBSIZE(pOut);
> break;
> }
> @@ -1644,6 +1655,7 @@ case OP_Remainder: { /* same as TK_REM, in1, in2, out3 */
> }
> }
> mem_set_int(pOut, iB, is_res_neg);
> + pOut->field_type = field_type_MAX;
> } else {
> bIntint = 0;
> if (sqlVdbeRealValue(pIn1, &rA) != 0) {
> @@ -1681,6 +1693,7 @@ case OP_Remainder: { /* same as TK_REM, in1, in2, out3 */
> }
> pOut->u.r = rB;
> MemSetTypeFlag(pOut, MEM_Real);
> + pOut->field_type = field_type_MAX;
> if (((type1|type2)&MEM_Real)==0 && !bIntint) {
> mem_apply_integer_type(pOut);
> }
> @@ -1689,6 +1702,8 @@ case OP_Remainder: { /* same as TK_REM, in1, in2, out3 */
>
> arithmetic_result_is_null:
> sqlVdbeMemSetNull(pOut);
> + /* Force NULL be of type NUMBER. */
> + pOut->field_type = FIELD_TYPE_NUMBER;
> break;
7. All these manual assignments of field_type are broken by design.
When you have to update field_type after each operation manually,
you will miss some places for sure. For example:
tarantool> box.execute('CREATE TABLE test (id INTEGER PRIMARY KEY, a INTEGER, b INTEGER)')
---
- row_count: 1
...
tarantool> box.execute('INSERT INTO test VALUES (1, NULL, NULL)')
---
- row_count: 1
...
tarantool> box.execute('SELECT typeof(a & b) FROM test')
---
- metadata:
- name: typeof(a & b)
type: string
rows:
- ['null']
...
tarantool> box.execute('SELECT typeof(a), typeof(b), typeof(a & b) FROM test')
---
- metadata:
- name: typeof(a)
type: string
- name: typeof(b)
type: string
- name: typeof(a & b)
type: string
rows:
- ['integer', 'integer', 'integer']
...
Here in the last two queries 'typeof(a & b)' result *depends* on
other columns in result set. But it should be integer always, shouldn't
it? The same about all other places, where you call mem_set_* and don't
update field_type.
I propose you to redesign how to update field_type, so as it would
be hard to forget to update it.
For instance, add a field_type argument to sqlVdbeMemSetNull() to never
miss to set a real type for a NULL value. Make mem_set_int update field_type
depending on the positivity flag. Make mem_set_u64 always set field_type
UNSIGNED. Make sqlVdbeMemCast always set field_type to the target type.
>
> division_by_zero:
> @@ -1909,6 +1924,7 @@ case OP_ShiftRight: { /* same as TK_RSHIFT, in1, in2, out3 */
> }
> }
> mem_set_i64(pOut, iA);
> + pOut->field_type = field_type_MAX;
> break;
> }
>
> @@ -1988,6 +2004,13 @@ case OP_Cast: { /* in1 */
> if (ExpandBlob(pIn1) != 0)
> goto abort_due_to_error;
> rc = sqlVdbeMemCast(pIn1, pOp->p2);
> + /*
> + * SCALAR is not type itself, but rather an aggregation
> + * of types. Hence, cast to this type shouldn't change
> + * original type of argument.
> + */
> + if (pOp->p2 != FIELD_TYPE_SCALAR)
> + pIn1->field_type = pOp->p2;
8. Why sqlVdbeMemCast does not update the type inside? Now
this function works on half - it updates flags with a new type,
sets a new value, but doesn't touch field_type.
> UPDATE_MAX_BLOBSIZE(pIn1);
> if (rc == 0)
> break;
> @@ -2094,6 +2117,8 @@ case OP_Ge: { /* same as TK_GE, jump, in1, in3 */
> pIn3 = &aMem[pOp->p3];
> flags1 = pIn1->flags;
> flags3 = pIn3->flags;
> + enum field_type ft_p1 = pIn1->field_type;
> + enum field_type ft_p3 = pIn3->field_type;
> if ((flags1 | flags3)&MEM_Null) {
> /* One or both operands are NULL */
> if (pOp->p5 & SQL_NULLEQ) {
> @@ -2232,8 +2257,10 @@ case OP_Ge: { /* same as TK_GE, jump, in1, in3 */
> /* Undo any changes made by mem_apply_type() to the input registers. */
> assert((pIn1->flags & MEM_Dyn) == (flags1 & MEM_Dyn));
> pIn1->flags = flags1;
> + pIn1->field_type = ft_p1;
> assert((pIn3->flags & MEM_Dyn) == (flags3 & MEM_Dyn));
> pIn3->flags = flags3;
> + pIn3->field_type = ft_p3;
>
> if (pOp->p5 & SQL_STOREP2) {
> pOut = &aMem[pOp->p2];
> @@ -2451,6 +2478,7 @@ case OP_Or: { /* same as TK_OR, in1, in2, out3 */
> } else {
> mem_set_bool(pOut, v1);
> }
> + pOut->field_type = field_type_MAX;
> break;
> }
>
> @@ -2472,6 +2500,7 @@ case OP_Not: { /* same as TK_NOT, in1, out2 */
> goto abort_due_to_error;
> }
> mem_set_bool(pOut, ! pIn1->u.b);
> + pOut->field_type = field_type_MAX;
9. So after mem_set_bool, if the cell was used for
something else before, it is in an invalid state, right?
I don't think it should work so. Please, develop a
better way how to keep field_type up to date.
Ideally, if code doesn't touch type flags, it should not touch
field_type. Otherwise that place is broken, if you need to
do follow-up updates after each function, doing something
with types.
More information about the Tarantool-patches
mailing list