[tarantool-patches] Re: [PATCH 3/8] sql: remove numeric affinity

n.pettik korablev at tarantool.org
Wed Jan 16 17:26:11 MSK 2019


>> Numeric affinity in SQLite means the same as real, except that it
>> forces integer values into floating point representation in case
>> it can be converted without loss (e.g. 2.0 -> 2).
>> Since in Tarantool core there is no difference between numeric and real
>> values (both are stored as values of type NUMBER), lets remove numeric
>> affinity and use instead real.
>> The only real pitfall is implicit conversion mentioned above.
>> What is more, vinyl engine complicates problem since it relies
>> on data encoding (i.e. whether it is encoded as MP_INT or MP_FLOAT).
>> For instance, if we encode 1.0 as MP_FLOAT during insertion, we won't
>> be able to use iterators from Lua, since they implicitly change type of
>> 1.0 and pass it to the iterator as MP_INT.  Solution to this problem is
>> simple: lets always attempt at encoding floats as ints if conversion
>> takes place without loss. This is a straightforward approach, but to
>> implement it we need to care about reversed (decoding) situation.
> 
> The bug is confirmed: https://github.com/tarantool/tarantool/issues/3907
> 
> I agree with Kostja - it is just a bug, that Vinyl treats differently
> integers and their double casts. It should not affect design decisions
> of this patchset.

Ok, we may consider part of this patch as a workaround.
Now affinity removal would significantly help me to fix
other issues connected with strict typing (e.g. implicit casts).
afterwards code introduced in this commit may be simplified/removed.
I don’t know what priority of #3907 issue (milestone is 2.1.1 but
we know that sometimes it may take a while).

> But there is another reason why we can't pass *.0 as an iterator value -
> our fast comparators (TupleCompare, TupleCompareWithKey) are designed to
> work with only values of same MP_ type. They do not use slow
> tuple_compare_field() which is able to compare double and integer.

Yep, it is sad. I can’t say anything more now,I need to "think about it”.

The only workaround I can come up right now with is to add to
every OP_Found/OP_Seek etc opcode OP_PromoteType.
It would attempt at fetching space from cursor and gently
applying types from format to record to be passed to iterator.
But this doesn’t seem to be acceptable solution for many reasons.

Otherwise, I have no idea how to determine required conversions
*.0 -> * without execution context.

> Moreover, I think, we should forbid implicit *.0 -> * This is why we
> designed strict typing, isn't it?

It is too strict rule, ANSI allows this kind of conversion. 
The only restriction is that there shouldn’t be precision loss
during conversion. It has been already discussed with
Konstantin and Peter Gulutzan, I inline part of that thread
(Topic “[dev] Re: Casts")

> The standard rules for casting look like this
> (where SD = source data type, TD = target data type, SV = source value):
> "
> 8) If TD is exact numeric, then
> Case:
> a)If SD is exact numeric or approximate numeric, then
> Case:
>   i) If there is a representation of SV in the data type TD that does
>      not lose any leading significant digits after rounding or truncating
>      if necessary, then TV is that representation.
>      The choice of whether to round or truncate is implementation-defined.
>   ii)Otherwise, an exception condition is raised:
>      data exception — numeric value out of range.
> "
> So truncating 1.123 to 1 is okay, but truncating 11.123 to 1.123 is not okay
> (that loses a leading significant digit).
> 

Even more simple situation:

CREATE TABLE t1 (id INT PRIMARY KEY);
INSERT INTO t1 VALUES (1.0);

This example is valid and all DBs support it.

> If a table has column int, we should not be able to select it by double
> values without explicit cast, even if implicit is possible without loss.
> For explicit cast users have CAST operator.

It would look at least strange:

SELECT * FROM t1 WHERE x = (1.0 AS INT);

Again, all DBs I know accept this query without cast.

> 
>> OP_Column fetches from msgpack field with given number and stores it as
>> a native VDBE memory object. Type of that memory is based on type of
>> msgpack value. So, if space field is of type NUMBER and holds value 1,
>> type of VDBE memory will be INT (after decoding), not float 1.0.  As a
>> result, further calculations may be wrong: for instance, instead of
>> floating point division, we could get integer division.  To cope with
>> this problem, lets add auxiliary conversion to decoding routine which
>> uses space format of tuple to be decoded. It is worth mentioning that
>> ephemeral spaces don't feature space format, so we are going to rely on
>> type of key parts. Finally, internal VDBE merge sorter also operates on
>> entries encoded into msgpack. To fix this case, we check type of
>> ORDER BY/GROUP BY arguments: if they are of type float, we are emitting
>> additional opcode OP_AffinityReal to force float type after encoding.
>> Part of #3698
>> ---
>>  src/box/field_def.h                  |  1 -
>>  src/box/lua/lua_sql.c                |  2 +-
>>  src/box/sql.c                        | 10 +++++++---
>>  src/box/sql/build.c                  |  1 -
>>  src/box/sql/expr.c                   | 20 +++++++++++---------
>>  src/box/sql/select.c                 | 10 +++++++---
>>  src/box/sql/sqliteInt.h              |  2 +-
>>  src/box/sql/vdbe.c                   | 26 ++++++++++++++++++--------
>>  src/box/sql/vdbeaux.c                | 19 ++++++++++++++++---
>>  src/box/sql/vdbemem.c                |  7 ++-----
>>  test/sql-tap/tkt-80e031a00f.test.lua | 12 ++++++------
>>  11 files changed, 69 insertions(+), 41 deletions(-)
>> diff --git a/src/box/sql.c b/src/box/sql.c
>> index a06c50dca..a498cd8fe 100644
>> --- a/src/box/sql.c
>> +++ b/src/box/sql.c
>> @@ -376,14 +376,18 @@ sql_ephemeral_space_create(uint32_t field_count, struct sql_key_info *key_info)
>>  	for (uint32_t i = 0; i < field_count; ++i) {
>>  		struct key_part_def *part = &ephemer_key_parts[i];
>>  		part->fieldno = i;
>> -		part->type = FIELD_TYPE_SCALAR;
>>  		part->nullable_action = ON_CONFLICT_ACTION_NONE;
>>  		part->is_nullable = true;
>>  		part->sort_order = SORT_ORDER_ASC;
>> -		if (def != NULL && i < def->part_count)
>> +		if (def != NULL && i < def->part_count) {
>> +			assert(def->parts[i].type < field_type_MAX);
>> +			part->type = def->parts[i].type != FIELD_TYPE_ANY ?
>> +				     def->parts[i].type : FIELD_TYPE_SCALAR;
>>  			part->coll_id = def->parts[i].coll_id;
> 
> 1. How can key_part have FIELD_TYPE_ANY? We have no comparators for ANY
> type, it is impossible, isn't it?

We don’t, and that is why I replace ANY with SCALAR.

> 
>> -		else
>> +		} else {
>>  			part->coll_id = COLL_NONE;
>> +			part->type = FIELD_TYPE_SCALAR;
>> +		}
>>  	}
>>  	struct key_def *ephemer_key_def = key_def_new(ephemer_key_parts,
>>  						      field_count);
>> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
>> index b67b22c23..7a0b929a7 100644
>> --- a/src/box/sql/expr.c
>> +++ b/src/box/sql/expr.c
>> @@ -3700,6 +3698,10 @@ sqlite3ExprCodeTarget(Parse * pParse, Expr * pExpr, int target)
>>  				sqlite3VdbeAddOp3(v, OP_Column,
>>  						  pAggInfo->sortingIdxPTab,
>>  						  pCol->iSorterColumn, target);
>> +				if (pCol->space_def->fields[pExpr->iAgg].type ==
>> +				    FIELD_TYPE_NUMBER)
>> +					sqlite3VdbeAddOp1(v, OP_RealAffinity,
>> +							  target);
> 
> 2. Please, use {} for multi-line 'if' body.

Ok (I always confuse multi-line with multi-statement body):

diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
index 7a0b929a7..14b727039 100644
--- a/src/box/sql/expr.c
+++ b/src/box/sql/expr.c
@@ -3699,9 +3699,10 @@ sqlite3ExprCodeTarget(Parse * pParse, Expr * pExpr, int target)
                                                  pAggInfo->sortingIdxPTab,
                                                  pCol->iSorterColumn, target);
                                if (pCol->space_def->fields[pExpr->iAgg].type ==
-                                   FIELD_TYPE_NUMBER)
+                                   FIELD_TYPE_NUMBER) {
                                        sqlite3VdbeAddOp1(v, OP_RealAffinity,
                                                          target);
+                               }
                                return target;
                        }


> 
>>  				return target;
>>  			}
>>  			/* Otherwise, fall thru into the TK_COLUMN case */
>> diff --git a/src/box/sql/select.c b/src/box/sql/select.c
>> index 02ee225f1..40336e679 100644
>> --- a/src/box/sql/select.c
>> +++ b/src/box/sql/select.c
>> @@ -1427,6 +1427,8 @@ sql_expr_list_to_key_info(struct Parse *parse, struct ExprList *list, int start)
>>  		sql_expr_coll(parse, item->pExpr, &unused, &id);
>>  		part->coll_id = id;
>>  		part->sort_order = item->sort_order;
>> +		enum affinity_type aff = sqlite3ExprAffinity(item->pExpr);
>> +		part->type =sql_affinity_to_field_type(aff);
> 
> 3. Lost white-space after '=‘.

Fixed:

diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index 40336e679..fa0fea6c8 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -1428,7 +1428,7 @@ sql_expr_list_to_key_info(struct Parse *parse, struct ExprList *list, int start)
                part->coll_id = id;
                part->sort_order = item->sort_order;
                enum affinity_type aff = sqlite3ExprAffinity(item->pExpr);
-               part->type =sql_affinity_to_field_type(aff);
+               part->type = sql_affinity_to_field_type(aff);

> 
>>  	}
>>  	return key_info;
>>  }
>> @@ -5789,8 +5790,11 @@ sqlite3Select(Parse * pParse,		/* The parser context */
>>  	/* If the output is destined for a temporary table, open that table.
>>  	 */
>>  	if (pDest->eDest == SRT_EphemTab) {
>> -		sqlite3VdbeAddOp2(v, OP_OpenTEphemeral, pDest->reg_eph,
>> -				  pEList->nExpr + 1);
>> +		struct sql_key_info *key_info =
>> +			sql_expr_list_to_key_info(pParse, pEList, 0);
>> +		sqlite3VdbeAddOp4(v, OP_OpenTEphemeral, pDest->reg_eph,
>> +				  pEList->nExpr + 1, 0, (char *)key_info,
>> +				  P4_KEYINFO);
>>  		sqlite3VdbeAddOp3(v, OP_IteratorOpen, pDest->iSDParm, 0,
>>  				  pDest->reg_eph);
> 
> 4. Why can some ephemeral spaces be created without key_info? I grepped
> by OP_OpenTEphemeral and saw that somewhere it is still Op2, not Op4.

If there is no key_info, then default one is used instead.
See sql_ephemeral_space_create and how we cope
with situation when key_info == NULL.

> 
>>  diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
>> index e6b413c70..4345af24e 100644
>> --- a/src/box/sql/vdbe.c
>> +++ b/src/box/sql/vdbe.c
>> @@ -2173,7 +2167,7 @@ case OP_Ge: {             /* same as TK_GE, jump, in1, in3 */
>>  	} else {
>>  		/* Neither operand is NULL.  Do a comparison. */
>>  		affinity = pOp->p5 & AFFINITY_MASK;
>> -		if (affinity>=AFFINITY_NUMERIC) {
>> +		if (affinity>=AFFINITY_INTEGER) {
> 
> 5. I guess, we have sqlite3IsNumericAffinity for this.

Anyway I replace it with sql_affinity_is_numeric in
sql: clean-up affinity from SQL source code patch.





More information about the Tarantool-patches mailing list