From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id B86CB2576F for ; Wed, 16 Jan 2019 09:26:13 -0500 (EST) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sUR9kuKkOPW0 for ; Wed, 16 Jan 2019 09:26:13 -0500 (EST) Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 47A172569E for ; Wed, 16 Jan 2019 09:26:13 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\)) Subject: [tarantool-patches] Re: [PATCH 3/8] sql: remove numeric affinity From: "n.pettik" In-Reply-To: <28e6da44-2d1e-87b5-741d-a791750db6c4@tarantool.org> Date: Wed, 16 Jan 2019 17:26:11 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <58b4d75729413f02134c72886ecbc749e510a1d1.1545987214.git.korablev@tarantool.org> <28e6da44-2d1e-87b5-741d-a791750db6c4@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org Cc: Vladislav Shpilevoy >> Numeric affinity in SQLite means the same as real, except that it >> forces integer values into floating point representation in case >> it can be converted without loss (e.g. 2.0 -> 2). >> Since in Tarantool core there is no difference between numeric and = real >> values (both are stored as values of type NUMBER), lets remove = numeric >> affinity and use instead real. >> The only real pitfall is implicit conversion mentioned above. >> What is more, vinyl engine complicates problem since it relies >> on data encoding (i.e. whether it is encoded as MP_INT or MP_FLOAT). >> For instance, if we encode 1.0 as MP_FLOAT during insertion, we won't >> be able to use iterators from Lua, since they implicitly change type = of >> 1.0 and pass it to the iterator as MP_INT. Solution to this problem = is >> simple: lets always attempt at encoding floats as ints if conversion >> takes place without loss. This is a straightforward approach, but to >> implement it we need to care about reversed (decoding) situation. >=20 > The bug is confirmed: = https://github.com/tarantool/tarantool/issues/3907 >=20 > I agree with Kostja - it is just a bug, that Vinyl treats differently > integers and their double casts. It should not affect design decisions > of this patchset. Ok, we may consider part of this patch as a workaround. Now affinity removal would significantly help me to fix other issues connected with strict typing (e.g. implicit casts). afterwards code introduced in this commit may be simplified/removed. I don=E2=80=99t know what priority of #3907 issue (milestone is 2.1.1 = but we know that sometimes it may take a while). > But there is another reason why we can't pass *.0 as an iterator value = - > our fast comparators (TupleCompare, TupleCompareWithKey) are designed = to > work with only values of same MP_ type. They do not use slow > tuple_compare_field() which is able to compare double and integer. Yep, it is sad. I can=E2=80=99t say anything more now,I need to "think = about it=E2=80=9D. The only workaround I can come up right now with is to add to every OP_Found/OP_Seek etc opcode OP_PromoteType. It would attempt at fetching space from cursor and gently applying types from format to record to be passed to iterator. But this doesn=E2=80=99t seem to be acceptable solution for many = reasons. Otherwise, I have no idea how to determine required conversions *.0 -> * without execution context. > Moreover, I think, we should forbid implicit *.0 -> * This is why we > designed strict typing, isn't it? It is too strict rule, ANSI allows this kind of conversion.=20 The only restriction is that there shouldn=E2=80=99t be precision loss during conversion. It has been already discussed with Konstantin and Peter Gulutzan, I inline part of that thread (Topic =E2=80=9C[dev] Re: Casts") > The standard rules for casting look like this > (where SD =3D source data type, TD =3D target data type, SV =3D source = value): > " > 8) If TD is exact numeric, then > Case: > a)If SD is exact numeric or approximate numeric, then > Case: > i) If there is a representation of SV in the data type TD that does > not lose any leading significant digits after rounding or = truncating > if necessary, then TV is that representation. > The choice of whether to round or truncate is = implementation-defined. > ii)Otherwise, an exception condition is raised: > data exception =E2=80=94 numeric value out of range. > " > So truncating 1.123 to 1 is okay, but truncating 11.123 to 1.123 is = not okay > (that loses a leading significant digit). >=20 Even more simple situation: CREATE TABLE t1 (id INT PRIMARY KEY); INSERT INTO t1 VALUES (1.0); This example is valid and all DBs support it. > If a table has column int, we should not be able to select it by = double > values without explicit cast, even if implicit is possible without = loss. > For explicit cast users have CAST operator. It would look at least strange: SELECT * FROM t1 WHERE x =3D (1.0 AS INT); Again, all DBs I know accept this query without cast. >=20 >> OP_Column fetches from msgpack field with given number and stores it = as >> a native VDBE memory object. Type of that memory is based on type of >> msgpack value. So, if space field is of type NUMBER and holds value = 1, >> type of VDBE memory will be INT (after decoding), not float 1.0. As = a >> result, further calculations may be wrong: for instance, instead of >> floating point division, we could get integer division. To cope with >> this problem, lets add auxiliary conversion to decoding routine which >> uses space format of tuple to be decoded. It is worth mentioning that >> ephemeral spaces don't feature space format, so we are going to rely = on >> type of key parts. Finally, internal VDBE merge sorter also operates = on >> entries encoded into msgpack. To fix this case, we check type of >> ORDER BY/GROUP BY arguments: if they are of type float, we are = emitting >> additional opcode OP_AffinityReal to force float type after encoding. >> Part of #3698 >> --- >> src/box/field_def.h | 1 - >> src/box/lua/lua_sql.c | 2 +- >> src/box/sql.c | 10 +++++++--- >> src/box/sql/build.c | 1 - >> src/box/sql/expr.c | 20 +++++++++++--------- >> src/box/sql/select.c | 10 +++++++--- >> src/box/sql/sqliteInt.h | 2 +- >> src/box/sql/vdbe.c | 26 ++++++++++++++++++-------- >> src/box/sql/vdbeaux.c | 19 ++++++++++++++++--- >> src/box/sql/vdbemem.c | 7 ++----- >> test/sql-tap/tkt-80e031a00f.test.lua | 12 ++++++------ >> 11 files changed, 69 insertions(+), 41 deletions(-) >> diff --git a/src/box/sql.c b/src/box/sql.c >> index a06c50dca..a498cd8fe 100644 >> --- a/src/box/sql.c >> +++ b/src/box/sql.c >> @@ -376,14 +376,18 @@ sql_ephemeral_space_create(uint32_t = field_count, struct sql_key_info *key_info) >> for (uint32_t i =3D 0; i < field_count; ++i) { >> struct key_part_def *part =3D &ephemer_key_parts[i]; >> part->fieldno =3D i; >> - part->type =3D FIELD_TYPE_SCALAR; >> part->nullable_action =3D ON_CONFLICT_ACTION_NONE; >> part->is_nullable =3D true; >> part->sort_order =3D SORT_ORDER_ASC; >> - if (def !=3D NULL && i < def->part_count) >> + if (def !=3D NULL && i < def->part_count) { >> + assert(def->parts[i].type < field_type_MAX); >> + part->type =3D def->parts[i].type !=3D = FIELD_TYPE_ANY ? >> + def->parts[i].type : = FIELD_TYPE_SCALAR; >> part->coll_id =3D def->parts[i].coll_id; >=20 > 1. How can key_part have FIELD_TYPE_ANY? We have no comparators for = ANY > type, it is impossible, isn't it? We don=E2=80=99t, and that is why I replace ANY with SCALAR. >=20 >> - else >> + } else { >> part->coll_id =3D COLL_NONE; >> + part->type =3D FIELD_TYPE_SCALAR; >> + } >> } >> struct key_def *ephemer_key_def =3D = key_def_new(ephemer_key_parts, >> field_count); >> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c >> index b67b22c23..7a0b929a7 100644 >> --- a/src/box/sql/expr.c >> +++ b/src/box/sql/expr.c >> @@ -3700,6 +3698,10 @@ sqlite3ExprCodeTarget(Parse * pParse, Expr * = pExpr, int target) >> sqlite3VdbeAddOp3(v, OP_Column, >> = pAggInfo->sortingIdxPTab, >> pCol->iSorterColumn, = target); >> + if = (pCol->space_def->fields[pExpr->iAgg].type =3D=3D >> + FIELD_TYPE_NUMBER) >> + sqlite3VdbeAddOp1(v, = OP_RealAffinity, >> + target); >=20 > 2. Please, use {} for multi-line 'if' body. Ok (I always confuse multi-line with multi-statement body): diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c index 7a0b929a7..14b727039 100644 --- a/src/box/sql/expr.c +++ b/src/box/sql/expr.c @@ -3699,9 +3699,10 @@ sqlite3ExprCodeTarget(Parse * pParse, Expr * = pExpr, int target) = pAggInfo->sortingIdxPTab, pCol->iSorterColumn, = target); if = (pCol->space_def->fields[pExpr->iAgg].type =3D=3D - FIELD_TYPE_NUMBER) + FIELD_TYPE_NUMBER) { sqlite3VdbeAddOp1(v, = OP_RealAffinity, target); + } return target; } >=20 >> return target; >> } >> /* Otherwise, fall thru into the TK_COLUMN case = */ >> diff --git a/src/box/sql/select.c b/src/box/sql/select.c >> index 02ee225f1..40336e679 100644 >> --- a/src/box/sql/select.c >> +++ b/src/box/sql/select.c >> @@ -1427,6 +1427,8 @@ sql_expr_list_to_key_info(struct Parse *parse, = struct ExprList *list, int start) >> sql_expr_coll(parse, item->pExpr, &unused, &id); >> part->coll_id =3D id; >> part->sort_order =3D item->sort_order; >> + enum affinity_type aff =3D = sqlite3ExprAffinity(item->pExpr); >> + part->type =3Dsql_affinity_to_field_type(aff); >=20 > 3. Lost white-space after '=3D=E2=80=98. Fixed: diff --git a/src/box/sql/select.c b/src/box/sql/select.c index 40336e679..fa0fea6c8 100644 --- a/src/box/sql/select.c +++ b/src/box/sql/select.c @@ -1428,7 +1428,7 @@ sql_expr_list_to_key_info(struct Parse *parse, = struct ExprList *list, int start) part->coll_id =3D id; part->sort_order =3D item->sort_order; enum affinity_type aff =3D = sqlite3ExprAffinity(item->pExpr); - part->type =3Dsql_affinity_to_field_type(aff); + part->type =3D sql_affinity_to_field_type(aff); >=20 >> } >> return key_info; >> } >> @@ -5789,8 +5790,11 @@ sqlite3Select(Parse * pParse, /* The = parser context */ >> /* If the output is destined for a temporary table, open that = table. >> */ >> if (pDest->eDest =3D=3D SRT_EphemTab) { >> - sqlite3VdbeAddOp2(v, OP_OpenTEphemeral, pDest->reg_eph, >> - pEList->nExpr + 1); >> + struct sql_key_info *key_info =3D >> + sql_expr_list_to_key_info(pParse, pEList, 0); >> + sqlite3VdbeAddOp4(v, OP_OpenTEphemeral, pDest->reg_eph, >> + pEList->nExpr + 1, 0, (char = *)key_info, >> + P4_KEYINFO); >> sqlite3VdbeAddOp3(v, OP_IteratorOpen, pDest->iSDParm, 0, >> pDest->reg_eph); >=20 > 4. Why can some ephemeral spaces be created without key_info? I = grepped > by OP_OpenTEphemeral and saw that somewhere it is still Op2, not Op4. If there is no key_info, then default one is used instead. See sql_ephemeral_space_create and how we cope with situation when key_info =3D=3D NULL. >=20 >> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c >> index e6b413c70..4345af24e 100644 >> --- a/src/box/sql/vdbe.c >> +++ b/src/box/sql/vdbe.c >> @@ -2173,7 +2167,7 @@ case OP_Ge: { /* same as TK_GE, = jump, in1, in3 */ >> } else { >> /* Neither operand is NULL. Do a comparison. */ >> affinity =3D pOp->p5 & AFFINITY_MASK; >> - if (affinity>=3DAFFINITY_NUMERIC) { >> + if (affinity>=3DAFFINITY_INTEGER) { >=20 > 5. I guess, we have sqlite3IsNumericAffinity for this. Anyway I replace it with sql_affinity_is_numeric in sql: clean-up affinity from SQL source code patch.