From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 0EB4E2A16F for ; Tue, 23 Apr 2019 11:38:52 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Or76vkt2RooY for ; Tue, 23 Apr 2019 11:38:51 -0400 (EDT) Received: from smtp38.i.mail.ru (smtp38.i.mail.ru [94.100.177.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 272102A15A for ; Tue, 23 Apr 2019 11:38:50 -0400 (EDT) From: i.koptelov Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: [tarantool-patches] Re: [PATCH 2/2] sql: make aggregate functions types more strict Date: Tue, 23 Apr 2019 18:38:47 +0300 References: <49e4ae0bc187dc02f908427692c0ddb2cc2d36a8.1554475881.git.ivan.koptelov@tarantool.org> In-Reply-To: Message-Id: Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org, "n.pettik" Firstly I=E2=80=99d like to answer to Konstantin=E2=80=99s comments. > * Ivan Koptelov [19/04/05 18:02]: >> +/* >> + * This structure is for keeping context during work of >> + * aggregate function. >> + */ >> +struct aggregate_context { >> + /** Value being aggregated. (e.g. current MAX or current counter = value). */ >> + Mem value; >> + /** Reference value to keep track of previous argument's type. = */ >> + Mem reference_value; >> +}; >=20 > Why not call this struct agg_value? I think Nikita=E2=80=99s idea is better - =E2=80=98minmax_context=E2=80=99= - because it=E2=80=99s only used in minmax implementation. >=20 > Besides, keeping a reference to the previous argument is an > overkill. Why not keep a type instead, and assign it to > FIELD_TYPE_SCALAR initially and change to a more specific type > after the first assignment? I agree. Instead of using a reference to Mem, now I keep only reference type. >=20 >> + } else { >> + diag_set(ClientError, ER_INCONSISTENT_TYPES, >> + "INTEGER or FLOAT", = mem_type_to_str(argv[0])); >> + context->fErrorOrAux =3D 1; >> + context->isError =3D SQL_TARANTOOL_ERROR; >=20 > This message would look confusing. Could we get rid of "or" in the > message and be more specific about what is inconsistent? Fixed. @@ -1524,8 +1528,12 @@ sumStep(sql_context * context, int argc, = sql_value ** argv) p->rSum +=3D sql_value_double(argv[0]); p->approx =3D 1; } else { - diag_set(ClientError, ER_INCONSISTENT_TYPES, - "INTEGER or FLOAT", = mem_type_to_str(argv[0])); + diag_set(ClientError, ER_ILLEGAL_PARAMS, + "SUM, TOTAL and AVG aggregate functions = can " + "be called only with INTEGER, FLOAT or = SCALAR " + "arguments. In case of SCALAR arguments, = they " + "all must have numeric values.", + mem_type_to_str(argv[0])); context->fErrorOrAux =3D 1; context->isError =3D SQL_TARANTOOL_ERROR; } >=20 >> + if (sql_type !=3D ref_sql_type) { >> + is_compatible =3D false; >> + if ((sql_type =3D=3D SQL_INTEGER || sql_type =3D=3D= SQL_FLOAT) && >> + (ref_sql_type =3D=3D SQL_INTEGER || >> + ref_sql_type =3D=3D SQL_FLOAT)) { >> + is_compatible =3D true; >=20 > This is a very hot path and doing so much work to check > compatibility is a) clumsy when reading b) slow c) hard to > maintain. >=20 > Please use a compatibility matrix statically defined as a 8x8 > bitmap. Fixed: @@ -1630,30 +1636,23 @@ minmaxStep(sql_context *context, int not_used, = sql_value **argv) * any other type). In the later case an error * is raised. */ - sql_value *reference_value =3D = &aggr_context->reference_value; - if (reference_value->flags =3D=3D 0) - sqlVdbeMemCopy(reference_value, arg); - enum sql_type ref_sql_type =3D = sql_value_type(reference_value); + if (minmax_context->reference_type =3D=3D 0) + minmax_context->reference_type =3D sql_type; + int ref_sql_type =3D minmax_context->reference_type; =20 - bool is_compatible =3D true; if (sql_type !=3D ref_sql_type) { - is_compatible =3D false; - if ((sql_type =3D=3D SQL_INTEGER || sql_type =3D=3D= SQL_FLOAT) && - (ref_sql_type =3D=3D SQL_INTEGER || - ref_sql_type =3D=3D SQL_FLOAT)) { - is_compatible =3D true; + bool types_are_compatible =3D (sql_type & = MEM_NumMask) && + (ref_sql_type & = MEM_NumMask); + if (!types_are_compatible) { + diag_set(ClientError, = ER_INCONSISTENT_TYPES, + type_to_str(ref_sql_type), + type_to_str(sql_type)); + context->fErrorOrAux =3D 1; + context->isError =3D = SQL_TARANTOOL_ERROR; + sqlVdbeMemRelease(best); + return; } } - if (!is_compatible) { - diag_set(ClientError, ER_INCONSISTENT_TYPES, - mem_type_to_str(reference_value), - mem_type_to_str(arg)); - context->fErrorOrAux =3D 1; - context->isError =3D SQL_TARANTOOL_ERROR; - sqlVdbeMemRelease(best); - sqlVdbeMemRelease(reference_value); - return; - } diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c index ed7bf8870..960030b52 100644 --- a/src/box/sql/vdbe.c +++ b/src/box/sql/vdbe.c @@ -618,10 +618,9 @@ vdbe_add_new_autoinc_id(struct Vdbe *vdbe, int64_t = id) } =20 char * -mem_type_to_str(const struct Mem *p) +type_to_str(int type) { - assert(p !=3D NULL); - switch (p->flags & MEM_PURE_TYPE_MASK) { + switch (type) { case MEM_Null: return "NULL"; case MEM_Str: @@ -639,6 +638,19 @@ mem_type_to_str(const struct Mem *p) } } =20 +int +mem_type(const struct Mem *p) +{ + assert(p !=3D NULL); + return p->flags & MEM_PURE_TYPE_MASK; +} + +char * +mem_type_to_str(const struct Mem *p) +{ + return type_to_str(mem_type(p)); +} + /* * Execute as much of a VDBE program as we can. * This is the core of sql_step(). diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h index c84f22caf..553c4f225 100644 --- a/src/box/sql/vdbeInt.h +++ b/src/box/sql/vdbeInt.h @@ -228,6 +228,7 @@ struct Mem { #define MEM_Str 0x0002 /* Value is a string */ #define MEM_Int 0x0004 /* Value is an integer */ #define MEM_Real 0x0008 /* Value is a real number */ +#define MEM_NumMask 0x000c /* Value is integer or real (mask) */ #define MEM_Blob 0x0010 /* Value is a BLOB */ #define MEM_Bool 0x0020 /* Value is a bool */ #define MEM_Ptr 0x0040 /* Value is a generic pointer */ @@ -262,11 +263,25 @@ enum { MEM_PURE_TYPE_MASK =3D 0x1f }; =20 +/** + * Returns one of MEM_Null, MEM_Str, MEM_Int, MEM_Real, MEM_Blob + * or MEM_Bool. Used for error detection and reporting. + */ +int +mem_type(const struct Mem *p); + /** * Simple type to str convertor. It is used to simplify * error reporting. */ char * +type_to_str(int type); + +/** + * Returns string representing type of the given Mem. It is used + * to simplify error reporting. + */ +char * mem_type_to_str(const struct Mem *p); > On 9 Apr 2019, at 17:52, n.pettik wrote: >=20 > Then, I=E2=80=99ve found assertion fault: >=20 > tarantool> create table t(id int primary key, b scalar) > tarantool> insert into t values=09 > tarantool> select min(b) from t > Assertion failed: (((p->szMalloc > 0 && p->z =3D=3D p->zMalloc) ? 1 : = 0) + ((p->flags & MEM_Dyn) !=3D 0 ? 1 : 0) + ((p->flags & MEM_Ephem) !=3D = 0 ? 1 : 0) + ((p->flags & MEM_Static) !=3D 0 ? 1 : 0) =3D=3D 1), = function sqlVdbeCheckMemInvariants, file = /Users/n.pettik/tarantool/src/box/sql/vdbemem.c, line 87. > Abort trap: 6 It was fixed after removing reference_value from minmax_context. > tarantool> create table t(id int primary key, a int, b text) > tarantool> insert into t values(1,1,'1=E2=80=99) > tarantool> select sum(b) from t > --- > - metadata: > - name: sum(b) > type: number > rows: > - [1] > =E2=80=A6 >=20 > So, your statement below that SUM accepts only numeric arguments > is false. If string can be converted to number, it is OK: >=20 > tarantool> select a=3Db from t; > --- > - metadata: > - name: a=3Db > type: integer > rows: > - [1] > =E2=80=A6 >=20 > Next: >=20 > tarantool> select min(b) from t > --- > - metadata: > - name: min(b) > type: scalar > rows: > - ['1'] > =E2=80=A6 >=20 > Could you please fix this behaviour (scalar->int) in a separate patch? > If it is not that easy, file a ticket. It is not easy and we already have a ticket. Fixed commit message. >=20 >> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >> index b1bfc886e..379f28252 100644 >> --- a/src/box/sql/func.c >> +++ b/src/box/sql/func.c >> @@ -45,6 +45,17 @@ >> #include >> #include >>=20 >> +/* >=20 > /** >=20 >> + * This structure is for keeping context during work of >> + * aggregate function. >> + */ >> +struct aggregate_context { >=20 > Let=E2=80=99s change name to minmax_context. Fixed. diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 379f28252..258df0d7e 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -45,15 +45,19 @@ #include #include =20 -/* +/** * This structure is for keeping context during work of - * aggregate function. + * min/max aggregate functions. */ -struct aggregate_context { - /** Value being aggregated. (e.g. current MAX or current counter = value). */ - Mem value; - /** Reference value to keep track of previous argument's type. */ - Mem reference_value; +struct minmax_context { + /** Value being aggregated i.e. current MAX or MIN. */ + Mem best; + /** + * Reference type to keep track of previous argument's types. + * One of MEM_Null, MEM_Str, MEM_Int, MEM_Real, MEM_Blob + * or MEM_Bool. + */ + int reference_type; }; >=20 >> + /** Value being aggregated. (e.g. current MAX or current counter = value). */ >> + Mem value; >> + /** Reference value to keep track of previous argument's type. = */ >> + Mem reference_value; >> +}; >=20 > I agree that keeping struct Men is too much (especially as values not = as pointers). > We need only type of that value. What is more, you need only one = member: > =E2=80=9Cvalue being aggregated=E2=80=9D comes from argv[0]. >=20 Ok, now we keep reference type (instead of reference value) in = minmax_context. But we still need "value being aggregated=E2=80=9D because argv[0] = actually does not contain it. >> @@ -1588,16 +1604,57 @@ static void >> minmaxStep(sql_context *context, int not_used, sql_value **argv) >> { >> UNUSED_PARAMETER(not_used); >> - sql_value *arg =3D argv[0]; >> - sql_value *best =3D sql_aggregate_context(context, = sizeof(*best)); >> + struct aggregate_context *aggr_context =3D >> + (struct aggregate_context *) = sql_aggregate_context(context, >> + = sizeof(*aggr_context)); >=20 > This way looks better: >=20 > @@ -1604,9 +1604,8 @@ static void > minmaxStep(sql_context *context, int not_used, sql_value **argv) > { > UNUSED_PARAMETER(not_used); > - struct aggregate_context *aggr_context =3D > - (struct aggregate_context *) = sql_aggregate_context(context, > - = sizeof(*aggr_context)); > + struct aggregate_context *aggr_context =3D (struct = aggregate_context *) > + sql_aggregate_context(context, sizeof(*aggr_context)); > if (aggr_context =3D=3D NULL) > retur Ok, fixed. @@ -1604,20 +1612,18 @@ static void minmaxStep(sql_context *context, int not_used, sql_value **argv) { UNUSED_PARAMETER(not_used); - struct aggregate_context *aggr_context =3D - (struct aggregate_context *) = sql_aggregate_context(context, - = sizeof(*aggr_context)); - if (aggr_context =3D=3D NULL) + struct minmax_context *minmax_context =3D (struct minmax_context = *) + sql_aggregate_context(context, sizeof(*minmax_context)); + if (minmax_context =3D=3D NULL) return; >=20 >> + if (aggr_context =3D=3D NULL) >> + return; >>=20 >> + sql_value *arg =3D argv[0]; >> + sql_value *best =3D &(aggr_context->value); >> if (best =3D=3D NULL) >> return; >>=20 >> - if (sql_value_type(argv[0]) =3D=3D SQL_NULL) { >> + enum sql_type sql_type =3D sql_value_type(arg); >> + >> + if (sql_type =3D=3D SQL_NULL) { >> if (best->flags !=3D 0) >> sqlSkipAccumulatorLoad(context); >> } else if (best->flags !=3D 0) { >> + /* >> + * During proceeding of the function, arguments >> + * of different types may be encountered (if >> + * SCALAR type column is proceeded). Some types >> + * are compatible (INTEGER and FLOAT) and others >> + * are not (TEXT and BLOB are not compatible with >> + * any other type). In the later case an error >> + * is raised. >> + */ >> + sql_value *reference_value =3D = &aggr_context->reference_value; >> + if (reference_value->flags =3D=3D 0) >> + sqlVdbeMemCopy(reference_value, arg); >=20 > AFAIR flags =3D=3D 0 is a situation when we are processing first > entry (i.e. aggregate_context is initialised right now). Also fixed: @@ -1630,30 +1636,23 @@ minmaxStep(sql_context *context, int not_used, = sql_value **argv) * any other type). In the later case an error * is raised. */ - sql_value *reference_value =3D = &aggr_context->reference_value; - if (reference_value->flags =3D=3D 0) - sqlVdbeMemCopy(reference_value, arg); - enum sql_type ref_sql_type =3D = sql_value_type(reference_value); + if (minmax_context->reference_type =3D=3D 0) + minmax_context->reference_type =3D sql_type; + int ref_sql_type =3D minmax_context->reference_type; >=20 >> + enum sql_type ref_sql_type =3D = sql_value_type(reference_value); >> + >> + bool is_compatible =3D true; >=20 > -> types_are_compatible Ok, fixed: @@ -1630,30 +1636,23 @@ minmaxStep(sql_context *context, int not_used, = sql_value **argv) - bool is_compatible =3D true; if (sql_type !=3D ref_sql_type) { - is_compatible =3D false; - if ((sql_type =3D=3D SQL_INTEGER || sql_type =3D=3D= SQL_FLOAT) && - (ref_sql_type =3D=3D SQL_INTEGER || - ref_sql_type =3D=3D SQL_FLOAT)) { - is_compatible =3D true; + bool types_are_compatible =3D (sql_type & = MEM_NumMask) && + (ref_sql_type & = MEM_NumMask); + if (!types_are_compatible) { + diag_set(ClientError, = ER_INCONSISTENT_TYPES, + type_to_str(ref_sql_type), + type_to_str(sql_type)); + context->fErrorOrAux =3D 1; + context->isError =3D = SQL_TARANTOOL_ERROR; + sqlVdbeMemRelease(best); + return; } } - if (!is_compatible) { - diag_set(ClientError, ER_INCONSISTENT_TYPES, - mem_type_to_str(reference_value), - mem_type_to_str(arg)); - context->fErrorOrAux =3D 1; - context->isError =3D SQL_TARANTOOL_ERROR; - sqlVdbeMemRelease(best); - sqlVdbeMemRelease(reference_value); - return; - } >=20 >> + if (sql_type !=3D ref_sql_type) { >> + is_compatible =3D false; >> + if ((sql_type =3D=3D SQL_INTEGER || sql_type =3D=3D= SQL_FLOAT) && >> + (ref_sql_type =3D=3D SQL_INTEGER || >> + ref_sql_type =3D=3D SQL_FLOAT)) { >> + is_compatible =3D true; >> + } >> + } >> + if (!is_compatible) { >> + diag_set(ClientError, ER_INCONSISTENT_TYPES, >> + mem_type_to_str(reference_value), >> + mem_type_to_str(arg)); >> + context->fErrorOrAux =3D 1; >> + context->isError =3D SQL_TARANTOOL_ERROR; >> + sqlVdbeMemRelease(best); >> + sqlVdbeMemRelease(reference_value); >=20 > Both of these values are allocated in scope of aggregate context. > I guess there=E2=80=99s no need to provide any cleanup on these = variables. I check this in debug and you are right. Because of inner =E2=80=98if=E2=80= =99 this call actually does not do nothing, except checking inner invariants of Mem. Removed.=20= diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 258df0d7e..0738f44e6 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1649,7 +1649,6 @@ minmaxStep(sql_context *context, int not_used, = sql_value **argv) type_to_str(sql_type)); context->fErrorOrAux =3D 1; context->isError =3D = SQL_TARANTOOL_ERROR; - sqlVdbeMemRelease(best); return; } } @@ -1689,7 +1688,6 @@ minMaxFinalize(sql_context * context) if (res->flags !=3D 0) { sql_result_value(context, res); } - sqlVdbeMemRelease(res); } } >=20 >> + return; >> + } >> + >> int max; >> int cmp; >> struct coll *coll =3D sqlGetFuncCollSeq(context); >> @@ -1625,13 +1682,16 @@ minmaxStep(sql_context *context, int = not_used, sql_value **argv) >> static void >> minMaxFinalize(sql_context * context) >> { >> - sql_value *pRes; >> - pRes =3D (sql_value *) sql_aggregate_context(context, 0); >> + struct aggregate_context *func_context =3D >> + (struct aggregate_context *) = sql_aggregate_context(context, sizeof(*func_context)); >> + sql_value *pRes =3D &(func_context->value); >> + >> if (pRes !=3D NULL) { >> if (pRes->flags !=3D 0) { >> sql_result_value(context, pRes); >> } >> sqlVdbeMemRelease(pRes); >> + sqlVdbeMemRelease(&func_context->reference_value); >> } >> } >>=20 >> diff --git a/src/box/sql/select.c b/src/box/sql/select.c >> index b1ec8c758..624083b22 100644 >> --- a/src/box/sql/select.c >> +++ b/src/box/sql/select.c >> @@ -4340,13 +4340,22 @@ pushDownWhereTerms(Parse * pParse, /* Parse = context (for malloc() and error repo >> * argument, this function checks if the following are true: >> * >> * * the query contains just a single aggregate function, >> - * * the aggregate function is either min() or max(), and >> - * * the argument to the aggregate function is a column value. >> + * * the aggregate function is either min() or max(), >> + * * the argument to the aggregate function is a column value, >> + * * the type of column is not SCALAR. >> * >> * If all of the above are true, then WHERE_ORDERBY_MIN or = WHERE_ORDERBY_MAX >> * is returned as appropriate. Also, *ppMinMax is set to point to the >> * list of arguments passed to the aggregate before returning. >> * >> + * The requirement of column type not being SCALAR follows from >> + * the purpose of the function. The purpose of the function is >> + * to answer the question: "Should MIN/MAX call be optimised by >> + * using ORDER ON clause code?" If the type of column is SCALAR >=20 > There=E2=80=99s no ORDER ON clause. Sorry, fixed the comment. diff --git a/src/box/sql/select.c b/src/box/sql/select.c index 624083b22..dd87c5b70 100644 --- a/src/box/sql/select.c +++ b/src/box/sql/select.c @@ -4351,7 +4351,7 @@ pushDownWhereTerms(Parse * pParse, /* Parse = context (for malloc() and error repo * The requirement of column type not being SCALAR follows from * the purpose of the function. The purpose of the function is * to answer the question: "Should MIN/MAX call be optimised by - * using ORDER ON clause code?" If the type of column is SCALAR + * using ORDER BY clause code?" If the type of column is SCALAR * then we have to iterate over all rows to check if their types * are compatible. Hence, the optimisation should not be used. >=20 >> + * then we have to iterate over all rows to check if their types >> + * are compatible. Hence, the optimisation should not be used. >> + * For details please see: = https://github.com/tarantool/tarantool/issues/4032 >> + * >> * Or, if the conditions above are not met, *ppMinMax is set to 0 and >> * WHERE_ORDERBY_NORMAL is returned. >> */ >> @@ -4364,6 +4373,8 @@ minMaxQuery(AggInfo * pAggInfo, ExprList ** = ppMinMax) >> if (pEList && pEList->nExpr =3D=3D 1 >> && pEList->a[0].pExpr->op =3D=3D TK_AGG_COLUMN) { >> const char *zFunc =3D pExpr->u.zToken; >> + if (sql_expr_type(pEList->a[0].pExpr) =3D=3D = FIELD_TYPE_SCALAR) >> + return eRet; >=20 > I see no difference between byte code generated for SCALAR and other > columns (SELECT MIN(a) FROM t;). Investigate this case please. >=20 I am not quite understand what is the problem with this. If the following conditions are true, then the result of MIN/MAX is = retrieved without usage of corresponding function in func: =20 * the query contains just a single aggregate function, * the aggregate function is either min() or max(), * the argument to the aggregate function is a column value, * the type of column is not SCALAR. And the changes above disables this optimisation if column type is = SCALAR. I also added test for this case. >> if (sqlStrICmp(zFunc, "min") =3D=3D 0) { >> eRet =3D WHERE_ORDERBY_MIN; >> *ppMinMax =3D pEList; >>=20 >> diff --git a/test/sql-tap/select1.test.lua = b/test/sql-tap/select1.test.lua >> index be63e815d..a4ce19d73 100755 >> --- a/test/sql-tap/select1.test.lua >> +++ b/test/sql-tap/select1.test.lua >> @@ -1,6 +1,6 @@ >> #!/usr/bin/env tarantool >> test =3D require("sqltester") >> -test:plan(173)=09 >> +test:plan(172) >>=20 >> --!./tcltestrunner.lua >> -- ["set","testdir",[["file","dirname",["argv0"]]]] >> @@ -503,16 +503,6 @@ test:do_catchsql_test( >> -- >> }) >>=20 >> -test:do_execsql_test( >> - "select1-2.17.1", >> - [[ >> - SELECT sum(a) FROM t3 >> - ]], { >> - -- >> - 44.0 >> - -- >> - }) >=20 > Don=E2=80=99t delete test, just change it to catch an error. Ok: diff --git a/test/sql-tap/select1.test.lua = b/test/sql-tap/select1.test.lua index a4ce19d73..b743d900c 100755 --- a/test/sql-tap/select1.test.lua +++ b/test/sql-tap/select1.test.lua @@ -1,6 +1,6 @@ #!/usr/bin/env tarantool test =3D require("sqltester") -test:plan(172) +test:plan(173) =20 --!./tcltestrunner.lua -- ["set","testdir",[["file","dirname",["argv0"]]]] @@ -503,6 +503,18 @@ test:do_catchsql_test( -- }) =20 +test:do_catchsql_test( + "select1-2.17.1", + [[ + SELECT sum(a) FROM t3 + ]], { + -- + 1, "Illegal parameters, SUM, TOTAL and AVG aggregate functions = can " .. + "be called only with INTEGER, FLOAT or SCALAR arguments. In = " .. + "case of SCALAR arguments, they all must have numeric = values." + -- + }) + >=20 >> - >> test:do_catchsql_test( >> "select1-2.18=E2=80=9D >=20 >=20 >=20