[tarantool-patches] [PATCH 0/2] sql: make aggregate functions types more strict

Tarantool development patches archive
 help / color / mirror / Atom feed

* [tarantool-patches] [PATCH 0/2] sql: make aggregate functions types more strict
@ 2019-04-05 14:57 Ivan Koptelov
  2019-04-05 14:57 ` [tarantool-patches] [PATCH 1/2] " Ivan Koptelov
  2019-04-05 14:57 ` [tarantool-patches] [PATCH 2/2] " Ivan Koptelov
  0 siblings, 2 replies; 11+ messages in thread
From: Ivan Koptelov @ 2019-04-05 14:57 UTC (permalink / raw)
  To: tarantool-patches, korablev; +Cc: Ivan Koptelov

In this patches we make aggregate functions more type strict.
First patch contain only some preparatory refactoring and the
second patch contain actual changes.

Ivan Koptelov (2):
  sql: make aggregate functions types more strict
  sql: make aggregate functions types more strict

Branch https://github.com/tarantool/tarantool/tree/sudobobo/gh-4032-strict-aggr-funcs-types
Issue https://github.com/tarantool/tarantool/issues/4032

 src/box/sql/func.c            |  99 ++++++++++++++++++++++++++-------
 src/box/sql/select.c          |  15 ++++-
 test/sql-tap/func5.test.lua   |  50 ++++++++++++++++-
 test/sql-tap/minmax4.test.lua | 101 +++++++++++++++++++++++++++++++++-
 test/sql-tap/select1.test.lua |  12 +---
 test/sql-tap/select5.test.lua |   4 +-
 6 files changed, 244 insertions(+), 37 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tarantool-patches] [PATCH 1/2] sql: make aggregate functions types more strict
  2019-04-05 14:57 [tarantool-patches] [PATCH 0/2] sql: make aggregate functions types more strict Ivan Koptelov
@ 2019-04-05 14:57 ` Ivan Koptelov
  2019-04-09 14:52   ` [tarantool-patches] " n.pettik
  2019-04-05 14:57 ` [tarantool-patches] [PATCH 2/2] " Ivan Koptelov
  1 sibling, 1 reply; 11+ messages in thread
From: Ivan Koptelov @ 2019-04-05 14:57 UTC (permalink / raw)
  To: tarantool-patches, korablev; +Cc: Ivan Koptelov

Let's firstly fix code-style.
---
 src/box/sql/func.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index b86a95d9a..b1bfc886e 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -1585,23 +1585,22 @@ countFinalize(sql_context * context)
  * Routines to implement min() and max() aggregate functions.
  */
 static void
-minmaxStep(sql_context * context, int NotUsed, sql_value ** argv)
+minmaxStep(sql_context *context, int not_used, sql_value **argv)
 {
-	Mem *pArg = (Mem *) argv[0];
-	Mem *pBest;
-	UNUSED_PARAMETER(NotUsed);
+	UNUSED_PARAMETER(not_used);
+	sql_value *arg = argv[0];
+	sql_value *best = sql_aggregate_context(context, sizeof(*best));
 
-	pBest = (Mem *) sql_aggregate_context(context, sizeof(*pBest));
-	if (!pBest)
+	if (best == NULL)
 		return;
 
 	if (sql_value_type(argv[0]) == SQL_NULL) {
-		if (pBest->flags)
+		if (best->flags != 0)
 			sqlSkipAccumulatorLoad(context);
-	} else if (pBest->flags) {
+	} else if (best->flags != 0) {
 		int max;
 		int cmp;
-		struct coll *pColl = sqlGetFuncCollSeq(context);
+		struct coll *coll = sqlGetFuncCollSeq(context);
 		/* This step function is used for both the min() and max() aggregates,
 		 * the only difference between the two being that the sense of the
 		 * comparison is inverted. For the max() aggregate, the
@@ -1611,15 +1610,15 @@ minmaxStep(sql_context * context, int NotUsed, sql_value ** argv)
 		 * aggregate, or 0 for min().
 		 */
 		max = sql_user_data(context) != 0;
-		cmp = sqlMemCompare(pBest, pArg, pColl);
-		if ((max && cmp < 0) || (!max && cmp > 0)) {
-			sqlVdbeMemCopy(pBest, pArg);
+		cmp = sqlMemCompare(best, arg, coll);
+		if ((max != 0 && cmp < 0) || (max == 0 && cmp > 0)) {
+			sqlVdbeMemCopy(best, arg);
 		} else {
 			sqlSkipAccumulatorLoad(context);
 		}
 	} else {
-		pBest->db = sql_context_db_handle(context);
-		sqlVdbeMemCopy(pBest, pArg);
+		best->db = sql_context_db_handle(context);
+		sqlVdbeMemCopy(best, arg);
 	}
 }
 
@@ -1628,8 +1627,8 @@ minMaxFinalize(sql_context * context)
 {
 	sql_value *pRes;
 	pRes = (sql_value *) sql_aggregate_context(context, 0);
-	if (pRes) {
-		if (pRes->flags) {
+	if (pRes != NULL) {
+		if (pRes->flags != 0) {
 			sql_result_value(context, pRes);
 		}
 		sqlVdbeMemRelease(pRes);
-- 
2.20.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tarantool-patches] [PATCH 2/2] sql: make aggregate functions types more strict
  2019-04-05 14:57 [tarantool-patches] [PATCH 0/2] sql: make aggregate functions types more strict Ivan Koptelov
  2019-04-05 14:57 ` [tarantool-patches] [PATCH 1/2] " Ivan Koptelov
@ 2019-04-05 14:57 ` Ivan Koptelov
  2019-04-05 19:48   ` [tarantool-patches] " Konstantin Osipov
  2019-04-09 14:52   ` n.pettik
  1 sibling, 2 replies; 11+ messages in thread
From: Ivan Koptelov @ 2019-04-05 14:57 UTC (permalink / raw)
  To: tarantool-patches, korablev; +Cc: Ivan Koptelov

TOTAL, SUM and AVG are now can be called only on
expressions of INTEGER, FLOAT or SCALAR type.
Also if TOTAL, SUM, AVG, MIN or MAX are called on
expression of SCALAR type and during function work
value of incompatible types are encountered (e.g. INTEGER
and TEXT) then the error is raised.

Closes #4032

@TarantoolBot document
Title: sql aggregate functions became more type strict
Patch changes two things:
1) It prohibits calling TOTAL, SUM and AVG on columns/
expressions which has type different of INTEGER,
FLOAT or SCALAR.
2) If one would call TOTAL, SUM, AVG, MIN or MAX on
column/expression of SCALAR type which actually has
values of incompatible types, then the error would be
raised. INTEGER and FLOAT are compatible, all other
types are incompatible. Consider the example:

CREATE TABLE t1 (a INTEGER PRIMARY KEY, b SCALAR);
INSERT INTO t1 VALUES (1, 1);
INSERT INTO t1 VALUES (2, 3.4);
SELECT MAX(b) FROM t1; -- is OK
INSERT INTO t1 VALUES (3, 'abc');
SELECT MAX(b) FROM t1; -- now it would fall with error
---
 src/box/sql/func.c            |  72 ++++++++++++++++++++++--
 src/box/sql/select.c          |  15 ++++-
 test/sql-tap/func5.test.lua   |  50 ++++++++++++++++-
 test/sql-tap/minmax4.test.lua | 101 +++++++++++++++++++++++++++++++++-
 test/sql-tap/select1.test.lua |  12 +---
 test/sql-tap/select5.test.lua |   4 +-
 6 files changed, 231 insertions(+), 23 deletions(-)

diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index b1bfc886e..379f28252 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -45,6 +45,17 @@
 #include <unicode/uchar.h>
 #include <unicode/ucol.h>
 
+/*
+ * This structure is for keeping context during work of
+ * aggregate function.
+ */
+struct aggregate_context {
+    /** Value being aggregated. (e.g. current MAX or current counter value). */
+    Mem value;
+    /** Reference value to keep track of previous argument's type. */
+    Mem reference_value;
+};
+
 static UConverter* pUtf8conv;
 
 /*
@@ -1509,9 +1520,14 @@ sumStep(sql_context * context, int argc, sql_value ** argv)
 			    && sqlAddInt64(&p->iSum, v)) {
 				p->overflow = 1;
 			}
-		} else {
+		} else if (type == SQL_FLOAT) {
 			p->rSum += sql_value_double(argv[0]);
 			p->approx = 1;
+		} else {
+			diag_set(ClientError, ER_INCONSISTENT_TYPES,
+				 "INTEGER or FLOAT", mem_type_to_str(argv[0]));
+			context->fErrorOrAux = 1;
+			context->isError = SQL_TARANTOOL_ERROR;
 		}
 	}
 }
@@ -1588,16 +1604,57 @@ static void
 minmaxStep(sql_context *context, int not_used, sql_value **argv)
 {
 	UNUSED_PARAMETER(not_used);
-	sql_value *arg = argv[0];
-	sql_value *best = sql_aggregate_context(context, sizeof(*best));
+	struct aggregate_context *aggr_context =
+		(struct aggregate_context *) sql_aggregate_context(context,
+								   sizeof(*aggr_context));
+	if (aggr_context == NULL)
+		return;
 
+	sql_value *arg = argv[0];
+	sql_value *best = &(aggr_context->value);
 	if (best == NULL)
 		return;
 
-	if (sql_value_type(argv[0]) == SQL_NULL) {
+	enum sql_type sql_type = sql_value_type(arg);
+
+	if (sql_type == SQL_NULL) {
 		if (best->flags != 0)
 			sqlSkipAccumulatorLoad(context);
 	} else if (best->flags != 0) {
+		/*
+		 * During proceeding of the function, arguments
+		 * of different types may be encountered (if
+		 * SCALAR type column is proceeded). Some types
+		 * are compatible (INTEGER and FLOAT) and others
+		 * are not (TEXT and BLOB are not compatible with
+		 * any other type). In the later case an error
+		 * is raised.
+		 */
+		sql_value *reference_value = &aggr_context->reference_value;
+		if (reference_value->flags == 0)
+			sqlVdbeMemCopy(reference_value, arg);
+		enum sql_type ref_sql_type = sql_value_type(reference_value);
+
+		bool is_compatible = true;
+		if (sql_type != ref_sql_type) {
+			is_compatible = false;
+			if ((sql_type == SQL_INTEGER || sql_type == SQL_FLOAT) &&
+			    (ref_sql_type == SQL_INTEGER ||
+			     ref_sql_type == SQL_FLOAT)) {
+				is_compatible = true;
+			}
+		}
+		if (!is_compatible) {
+			diag_set(ClientError, ER_INCONSISTENT_TYPES,
+				 mem_type_to_str(reference_value),
+				 mem_type_to_str(arg));
+			context->fErrorOrAux = 1;
+			context->isError = SQL_TARANTOOL_ERROR;
+			sqlVdbeMemRelease(best);
+			sqlVdbeMemRelease(reference_value);
+			return;
+		}
+
 		int max;
 		int cmp;
 		struct coll *coll = sqlGetFuncCollSeq(context);
@@ -1625,13 +1682,16 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
 static void
 minMaxFinalize(sql_context * context)
 {
-	sql_value *pRes;
-	pRes = (sql_value *) sql_aggregate_context(context, 0);
+	struct aggregate_context *func_context =
+		(struct aggregate_context *) sql_aggregate_context(context, sizeof(*func_context));
+	sql_value *pRes = &(func_context->value);
+
 	if (pRes != NULL) {
 		if (pRes->flags != 0) {
 			sql_result_value(context, pRes);
 		}
 		sqlVdbeMemRelease(pRes);
+		sqlVdbeMemRelease(&func_context->reference_value);
 	}
 }
 
diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index b1ec8c758..624083b22 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -4340,13 +4340,22 @@ pushDownWhereTerms(Parse * pParse,	/* Parse context (for malloc() and error repo
  * argument, this function checks if the following are true:
  *
  *    * the query contains just a single aggregate function,
- *    * the aggregate function is either min() or max(), and
- *    * the argument to the aggregate function is a column value.
+ *    * the aggregate function is either min() or max(),
+ *    * the argument to the aggregate function is a column value,
+ *    * the type of column is not SCALAR.
  *
  * If all of the above are true, then WHERE_ORDERBY_MIN or WHERE_ORDERBY_MAX
  * is returned as appropriate. Also, *ppMinMax is set to point to the
  * list of arguments passed to the aggregate before returning.
  *
+ * The requirement of column type not being SCALAR follows from
+ * the purpose of the function. The purpose of the function is
+ * to answer the question: "Should MIN/MAX call be optimised by
+ * using ORDER ON clause code?" If the type of column is SCALAR
+ * then we have to iterate over all rows to check if their types
+ * are compatible. Hence, the optimisation should not be used.
+ * For details please see: https://github.com/tarantool/tarantool/issues/4032
+ *
  * Or, if the conditions above are not met, *ppMinMax is set to 0 and
  * WHERE_ORDERBY_NORMAL is returned.
  */
@@ -4364,6 +4373,8 @@ minMaxQuery(AggInfo * pAggInfo, ExprList ** ppMinMax)
 		if (pEList && pEList->nExpr == 1
 		    && pEList->a[0].pExpr->op == TK_AGG_COLUMN) {
 			const char *zFunc = pExpr->u.zToken;
+			if (sql_expr_type(pEList->a[0].pExpr) == FIELD_TYPE_SCALAR)
+				return eRet;
 			if (sqlStrICmp(zFunc, "min") == 0) {
 				eRet = WHERE_ORDERBY_MIN;
 				*ppMinMax = pEList;
diff --git a/test/sql-tap/func5.test.lua b/test/sql-tap/func5.test.lua
index f3706e631..b504bb8bf 100755
--- a/test/sql-tap/func5.test.lua
+++ b/test/sql-tap/func5.test.lua
@@ -1,6 +1,6 @@
 #!/usr/bin/env tarantool
 test = require("sqltester")
-test:plan(15)
+test:plan(19)
 
 --!./tcltestrunner.lua
 -- 2010 August 27
@@ -229,4 +229,52 @@ test:do_catchsql_test(
     }
 )
 
+-- The following tests ensure that AVG and SUM correctly proceeds
+-- SCALAR type. The expected behavior:
+-- 1) If column actually contains INTEGER and FLOAT values,
+--    then INTEGER values are casted to FLOATs and the result
+--    is computed.
+-- 2) If column actually contains values of any other
+--    types, then the error is raised.
+
+test:do_execsql_test(
+    "func-5.4.1",
+    [[
+        CREATE TABLE test6(a SCALAR PRIMARY KEY);
+        INSERT INTO test6 VALUES(1);
+        INSERT INTO test6 VALUES(2.5);
+        SELECT SUM(a) FROM test6;
+    ]], {
+        3.5
+    }
+)
+
+test:do_execsql_test(
+    "func-5.4.2",
+    [[
+        SELECT AVG(a) FROM test6;
+    ]], {
+        1.75
+    }
+)
+
+test:do_catchsql_test(
+    "func-5.4.3",
+    [[
+        INSERT INTO test6 VALUES('abc');
+        SELECT SUM(a) FROM test6;
+    ]], {
+        1,"Inconsistent types: expected INTEGER or FLOAT got TEXT"
+    }
+)
+
+test:do_catchsql_test(
+    "func-5.4.4",
+    [[
+        SELECT AVG(a) FROM test6;
+    ]], {
+        1,"Inconsistent types: expected INTEGER or FLOAT got TEXT"
+    }
+)
+
 test:finish_test()
diff --git a/test/sql-tap/minmax4.test.lua b/test/sql-tap/minmax4.test.lua
index b600c9bfe..7476051f1 100755
--- a/test/sql-tap/minmax4.test.lua
+++ b/test/sql-tap/minmax4.test.lua
@@ -1,6 +1,7 @@
 #!/usr/bin/env tarantool
 test = require("sqltester")
-test:plan(21)
+
+test:plan(29)
 
 --!./tcltestrunner.lua
 -- 2012 February 02
@@ -303,6 +304,104 @@ test:do_test(
         -- </minmax4-2.7>
     })
 
+-- The following tests ensure that MIN and MAX correctly proceeds
+-- SCALAR type. The expected behavior:
+-- 1) If column actually contains INTEGER and FLOAT values,
+--    then INTEGER values are casted to FLOATs and the result
+--    is computed.
+-- 2) If column actually contains TEXT and values of any other
+--    types, then the error is raised.
+-- 3) All other combinations would also result in error.
+test:do_test(
+    "minmax4-3.1",
+    function()
+        return test:execsql [[
+            CREATE TABLE t4(a INT PRIMARY KEY, b SCALAR);
+            INSERT INTO t4 VALUES (1, 2);
+            INSERT INTO t4 VALUES (2, 1.5);
+            SELECT MAX(b) FROM t4;
+        ]]
+        end, {
+            2.0,
+    })
+
+test:do_test(
+    "minmax4-3.2",
+    function()
+        return test:execsql [[
+            SELECT MIN(b) FROM t4;
+        ]]
+    end, {
+        1.5,
+    })
+
+test:do_test(
+    "minmax4-3.3",
+    function()
+        return test:catchsql [[
+            INSERT INTO t4 VALUES (3, 'abc');
+            SELECT MIN(b) FROM t4;
+        ]]
+    end, {
+        1, "Inconsistent types: expected REAL got TEXT"
+    })
+
+test:do_test(
+    "minmax4-3.4",
+    function()
+        return test:catchsql [[
+            SELECT MAX(b) FROM t4;
+        ]]
+    end, {
+        1, "Inconsistent types: expected REAL got TEXT"
+    })
+
+-- Cases when we call aggregate MIN/MAX functions on column with
+-- index (e.g. PRIMARY KEY index) deserves it's own test
+-- because in this case MIN/MAX is implemented not with
+-- dedicated function, but with usage of corresponding index.
+test:do_test(
+    "minmax4-3.5",
+    function()
+        return test:execsql [[
+            CREATE TABLE t5(a SCALAR PRIMARY KEY);
+            INSERT INTO t5 VALUES (2);
+            INSERT INTO t5 VALUES (1.5);
+            SELECT MAX(a) FROM t5;
+        ]]
+    end, {
+        2.0,
+    })
 
+test:do_test(
+    "minmax4-3.6",
+    function()
+        return test:execsql [[
+            SELECT MIN(a) FROM t5;
+        ]]
+    end, {
+        1.5,
+    })
+
+test:do_test(
+    "minmax4-3.7",
+    function()
+        return test:catchsql [[
+            INSERT INTO t5 VALUES ('abc');
+            SELECT MIN(a) FROM t5;
+        ]]
+    end, {
+        1, "Inconsistent types: expected INTEGER got TEXT"
+    })
+
+test:do_test(
+    "minmax4-3.8",
+    function()
+        return test:catchsql [[
+            SELECT MAX(a) FROM t5;
+        ]]
+    end, {
+        1, "Inconsistent types: expected INTEGER got TEXT"
+    })
 
 test:finish_test()
diff --git a/test/sql-tap/select1.test.lua b/test/sql-tap/select1.test.lua
index be63e815d..a4ce19d73 100755
--- a/test/sql-tap/select1.test.lua
+++ b/test/sql-tap/select1.test.lua
@@ -1,6 +1,6 @@
 #!/usr/bin/env tarantool
 test = require("sqltester")
-test:plan(173)
+test:plan(172)
 
 --!./tcltestrunner.lua
 -- ["set","testdir",[["file","dirname",["argv0"]]]]
@@ -503,16 +503,6 @@ test:do_catchsql_test(
         -- </select1-2.17>
     })
 
-test:do_execsql_test(
-    "select1-2.17.1",
-    [[
-        SELECT sum(a) FROM t3
-    ]], {
-        -- <select1-2.17.1>
-        44.0
-        -- </select1-2.17.1>
-    })
-
 test:do_catchsql_test(
     "select1-2.18",
     [[
diff --git a/test/sql-tap/select5.test.lua b/test/sql-tap/select5.test.lua
index 507448d00..5674edfea 100755
--- a/test/sql-tap/select5.test.lua
+++ b/test/sql-tap/select5.test.lua
@@ -553,8 +553,8 @@ test:do_execsql_test(
 test:do_execsql_test(
     "select5-9.13.2",
     [[
-            CREATE TABLE jj (s1 INT, s2 VARCHAR(1), PRIMARY KEY(s1));
-            INSERT INTO jj VALUES(1, 'A'), (2, 'a');
+            CREATE TABLE jj (s1 INT, s2 INTEGER, PRIMARY KEY(s1));
+            INSERT INTO jj VALUES(1, 0), (2, 0);
             SELECT 1 FROM jj HAVING avg(s2) = 1 AND avg(s2) = 0;
     ]], {
     -- <select5-9.13.2>
-- 
2.20.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tarantool-patches] Re: [PATCH 2/2] sql: make aggregate functions types more strict
  2019-04-05 14:57 ` [tarantool-patches] [PATCH 2/2] " Ivan Koptelov
@ 2019-04-05 19:48   ` Konstantin Osipov
  2019-04-17 12:50     ` i.koptelov
  2019-04-09 14:52   ` n.pettik
  1 sibling, 1 reply; 11+ messages in thread
From: Konstantin Osipov @ 2019-04-05 19:48 UTC (permalink / raw)
  To: tarantool-patches; +Cc: korablev, Ivan Koptelov

* Ivan Koptelov <ivan.koptelov@tarantool.org> [19/04/05 18:02]:
> +/*
> + * This structure is for keeping context during work of
> + * aggregate function.
> + */
> +struct aggregate_context {
> +    /** Value being aggregated. (e.g. current MAX or current counter value). */
> +    Mem value;
> +    /** Reference value to keep track of previous argument's type. */
> +    Mem reference_value;
> +};

Why not call this struct agg_value?

Besides, keeping a reference to the previous argument is an
overkill. Why not keep a type instead, and assign it to
FIELD_TYPE_SCALAR initially and change to a more specific type
after the first assignment?

> +		} else {
> +			diag_set(ClientError, ER_INCONSISTENT_TYPES,
> +				 "INTEGER or FLOAT", mem_type_to_str(argv[0]));
> +			context->fErrorOrAux = 1;
> +			context->isError = SQL_TARANTOOL_ERROR;

This message would look confusing. Could we get rid of "or" in the
message and be more specific about what is inconsistent?

> +		if (sql_type != ref_sql_type) {
> +			is_compatible = false;
> +			if ((sql_type == SQL_INTEGER || sql_type == SQL_FLOAT) &&
> +			    (ref_sql_type == SQL_INTEGER ||
> +			     ref_sql_type == SQL_FLOAT)) {
> +				is_compatible = true;

This is a very hot path and doing so much work to check
compatibility is a) clumsy when reading b) slow c) hard to
maintain.

Please use a compatibility matrix statically defined as a 8x8
bitmap.

Besides, I guess you can get rid of this check for most common
cases - averaging a column of the same type - so this is perhaps
better to make a separate opcode, not part of the main opcode, and
emit only when we're not sure the type is going to be the same
across all values. I don't know how hard this is to do, however -
perhaps should be moved into a separate patch, but I'd guess
detecting that the aggregate function argument has a non-mutable
type is not hard. 

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tarantool-patches] Re: [PATCH 1/2] sql: make aggregate functions types more strict
  2019-04-05 14:57 ` [tarantool-patches] [PATCH 1/2] " Ivan Koptelov
@ 2019-04-09 14:52   ` n.pettik
  0 siblings, 0 replies; 11+ messages in thread
From: n.pettik @ 2019-04-09 14:52 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Ivan Koptelov

Please, don’t use the same commit message within patch-set.
In this particular case, I’d say:

sql: fix code style of functions implementing min/max aggregates

> On 5 Apr 2019, at 17:57, Ivan Koptelov <ivan.koptelov@tarantool.org> wrote:
> 
> Let's firstly fix code-style.
> ---
> src/box/sql/func.c | 31 +++++++++++++++----------------
> 1 file changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index b86a95d9a..b1bfc886e 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -1585,23 +1585,22 @@ countFinalize(sql_context * context)
>  * Routines to implement min() and max() aggregate functions.
>  */

diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 379f28252..ad8e373ae 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -1597,7 +1597,7 @@ countFinalize(sql_context * context)
        sql_result_int64(context, p ? p->n : 0);
 }
 
-/*
+/**
  * Routines to implement min() and max() aggregate functions.
  */
 static void

> static void
> -minmaxStep(sql_context * context, int NotUsed, sql_value ** argv)
> +minmaxStep(sql_context *context, int not_used, sql_value **argv)

FIx name of function -> minmax_agg_step()

> {
> -	Mem *pArg = (Mem *) argv[0];
> -	Mem *pBest;
> -	UNUSED_PARAMETER(NotUsed);
> +	UNUSED_PARAMETER(not_used);
> +	sql_value *arg = argv[0];
> +	sql_value *best = sql_aggregate_context(context, sizeof(*best));

What is best? Use more specific name or add comment.
I got it only after 30 minutes...

> 
> -	pBest = (Mem *) sql_aggregate_context(context, sizeof(*pBest));
> -	if (!pBest)
> +	if (best == NULL)
> 		return;
> 
> 	if (sql_value_type(argv[0]) == SQL_NULL) {
> -		if (pBest->flags)
> +		if (best->flags != 0)
> 			sqlSkipAccumulatorLoad(context);
> -	} else if (pBest->flags) {
> +	} else if (best->flags != 0) {
> 		int max;
> 		int cmp;
> -		struct coll *pColl = sqlGetFuncCollSeq(context);
> +		struct coll *coll = sqlGetFuncCollSeq(context);
> 		/* This step function is used for both the min() and max() aggregates,
> 		 * the only difference between the two being that the sense of the
> 		 * comparison is inverted. For the max() aggregate, the

Fix comment formatting:

@@ -1658,12 +1658,15 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
                int max;
                int cmp;
                struct coll *coll = sqlGetFuncCollSeq(context);
-               /* This step function is used for both the min() and max() aggregates,
-                * the only difference between the two being that the sense of the
-                * comparison is inverted. For the max() aggregate, the
-                * sql_user_data() function returns (void *)-1. For min() it
-                * returns (void *)db, where db is the sql* database pointer.
-                * Therefore the next statement sets variable 'max' to 1 for the max()
+               /*
+                * This step function is used for both the min()
+                * and max() aggregates, the only difference
+                * between the two being that the sense of the
+                * comparison is inverted. For the max() aggregate,
+                * the sql_user_data() function returns (void *)-1.
+                * For min() it returns (void *)db, where db is the
+                * sql* database pointer. Therefore the next
+                * statement sets variable 'max' to 1 for the max()
                 * aggregate, or 0 for min().
                 */


> @@ -1611,15 +1610,15 @@ minmaxStep(sql_context * context, int NotUsed, sql_value ** argv)
> 		 * aggregate, or 0 for min().
> 		 */
> 		max = sql_user_data(context) != 0;
> -		cmp = sqlMemCompare(pBest, pArg, pColl);
> -		if ((max && cmp < 0) || (!max && cmp > 0)) {
> -			sqlVdbeMemCopy(pBest, pArg);
> +		cmp = sqlMemCompare(best, arg, coll);
> +		if ((max != 0 && cmp < 0) || (max == 0 && cmp > 0)) {
> +			sqlVdbeMemCopy(best, arg);
> 		} else {
> 			sqlSkipAccumulatorLoad(context);
> 		}
> 	} else {
> -		pBest->db = sql_context_db_handle(context);
> -		sqlVdbeMemCopy(pBest, pArg);
> +		best->db = sql_context_db_handle(context);
> +		sqlVdbeMemCopy(best, arg);
> 	}
> }
> 
> @@ -1628,8 +1627,8 @@ minMaxFinalize(sql_context * context)

Finish refactoring of this function.

> {
> 	sql_value *pRes;
> 	pRes = (sql_value *) sql_aggregate_context(context, 0);
> -	if (pRes) {
> -		if (pRes->flags) {
> +	if (pRes != NULL) {
> +		if (pRes->flags != 0) {
> 			sql_result_value(context, pRes);
> 		}
> 		sqlVdbeMemRelease(pRes);
> -- 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tarantool-patches] Re: [PATCH 2/2] sql: make aggregate functions types more strict
  2019-04-05 14:57 ` [tarantool-patches] [PATCH 2/2] " Ivan Koptelov
  2019-04-05 19:48   ` [tarantool-patches] " Konstantin Osipov
@ 2019-04-09 14:52   ` n.pettik
  2019-04-23 15:38     ` i.koptelov
  1 sibling, 1 reply; 11+ messages in thread
From: n.pettik @ 2019-04-09 14:52 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Ivan Koptelov


Firstly, please consider Konstantin’s comments.

Then, I’ve found assertion fault:

tarantool> create table t(id int primary key, b scalar)
tarantool> insert into t values(1, 1), (2, '1'), (3, 4.123)
tarantool> select min(b) from t
Assertion failed: (((p->szMalloc > 0 && p->z == p->zMalloc) ? 1 : 0) + ((p->flags & MEM_Dyn) != 0 ? 1 : 0) + ((p->flags & MEM_Ephem) != 0 ? 1 : 0) + ((p->flags & MEM_Static) != 0 ? 1 : 0) == 1), function sqlVdbeCheckMemInvariants, file /Users/n.pettik/tarantool/src/box/sql/vdbemem.c, line 87.
Abort trap: 6

tarantool> create table t(id int primary key, a int, b text)
tarantool> insert into t values(1,1,'1’)
tarantool> select sum(b) from t
---
- metadata:
  - name: sum(b)
    type: number
  rows:
  - [1]
…

So, your statement below that SUM accepts only numeric arguments
is false. If string can be converted to number, it is OK:

tarantool> select a=b from t;
---
- metadata:
  - name: a=b
    type: integer
  rows:
  - [1]
…

Next:

tarantool> select min(b) from t
---
- metadata:
  - name: min(b)
    type: scalar
  rows:
  - ['1']
…

Could you please fix this behaviour (scalar->int) in a separate patch?
If it is not that easy, file a ticket.

> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index b1bfc886e..379f28252 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -45,6 +45,17 @@
> #include <unicode/uchar.h>
> #include <unicode/ucol.h>
> 
> +/*

/**

> + * This structure is for keeping context during work of
> + * aggregate function.
> + */
> +struct aggregate_context {

Let’s change name to minmax_context.

> +    /** Value being aggregated. (e.g. current MAX or current counter value). */
> +    Mem value;
> +    /** Reference value to keep track of previous argument's type. */
> +    Mem reference_value;
> +};

I agree that keeping struct Men is too much (especially as values not as pointers).
We need only type of that value. What is more, you need only one member:
“value being aggregated” comes from argv[0].

> @@ -1588,16 +1604,57 @@ static void
> minmaxStep(sql_context *context, int not_used, sql_value **argv)
> {
> 	UNUSED_PARAMETER(not_used);
> -	sql_value *arg = argv[0];
> -	sql_value *best = sql_aggregate_context(context, sizeof(*best));
> +	struct aggregate_context *aggr_context =
> +		(struct aggregate_context *) sql_aggregate_context(context,
> +								   sizeof(*aggr_context));

This way looks better:

@@ -1604,9 +1604,8 @@ static void
 minmaxStep(sql_context *context, int not_used, sql_value **argv)
 {
        UNUSED_PARAMETER(not_used);
-       struct aggregate_context *aggr_context =
-               (struct aggregate_context *) sql_aggregate_context(context,
-                                                                  sizeof(*aggr_context));
+       struct aggregate_context *aggr_context = (struct aggregate_context *)
+               sql_aggregate_context(context, sizeof(*aggr_context));
        if (aggr_context == NULL)
                retur

> +	if (aggr_context == NULL)
> +		return;
> 
> +	sql_value *arg = argv[0];
> +	sql_value *best = &(aggr_context->value);
> 	if (best == NULL)
> 		return;
> 
> -	if (sql_value_type(argv[0]) == SQL_NULL) {
> +	enum sql_type sql_type = sql_value_type(arg);
> +
> +	if (sql_type == SQL_NULL) {
> 		if (best->flags != 0)
> 			sqlSkipAccumulatorLoad(context);
> 	} else if (best->flags != 0) {
> +		/*
> +		 * During proceeding of the function, arguments
> +		 * of different types may be encountered (if
> +		 * SCALAR type column is proceeded). Some types
> +		 * are compatible (INTEGER and FLOAT) and others
> +		 * are not (TEXT and BLOB are not compatible with
> +		 * any other type). In the later case an error
> +		 * is raised.
> +		 */
> +		sql_value *reference_value = &aggr_context->reference_value;
> +		if (reference_value->flags == 0)
> +			sqlVdbeMemCopy(reference_value, arg);

AFAIR flags == 0 is a situation when we are processing first
entry (i.e. aggregate_context is initialised right now).

> +		enum sql_type ref_sql_type = sql_value_type(reference_value);
> +
> +		bool is_compatible = true;

-> types_are_compatible

> +		if (sql_type != ref_sql_type) {
> +			is_compatible = false;
> +			if ((sql_type == SQL_INTEGER || sql_type == SQL_FLOAT) &&
> +			    (ref_sql_type == SQL_INTEGER ||
> +			     ref_sql_type == SQL_FLOAT)) {
> +				is_compatible = true;
> +			}
> +		}
> +		if (!is_compatible) {
> +			diag_set(ClientError, ER_INCONSISTENT_TYPES,
> +				 mem_type_to_str(reference_value),
> +				 mem_type_to_str(arg));
> +			context->fErrorOrAux = 1;
> +			context->isError = SQL_TARANTOOL_ERROR;
> +			sqlVdbeMemRelease(best);
> +			sqlVdbeMemRelease(reference_value);

Both of these values are allocated in scope of aggregate context.
I guess there’s no need to provide any cleanup on these variables.

> +			return;
> +		}
> +
> 		int max;
> 		int cmp;
> 		struct coll *coll = sqlGetFuncCollSeq(context);
> @@ -1625,13 +1682,16 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
> static void
> minMaxFinalize(sql_context * context)
> {
> -	sql_value *pRes;
> -	pRes = (sql_value *) sql_aggregate_context(context, 0);
> +	struct aggregate_context *func_context =
> +		(struct aggregate_context *) sql_aggregate_context(context, sizeof(*func_context));
> +	sql_value *pRes = &(func_context->value);
> +
> 	if (pRes != NULL) {
> 		if (pRes->flags != 0) {
> 			sql_result_value(context, pRes);
> 		}
> 		sqlVdbeMemRelease(pRes);
> +		sqlVdbeMemRelease(&func_context->reference_value);
> 	}
> }
> 
> diff --git a/src/box/sql/select.c b/src/box/sql/select.c
> index b1ec8c758..624083b22 100644
> --- a/src/box/sql/select.c
> +++ b/src/box/sql/select.c
> @@ -4340,13 +4340,22 @@ pushDownWhereTerms(Parse * pParse,	/* Parse context (for malloc() and error repo
>  * argument, this function checks if the following are true:
>  *
>  *    * the query contains just a single aggregate function,
> - *    * the aggregate function is either min() or max(), and
> - *    * the argument to the aggregate function is a column value.
> + *    * the aggregate function is either min() or max(),
> + *    * the argument to the aggregate function is a column value,
> + *    * the type of column is not SCALAR.
>  *
>  * If all of the above are true, then WHERE_ORDERBY_MIN or WHERE_ORDERBY_MAX
>  * is returned as appropriate. Also, *ppMinMax is set to point to the
>  * list of arguments passed to the aggregate before returning.
>  *
> + * The requirement of column type not being SCALAR follows from
> + * the purpose of the function. The purpose of the function is
> + * to answer the question: "Should MIN/MAX call be optimised by
> + * using ORDER ON clause code?" If the type of column is SCALAR

There’s no ORDER ON clause.

> + * then we have to iterate over all rows to check if their types
> + * are compatible. Hence, the optimisation should not be used.
> + * For details please see: https://github.com/tarantool/tarantool/issues/4032
> + *
>  * Or, if the conditions above are not met, *ppMinMax is set to 0 and
>  * WHERE_ORDERBY_NORMAL is returned.
>  */
> @@ -4364,6 +4373,8 @@ minMaxQuery(AggInfo * pAggInfo, ExprList ** ppMinMax)
> 		if (pEList && pEList->nExpr == 1
> 		    && pEList->a[0].pExpr->op == TK_AGG_COLUMN) {
> 			const char *zFunc = pExpr->u.zToken;
> +			if (sql_expr_type(pEList->a[0].pExpr) == FIELD_TYPE_SCALAR)
> +				return eRet;

I see no difference between byte code generated for SCALAR and other
columns (SELECT MIN(a) FROM t;). Investigate this case please.

> 			if (sqlStrICmp(zFunc, "min") == 0) {
> 				eRet = WHERE_ORDERBY_MIN;
> 				*ppMinMax = pEList;
> 
> diff --git a/test/sql-tap/select1.test.lua b/test/sql-tap/select1.test.lua
> index be63e815d..a4ce19d73 100755
> --- a/test/sql-tap/select1.test.lua
> +++ b/test/sql-tap/select1.test.lua
> @@ -1,6 +1,6 @@
> #!/usr/bin/env tarantool
> test = require("sqltester")
> -test:plan(173)
> +test:plan(172)
> 
> --!./tcltestrunner.lua
> -- ["set","testdir",[["file","dirname",["argv0"]]]]
> @@ -503,16 +503,6 @@ test:do_catchsql_test(
>         -- </select1-2.17>
>     })
> 
> -test:do_execsql_test(
> -    "select1-2.17.1",
> -    [[
> -        SELECT sum(a) FROM t3
> -    ]], {
> -        -- <select1-2.17.1>
> -        44.0
> -        -- </select1-2.17.1>
> -    })

Don’t delete test, just change it to catch an error.

> -
> test:do_catchsql_test(
>     "select1-2.18”

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tarantool-patches] Re: [PATCH 2/2] sql: make aggregate functions types more strict
  2019-04-05 19:48   ` [tarantool-patches] " Konstantin Osipov
@ 2019-04-17 12:50     ` i.koptelov
  2019-04-17 13:19       ` n.pettik
  0 siblings, 1 reply; 11+ messages in thread
From: i.koptelov @ 2019-04-17 12:50 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Konstantin Osipov, n.pettik

Thank you for the comments! I agree with all your suggestions and would
send fixes a little bit later. Now I have one thing to discuss. 

> On 5 Apr 2019, at 22:48, Konstantin Osipov <kostja@tarantool.org> wrote:
> 
> * Ivan Koptelov <ivan.koptelov@tarantool.org> [19/04/05 18:02]:
> 
> Besides, I guess you can get rid of this check for most common
> cases - averaging a column of the same type - so this is perhaps
> better to make a separate opcode, not part of the main opcode, and
> emit only when we're not sure the type is going to be the same
> across all values. I don't know how hard this is to do, however -
> perhaps should be moved into a separate patch, but I'd guess
> detecting that the aggregate function argument has a non-mutable
> type is not hard. 
> 
> -- 
> Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
> http://tarantool.io - www.twitter.com/kostja_osipov

I am not quite understand why do you use word ‘opcode’.
Functions are implemented as C code.

Considering your suggestion (one ‘opcode’ for simple cases, another one
- for complex) I want to do the following:
1) Add a bunch of INTERNAL functions, for example max_number, max_text and max_scalar.
max_number and max_text would not have excess type checks, while max_scalar would
have all necessary type checks. So a bunch of INTERNAL functions would implement one EXTERNAL function
(just max() in this example).
2) In runtime determine proper INTERNAL function (max_number, max_text or max_scalar) to implement
given function. It would be done only once (not on the every step of aggregate function) using
information about column type.

For example:
SELECT MAX(b) FROM test_table;

If test_table.b has TEXT type we would use max_text. If test_table.b has SCALAR type
we would max_scalar.  

If this question seem for you to be too ‘low-level’ I can just send the code
for the next review round.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tarantool-patches] Re: [PATCH 2/2] sql: make aggregate functions types more strict
  2019-04-17 12:50     ` i.koptelov
@ 2019-04-17 13:19       ` n.pettik
  0 siblings, 0 replies; 11+ messages in thread
From: n.pettik @ 2019-04-17 13:19 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Ivan Koptelov, Konstantin Osipov

> On 17 Apr 2019, at 15:50, i.koptelov <ivan.koptelov@tarantool.org> wrote:
> 
> Thank you for the comments! I agree with all your suggestions and would
> send fixes a little bit later. Now I have one thing to discuss. 
> 
>> On 5 Apr 2019, at 22:48, Konstantin Osipov <kostja@tarantool.org> wrote:
>> 
>> * Ivan Koptelov <ivan.koptelov@tarantool.org> [19/04/05 18:02]:
>> 
>> Besides, I guess you can get rid of this check for most common
>> cases - averaging a column of the same type - so this is perhaps
>> better to make a separate opcode, not part of the main opcode, and
>> emit only when we're not sure the type is going to be the same
>> across all values. I don't know how hard this is to do, however -
>> perhaps should be moved into a separate patch, but I'd guess
>> detecting that the aggregate function argument has a non-mutable
>> type is not hard. 
>> 
>> -- 
>> Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
>> http://tarantool.io - www.twitter.com/kostja_osipov
> 
> I am not quite understand why do you use word ‘opcode’.
> Functions are implemented as C code.

I guess Konstantin meant adding opcode that implements
logic of primitive aggregate function. We are operating in
terms of opcodes due to our abstraction model: it is divided
into compilation stage and runtime. To invoke ‘c function’, we
should have corresponding opcode implementing initial preparation,
passing arguments to function, saving results to given registers etc.

Original SQLite routines to implement aggregates take into account
most general case (any function, any types, any number of arguements etc).
We can significantly simplify this procedures for several quite common cases.
For instance, SELECT sum(a) FROM t;

Within one opcode, we can open iterator, fetch values and quickly evaluate
sum (without involving OP_Next, OP_Column etc).

> Considering your suggestion (one ‘opcode’ for simple cases, another one
> - for complex) I want to do the following:
> 1) Add a bunch of INTERNAL functions, for example max_number, max_text and max_scalar.
> max_number and max_text would not have excess type checks, while max_scalar would
> have all necessary type checks. So a bunch of INTERNAL functions would implement one EXTERNAL function
> (just max() in this example).

Anyway, at some moment you have to dispatch call of function depending
on runtime value:

switch (mem->type) {
	case MEM_Str:
		return max_str(mem);
etc.

Otherwise, we have to add separate opcode for each pair
(aggregate, type), which in turn would decrease performance.

Nevertheless, I like this approach. For instance, it is used in PosgreSQL:
for each operation (like ‘+’) they have function with corresponding arguments.
When you try to add non-compatible values (like string and double), there’s
no function implementing addition with char * and double arguments, so
error is raised:  'operator does not exist: text * integer’. What is more, such
approach is quite suitable for JITing. On the other hand, we should carefully
benchmark changes like this. I’m not sure whether priority of this task is high
or low.

> 2) In runtime determine proper INTERNAL function (max_number, max_text or max_scalar) to implement
> given function. It would be done only once (not on the every step of aggregate function) using
> information about column type.
> 
> For example:
> SELECT MAX(b) FROM test_table;
> 
> If test_table.b has TEXT type we would use max_text. If test_table.b has SCALAR type
> we would max_scalar.  
> 
> If this question seem for you to be too ‘low-level’ I can just send the code
> for the next review round.
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tarantool-patches] Re: [PATCH 2/2] sql: make aggregate functions types more strict
  2019-04-09 14:52   ` n.pettik
@ 2019-04-23 15:38     ` i.koptelov
  2019-04-24 17:37       ` n.pettik
  0 siblings, 1 reply; 11+ messages in thread
From: i.koptelov @ 2019-04-23 15:38 UTC (permalink / raw)
  To: tarantool-patches, n.pettik

Firstly I’d like to answer to Konstantin’s comments.

> * Ivan Koptelov <ivan.koptelov@tarantool.org> [19/04/05 18:02]:
>> +/*
>> + * This structure is for keeping context during work of
>> + * aggregate function.
>> + */
>> +struct aggregate_context {
>> +    /** Value being aggregated. (e.g. current MAX or current counter value). */
>> +    Mem value;
>> +    /** Reference value to keep track of previous argument's type. */
>> +    Mem reference_value;
>> +};
> 
> Why not call this struct agg_value?
I think Nikita’s idea is better - ‘minmax_context’ - because it’s only used in minmax implementation.

> 
> Besides, keeping a reference to the previous argument is an
> overkill. Why not keep a type instead, and assign it to
> FIELD_TYPE_SCALAR initially and change to a more specific type
> after the first assignment?
I agree. Instead of using a reference to Mem, now I keep only
reference type.

> 
>> +		} else {
>> +			diag_set(ClientError, ER_INCONSISTENT_TYPES,
>> +				 "INTEGER or FLOAT", mem_type_to_str(argv[0]));
>> +			context->fErrorOrAux = 1;
>> +			context->isError = SQL_TARANTOOL_ERROR;
> 
> This message would look confusing. Could we get rid of "or" in the
> message and be more specific about what is inconsistent?
Fixed.

@@ -1524,8 +1528,12 @@ sumStep(sql_context * context, int argc, sql_value ** argv)
 			p->rSum += sql_value_double(argv[0]);
 			p->approx = 1;
 		} else {
-			diag_set(ClientError, ER_INCONSISTENT_TYPES,
-				 "INTEGER or FLOAT", mem_type_to_str(argv[0]));
+			diag_set(ClientError, ER_ILLEGAL_PARAMS,
+				"SUM, TOTAL and AVG aggregate functions can "
+				"be called only with INTEGER, FLOAT or SCALAR "
+				"arguments. In case of SCALAR arguments, they "
+				"all must have numeric values.",
+				 mem_type_to_str(argv[0]));
 			context->fErrorOrAux = 1;
 			context->isError = SQL_TARANTOOL_ERROR;
 		}

> 
>> +		if (sql_type != ref_sql_type) {
>> +			is_compatible = false;
>> +			if ((sql_type == SQL_INTEGER || sql_type == SQL_FLOAT) &&
>> +			    (ref_sql_type == SQL_INTEGER ||
>> +			     ref_sql_type == SQL_FLOAT)) {
>> +				is_compatible = true;
> 
> This is a very hot path and doing so much work to check
> compatibility is a) clumsy when reading b) slow c) hard to
> maintain.
> 
> Please use a compatibility matrix statically defined as a 8x8
> bitmap.
Fixed:

@@ -1630,30 +1636,23 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
 		 * any other type). In the later case an error
 		 * is raised.
 		 */
-		sql_value *reference_value = &aggr_context->reference_value;
-		if (reference_value->flags == 0)
-			sqlVdbeMemCopy(reference_value, arg);
-		enum sql_type ref_sql_type = sql_value_type(reference_value);
+		if (minmax_context->reference_type == 0)
+			minmax_context->reference_type = sql_type;
+		int ref_sql_type = minmax_context->reference_type;
 
-		bool is_compatible = true;
 		if (sql_type != ref_sql_type) {
-			is_compatible = false;
-			if ((sql_type == SQL_INTEGER || sql_type == SQL_FLOAT) &&
-			    (ref_sql_type == SQL_INTEGER ||
-			     ref_sql_type == SQL_FLOAT)) {
-				is_compatible = true;
+			bool types_are_compatible = (sql_type & MEM_NumMask) &&
+						    (ref_sql_type & MEM_NumMask);
+			if (!types_are_compatible) {
+				diag_set(ClientError, ER_INCONSISTENT_TYPES,
+					 type_to_str(ref_sql_type),
+					 type_to_str(sql_type));
+				context->fErrorOrAux = 1;
+				context->isError = SQL_TARANTOOL_ERROR;
+				sqlVdbeMemRelease(best);
+				return;
 			}
 		}
-		if (!is_compatible) {
-			diag_set(ClientError, ER_INCONSISTENT_TYPES,
-				 mem_type_to_str(reference_value),
-				 mem_type_to_str(arg));
-			context->fErrorOrAux = 1;
-			context->isError = SQL_TARANTOOL_ERROR;
-			sqlVdbeMemRelease(best);
-			sqlVdbeMemRelease(reference_value);
-			return;
-		}

diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
index ed7bf8870..960030b52 100644
--- a/src/box/sql/vdbe.c
+++ b/src/box/sql/vdbe.c
@@ -618,10 +618,9 @@ vdbe_add_new_autoinc_id(struct Vdbe *vdbe, int64_t id)
 }
 
 char *
-mem_type_to_str(const struct Mem *p)
+type_to_str(int type)
 {
-	assert(p != NULL);
-	switch (p->flags & MEM_PURE_TYPE_MASK) {
+	switch (type) {
 	case MEM_Null:
 		return "NULL";
 	case MEM_Str:
@@ -639,6 +638,19 @@ mem_type_to_str(const struct Mem *p)
 	}
 }
 
+int
+mem_type(const struct Mem *p)
+{
+	assert(p != NULL);
+	return p->flags & MEM_PURE_TYPE_MASK;
+}
+
+char *
+mem_type_to_str(const struct Mem *p)
+{
+	return type_to_str(mem_type(p));
+}
+
 /*
  * Execute as much of a VDBE program as we can.
  * This is the core of sql_step().
diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h
index c84f22caf..553c4f225 100644
--- a/src/box/sql/vdbeInt.h
+++ b/src/box/sql/vdbeInt.h
@@ -228,6 +228,7 @@ struct Mem {
 #define MEM_Str       0x0002	/* Value is a string */
 #define MEM_Int       0x0004	/* Value is an integer */
 #define MEM_Real      0x0008	/* Value is a real number */
+#define MEM_NumMask   0x000c	/* Value is integer or real (mask) */
 #define MEM_Blob      0x0010	/* Value is a BLOB */
 #define MEM_Bool      0x0020    /* Value is a bool */
 #define MEM_Ptr       0x0040	/* Value is a generic pointer */
@@ -262,11 +263,25 @@ enum {
 	MEM_PURE_TYPE_MASK = 0x1f
 };
 
+/**
+ * Returns one of MEM_Null, MEM_Str, MEM_Int, MEM_Real, MEM_Blob
+ * or MEM_Bool. Used for error detection and reporting.
+ */
+int
+mem_type(const struct Mem *p);
+
 /**
  * Simple type to str convertor. It is used to simplify
  * error reporting.
  */
 char *
+type_to_str(int type);
+
+/**
+ * Returns string representing type of the given Mem. It is used
+ * to simplify error reporting.
+ */
+char *
 mem_type_to_str(const struct Mem *p);

> On 9 Apr 2019, at 17:52, n.pettik <korablev@tarantool.org> wrote:


> 
> Then, I’ve found assertion fault:
> 
> tarantool> create table t(id int primary key, b scalar)
> tarantool> insert into t values	
> tarantool> select min(b) from t
> Assertion failed: (((p->szMalloc > 0 && p->z == p->zMalloc) ? 1 : 0) + ((p->flags & MEM_Dyn) != 0 ? 1 : 0) + ((p->flags & MEM_Ephem) != 0 ? 1 : 0) + ((p->flags & MEM_Static) != 0 ? 1 : 0) == 1), function sqlVdbeCheckMemInvariants, file /Users/n.pettik/tarantool/src/box/sql/vdbemem.c, line 87.
> Abort trap: 6
It was fixed after removing reference_value from minmax_context.
> tarantool> create table t(id int primary key, a int, b text)
> tarantool> insert into t values(1,1,'1’)
> tarantool> select sum(b) from t
> ---
> - metadata:
>  - name: sum(b)
>    type: number
>  rows:
>  - [1]
> …
> 
> So, your statement below that SUM accepts only numeric arguments
> is false. If string can be converted to number, it is OK:
> 
> tarantool> select a=b from t;
> ---
> - metadata:
>  - name: a=b
>    type: integer
>  rows:
>  - [1]
> …
> 
> Next:
> 
> tarantool> select min(b) from t
> ---
> - metadata:
>  - name: min(b)
>    type: scalar
>  rows:
>  - ['1']
> …
> 
> Could you please fix this behaviour (scalar->int) in a separate patch?
> If it is not that easy, file a ticket.
It is not easy and we already have a ticket. Fixed commit message.
> 
>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
>> index b1bfc886e..379f28252 100644
>> --- a/src/box/sql/func.c
>> +++ b/src/box/sql/func.c
>> @@ -45,6 +45,17 @@
>> #include <unicode/uchar.h>
>> #include <unicode/ucol.h>
>> 
>> +/*
> 
> /**
> 
>> + * This structure is for keeping context during work of
>> + * aggregate function.
>> + */
>> +struct aggregate_context {
> 
> Let’s change name to minmax_context.
Fixed.

diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 379f28252..258df0d7e 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -45,15 +45,19 @@
 #include <unicode/uchar.h>
 #include <unicode/ucol.h>
 
-/*
+/**
  * This structure is for keeping context during work of
- * aggregate function.
+ * min/max aggregate functions.
  */
-struct aggregate_context {
-    /** Value being aggregated. (e.g. current MAX or current counter value). */
-    Mem value;
-    /** Reference value to keep track of previous argument's type. */
-    Mem reference_value;
+struct minmax_context {
+	/** Value being aggregated i.e. current MAX or MIN. */
+	Mem best;
+	/**
+	 * Reference type to keep track of previous argument's types.
+	 * One of MEM_Null, MEM_Str, MEM_Int, MEM_Real, MEM_Blob
+	 * or MEM_Bool.
+	 */
+	int reference_type;
 };

> 
>> +    /** Value being aggregated. (e.g. current MAX or current counter value). */
>> +    Mem value;
>> +    /** Reference value to keep track of previous argument's type. */
>> +    Mem reference_value;
>> +};
> 
> I agree that keeping struct Men is too much (especially as values not as pointers).
> We need only type of that value. What is more, you need only one member:
> “value being aggregated” comes from argv[0].
> 
Ok, now we keep reference type (instead of reference value) in minmax_context.
But we still need "value being aggregated” because argv[0] actually does not
contain it.
>> @@ -1588,16 +1604,57 @@ static void
>> minmaxStep(sql_context *context, int not_used, sql_value **argv)
>> {
>> 	UNUSED_PARAMETER(not_used);
>> -	sql_value *arg = argv[0];
>> -	sql_value *best = sql_aggregate_context(context, sizeof(*best));
>> +	struct aggregate_context *aggr_context =
>> +		(struct aggregate_context *) sql_aggregate_context(context,
>> +								   sizeof(*aggr_context));
> 
> This way looks better:
> 
> @@ -1604,9 +1604,8 @@ static void
> minmaxStep(sql_context *context, int not_used, sql_value **argv)
> {
>        UNUSED_PARAMETER(not_used);
> -       struct aggregate_context *aggr_context =
> -               (struct aggregate_context *) sql_aggregate_context(context,
> -                                                                  sizeof(*aggr_context));
> +       struct aggregate_context *aggr_context = (struct aggregate_context *)
> +               sql_aggregate_context(context, sizeof(*aggr_context));
>        if (aggr_context == NULL)
>                retur
Ok, fixed.

@@ -1604,20 +1612,18 @@ static void
 minmaxStep(sql_context *context, int not_used, sql_value **argv)
 {
 	UNUSED_PARAMETER(not_used);
-	struct aggregate_context *aggr_context =
-		(struct aggregate_context *) sql_aggregate_context(context,
-								   sizeof(*aggr_context));
-	if (aggr_context == NULL)
+	struct minmax_context *minmax_context = (struct minmax_context *)
+		sql_aggregate_context(context, sizeof(*minmax_context));
+	if (minmax_context == NULL)
 		return;

> 
>> +	if (aggr_context == NULL)
>> +		return;
>> 
>> +	sql_value *arg = argv[0];
>> +	sql_value *best = &(aggr_context->value);
>> 	if (best == NULL)
>> 		return;
>> 
>> -	if (sql_value_type(argv[0]) == SQL_NULL) {
>> +	enum sql_type sql_type = sql_value_type(arg);
>> +
>> +	if (sql_type == SQL_NULL) {
>> 		if (best->flags != 0)
>> 			sqlSkipAccumulatorLoad(context);
>> 	} else if (best->flags != 0) {
>> +		/*
>> +		 * During proceeding of the function, arguments
>> +		 * of different types may be encountered (if
>> +		 * SCALAR type column is proceeded). Some types
>> +		 * are compatible (INTEGER and FLOAT) and others
>> +		 * are not (TEXT and BLOB are not compatible with
>> +		 * any other type). In the later case an error
>> +		 * is raised.
>> +		 */
>> +		sql_value *reference_value = &aggr_context->reference_value;
>> +		if (reference_value->flags == 0)
>> +			sqlVdbeMemCopy(reference_value, arg);
> 
> AFAIR flags == 0 is a situation when we are processing first
> entry (i.e. aggregate_context is initialised right now).
Also fixed:

@@ -1630,30 +1636,23 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
 		 * any other type). In the later case an error
 		 * is raised.
 		 */
-		sql_value *reference_value = &aggr_context->reference_value;
-		if (reference_value->flags == 0)
-			sqlVdbeMemCopy(reference_value, arg);
-		enum sql_type ref_sql_type = sql_value_type(reference_value);
+		if (minmax_context->reference_type == 0)
+			minmax_context->reference_type = sql_type;
+		int ref_sql_type = minmax_context->reference_type;

> 
>> +		enum sql_type ref_sql_type = sql_value_type(reference_value);
>> +
>> +		bool is_compatible = true;
> 
> -> types_are_compatible
Ok, fixed:

@@ -1630,30 +1636,23 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
-		bool is_compatible = true;
 		if (sql_type != ref_sql_type) {
-			is_compatible = false;
-			if ((sql_type == SQL_INTEGER || sql_type == SQL_FLOAT) &&
-			    (ref_sql_type == SQL_INTEGER ||
-			     ref_sql_type == SQL_FLOAT)) {
-				is_compatible = true;
+			bool types_are_compatible = (sql_type & MEM_NumMask) &&
+						    (ref_sql_type & MEM_NumMask);
+			if (!types_are_compatible) {
+				diag_set(ClientError, ER_INCONSISTENT_TYPES,
+					 type_to_str(ref_sql_type),
+					 type_to_str(sql_type));
+				context->fErrorOrAux = 1;
+				context->isError = SQL_TARANTOOL_ERROR;
+				sqlVdbeMemRelease(best);
+				return;
 			}
 		}
-		if (!is_compatible) {
-			diag_set(ClientError, ER_INCONSISTENT_TYPES,
-				 mem_type_to_str(reference_value),
-				 mem_type_to_str(arg));
-			context->fErrorOrAux = 1;
-			context->isError = SQL_TARANTOOL_ERROR;
-			sqlVdbeMemRelease(best);
-			sqlVdbeMemRelease(reference_value);
-			return;
-		}
> 
>> +		if (sql_type != ref_sql_type) {
>> +			is_compatible = false;
>> +			if ((sql_type == SQL_INTEGER || sql_type == SQL_FLOAT) &&
>> +			    (ref_sql_type == SQL_INTEGER ||
>> +			     ref_sql_type == SQL_FLOAT)) {
>> +				is_compatible = true;
>> +			}
>> +		}
>> +		if (!is_compatible) {
>> +			diag_set(ClientError, ER_INCONSISTENT_TYPES,
>> +				 mem_type_to_str(reference_value),
>> +				 mem_type_to_str(arg));
>> +			context->fErrorOrAux = 1;
>> +			context->isError = SQL_TARANTOOL_ERROR;
>> +			sqlVdbeMemRelease(best);
>> +			sqlVdbeMemRelease(reference_value);
> 
> Both of these values are allocated in scope of aggregate context.
> I guess there’s no need to provide any cleanup on these variables.
I check this in debug and you are right. Because of inner ‘if’ this call actually
does not do nothing, except checking inner invariants of Mem. Removed. 

diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 258df0d7e..0738f44e6 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -1649,7 +1649,6 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
                                         type_to_str(sql_type));
                                context->fErrorOrAux = 1;
                                context->isError = SQL_TARANTOOL_ERROR;
-                               sqlVdbeMemRelease(best);
                                return;
                        }
                }
@@ -1689,7 +1688,6 @@ minMaxFinalize(sql_context * context)
                if (res->flags != 0) {
                        sql_result_value(context, res);
                }
-               sqlVdbeMemRelease(res);
        }
 }

> 

>> +			return;
>> +		}
>> +
>> 		int max;
>> 		int cmp;
>> 		struct coll *coll = sqlGetFuncCollSeq(context);
>> @@ -1625,13 +1682,16 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
>> static void
>> minMaxFinalize(sql_context * context)
>> {
>> -	sql_value *pRes;
>> -	pRes = (sql_value *) sql_aggregate_context(context, 0);
>> +	struct aggregate_context *func_context =
>> +		(struct aggregate_context *) sql_aggregate_context(context, sizeof(*func_context));
>> +	sql_value *pRes = &(func_context->value);
>> +
>> 	if (pRes != NULL) {
>> 		if (pRes->flags != 0) {
>> 			sql_result_value(context, pRes);
>> 		}
>> 		sqlVdbeMemRelease(pRes);
>> +		sqlVdbeMemRelease(&func_context->reference_value);
>> 	}
>> }
>> 
>> diff --git a/src/box/sql/select.c b/src/box/sql/select.c
>> index b1ec8c758..624083b22 100644
>> --- a/src/box/sql/select.c
>> +++ b/src/box/sql/select.c
>> @@ -4340,13 +4340,22 @@ pushDownWhereTerms(Parse * pParse,	/* Parse context (for malloc() and error repo
>> * argument, this function checks if the following are true:
>> *
>> *    * the query contains just a single aggregate function,
>> - *    * the aggregate function is either min() or max(), and
>> - *    * the argument to the aggregate function is a column value.
>> + *    * the aggregate function is either min() or max(),
>> + *    * the argument to the aggregate function is a column value,
>> + *    * the type of column is not SCALAR.
>> *
>> * If all of the above are true, then WHERE_ORDERBY_MIN or WHERE_ORDERBY_MAX
>> * is returned as appropriate. Also, *ppMinMax is set to point to the
>> * list of arguments passed to the aggregate before returning.
>> *
>> + * The requirement of column type not being SCALAR follows from
>> + * the purpose of the function. The purpose of the function is
>> + * to answer the question: "Should MIN/MAX call be optimised by
>> + * using ORDER ON clause code?" If the type of column is SCALAR
> 
> There’s no ORDER ON clause.
Sorry, fixed the comment.

diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index 624083b22..dd87c5b70 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -4351,7 +4351,7 @@ pushDownWhereTerms(Parse * pParse,        /* Parse context (for malloc() and error repo
  * The requirement of column type not being SCALAR follows from
  * the purpose of the function. The purpose of the function is
  * to answer the question: "Should MIN/MAX call be optimised by
- * using ORDER ON clause code?" If the type of column is SCALAR
+ * using ORDER BY clause code?" If the type of column is SCALAR
  * then we have to iterate over all rows to check if their types
  * are compatible. Hence, the optimisation should not be used.

> 
>> + * then we have to iterate over all rows to check if their types
>> + * are compatible. Hence, the optimisation should not be used.
>> + * For details please see: https://github.com/tarantool/tarantool/issues/4032
>> + *
>> * Or, if the conditions above are not met, *ppMinMax is set to 0 and
>> * WHERE_ORDERBY_NORMAL is returned.
>> */
>> @@ -4364,6 +4373,8 @@ minMaxQuery(AggInfo * pAggInfo, ExprList ** ppMinMax)
>> 		if (pEList && pEList->nExpr == 1
>> 		    && pEList->a[0].pExpr->op == TK_AGG_COLUMN) {
>> 			const char *zFunc = pExpr->u.zToken;
>> +			if (sql_expr_type(pEList->a[0].pExpr) == FIELD_TYPE_SCALAR)
>> +				return eRet;
> 
> I see no difference between byte code generated for SCALAR and other
> columns (SELECT MIN(a) FROM t;). Investigate this case please.
> 
I am not quite understand what is the problem with this.
If the following conditions are true, then the result of MIN/MAX is retrieved without
usage of corresponding function in func:
 
 * the query contains just a single aggregate function,
 * the aggregate function is either min() or max(),
 * the argument to the aggregate function is a column value,
 * the type of column is not SCALAR.

And the changes above disables this optimisation if column type is SCALAR.
I also added test for this case.
>> 			if (sqlStrICmp(zFunc, "min") == 0) {
>> 				eRet = WHERE_ORDERBY_MIN;
>> 				*ppMinMax = pEList;
>> 
>> diff --git a/test/sql-tap/select1.test.lua b/test/sql-tap/select1.test.lua
>> index be63e815d..a4ce19d73 100755
>> --- a/test/sql-tap/select1.test.lua
>> +++ b/test/sql-tap/select1.test.lua
>> @@ -1,6 +1,6 @@
>> #!/usr/bin/env tarantool
>> test = require("sqltester")
>> -test:plan(173)	
>> +test:plan(172)
>> 
>> --!./tcltestrunner.lua
>> -- ["set","testdir",[["file","dirname",["argv0"]]]]
>> @@ -503,16 +503,6 @@ test:do_catchsql_test(
>>        -- </select1-2.17>
>>    })
>> 
>> -test:do_execsql_test(
>> -    "select1-2.17.1",
>> -    [[
>> -        SELECT sum(a) FROM t3
>> -    ]], {
>> -        -- <select1-2.17.1>
>> -        44.0
>> -        -- </select1-2.17.1>
>> -    })
> 
> Don’t delete test, just change it to catch an error.
Ok:

diff --git a/test/sql-tap/select1.test.lua b/test/sql-tap/select1.test.lua
index a4ce19d73..b743d900c 100755
--- a/test/sql-tap/select1.test.lua
+++ b/test/sql-tap/select1.test.lua
@@ -1,6 +1,6 @@
 #!/usr/bin/env tarantool
 test = require("sqltester")
-test:plan(172)
+test:plan(173)
 
 --!./tcltestrunner.lua
 -- ["set","testdir",[["file","dirname",["argv0"]]]]
@@ -503,6 +503,18 @@ test:do_catchsql_test(
         -- </select1-2.17>
     })
 
+test:do_catchsql_test(
+    "select1-2.17.1",
+    [[
+        SELECT sum(a) FROM t3
+    ]], {
+        -- <select1-2.17.1>
+        1, "Illegal parameters, SUM, TOTAL and AVG aggregate functions can " ..
+            "be called only with INTEGER, FLOAT or SCALAR arguments. In " ..
+            "case of SCALAR arguments, they all must have numeric values."
+        -- </select1-2.17.1>
+    })
+

> 
>> -
>> test:do_catchsql_test(
>>    "select1-2.18”
> 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tarantool-patches] Re: [PATCH 2/2] sql: make aggregate functions types more strict
  2019-04-23 15:38     ` i.koptelov
@ 2019-04-24 17:37       ` n.pettik
  2019-05-06 13:24         ` i.koptelov
  0 siblings, 1 reply; 11+ messages in thread
From: n.pettik @ 2019-04-24 17:37 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Ivan Koptelov


>>> +		if (sql_type != ref_sql_type) {
>>> +			is_compatible = false;
>>> +			if ((sql_type == SQL_INTEGER || sql_type == SQL_FLOAT) &&
>>> +			    (ref_sql_type == SQL_INTEGER ||
>>> +			     ref_sql_type == SQL_FLOAT)) {
>>> +				is_compatible = true;
>> 
>> This is a very hot path and doing so much work to check
>> compatibility is a) clumsy when reading b) slow c) hard to
>> maintain.
>> 
>> Please use a compatibility matrix statically defined as a 8x8
>> bitmap.
> Fixed:

It is not what was suggested. Look at field_type1_contains_type2() and
field_type_compatibility as examples.

> diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h
> index c84f22caf..553c4f225 100644
> --- a/src/box/sql/vdbeInt.h
> +++ b/src/box/sql/vdbeInt.h
> @@ -228,6 +228,7 @@ struct Mem {
> #define MEM_Str       0x0002	/* Value is a string */
> #define MEM_Int       0x0004	/* Value is an integer */
> #define MEM_Real      0x0008	/* Value is a real number */
> +#define MEM_NumMask   0x000c	/* Value is integer or real (mask) */

This mask covers neither Blob nor Bool values.

> #define MEM_Blob      0x0010	/* Value is a BLOB */
> #define MEM_Bool      0x0020    /* Value is a bool */
> #define MEM_Ptr       0x0040	/* Value is a generic pointer */
> @@ -262,11 +263,25 @@ enum {
> 	MEM_PURE_TYPE_MASK = 0x1f
> };
> 
> diff --git a/src/box/sql/select.c b/src/box/sql/select.c
> index 624083b22..dd87c5b70 100644
> --- a/src/box/sql/select.c
> +++ b/src/box/sql/select.c
> @@ -4351,7 +4351,7 @@ pushDownWhereTerms(Parse * pParse,        /* Parse context (for malloc() and error repo
>  * The requirement of column type not being SCALAR follows from
>  * the purpose of the function. The purpose of the function is
>  * to answer the question: "Should MIN/MAX call be optimised by
> - * using ORDER ON clause code?" If the type of column is SCALAR
> + * using ORDER BY clause code?" If the type of column is SCALAR
>  * then we have to iterate over all rows to check if their types
>  * are compatible. Hence, the optimisation should not be used.
> 
>> 
>>> + * then we have to iterate over all rows to check if their types
>>> + * are compatible. Hence, the optimisation should not be used.
>>> + * For details please see: https://github.com/tarantool/tarantool/issues/4032
>>> + *
>>> * Or, if the conditions above are not met, *ppMinMax is set to 0 and
>>> * WHERE_ORDERBY_NORMAL is returned.
>>> */
>>> @@ -4364,6 +4373,8 @@ minMaxQuery(AggInfo * pAggInfo, ExprList ** ppMinMax)
>>> 		if (pEList && pEList->nExpr == 1
>>> 		    && pEList->a[0].pExpr->op == TK_AGG_COLUMN) {
>>> 			const char *zFunc = pExpr->u.zToken;
>>> +			if (sql_expr_type(pEList->a[0].pExpr) == FIELD_TYPE_SCALAR)
>>> +				return eRet;
>> 
>> I see no difference between byte code generated for SCALAR and other
>> columns (SELECT MIN(a) FROM t;). Investigate this case please.
>> 
> I am not quite understand what is the problem with this.
> If the following conditions are true, then the result of MIN/MAX is retrieved without
> usage of corresponding function in func:
> 
> * the query contains just a single aggregate function,
> * the aggregate function is either min() or max(),
> * the argument to the aggregate function is a column value,
> * the type of column is not SCALAR.
> 
> And the changes above disables this optimisation if column type is SCALAR.
> I also added test for this case.

This optimisation disables index search for min/max queries and SCALAR.
I’m not sure that it is what we want. Nevertheless it makes usage of min/max
uniform (in both cases error is raised), it dramatically slows down execution.
I’ve asked server team, and we decided to allow all comparisons with SCALAR
field type. Now we can’t tell scalar from non-scalar fields. To achieve this we should
add field_type member to struct Mem and check field_type before comparisons.

Theoretically speaking, these checks introduced in minmaxStep still may occur
useful: if user register function, which may return data of different types and use
this function as argument of min/max functions.

Should current patch be pushed before handling of scalars is introduced - IDK.

Overall, I suggest following diff. Note that you don’t need even
‘reference_type’ in minmax context.

diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 0738f44e6..cc63f8b5f 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -51,13 +51,7 @@
  */
 struct minmax_context {
        /** Value being aggregated i.e. current MAX or MIN. */
-       Mem best;
-       /**
-        * Reference type to keep track of previous argument's types.
-        * One of MEM_Null, MEM_Str, MEM_Int, MEM_Real, MEM_Blob
-        * or MEM_Bool.
-        */
-       int reference_type;
+       struct Mem best;
 };
 
 static UConverter* pUtf8conv;
@@ -1618,12 +1612,10 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
                return;
 
        sql_value *arg = argv[0];
-       sql_value *best = &(minmax_context->best);
-       if (best == NULL)
-               return;
+       sql_value *best = &minmax_context->best;
 
-       int sql_type = mem_type(arg);
-       if (sql_type == MEM_Null) {
+       int mem_type_current = arg->flags & MEM_PURE_TYPE_MASK;
+       if (mem_type_current == MEM_Null) {
                if (best->flags != 0)
                        sqlSkipAccumulatorLoad(context);
        } else if (best->flags != 0) {
@@ -1636,25 +1628,20 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
                 * any other type). In the later case an error
                 * is raised.
                 */
-               if (minmax_context->reference_type == 0)
-                       minmax_context->reference_type = sql_type;
-               int ref_sql_type = minmax_context->reference_type;
-
-               if (sql_type != ref_sql_type) {
-                       bool types_are_compatible = (sql_type & MEM_NumMask) &&
-                                                   (ref_sql_type & MEM_NumMask);
+               int mem_type_best = best->flags & MEM_PURE_TYPE_MASK;
+               if (mem_type_best != mem_type_current) {
+                       bool types_are_compatible = (mem_type_best & MEM_NumMask) &&
+                                                   (mem_type_current & MEM_NumMask);
                        if (!types_are_compatible) {
                                diag_set(ClientError, ER_INCONSISTENT_TYPES,
-                                        type_to_str(ref_sql_type),
-                                        type_to_str(sql_type));
+                                        mem_type_to_str(best),
+                                        mem_type_to_str(arg));
                                context->fErrorOrAux = 1;
                                context->isError = SQL_TARANTOOL_ERROR;
                                return;
                        }
                }
 
-               int max;
-               int cmp;
                struct coll *coll = sqlGetFuncCollSeq(context);
                /* This step function is used for both the min() and max() aggregates,
                 * the only difference between the two being that the sense of the
@@ -1664,9 +1651,9 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
                 * Therefore the next statement sets variable 'max' to 1 for the max()
                 * aggregate, or 0 for min().
                 */
-               max = sql_user_data(context) != 0;
-               cmp = sqlMemCompare(best, arg, coll);
-               if ((max != 0 && cmp < 0) || (max == 0 && cmp > 0)) {
+               bool max = sql_user_data(context) != 0;
+               int cmp_res = sqlMemCompare(best, arg, coll);
+               if ((max && cmp_res < 0) || (! max && cmp_res > 0)) {
                        sqlVdbeMemCopy(best, arg);
                } else {
                        sqlSkipAccumulatorLoad(context);
@@ -1682,12 +1669,13 @@ minMaxFinalize(sql_context * context)
 {
        struct minmax_context *minmax_context = (struct minmax_context *)
                sql_aggregate_context(context, sizeof(*minmax_context));
-       sql_value *res = &(minmax_context->best);
+       sql_value *res = &minmax_context->best;
 
        if (res != NULL) {
                if (res->flags != 0) {
                        sql_result_value(context, res);
                }
+               sqlVdbeMemRelease(res);
        }
 }
 
diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index dd87c5b70..b1ec8c758 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -4340,22 +4340,13 @@ pushDownWhereTerms(Parse * pParse,      /* Parse context (for malloc() and error repo
  * argument, this function checks if the following are true:
  *
  *    * the query contains just a single aggregate function,
- *    * the aggregate function is either min() or max(),
- *    * the argument to the aggregate function is a column value,
- *    * the type of column is not SCALAR.
+ *    * the aggregate function is either min() or max(), and
+ *    * the argument to the aggregate function is a column value.
  *
  * If all of the above are true, then WHERE_ORDERBY_MIN or WHERE_ORDERBY_MAX
  * is returned as appropriate. Also, *ppMinMax is set to point to the
  * list of arguments passed to the aggregate before returning.
  *
- * The requirement of column type not being SCALAR follows from
- * the purpose of the function. The purpose of the function is
- * to answer the question: "Should MIN/MAX call be optimised by
- * using ORDER BY clause code?" If the type of column is SCALAR
- * then we have to iterate over all rows to check if their types
- * are compatible. Hence, the optimisation should not be used.
- * For details please see: https://github.com/tarantool/tarantool/issues/4032
- *
  * Or, if the conditions above are not met, *ppMinMax is set to 0 and
  * WHERE_ORDERBY_NORMAL is returned.
  */
@@ -4373,8 +4364,6 @@ minMaxQuery(AggInfo * pAggInfo, ExprList ** ppMinMax)
                if (pEList && pEList->nExpr == 1
                    && pEList->a[0].pExpr->op == TK_AGG_COLUMN) {
                        const char *zFunc = pExpr->u.zToken;
-                       if (sql_expr_type(pEList->a[0].pExpr) == FIELD_TYPE_SCALAR)
-                               return eRet;
                        if (sqlStrICmp(zFunc, "min") == 0) {
                                eRet = WHERE_ORDERBY_MIN;
                                *ppMinMax = pEList;
diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
index 960030b52..ed7bf8870 100644
--- a/src/box/sql/vdbe.c
+++ b/src/box/sql/vdbe.c
@@ -618,9 +618,10 @@ vdbe_add_new_autoinc_id(struct Vdbe *vdbe, int64_t id)
 }
 
 char *
-type_to_str(int type)
+mem_type_to_str(const struct Mem *p)
 {
-       switch (type) {
+       assert(p != NULL);
+       switch (p->flags & MEM_PURE_TYPE_MASK) {
        case MEM_Null:
                return "NULL";
        case MEM_Str:
@@ -638,19 +639,6 @@ type_to_str(int type)
        }
 }
 
-int
-mem_type(const struct Mem *p)
-{
-       assert(p != NULL);
-       return p->flags & MEM_PURE_TYPE_MASK;
-}
-
-char *
-mem_type_to_str(const struct Mem *p)
-{
-       return type_to_str(mem_type(p));
-}
-
 /*
  * Execute as much of a VDBE program as we can.
  * This is the core of sql_step().
diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h
index 553c4f225..2b60bd6b5 100644
--- a/src/box/sql/vdbeInt.h
+++ b/src/box/sql/vdbeInt.h
@@ -263,25 +263,11 @@ enum {
        MEM_PURE_TYPE_MASK = 0x1f
 };
 
-/**
- * Returns one of MEM_Null, MEM_Str, MEM_Int, MEM_Real, MEM_Blob
- * or MEM_Bool. Used for error detection and reporting.
- */
-int
-mem_type(const struct Mem *p);
-
 /**
  * Simple type to str convertor. It is used to simplify
  * error reporting.
  */
 char *
-type_to_str(int type);
-
-/**
- * Returns string representing type of the given Mem. It is used
- * to simplify error reporting.
- */
-char *
 mem_type_to_str(const struct Mem *p);
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tarantool-patches] Re: [PATCH 2/2] sql: make aggregate functions types more strict
  2019-04-24 17:37       ` n.pettik
@ 2019-05-06 13:24         ` i.koptelov
  0 siblings, 0 replies; 11+ messages in thread
From: i.koptelov @ 2019-05-06 13:24 UTC (permalink / raw)
  To: tarantool-patches; +Cc: n.pettik



> On 24 Apr 2019, at 20:37, n.pettik <korablev@tarantool.org> wrote:
> 
> 
>>>> +		if (sql_type != ref_sql_type) {
>>>> +			is_compatible = false;
>>>> +			if ((sql_type == SQL_INTEGER || sql_type == SQL_FLOAT) &&
>>>> +			    (ref_sql_type == SQL_INTEGER ||
>>>> +			     ref_sql_type == SQL_FLOAT)) {
>>>> +				is_compatible = true;
>>> 
>>> This is a very hot path and doing so much work to check
>>> compatibility is a) clumsy when reading b) slow c) hard to
>>> maintain.
>>> 
>>> Please use a compatibility matrix statically defined as a 8x8
>>> bitmap.
>> Fixed:
> 
> It is not what was suggested. Look at field_type1_contains_type2() and
> field_type_compatibility as examples.
Thank you for noticing.

diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 0738f44e6..d98c60914 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -1605,6 +1599,25 @@ countFinalize(sql_context * context)
 	sql_result_int64(context, p ? p->n : 0);
 }
 
+/**
+ * Table of aggregate functions args type compatibility.
+ */
+static const bool scalar_type_compatibility[] = {
+	      /*  SQL_INTEGER  SQL_FLOAT  SQL_TEXT  SQL_BLOB  SQL_NULL */
+/* SQL_INTEGER */ true,        true,      false,    false,    false,
+/* SQL_FLOAT   */ true,        true,      false,    false,    false,
+/* SQL_TEXT    */ false,       false,     true,     false,    false,
+/* SQL_BLOB    */ false,       false,     false,    true,     false,
+/* SQL_NULL    */ false,       false,     false,    false,    true,
+};
+
+static bool
+are_scalar_types_compatible(enum sql_type type1, enum sql_type type2)
+{
+	int idx = (type2 - 1) * (SQL_TYPE_MAX - 1) + (type1 - 1);
+	return scalar_type_compatibility[idx];
+}
+

> 
>> diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h
>> index c84f22caf..553c4f225 100644
>> --- a/src/box/sql/vdbeInt.h
>> +++ b/src/box/sql/vdbeInt.h
>> @@ -228,6 +228,7 @@ struct Mem {
>> #define MEM_Str       0x0002	/* Value is a string */
>> #define MEM_Int       0x0004	/* Value is an integer */
>> #define MEM_Real      0x0008	/* Value is a real number */
>> +#define MEM_NumMask   0x000c	/* Value is integer or real (mask) */
> 
> This mask covers neither Blob nor Bool values.
Can you please explain why it should? Blob and Bool are not numeric as far as I understand.

> 
>> #define MEM_Blob      0x0010	/* Value is a BLOB */
>> #define MEM_Bool      0x0020    /* Value is a bool */
>> #define MEM_Ptr       0x0040	/* Value is a generic pointer */
>> @@ -262,11 +263,25 @@ enum {
>> 	MEM_PURE_TYPE_MASK = 0x1f
>> };
>> 
>> diff --git a/src/box/sql/select.c b/src/box/sql/select.c
>> index 624083b22..dd87c5b70 100644
>> --- a/src/box/sql/select.c
>> +++ b/src/box/sql/select.c
>> @@ -4351,7 +4351,7 @@ pushDownWhereTerms(Parse * pParse,        /* Parse context (for malloc() and error repo
>> * The requirement of column type not being SCALAR follows from
>> * the purpose of the function. The purpose of the function is
>> * to answer the question: "Should MIN/MAX call be optimised by
>> - * using ORDER ON clause code?" If the type of column is SCALAR
>> + * using ORDER BY clause code?" If the type of column is SCALAR
>> * then we have to iterate over all rows to check if their types
>> * are compatible. Hence, the optimisation should not be used.
>> 
>>> 
>>>> + * then we have to iterate over all rows to check if their types
>>>> + * are compatible. Hence, the optimisation should not be used.
>>>> + * For details please see: https://github.com/tarantool/tarantool/issues/4032
>>>> + *
>>>> * Or, if the conditions above are not met, *ppMinMax is set to 0 and
>>>> * WHERE_ORDERBY_NORMAL is returned.
>>>> */
>>>> @@ -4364,6 +4373,8 @@ minMaxQuery(AggInfo * pAggInfo, ExprList ** ppMinMax)
>>>> 		if (pEList && pEList->nExpr == 1
>>>> 		    && pEList->a[0].pExpr->op == TK_AGG_COLUMN) {
>>>> 			const char *zFunc = pExpr->u.zToken;
>>>> +			if (sql_expr_type(pEList->a[0].pExpr) == FIELD_TYPE_SCALAR)
>>>> +				return eRet;
>>> 
>>> I see no difference between byte code generated for SCALAR and other
>>> columns (SELECT MIN(a) FROM t;). Investigate this case please.
>>> 
>> I am not quite understand what is the problem with this.
>> If the following conditions are true, then the result of MIN/MAX is retrieved without
>> usage of corresponding function in func:
>> 
>> * the query contains just a single aggregate function,
>> * the aggregate function is either min() or max(),
>> * the argument to the aggregate function is a column value,
>> * the type of column is not SCALAR.
>> 
>> And the changes above disables this optimisation if column type is SCALAR.
>> I also added test for this case.
> 
> This optimisation disables index search for min/max queries and SCALAR.
> I’m not sure that it is what we want. Nevertheless it makes usage of min/max
> uniform (in both cases error is raised), it dramatically slows down execution.
> I’ve asked server team, and we decided to allow all comparisons with SCALAR
> field type. Now we can’t tell scalar from non-scalar fields. To achieve this we should
> add field_type member to struct Mem and check field_type before comparisons.
> 
> Theoretically speaking, these checks introduced in minmaxStep still may occur
> useful: if user register function, which may return data of different types and use
> this function as argument of min/max functions.
> 
> Should current patch be pushed before handling of scalars is introduced - IDK.
> 
> Overall, I suggest following diff. Note that you don’t need even
> ‘reference_type’ in minmax context.
> 
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index 0738f44e6..cc63f8b5f 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -51,13 +51,7 @@
>  */
> struct minmax_context {
>        /** Value being aggregated i.e. current MAX or MIN. */
> -       Mem best;
> -       /**
> -        * Reference type to keep track of previous argument's types.
> -        * One of MEM_Null, MEM_Str, MEM_Int, MEM_Real, MEM_Blob
> -        * or MEM_Bool.
> -        */
> -       int reference_type;
> +       struct Mem best;
> };
> 
> static UConverter* pUtf8conv;
> @@ -1618,12 +1612,10 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
>                return;
> 
>        sql_value *arg = argv[0];
> -       sql_value *best = &(minmax_context->best);
> -       if (best == NULL)
> -               return;
> +       sql_value *best = &minmax_context->best;
> 
> -       int sql_type = mem_type(arg);
> -       if (sql_type == MEM_Null) {
> +       int mem_type_current = arg->flags & MEM_PURE_TYPE_MASK;
> +       if (mem_type_current == MEM_Null) {
>                if (best->flags != 0)
>                        sqlSkipAccumulatorLoad(context);
>        } else if (best->flags != 0) {
> @@ -1636,25 +1628,20 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
>                 * any other type). In the later case an error
>                 * is raised.
>                 */
> -               if (minmax_context->reference_type == 0)
> -                       minmax_context->reference_type = sql_type;
> -               int ref_sql_type = minmax_context->reference_type;
> -
> -               if (sql_type != ref_sql_type) {
> -                       bool types_are_compatible = (sql_type & MEM_NumMask) &&
> -                                                   (ref_sql_type & MEM_NumMask);
> +               int mem_type_best = best->flags & MEM_PURE_TYPE_MASK;
> +               if (mem_type_best != mem_type_current) {
> +                       bool types_are_compatible = (mem_type_best & MEM_NumMask) &&
> +                                                   (mem_type_current & MEM_NumMask);
>                        if (!types_are_compatible) {
>                                diag_set(ClientError, ER_INCONSISTENT_TYPES,
> -                                        type_to_str(ref_sql_type),
> -                                        type_to_str(sql_type));
> +                                        mem_type_to_str(best),
> +                                        mem_type_to_str(arg));
>                                context->fErrorOrAux = 1;
>                                context->isError = SQL_TARANTOOL_ERROR;
>                                return;
>                        }
>                }
> 
> -               int max;
> -               int cmp;
>                struct coll *coll = sqlGetFuncCollSeq(context);
>                /* This step function is used for both the min() and max() aggregates,
>                 * the only difference between the two being that the sense of the
> @@ -1664,9 +1651,9 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
>                 * Therefore the next statement sets variable 'max' to 1 for the max()
>                 * aggregate, or 0 for min().
>                 */
> -               max = sql_user_data(context) != 0;
> -               cmp = sqlMemCompare(best, arg, coll);
> -               if ((max != 0 && cmp < 0) || (max == 0 && cmp > 0)) {
> +               bool max = sql_user_data(context) != 0;
> +               int cmp_res = sqlMemCompare(best, arg, coll);
> +               if ((max && cmp_res < 0) || (! max && cmp_res > 0)) {
>                        sqlVdbeMemCopy(best, arg);
>                } else {
>                        sqlSkipAccumulatorLoad(context);
> @@ -1682,12 +1669,13 @@ minMaxFinalize(sql_context * context)
> {
>        struct minmax_context *minmax_context = (struct minmax_context *)
>                sql_aggregate_context(context, sizeof(*minmax_context));
> -       sql_value *res = &(minmax_context->best);
> +       sql_value *res = &minmax_context->best;
> 
>        if (res != NULL) {
>                if (res->flags != 0) {
>                        sql_result_value(context, res);
>                }
> +               sqlVdbeMemRelease(res);
>        }
> }
> 
> diff --git a/src/box/sql/select.c b/src/box/sql/select.c
> index dd87c5b70..b1ec8c758 100644
> --- a/src/box/sql/select.c
> +++ b/src/box/sql/select.c
> @@ -4340,22 +4340,13 @@ pushDownWhereTerms(Parse * pParse,      /* Parse context (for malloc() and error repo
>  * argument, this function checks if the following are true:
>  *
>  *    * the query contains just a single aggregate function,
> - *    * the aggregate function is either min() or max(),
> - *    * the argument to the aggregate function is a column value,
> - *    * the type of column is not SCALAR.
> + *    * the aggregate function is either min() or max(), and
> + *    * the argument to the aggregate function is a column value.
>  *
>  * If all of the above are true, then WHERE_ORDERBY_MIN or WHERE_ORDERBY_MAX
>  * is returned as appropriate. Also, *ppMinMax is set to point to the
>  * list of arguments passed to the aggregate before returning.
>  *
> - * The requirement of column type not being SCALAR follows from
> - * the purpose of the function. The purpose of the function is
> - * to answer the question: "Should MIN/MAX call be optimised by
> - * using ORDER BY clause code?" If the type of column is SCALAR
> - * then we have to iterate over all rows to check if their types
> - * are compatible. Hence, the optimisation should not be used.
> - * For details please see: https://github.com/tarantool/tarantool/issues/4032
> - *
>  * Or, if the conditions above are not met, *ppMinMax is set to 0 and
>  * WHERE_ORDERBY_NORMAL is returned.
>  */
> @@ -4373,8 +4364,6 @@ minMaxQuery(AggInfo * pAggInfo, ExprList ** ppMinMax)
>                if (pEList && pEList->nExpr == 1
>                    && pEList->a[0].pExpr->op == TK_AGG_COLUMN) {
>                        const char *zFunc = pExpr->u.zToken;
> -                       if (sql_expr_type(pEList->a[0].pExpr) == FIELD_TYPE_SCALAR)
> -                               return eRet;
>                        if (sqlStrICmp(zFunc, "min") == 0) {
>                                eRet = WHERE_ORDERBY_MIN;
>                                *ppMinMax = pEList;
> diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
> index 960030b52..ed7bf8870 100644
> --- a/src/box/sql/vdbe.c
> +++ b/src/box/sql/vdbe.c
> @@ -618,9 +618,10 @@ vdbe_add_new_autoinc_id(struct Vdbe *vdbe, int64_t id)
> }
> 
> char *
> -type_to_str(int type)
> +mem_type_to_str(const struct Mem *p)
> {
> -       switch (type) {
> +       assert(p != NULL);
> +       switch (p->flags & MEM_PURE_TYPE_MASK) {
>        case MEM_Null:
>                return "NULL";
>        case MEM_Str:
> @@ -638,19 +639,6 @@ type_to_str(int type)
>        }
> }
> 
> -int
> -mem_type(const struct Mem *p)
> -{
> -       assert(p != NULL);
> -       return p->flags & MEM_PURE_TYPE_MASK;
> -}
> -
> -char *
> -mem_type_to_str(const struct Mem *p)
> -{
> -       return type_to_str(mem_type(p));
> -}
> -
> /*
>  * Execute as much of a VDBE program as we can.
>  * This is the core of sql_step().
> diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h
> index 553c4f225..2b60bd6b5 100644
> --- a/src/box/sql/vdbeInt.h
> +++ b/src/box/sql/vdbeInt.h
> @@ -263,25 +263,11 @@ enum {
>        MEM_PURE_TYPE_MASK = 0x1f
> };
> 
> -/**
> - * Returns one of MEM_Null, MEM_Str, MEM_Int, MEM_Real, MEM_Blob
> - * or MEM_Bool. Used for error detection and reporting.
> - */
> -int
> -mem_type(const struct Mem *p);
> -
> /**
>  * Simple type to str convertor. It is used to simplify
>  * error reporting.
>  */
> char *
> -type_to_str(int type);
> -
> -/**
> - * Returns string representing type of the given Mem. It is used
> - * to simplify error reporting.
> - */
> -char *
> mem_type_to_str(const struct Mem *p);
Thank you very much, I have checked the diff and applied it with little changes.
Full diff:

---
 src/box/sql/func.c            | 62 ++++++++++++++++++++---------------
 src/box/sql/select.c          | 15 ++-------
 src/box/sql/sqlInt.h          |  1 +
 src/box/sql/vdbe.c            | 18 ++--------
 src/box/sql/vdbeInt.h         | 15 ---------
 test/sql-tap/minmax4.test.lua | 13 +++++---
 6 files changed, 49 insertions(+), 75 deletions(-)

diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 0738f44e6..d98c60914 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -51,13 +51,7 @@
  */
 struct minmax_context {
 	/** Value being aggregated i.e. current MAX or MIN. */
-	Mem best;
-	/**
-	 * Reference type to keep track of previous argument's types.
-	 * One of MEM_Null, MEM_Str, MEM_Int, MEM_Real, MEM_Blob
-	 * or MEM_Bool.
-	 */
-	int reference_type;
+	struct Mem best;
 };
 
 static UConverter* pUtf8conv;
@@ -1605,6 +1599,25 @@ countFinalize(sql_context * context)
 	sql_result_int64(context, p ? p->n : 0);
 }
 
+/**
+ * Table of aggregate functions args type compatibility.
+ */
+static const bool scalar_type_compatibility[] = {
+	      /*  SQL_INTEGER  SQL_FLOAT  SQL_TEXT  SQL_BLOB  SQL_NULL */
+/* SQL_INTEGER */ true,        true,      false,    false,    false,
+/* SQL_FLOAT   */ true,        true,      false,    false,    false,
+/* SQL_TEXT    */ false,       false,     true,     false,    false,
+/* SQL_BLOB    */ false,       false,     false,    true,     false,
+/* SQL_NULL    */ false,       false,     false,    false,    true,
+};
+
+static bool
+are_scalar_types_compatible(enum sql_type type1, enum sql_type type2)
+{
+	int idx = (type2 - 1) * (SQL_TYPE_MAX - 1) + (type1 - 1);
+	return scalar_type_compatibility[idx];
+}
+
 /*
  * Routines to implement min() and max() aggregate functions.
  */
@@ -1618,12 +1631,10 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
 		return;
 
 	sql_value *arg = argv[0];
-	sql_value *best = &(minmax_context->best);
-	if (best == NULL)
-		return;
+	sql_value *best = &minmax_context->best;
 
-	int sql_type = mem_type(arg);
-	if (sql_type == MEM_Null) {
+	enum sql_type sql_type_current = sql_value_type(arg);
+	if (sql_type_current == SQL_NULL) {
 		if (best->flags != 0)
 			sqlSkipAccumulatorLoad(context);
 	} else if (best->flags != 0) {
@@ -1636,25 +1647,21 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
 		 * any other type). In the later case an error
 		 * is raised.
 		 */
-		if (minmax_context->reference_type == 0)
-			minmax_context->reference_type = sql_type;
-		int ref_sql_type = minmax_context->reference_type;
-
-		if (sql_type != ref_sql_type) {
-			bool types_are_compatible = (sql_type & MEM_NumMask) &&
-						    (ref_sql_type & MEM_NumMask);
+		enum sql_type sql_type_best = sql_value_type(best);
+		if (sql_type_best != sql_type_current) {
+			bool types_are_compatible =
+				are_scalar_types_compatible(sql_type_best,
+							    sql_type_current);
 			if (!types_are_compatible) {
 				diag_set(ClientError, ER_INCONSISTENT_TYPES,
-					 type_to_str(ref_sql_type),
-					 type_to_str(sql_type));
+					 mem_type_to_str(best),
+					 mem_type_to_str(arg));
 				context->fErrorOrAux = 1;
 				context->isError = SQL_TARANTOOL_ERROR;
 				return;
 			}
 		}
 
-		int max;
-		int cmp;
 		struct coll *coll = sqlGetFuncCollSeq(context);
 		/* This step function is used for both the min() and max() aggregates,
 		 * the only difference between the two being that the sense of the
@@ -1664,9 +1671,9 @@ minmaxStep(sql_context *context, int not_used, sql_value **argv)
 		 * Therefore the next statement sets variable 'max' to 1 for the max()
 		 * aggregate, or 0 for min().
 		 */
-		max = sql_user_data(context) != 0;
-		cmp = sqlMemCompare(best, arg, coll);
-		if ((max != 0 && cmp < 0) || (max == 0 && cmp > 0)) {
+		bool max = sql_user_data(context) != 0;
+		int cmp_res = sqlMemCompare(best, arg, coll);
+		if ((max && cmp_res < 0) || (! max && cmp_res > 0)) {
 			sqlVdbeMemCopy(best, arg);
 		} else {
 			sqlSkipAccumulatorLoad(context);
@@ -1682,12 +1689,13 @@ minMaxFinalize(sql_context * context)
 {
 	struct minmax_context *minmax_context = (struct minmax_context *)
 		sql_aggregate_context(context, sizeof(*minmax_context));
-	sql_value *res = &(minmax_context->best);
+	sql_value *res = &minmax_context->best;
 
 	if (res != NULL) {
 		if (res->flags != 0) {
 			sql_result_value(context, res);
 		}
+		sqlVdbeMemRelease(res);
 	}
 }
 
diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index dd87c5b70..b1ec8c758 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -4340,22 +4340,13 @@ pushDownWhereTerms(Parse * pParse,	/* Parse context (for malloc() and error repo
  * argument, this function checks if the following are true:
  *
  *    * the query contains just a single aggregate function,
- *    * the aggregate function is either min() or max(),
- *    * the argument to the aggregate function is a column value,
- *    * the type of column is not SCALAR.
+ *    * the aggregate function is either min() or max(), and
+ *    * the argument to the aggregate function is a column value.
  *
  * If all of the above are true, then WHERE_ORDERBY_MIN or WHERE_ORDERBY_MAX
  * is returned as appropriate. Also, *ppMinMax is set to point to the
  * list of arguments passed to the aggregate before returning.
  *
- * The requirement of column type not being SCALAR follows from
- * the purpose of the function. The purpose of the function is
- * to answer the question: "Should MIN/MAX call be optimised by
- * using ORDER BY clause code?" If the type of column is SCALAR
- * then we have to iterate over all rows to check if their types
- * are compatible. Hence, the optimisation should not be used.
- * For details please see: https://github.com/tarantool/tarantool/issues/4032
- *
  * Or, if the conditions above are not met, *ppMinMax is set to 0 and
  * WHERE_ORDERBY_NORMAL is returned.
  */
@@ -4373,8 +4364,6 @@ minMaxQuery(AggInfo * pAggInfo, ExprList ** ppMinMax)
 		if (pEList && pEList->nExpr == 1
 		    && pEList->a[0].pExpr->op == TK_AGG_COLUMN) {
 			const char *zFunc = pExpr->u.zToken;
-			if (sql_expr_type(pEList->a[0].pExpr) == FIELD_TYPE_SCALAR)
-				return eRet;
 			if (sqlStrICmp(zFunc, "min") == 0) {
 				eRet = WHERE_ORDERBY_MIN;
 				*ppMinMax = pEList;
diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h
index b322602dc..a96033db5 100644
--- a/src/box/sql/sqlInt.h
+++ b/src/box/sql/sqlInt.h
@@ -641,6 +641,7 @@ enum sql_type {
 	SQL_TEXT = 3,
 	SQL_BLOB = 4,
 	SQL_NULL = 5,
+	SQL_TYPE_MAX = 6,
 };
 
 /**
diff --git a/src/box/sql/vdbe.c b/src/box/sql/vdbe.c
index 960030b52..ed7bf8870 100644
--- a/src/box/sql/vdbe.c
+++ b/src/box/sql/vdbe.c
@@ -618,9 +618,10 @@ vdbe_add_new_autoinc_id(struct Vdbe *vdbe, int64_t id)
 }
 
 char *
-type_to_str(int type)
+mem_type_to_str(const struct Mem *p)
 {
-	switch (type) {
+	assert(p != NULL);
+	switch (p->flags & MEM_PURE_TYPE_MASK) {
 	case MEM_Null:
 		return "NULL";
 	case MEM_Str:
@@ -638,19 +639,6 @@ type_to_str(int type)
 	}
 }
 
-int
-mem_type(const struct Mem *p)
-{
-	assert(p != NULL);
-	return p->flags & MEM_PURE_TYPE_MASK;
-}
-
-char *
-mem_type_to_str(const struct Mem *p)
-{
-	return type_to_str(mem_type(p));
-}
-
 /*
  * Execute as much of a VDBE program as we can.
  * This is the core of sql_step().
diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h
index 553c4f225..c84f22caf 100644
--- a/src/box/sql/vdbeInt.h
+++ b/src/box/sql/vdbeInt.h
@@ -228,7 +228,6 @@ struct Mem {
 #define MEM_Str       0x0002	/* Value is a string */
 #define MEM_Int       0x0004	/* Value is an integer */
 #define MEM_Real      0x0008	/* Value is a real number */
-#define MEM_NumMask   0x000c	/* Value is integer or real (mask) */
 #define MEM_Blob      0x0010	/* Value is a BLOB */
 #define MEM_Bool      0x0020    /* Value is a bool */
 #define MEM_Ptr       0x0040	/* Value is a generic pointer */
@@ -263,25 +262,11 @@ enum {
 	MEM_PURE_TYPE_MASK = 0x1f
 };
 
-/**
- * Returns one of MEM_Null, MEM_Str, MEM_Int, MEM_Real, MEM_Blob
- * or MEM_Bool. Used for error detection and reporting.
- */
-int
-mem_type(const struct Mem *p);
-
 /**
  * Simple type to str convertor. It is used to simplify
  * error reporting.
  */
 char *
-type_to_str(int type);
-
-/**
- * Returns string representing type of the given Mem. It is used
- * to simplify error reporting.
- */
-char *
 mem_type_to_str(const struct Mem *p);
 
 /* Return TRUE if Mem X contains dynamically allocated content - anything
diff --git a/test/sql-tap/minmax4.test.lua b/test/sql-tap/minmax4.test.lua
index 7476051f1..7e6e309ad 100755
--- a/test/sql-tap/minmax4.test.lua
+++ b/test/sql-tap/minmax4.test.lua
@@ -353,13 +353,16 @@ test:do_test(
             SELECT MAX(b) FROM t4;
         ]]
     end, {
-        1, "Inconsistent types: expected REAL got TEXT"
+        1, "Inconsistent types: expected INTEGER got TEXT"
     })
 
 -- Cases when we call aggregate MIN/MAX functions on column with
 -- index (e.g. PRIMARY KEY index) deserves it's own test
 -- because in this case MIN/MAX is implemented not with
 -- dedicated function, but with usage of corresponding index.
+-- The behavior is different: in such cases MIN/MAX are less
+-- type-strict, for example it's possible to compare numeri
+-- values with text values.
 test:do_test(
     "minmax4-3.5",
     function()
@@ -386,22 +389,22 @@ test:do_test(
 test:do_test(
     "minmax4-3.7",
     function()
-        return test:catchsql [[
+        return test:execsql [[
             INSERT INTO t5 VALUES ('abc');
             SELECT MIN(a) FROM t5;
         ]]
     end, {
-        1, "Inconsistent types: expected INTEGER got TEXT"
+        1.5
     })
 
 test:do_test(
     "minmax4-3.8",
     function()
-        return test:catchsql [[
+        return test:execsql [[
             SELECT MAX(a) FROM t5;
         ]]
     end, {
-        1, "Inconsistent types: expected INTEGER got TEXT"
+        'abc'
     })
 
 test:finish_test()
-- 
2.20.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-05-06 13:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-05 14:57 [tarantool-patches] [PATCH 0/2] sql: make aggregate functions types more strict Ivan Koptelov
2019-04-05 14:57 ` [tarantool-patches] [PATCH 1/2] " Ivan Koptelov
2019-04-09 14:52   ` [tarantool-patches] " n.pettik
2019-04-05 14:57 ` [tarantool-patches] [PATCH 2/2] " Ivan Koptelov
2019-04-05 19:48   ` [tarantool-patches] " Konstantin Osipov
2019-04-17 12:50     ` i.koptelov
2019-04-17 13:19       ` n.pettik
2019-04-09 14:52   ` n.pettik
2019-04-23 15:38     ` i.koptelov
2019-04-24 17:37       ` n.pettik
2019-05-06 13:24         ` i.koptelov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox