[tarantool-patches] Re: [PATCH v2 1/1] sql: support HAVING without GROUP BY clause

Kirill Shcherbatov kshcherbatov at tarantool.org
Tue Dec 25 15:59:29 MSK 2018


Thank you for review!

> See 6 comments below.
> 1. I do not think that you need to calculate sqlite3ExprIsConstantOrFunction if
> (sNC.ncFlags & NC_HasAgg) != 0. Result expression of || of this statements is
> true already.
> Also, you do not need to calculate anything if is_all_select_agg became false
> on any iteration.
Fixed on branch, tnx.

> 2. You do not need to copy an error message to a static buffer if it
> is already null terminated and you do not have formatting.
Fixed on branch, tnx.
> 3. sqlite3ResolveExprNames(&sNC, p->pHaving) is called right after your
> new code block. So if it passes, sqlite3ResolveExprNames(&sNC, p->pHaving)
> is called twice. It should not. Please, find a way how to do not duplicate
> work.> 4. I see that your having_nc.ncFlags always has NC_AllowAgg. But original
> sNC can have it unset. Right after p->pEList->a resolution if pGroupBy == NULL
> and (sNC.ncFlags & NC_HasAgg) == 0 NC_HasAgg is forbidden.
+		/*
+		 * Add the output column list to the name-context
+		 * before parsing the other expressions in the
+		 * SELECT statement. This is so that expressions
+		 * in the WHERE clause (etc.) can refer to
+		 * expressions by aliases in the result set.
+		 *
+		 * Minor point: If this is the case, then the
+		 * expression will be re-evaluated for each
+		 * reference to it.
+		 */
+		sNC.pEList = p->pEList;
 		/*
 		 * If a HAVING clause is present, then there must
 		 * be a GROUP BY clause or aggregate function
 		 * should be specified.
 		 */
 		if (p->pHaving != NULL && pGroupBy == NULL) {
-			struct NameContext having_nc;
-			memset(&having_nc, 0, sizeof(having_nc));
-			having_nc.pParse = pParse;
-			having_nc.ncFlags = NC_AllowAgg;
-			having_nc.pSrcList = p->pSrc;
+			sNC.ncFlags |= NC_AllowAgg;
 			if (is_all_select_agg &&
-			    sqlite3ResolveExprNames(&having_nc,
-						    p->pHaving) != 0)
+			    sqlite3ResolveExprNames(&sNC, p->pHaving) != 0)
 				return WRC_Abort;
-			if ((having_nc.ncFlags & NC_HasAgg) == 0 ||
-			    (having_nc.ncFlags & NC_HasUnaggregatedId) != 0) {
-				diag_set(ClientError, ER_SQL, "HAVING "\
+			if ((sNC.ncFlags & NC_HasAgg) == 0 ||
+			    (sNC.ncFlags & NC_HasUnaggregatedId) != 0) {
+				diag_set(ClientError, ER_SQL, "HAVING "
 					 "argument must appear in the GROUP BY "
-					 "clause or be used in an aggregate "\
+					 "clause or be used in an aggregate "
 					 "function");
 				pParse->nErr++;
 				pParse->rc = SQL_TARANTOOL_ERROR;
@@ -1365,19 +1372,10 @@ resolveSelectStep(Walker * pWalker, Select * p)
 			p->pLimit =
 			    sqlite3ExprAlloc(db, TK_INTEGER,
 					     &sqlite3IntTokens[1], 0);
+		} else {
+			if (sqlite3ResolveExprNames(&sNC, p->pHaving))
+				return WRC_Abort;
 		}
-
-		/* Add the output column list to the name-context before parsing the
-		 * other expressions in the SELECT statement. This is so that
-		 * expressions in the WHERE clause (etc.) can refer to expressions by
-		 * aliases in the result set.
-		 *
-		 * Minor point: If this is the case, then the expression will be
-		 * re-evaluated for each reference to it.
-		 */
-		sNC.pEList = p->pEList;
-		if (sqlite3ResolveExprNames(&sNC, p->pHaving))
-			return WRC_Abort;
 		if (sqlite3ResolveExprNames(&sNC, p->pWhere))
 			return WRC_Abort;

We shouldn't care about sNC NC_AllowAgg when condition is passed as all
aggregate functions are verified so NC_AllowAgg is ok there. 
As for "sNC.pEList = p->pEList;" it makes no effort when
(p->pHaving != NULL && pGroupBy == NULL) and I like this assignment
before checks.

> 
> 5. Please, write a comment why do you recreate pLimit. And what if I set
> explicit Limit 2? Why is not it an error?
There is no reason to raise error here AGG function may return
only one value, so user LIMIT has no sense. Some DBs dissallow such
construction as parser error, but I don't think that is required.

> 6. Verb v3 is used with passive voice, present perfect and past perfect.
> If you wanted to make passive voice, you should write 'are seen'.
> But personally I think it should be just present simple.
> 
> "One or more identifiers are out of aggregate function."
Tnx, fixed.

=============================================

Allowed to make SELECT requests that have HAVING clause without
GROUP BY. It is possible when both - left and right parts of
request have aggregate function or constant value.

Closes #2364.

@TarantoolBot document
Title: HAVING without GROUP BY clause
A query with a having clause should also have a group by clause.
If you omit group by, all the rows not excluded by the where
clause return as a single group.
Because no grouping is performed between the where and having
clauses, they cannot act independently of each other. Having
acts like where because it affects the rows in a single group
rather than groups, except the having clause can still use
aggregates.
Having without group by is not supported for select from
multiple tables.

2011 SQL standard "Part 2: Foundation" 7.10 <having clause> p.381

Example:
SELECT MIN(s1) FROM te40 HAVING SUM(s1) > 0; -- is valid
SELECT 1 FROM te40 HAVING SUM(s1) > 0;       -- is valid
SELECT NULL FROM te40 HAVING SUM(s1) > 0;    -- is valid
SELECT date() FROM te40 HAVING SUM(s1) > 0;  -- is valid
---
 src/box/sql/resolve.c         |  89 ++++++++++++++++++-----
 src/box/sql/sqliteInt.h       |   3 +-
 test/sql-tap/count.test.lua   |   8 +--
 test/sql-tap/select3.test.lua |   2 +-
 test/sql-tap/select5.test.lua | 128 +++++++++++++++++++++++++++++++++-
 5 files changed, 204 insertions(+), 26 deletions(-)

diff --git a/src/box/sql/resolve.c b/src/box/sql/resolve.c
index 9a2d6ff4e..6462467bc 100644
--- a/src/box/sql/resolve.c
+++ b/src/box/sql/resolve.c
@@ -602,6 +602,8 @@ resolveExprStep(Walker * pWalker, Expr * pExpr)
 		/* A lone identifier is the name of a column.
 		 */
 	case TK_ID:{
+			if ((pNC->ncFlags & NC_AllowAgg) != 0)
+				pNC->ncFlags |= NC_HasUnaggregatedId;
 			return lookupName(pParse, 0, pExpr->u.zToken, pNC,
 					  pExpr);
 		}
@@ -1285,13 +1287,34 @@ resolveSelectStep(Walker * pWalker, Select * p)
 		/* Set up the local name-context to pass to sqlite3ResolveExprNames() to
 		 * resolve the result-set expression list.
 		 */
+		bool is_all_select_agg = true;
 		sNC.ncFlags = NC_AllowAgg;
 		sNC.pSrcList = p->pSrc;
 		sNC.pNext = pOuterNC;
-
+		struct ExprList_item *item = p->pEList->a;
 		/* Resolve names in the result set. */
-		if (sqlite3ResolveExprListNames(&sNC, p->pEList))
-			return WRC_Abort;
+		for (i = 0; i < p->pEList->nExpr; ++i, ++item) {
+			u16 has_agg_flag = sNC.ncFlags & NC_HasAgg;
+			sNC.ncFlags &= ~NC_HasAgg;
+			if (sqlite3ResolveExprNames(&sNC, item->pExpr) != 0)
+				return WRC_Abort;
+			if ((sNC.ncFlags & NC_HasAgg) == 0 &&
+			    !sqlite3ExprIsConstantOrFunction(item->pExpr, 0)) {
+				is_all_select_agg = false;
+				sNC.ncFlags |= has_agg_flag;
+				break;
+			}
+			sNC.ncFlags |= has_agg_flag;
+		}
+		/*
+		 * Finish iteration for is_all_select_agg == false
+		 * and do not care about flags anymore.
+		 */
+		for (; i < p->pEList->nExpr; ++i, ++item) {
+			assert(! is_all_select_agg);
+			if (sqlite3ResolveExprNames(&sNC, item->pExpr) != 0)
+				return WRC_Abort;
+		}
 
 		/* If there are no aggregate functions in the result-set, and no GROUP BY
 		 * expression, do not allow aggregates in any of the other expressions.
@@ -1306,25 +1329,53 @@ resolveSelectStep(Walker * pWalker, Select * p)
 			sNC.ncFlags &= ~NC_AllowAgg;
 		}
 
-		/* If a HAVING clause is present, then there must be a GROUP BY clause.
-		 */
-		if (p->pHaving && !pGroupBy) {
-			sqlite3ErrorMsg(pParse,
-					"a GROUP BY clause is required before HAVING");
-			return WRC_Abort;
-		}
-
-		/* Add the output column list to the name-context before parsing the
-		 * other expressions in the SELECT statement. This is so that
-		 * expressions in the WHERE clause (etc.) can refer to expressions by
-		 * aliases in the result set.
+		/*
+		 * Add the output column list to the name-context
+		 * before parsing the other expressions in the
+		 * SELECT statement. This is so that expressions
+		 * in the WHERE clause (etc.) can refer to
+		 * expressions by aliases in the result set.
 		 *
-		 * Minor point: If this is the case, then the expression will be
-		 * re-evaluated for each reference to it.
+		 * Minor point: If this is the case, then the
+		 * expression will be re-evaluated for each
+		 * reference to it.
 		 */
 		sNC.pEList = p->pEList;
-		if (sqlite3ResolveExprNames(&sNC, p->pHaving))
-			return WRC_Abort;
+		/*
+		 * If a HAVING clause is present, then there must
+		 * be a GROUP BY clause or aggregate function
+		 * should be specified.
+		 */
+		if (p->pHaving != NULL && pGroupBy == NULL) {
+			sNC.ncFlags |= NC_AllowAgg;
+			if (is_all_select_agg &&
+			    sqlite3ResolveExprNames(&sNC, p->pHaving) != 0)
+				return WRC_Abort;
+			if ((sNC.ncFlags & NC_HasAgg) == 0 ||
+			    (sNC.ncFlags & NC_HasUnaggregatedId) != 0) {
+				diag_set(ClientError, ER_SQL, "HAVING "
+					 "argument must appear in the GROUP BY "
+					 "clause or be used in an aggregate "
+					 "function");
+				pParse->nErr++;
+				pParse->rc = SQL_TARANTOOL_ERROR;
+				return WRC_Abort;
+			}
+			/*
+			 * Aggregate functions may return only
+			 * one tuple, so user-defined LIMITs have
+			 * no sense (most DBs don't support such
+			 * LIMIT but there is no reason to
+			 * restrict it directly).
+			 */
+			sql_expr_delete(db, p->pLimit, false);
+			p->pLimit =
+			    sqlite3ExprAlloc(db, TK_INTEGER,
+					     &sqlite3IntTokens[1], 0);
+		} else {
+			if (sqlite3ResolveExprNames(&sNC, p->pHaving))
+				return WRC_Abort;
+		}
 		if (sqlite3ResolveExprNames(&sNC, p->pWhere))
 			return WRC_Abort;
 
diff --git a/src/box/sql/sqliteInt.h b/src/box/sql/sqliteInt.h
index 1ec52b875..9faf2459b 100644
--- a/src/box/sql/sqliteInt.h
+++ b/src/box/sql/sqliteInt.h
@@ -2454,7 +2454,8 @@ struct NameContext {
 #define NC_IdxExpr   0x0020	/* True if resolving columns of CREATE INDEX */
 #define NC_VarSelect 0x0040	/* A correlated subquery has been seen */
 #define NC_MinMaxAgg 0x1000	/* min/max aggregates seen.  See note above */
-
+/** One or more identifiers are out of aggregate function. */
+#define NC_HasUnaggregatedId     0x2000
 /*
  * An instance of the following structure contains all information
  * needed to generate code for a single SELECT statement.
diff --git a/test/sql-tap/count.test.lua b/test/sql-tap/count.test.lua
index b05e3a28e..6f58210f4 100755
--- a/test/sql-tap/count.test.lua
+++ b/test/sql-tap/count.test.lua
@@ -172,15 +172,15 @@ test:do_test(
         return uses_op_count("SELECT count(*) FROM t2 WHERE a IS NOT NULL")
     end, 0)
 
-test:do_catchsql_test(
+test:do_execsql_test(
     "count-2.9",
     [[
         SELECT count(*) FROM t2 HAVING count(*)>1
-    ]], {
+    ]],
         -- <count-2.9>
-        1, "a GROUP BY clause is required before HAVING"
+        {}
         -- </count-2.9>
-    })
+    )
 
 test:do_test(
     "count-2.10",
diff --git a/test/sql-tap/select3.test.lua b/test/sql-tap/select3.test.lua
index dbc95f0d8..2704c267e 100755
--- a/test/sql-tap/select3.test.lua
+++ b/test/sql-tap/select3.test.lua
@@ -200,7 +200,7 @@ test:do_catchsql_test("select3-3.1", [[
   SELECT log, count(*) FROM t1 HAVING log>=4
 ]], {
   -- <select3-3.1>
-  1, "a GROUP BY clause is required before HAVING"
+  1, "SQL error: HAVING argument must appear in the GROUP BY clause or be used in an aggregate function"
   -- </select3-3.1>
 })
 
diff --git a/test/sql-tap/select5.test.lua b/test/sql-tap/select5.test.lua
index 9c3cd2759..b889132aa 100755
--- a/test/sql-tap/select5.test.lua
+++ b/test/sql-tap/select5.test.lua
@@ -1,6 +1,6 @@
 #!/usr/bin/env tarantool
 test = require("sqltester")
-test:plan(32)
+test:plan(44)
 
 --!./tcltestrunner.lua
 -- 2001 September 15
@@ -412,5 +412,131 @@ test:do_execsql_test(
         -- </select5-8.8>
     })
 
+--
+-- gh-2364: Support HAVING without GROUP BY clause.
+--
+test:do_catchsql_test(
+    "select5-9.1",
+    [[
+        CREATE TABLE te40 (s1 INT, s2 INT, PRIMARY KEY (s1,s2));
+        INSERT INTO te40 VALUES (1,1);
+        INSERT INTO te40 VALUES (2,2);
+        SELECT s1 FROM te40 HAVING s1 = 1;
+    ]], {
+    -- <select5-9.1>
+    1, "SQL error: HAVING argument must appear in the GROUP BY clause or be used in an aggregate function"
+    -- </select5-9.1>
+})
+
+test:do_catchsql_test(
+    "select5-9.2",
+    [[
+        SELECT SUM(s1) FROM te40 HAVING s1 = 2;
+    ]], {
+    -- <select5-9.2>
+    1, "SQL error: HAVING argument must appear in the GROUP BY clause or be used in an aggregate function"
+    -- </select5-9.2>
+})
+
+test:do_catchsql_test(
+    "select5-9.3",
+    [[
+        SELECT s1 FROM te40 HAVING SUM(s1) = 2;
+    ]], {
+    -- <select5-9.3>
+    1, "SQL error: HAVING argument must appear in the GROUP BY clause or be used in an aggregate function"
+    -- </select5-9.3>
+})
+
+test:do_execsql_test(
+    "select5-9.4",
+    [[
+        SELECT SUM(s1) FROM te40 HAVING SUM(s1) > 0;
+    ]], {
+    -- <select5-9.4>
+    3
+    -- </select5-9.4>
+})
+
+test:do_execsql_test(
+    "select5-9.5",
+    [[
+        SELECT MIN(s1) FROM te40 HAVING SUM(s1) > 0;
+    ]], {
+    -- <select5-9.5>
+    1
+    -- </select5-9.5>
+})
+
+test:do_execsql_test(
+    "select5-9.6",
+    [[
+        SELECT SUM(s1) FROM te40 HAVING SUM(s1) < 0;
+    ]],
+    -- <select5-9.6>
+    {}
+    -- </select5-9.6>
+)
+
+test:do_catchsql_test(
+    "select5-9.7",
+    [[
+        SELECT SUM(s1),s2 FROM te40 HAVING SUM(s1) > 0;
+    ]], {
+    -- <select5-9.7>
+    1, "SQL error: HAVING argument must appear in the GROUP BY clause or be used in an aggregate function"
+    -- </select5-9.7>
+})
+
+test:do_catchsql_test(
+    "select5-9.8",
+    [[
+        SELECT SUM(s1) FROM te40 HAVING SUM(s1) > 0 and s2 > 0;
+    ]], {
+    -- <select5-9.8>
+    1, "SQL error: HAVING argument must appear in the GROUP BY clause or be used in an aggregate function"
+    -- </select5-9.8>
+})
+
+test:do_execsql_test(
+    "select5-9.9",
+    [[
+        SELECT SUM(s1) FROM te40 HAVING SUM(s1) > 0 and SUM(s2) > 0;
+    ]], {
+    -- <select5-9.9>
+    3
+    -- </select5-9.9>
+})
+
+test:do_execsql_test(
+    "select5-9.10",
+    [[
+        SELECT 1 FROM te40 HAVING SUM(s1) > 0;
+    ]], {
+    -- <select5-9.10>
+    1
+    -- </select5-9.10>
+})
+
+test:do_execsql_test(
+    "select5-9.11",
+    [[
+        SELECT -1 FROM te40 HAVING SUM(s1) > 0;
+    ]], {
+    -- <select5-9.11>
+    -1
+    -- </select5-9.11>
+})
+
+test:do_execsql_test(
+    "select5-9.12",
+    [[
+        SELECT NULL FROM te40 HAVING SUM(s1) > 0;
+    ]], {
+    -- <select5-9.12>
+    ""
+    -- </select5-9.12>
+})
+
 test:finish_test()
 
-- 
2.19.2






More information about the Tarantool-patches mailing list