From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id EE15F42EF5C for ; Thu, 25 Jun 2020 17:35:40 +0300 (MSK) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3594.4.19\)) From: Roman Khabibov In-Reply-To: Date: Thu, 25 Jun 2020 17:35:39 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <81127D96-BB22-4F99-A788-047F9327A4AE@tarantool.org> References: <20200611151853.24398-1-roman.habibov@tarantool.org> <20200611151853.24398-2-roman.habibov@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v3 1/2] sql: use unify pattern for column names List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Hi! Thanks for the review. > On Jun 24, 2020, at 02:12, Vladislav Shpilevoy = wrote: >=20 > Hi! Thanks for the patch! >=20 >>> 3. What if I specify first column by myself, and the second >>> is an expression? Will it be _1 or _2? >>=20 >> tarantool> CREATE TABLE j (column_1 SCALAR PRIMARY KEY); >> --- >> - row_count: 1 >> ... >>=20 >> tarantool> INSERT INTO j VALUES(1); >> --- >> - row_count: 1 >> ... >>=20 >> tarantool> SELECT column_1, column_1 COLLATE "unicode_ci" FROM j = ORDER BY column_1; >> --- >> - metadata: >> - name: COLUMN_1 >> type: scalar >> - name: COLUMN_1 >> type: scalar >> rows: >> - [1, 1] >> ... >>=20 >> If we want that the second expression to be named "COLUMN_2". should = we use a >> hash table in struct Parse. We will push the names of all columns in = the request >> to it and generate names without collisions. I don't now anymore = ways. >> I would leave it as it is now. >=20 > I didn't say anything about behaviour change. I just asked a simple > question, and I don't see an answer here. I will repeat it: what if > my query contains 2 columns - one normal column name and the second > is an expression. Will the expression be visible as COLUMN_1 or as > COLUMN_2 in the result set? >=20 > SELECT mycol, mycol + 1 FROM test; >=20 > Will 'mycol + 1' be stored as COLUMN_1 or COLUMN_2? It is a simple > question to clarify in the documentation request. Because it will be > asked if you don't explicitly say what will happen. Sorry for misunderstanding. Added to the doc request. Here, the expression "mycol + 1" generates a new column, so that it is the first auto generated resulting column will be named as "COLUMN_1". ``` tarantool> CREATE TABLE test (mycol INT PRIMARY KEY); --- - row_count: 1 ... tarantool> SELECT mycol, mycol + 1 FROM test; --- - metadata: - name: MYCOL type: integer - name: COLUMN_1 type: integer rows: [] ... ``` Note that you can use generated names already within the query, e.g. in clause. ``` tarantool> SELECT mycol, mycol + 1 FROM test ORDER BY column_1; --- - metadata: - name: MYCOL type: integer - name: COLUMN_1 type: integer rows: [] ... ``` >>>> @@ -4951,6 +4951,18 @@ selectExpander(Walker * pWalker, Select * p) >>>> || (pE->pLeft !=3D 0 && pE->pLeft->op =3D=3D = TK_ID)); >>>> if (pE->op =3D=3D TK_DOT && pE->pRight->op =3D=3D = TK_ASTERISK) >>>> break; >>>> + /* >>>> + * It is necessary to generate auto names for >>>> + * expressions before clauses such as ORDER BY, >>>> + * GROUP BY will be resolved. These names can be >>>> + * found inside these clauses during resolving. >>>> + */ >>>> + if (pEList->a[k].zName =3D=3D NULL && pE->op !=3D TK_DOT = && >>>> + ((pE->flags & EP_Leaf) =3D=3D 0 || pE->op !=3D = TK_ID)) { >>>> + uint32_t idx =3D ++pParse->autoname_i; >>>> + pEList->a[k].zName =3D >>>> + sqlDbStrDup(db, = sql_generate_column_name(idx)); >>> 4. I don't think I understand what is happening here. Looks too >>> complicated. Why do you need so many checks, what do these checks >>> mean, how are they related to ORDER BY/GROUP BY? And how is it >>> related to select expansion (the function is called = selectExpander())? >>> Why does it work fine when I use normal column names in ORDER = BY/GROUP BY, >>> and does not work here without this crutch? >> Here I name all items (generate names) in expression list if it is = needed. >> I think this operation can be attributed to "expanding". Expanding is = before >> name resolving. At the resolving all items must be named. >>=20 >> You can use normal column names in ORDER BY, GROUP BY etc, I checked = it in Postgre >> (Postgre names expression the same way). I think it is OK. >=20 > Can you use auto names in PostgreSQL too? Inside the query. Are they a > full featured column name, or just sugar for result set metadata? I was mistaken, Postgre just puts the same name =E2=80=9C?column?=E2=80=9D= . I used =E2=80=9C?column?=E2=80=9D in the query, so that full featured. CREATE TABLE test_12 (mycol INT PRIMARY KEY)\\ INSERT INTO test_12 VALUES (1)\\ SELECT mycol, mycol + 1 FROM test_12 ORDER BY "?column?"\\ Result: mycol ?column? 1 2 > See 5 comments below. >=20 >> diff --git a/src/box/sql/select.c b/src/box/sql/select.c >> index 4b069ad..7a56136 100644 >> --- a/src/box/sql/select.c >> +++ b/src/box/sql/select.c >> @@ -1989,6 +1987,8 @@ sqlColumnsFromExprList(Parse * parse, ExprList = * expr_list, >> memcpy(space_def->fields[i].name, zName, name_len); >> space_def->fields[i].name[name_len] =3D '\0'; >> } >> + sqlDbFree(db, expr_list->a[i].zName); >> + expr_list->a[i].zName =3D zName; >=20 > 1. I deleted this hunk and nothing changed in the tests. Why do you = need it? Dropped. >> } >> cleanup: >> sqlHashClear(&ht); >> @@ -4792,6 +4792,25 @@ selectPopWith(Walker * pWalker, Select * p) >> } >> } >> =20 >> +/** >> + * Determine whether to generate a name for @a expr or not. >> + * >> + * Auto generated names is needed for every item in a + * expression list except asterisks, dots and column names (also + * if this item hasn't alias). + * + * @param expr Expression from expression list to analyze. + * + * @retval true/false If item is/isn't needed needed to name. + */ +static bool +is_needed_to_name(struct Expr *expr) +{ + return (expr->op !=3D TK_ASTERISK && expr->op !=3D TK_DOT && = expr->op !=3D + TK_ID); +} + >> +} >> + >> /* >> * This routine is a Walker callback for "expanding" a SELECT = statement. >> * "Expanding" means to do the following: >> @@ -4941,19 +4960,27 @@ selectExpander(Walker * pWalker, Select * p) >> * all tables. >> * >> * The first loop just checks to see if there are any "*" = operators >> - * that need expanding. >> + * that need expanding and name items of the expression >> + * list if needed. >> */ >> + i =3D pEList->nExpr; >> for (k =3D 0; k < pEList->nExpr; k++) { >=20 > 4. Why do you have 2 iterators now? Fixed. @@ -4941,19 +4957,26 @@ selectExpander(Walker * pWalker, Select * p) * all tables. * * The first loop just checks to see if there are any "*" = operators - * that need expanding. + * that need expanding and names items of the expression + * list if needed. */ + bool has_asterisk =3D false; for (k =3D 0; k < pEList->nExpr; k++) { pE =3D pEList->a[k].pExpr; if (pE->op =3D=3D TK_ASTERISK) - break; + has_asterisk =3D true; assert(pE->op !=3D TK_DOT || pE->pRight !=3D 0); assert(pE->op !=3D TK_DOT || (pE->pLeft !=3D 0 && pE->pLeft->op =3D=3D = TK_ID)); if (pE->op =3D=3D TK_DOT && pE->pRight->op =3D=3D = TK_ASTERISK) - break; + has_asterisk =3D true; + if (pEList->a[k].zName =3D=3D NULL && = is_needed_to_name(pE)) { + uint32_t idx =3D ++pParse->autoname_i; + pEList->a[k].zName =3D + sqlDbStrDup(db, = sql_generate_column_name(idx)); + } } - if (k < pEList->nExpr) { + if (has_asterisk) { /* * If we get here it means the result set contains one = or more "*" * operators that need to be expanded. Loop through = each expression >> pE =3D pEList->a[k].pExpr; >> - if (pE->op =3D=3D TK_ASTERISK) >> - break; >> + if (pE->op =3D=3D TK_ASTERISK && i =3D=3D pEList->nExpr) >> + i =3D k; >> assert(pE->op !=3D TK_DOT || pE->pRight !=3D 0); >> assert(pE->op !=3D TK_DOT >> || (pE->pLeft !=3D 0 && pE->pLeft->op =3D=3D TK_ID)); >> - if (pE->op =3D=3D TK_DOT && pE->pRight->op =3D=3D = TK_ASTERISK) >> - break; >> + if (pE->op =3D=3D TK_DOT && pE->pRight->op =3D=3D = TK_ASTERISK && >> + i =3D=3D pEList->nExpr) >> + i =3D k; >> + if (pEList->a[k].zName =3D=3D NULL && is_needed_to_name(pE)) = { >> + uint32_t idx =3D ++pParse->autoname_i; >> + pEList->a[k].zName =3D >> + sqlDbStrDup(db, sql_generate_column_name(idx)); >> + } >> } >> - if (k < pEList->nExpr) { >> + if (i < pEList->nExpr) { >> /* >> * If we get here it means the result set contains one or = more "*" >> * operators that need to be expanded. Loop through each = expression >=20 > 5. I don't see any test for the case when a table contains COLUMN_1, = COLUMN_2, etc > columns. As real columns. What will happen with the expression names = then? This > should be investigated how is it done in other DBs, and documented in = the docbot > request for us. MariaDB, Oracle, MySQL use span for naming. I know only one and it is = Postgre. > Also probably you may need to ask Peter about it, if the other DBs' > cases don't offer anything good enough to blindly copy. I asked Peter in the ticket. commit 3fea865c78fcfa2e9b2ff14d38697abd97cd1695 Author: Roman Khabibov Date: Thu Mar 5 12:48:58 2020 +0300 sql: unify pattern for column names =20 Name resulting columns generated by an expression or construction by the "COLUMN_N" pattern. =20 Closes #3962 =20 @TarantoolBot document Title: Column naming in SQL =20 Now, every auto generated column is named by the "COLUMN_N" pattern, where N is the number of generated column in a query (starting from 1). Auto generated column is a column in a query result generated by an expression or a column from construction. =20 Examples: ``` box.execute("VALUES(1, 2, 3);") --- - metadata: - name: COLUMN_1 type: integer - name: COLUMN_2 type: integer - name: COLUMN_3 type: integer rows: - [1, 2, 3] ... box.execute("SELECT * FROM (VALUES (1+1, 1+1));") --- - metadata: - name: COLUMN_1 type: integer - name: COLUMN_2 type: integer rows: - [2, 2] ... box.execute("SELECT 1+1, 1+1;") --- - metadata: - name: COLUMN_1 type: integer - name: COLUMN_2 type: integer rows: - [2, 2] ... ``` =20 Here, the expression "mycol + 1" generates a new column, so that it is the first auto generated resulting column will be named as "COLUMN_1". ``` tarantool> CREATE TABLE test (mycol INT PRIMARY KEY); --- - row_count: 1 ... =20 tarantool> SELECT mycol, mycol + 1 FROM test; --- - metadata: - name: MYCOL type: integer - name: COLUMN_1 type: integer rows: [] ... ``` Note that you can use generated names already within the query, e.g. in clause. ``` tarantool> SELECT mycol, mycol + 1 FROM test ORDER BY column_1; --- - metadata: - name: MYCOL type: integer - name: COLUMN_1 type: integer rows: [] ... ``` =20 It should also be noted that if you use column names similar to the "COLUMN_N" pattern, you can get the same names as a result: =20 ``` tarantool> CREATE TABLE test (column_1 SCALAR PRIMARY KEY); --- - row_count: 1 ... =20 tarantool> INSERT INTO test VALUES(1); --- - row_count: 1 ... =20 tarantool> SELECT column_1, column_1 COLLATE "unicode_ci" FROM test; --- - metadata: - name: COLUMN_1 type: scalar - name: COLUMN_1 type: scalar rows: - [1, 1] ... ``` diff --git a/src/box/sql/select.c b/src/box/sql/select.c index 4b069addb..26c735ed7 100644 --- a/src/box/sql/select.c +++ b/src/box/sql/select.c @@ -1854,14 +1854,14 @@ generate_column_metadata(struct Parse *pParse, = struct SrcList *pTabList, } } else { const char *z =3D NULL; - if (colname !=3D NULL) + if (colname !=3D NULL) { z =3D colname; - else if (span !=3D NULL) - z =3D span; - else - z =3D tt_sprintf("column%d", i + 1); + } else { + uint32_t idx =3D ++pParse->autoname_i; + z =3D sql_generate_column_name(idx); + } vdbe_metadata_set_col_name(v, i, z); - if (is_full_meta && colname !=3D NULL) + if (is_full_meta) vdbe_metadata_set_col_span(v, i, span); } } @@ -1897,7 +1897,6 @@ sqlColumnsFromExprList(Parse * parse, ExprList * = expr_list, /* Database connection */ sql *db =3D parse->db; u32 cnt; /* Index added to make the name unique = */ - Expr *p; /* Expression for a single result column = */ char *zName; /* Column name */ int nName; /* Size of name in zName[] */ Hash ht; /* Hash table of column names */ @@ -1929,13 +1928,12 @@ sqlColumnsFromExprList(Parse * parse, ExprList * = expr_list, space_def->field_count =3D column_count; =20 for (uint32_t i =3D 0; i < column_count; i++) { - /* Get an appropriate name for the column + /* + * Check if the column contains an "AS " + * phrase. */ - p =3D sqlExprSkipCollate(expr_list->a[i].pExpr); - if ((zName =3D expr_list->a[i].zName) !=3D 0) { - /* If the column contains an "AS " phrase, = use as the name */ - } else { - Expr *pColExpr =3D p; /* The expression that = is the result column name */ + if ((zName =3D expr_list->a[i].zName) =3D=3D 0) { + struct Expr *pColExpr =3D expr_list->a[i].pExpr; struct space_def *space_def =3D NULL; while (pColExpr->op =3D=3D TK_DOT) { pColExpr =3D pColExpr->pRight; @@ -1951,14 +1949,14 @@ sqlColumnsFromExprList(Parse * parse, ExprList * = expr_list, } else if (pColExpr->op =3D=3D TK_ID) { assert(!ExprHasProperty(pColExpr, = EP_IntValue)); zName =3D pColExpr->u.zToken; - } else { - /* Use the original text of the column = expression as its name */ - zName =3D expr_list->a[i].zSpan; } } - if (zName =3D=3D NULL) - zName =3D "_auto_field_"; - zName =3D sqlMPrintf(db, "%s", zName); + if (zName =3D=3D NULL) { + uint32_t idx =3D ++parse->autoname_i; + zName =3D sqlDbStrDup(db, = sql_generate_column_name(idx)); + } else { + zName =3D sqlDbStrDup(db, zName); + } =20 /* Make sure the column name is unique. If the name is = not unique, * append an integer to the name so that it becomes = unique. @@ -4792,6 +4790,24 @@ selectPopWith(Walker * pWalker, Select * p) } } =20 +/** + * Determine whether to generate a name for @a expr or not. + * + * Auto generated names is needed for every item in a