[Tarantool-patches] [PATCH v3 1/2] sql: use unify pattern for column names

Wed Jun 24 02:12:25 MSK 2020

Hi! Thanks for the patch!

>> 3. What if I specify first column by myself, and the second
>> is an expression? Will it be _1 or _2?
> 
> tarantool>         CREATE TABLE j (column_1 SCALAR PRIMARY KEY);
> ---
> - row_count: 1
> ...
> 
> tarantool>         INSERT INTO j VALUES(1);
> ---
> - row_count: 1
> ...
> 
> tarantool> SELECT column_1, column_1 COLLATE "unicode_ci" FROM j ORDER BY column_1;
> ---
> - metadata:
>   - name: COLUMN_1
>     type: scalar
>   - name: COLUMN_1
>     type: scalar
>   rows:
>   - [1, 1]
> ...
> 
> If we want that the second expression to be named "COLUMN_2". should we use a
> hash table in struct Parse. We will push the names of all columns in the request
> to it and generate names without collisions. I don't now anymore ways.
> I would leave it as it is now.

I didn't say anything about behaviour change. I just asked a simple
question, and I don't see an answer here. I will repeat it: what if
my query contains 2 columns - one normal column name and the second
is an expression. Will the expression be visible as COLUMN_1 or as
COLUMN_2 in the result set?

    SELECT mycol, mycol + 1 FROM test;

Will 'mycol + 1' be stored as COLUMN_1 or COLUMN_2? It is a simple
question to clarify in the documentation request. Because it will be
asked if you don't explicitly say what will happen.

>>> @@ -4951,6 +4951,18 @@ selectExpander(Walker * pWalker, Select * p)
>>>  		       || (pE->pLeft != 0 && pE->pLeft->op == TK_ID));
>>>  		if (pE->op == TK_DOT && pE->pRight->op == TK_ASTERISK)
>>>  			break;
>>> +		/*
>>> +		 * It is necessary to generate auto names for
>>> +		 * expressions before clauses such as ORDER BY,
>>> +		 * GROUP BY will be resolved. These names can be
>>> +		 * found inside these clauses during resolving.
>>> +		 */
>>> +		if (pEList->a[k].zName == NULL && pE->op != TK_DOT &&
>>> +		    ((pE->flags & EP_Leaf) == 0 || pE->op != TK_ID)) {
>>> +			uint32_t idx = ++pParse->autoname_i;
>>> +			pEList->a[k].zName =
>>> +				sqlDbStrDup(db, sql_generate_column_name(idx));
>> 4. I don't think I understand what is happening here. Looks too
>> complicated. Why do you need so many checks, what do these checks
>> mean, how are they related to ORDER BY/GROUP BY? And how is it
>> related to select expansion (the function is called selectExpander())?
>> Why does it work fine when I use normal column names in ORDER BY/GROUP BY,
>> and does not work here without this crutch?
> Here I name all items (generate names) in expression list if it is needed.
> I think this operation can be attributed to "expanding". Expanding is before
> name resolving. At the resolving all items must be named.
> 
> You can use normal column names in ORDER BY, GROUP BY etc, I checked it in Postgre
> (Postgre names expression the same way). I think it is OK.

Can you use auto names in PostgreSQL too? Inside the query. Are they a
full featured column name, or just sugar for result set metadata?

See 5 comments below.

> diff --git a/src/box/sql/select.c b/src/box/sql/select.c
> index 4b069ad..7a56136 100644
> --- a/src/box/sql/select.c
> +++ b/src/box/sql/select.c
> @@ -1989,6 +1987,8 @@ sqlColumnsFromExprList(Parse * parse, ExprList * expr_list,
>              memcpy(space_def->fields[i].name, zName, name_len);
>              space_def->fields[i].name[name_len] = '\0';
>          }
> +        sqlDbFree(db, expr_list->a[i].zName);
> +        expr_list->a[i].zName = zName;

1. I deleted this hunk and nothing changed in the tests. Why do you need it?

>      }
>  cleanup:
>      sqlHashClear(&ht);
> @@ -4792,6 +4792,25 @@ selectPopWith(Walker * pWalker, Select * p)
>      }
>  }
>  
> +/**
> + * Determine whether to generate a name for @a expr or not.
> + *
> + * Auto generated names is needed for every item in  a <SELECT>
> + * expression list except asterisks, dots and column names (also
> + * if this item hasn't alias).
> + *
> + * @param expr Expression from expression list to analyze.
> + *
> + * @retval true/false If item is/isn't needed needed to name.
> + */
> +static bool
> +is_needed_to_name(struct Expr *expr) {

2. We put { on separate line.

> +    if (expr->op != TK_ASTERISK && expr->op != TK_DOT &&
> +        ((expr->flags & EP_Leaf) == 0 || expr->op != TK_ID))
> +        return true;
> +    return false;

3. You could simply return the condition itself. It is a boolean
anyway.

What I don't understand is what is 'leaf'. For example, there is an
expression '1 + 2'. Are 1 and 2 leafs? Is '1 + 2' a leaf? What is a
definition of the leaf here?

> +}
> +
>  /*
>   * This routine is a Walker callback for "expanding" a SELECT statement.
>   * "Expanding" means to do the following:
> @@ -4941,19 +4960,27 @@ selectExpander(Walker * pWalker, Select * p)
>       * all tables.
>       *
>       * The first loop just checks to see if there are any "*" operators
> -     * that need expanding.
> +     * that need expanding and name items of the expression
> +     * list if needed.
>       */
> +    i = pEList->nExpr;
>      for (k = 0; k < pEList->nExpr; k++) {

4. Why do you have 2 iterators now?

>          pE = pEList->a[k].pExpr;
> -        if (pE->op == TK_ASTERISK)
> -            break;
> +        if (pE->op == TK_ASTERISK && i == pEList->nExpr)
> +            i = k;
>          assert(pE->op != TK_DOT || pE->pRight != 0);
>          assert(pE->op != TK_DOT
>                 || (pE->pLeft != 0 && pE->pLeft->op == TK_ID));
> -        if (pE->op == TK_DOT && pE->pRight->op == TK_ASTERISK)
> -            break;
> +        if (pE->op == TK_DOT && pE->pRight->op == TK_ASTERISK  &&
> +            i == pEList->nExpr)
> +            i = k;
> +        if (pEList->a[k].zName == NULL && is_needed_to_name(pE)) {
> +            uint32_t idx = ++pParse->autoname_i;
> +            pEList->a[k].zName =
> +                sqlDbStrDup(db, sql_generate_column_name(idx));
> +        }
>      }
> -    if (k < pEList->nExpr) {
> +    if (i < pEList->nExpr) {
>          /*
>           * If we get here it means the result set contains one or more "*"
>           * operators that need to be expanded.  Loop through each expression

5. I don't see any test for the case when a table contains COLUMN_1, COLUMN_2, etc
columns. As real columns. What will happen with the expression names then? This
should be investigated how is it done in other DBs, and documented in the docbot
request for us. Also probably you may need to ask Peter about it, if the other DBs'
cases don't offer anything good enough to blindly copy.