Re: [Tarantool-patches] [PATCH v3 1/2] sql: use unify pattern for column names

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: roman <roman.habibov@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v3 1/2] sql: use unify pattern for column names
Date: Wed, 24 Jun 2020 01:12:25 +0200	[thread overview]
Message-ID: <e1e9043a-e48f-97f8-2ec5-71791537b50d@tarantool.org> (raw)
In-Reply-To: <b1f19857-21bf-7a21-f870-aab9086a2f22@tarantool.org>

Hi! Thanks for the patch!

>> 3. What if I specify first column by myself, and the second
>> is an expression? Will it be _1 or _2?
> 
> tarantool>         CREATE TABLE j (column_1 SCALAR PRIMARY KEY);
> ---
> - row_count: 1
> ...
> 
> tarantool>         INSERT INTO j VALUES(1);
> ---
> - row_count: 1
> ...
> 
> tarantool> SELECT column_1, column_1 COLLATE "unicode_ci" FROM j ORDER BY column_1;
> ---
> - metadata:
>   - name: COLUMN_1
>     type: scalar
>   - name: COLUMN_1
>     type: scalar
>   rows:
>   - [1, 1]
> ...
> 
> If we want that the second expression to be named "COLUMN_2". should we use a
> hash table in struct Parse. We will push the names of all columns in the request
> to it and generate names without collisions. I don't now anymore ways.
> I would leave it as it is now.

I didn't say anything about behaviour change. I just asked a simple
question, and I don't see an answer here. I will repeat it: what if
my query contains 2 columns - one normal column name and the second
is an expression. Will the expression be visible as COLUMN_1 or as
COLUMN_2 in the result set?

    SELECT mycol, mycol + 1 FROM test;

Will 'mycol + 1' be stored as COLUMN_1 or COLUMN_2? It is a simple
question to clarify in the documentation request. Because it will be
asked if you don't explicitly say what will happen.

>>> @@ -4951,6 +4951,18 @@ selectExpander(Walker * pWalker, Select * p)
>>>  		       || (pE->pLeft != 0 && pE->pLeft->op == TK_ID));
>>>  		if (pE->op == TK_DOT && pE->pRight->op == TK_ASTERISK)
>>>  			break;
>>> +		/*
>>> +		 * It is necessary to generate auto names for
>>> +		 * expressions before clauses such as ORDER BY,
>>> +		 * GROUP BY will be resolved. These names can be
>>> +		 * found inside these clauses during resolving.
>>> +		 */
>>> +		if (pEList->a[k].zName == NULL && pE->op != TK_DOT &&
>>> +		    ((pE->flags & EP_Leaf) == 0 || pE->op != TK_ID)) {
>>> +			uint32_t idx = ++pParse->autoname_i;
>>> +			pEList->a[k].zName =
>>> +				sqlDbStrDup(db, sql_generate_column_name(idx));
>> 4. I don't think I understand what is happening here. Looks too
>> complicated. Why do you need so many checks, what do these checks
>> mean, how are they related to ORDER BY/GROUP BY? And how is it
>> related to select expansion (the function is called selectExpander())?
>> Why does it work fine when I use normal column names in ORDER BY/GROUP BY,
>> and does not work here without this crutch?
> Here I name all items (generate names) in expression list if it is needed.
> I think this operation can be attributed to "expanding". Expanding is before
> name resolving. At the resolving all items must be named.
> 
> You can use normal column names in ORDER BY, GROUP BY etc, I checked it in Postgre
> (Postgre names expression the same way). I think it is OK.

Can you use auto names in PostgreSQL too? Inside the query. Are they a
full featured column name, or just sugar for result set metadata?

See 5 comments below.

> diff --git a/src/box/sql/select.c b/src/box/sql/select.c
> index 4b069ad..7a56136 100644
> --- a/src/box/sql/select.c
> +++ b/src/box/sql/select.c
> @@ -1989,6 +1987,8 @@ sqlColumnsFromExprList(Parse * parse, ExprList * expr_list,
>              memcpy(space_def->fields[i].name, zName, name_len);
>              space_def->fields[i].name[name_len] = '\0';
>          }
> +        sqlDbFree(db, expr_list->a[i].zName);
> +        expr_list->a[i].zName = zName;

1. I deleted this hunk and nothing changed in the tests. Why do you need it?

>      }
>  cleanup:
>      sqlHashClear(&ht);
> @@ -4792,6 +4792,25 @@ selectPopWith(Walker * pWalker, Select * p)
>      }
>  }
>  
> +/**
> + * Determine whether to generate a name for @a expr or not.
> + *
> + * Auto generated names is needed for every item in  a <SELECT>
> + * expression list except asterisks, dots and column names (also
> + * if this item hasn't alias).
> + *
> + * @param expr Expression from expression list to analyze.
> + *
> + * @retval true/false If item is/isn't needed needed to name.
> + */
> +static bool
> +is_needed_to_name(struct Expr *expr) {

2. We put { on separate line.

> +    if (expr->op != TK_ASTERISK && expr->op != TK_DOT &&
> +        ((expr->flags & EP_Leaf) == 0 || expr->op != TK_ID))
> +        return true;
> +    return false;

3. You could simply return the condition itself. It is a boolean
anyway.

What I don't understand is what is 'leaf'. For example, there is an
expression '1 + 2'. Are 1 and 2 leafs? Is '1 + 2' a leaf? What is a
definition of the leaf here?

> +}
> +
>  /*
>   * This routine is a Walker callback for "expanding" a SELECT statement.
>   * "Expanding" means to do the following:
> @@ -4941,19 +4960,27 @@ selectExpander(Walker * pWalker, Select * p)
>       * all tables.
>       *
>       * The first loop just checks to see if there are any "*" operators
> -     * that need expanding.
> +     * that need expanding and name items of the expression
> +     * list if needed.
>       */
> +    i = pEList->nExpr;
>      for (k = 0; k < pEList->nExpr; k++) {

4. Why do you have 2 iterators now?

>          pE = pEList->a[k].pExpr;
> -        if (pE->op == TK_ASTERISK)
> -            break;
> +        if (pE->op == TK_ASTERISK && i == pEList->nExpr)
> +            i = k;
>          assert(pE->op != TK_DOT || pE->pRight != 0);
>          assert(pE->op != TK_DOT
>                 || (pE->pLeft != 0 && pE->pLeft->op == TK_ID));
> -        if (pE->op == TK_DOT && pE->pRight->op == TK_ASTERISK)
> -            break;
> +        if (pE->op == TK_DOT && pE->pRight->op == TK_ASTERISK  &&
> +            i == pEList->nExpr)
> +            i = k;
> +        if (pEList->a[k].zName == NULL && is_needed_to_name(pE)) {
> +            uint32_t idx = ++pParse->autoname_i;
> +            pEList->a[k].zName =
> +                sqlDbStrDup(db, sql_generate_column_name(idx));
> +        }
>      }
> -    if (k < pEList->nExpr) {
> +    if (i < pEList->nExpr) {
>          /*
>           * If we get here it means the result set contains one or more "*"
>           * operators that need to be expanded.  Loop through each expression

5. I don't see any test for the case when a table contains COLUMN_1, COLUMN_2, etc
columns. As real columns. What will happen with the expression names then? This
should be investigated how is it done in other DBs, and documented in the docbot
request for us. Also probably you may need to ask Peter about it, if the other DBs'
cases don't offer anything good enough to blindly copy.

next prev parent reply	other threads:[~2020-06-23 23:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-11 15:18 [Tarantool-patches] [PATCH v3 0/2] Use-unify-pattern-for-column-names Roman Khabibov
2020-06-11 15:18 ` [Tarantool-patches] [PATCH v3 1/2] sql: use unify pattern for column names Roman Khabibov
2020-06-15 21:59   ` Vladislav Shpilevoy
2020-06-22 21:14     ` roman
2020-06-23 23:12       ` Vladislav Shpilevoy [this message]
2020-06-25 14:35         ` Roman Khabibov
2020-06-25 21:25           ` Vladislav Shpilevoy
2020-06-27 11:50             ` Roman Khabibov
2020-06-29 20:08               ` Vladislav Shpilevoy
2020-06-29 23:46                 ` Roman Khabibov
2020-06-30 21:23                   ` Vladislav Shpilevoy
2020-07-01 12:45                     ` Roman Khabibov
2020-07-01 21:25                       ` Vladislav Shpilevoy
2020-07-02 15:55                         ` Roman Khabibov
2020-06-11 15:18 ` [Tarantool-patches] [PATCH v3 2/2] sql: print span more properly Roman Khabibov
2020-06-15 22:18   ` Vladislav Shpilevoy
2020-06-22 21:14     ` roman
2020-06-23 23:12       ` Vladislav Shpilevoy
2020-07-02 15:55         ` Roman Khabibov
2020-07-02 19:06           ` Nikita Pettik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e1e9043a-e48f-97f8-2ec5-71791537b50d@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=roman.habibov@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v3 1/2] sql: use unify pattern for column names' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox