[tarantool-patches] Re: [PATCH 2/2] sql: compute resulting collation for concatenation

n.pettik korablev at tarantool.org
Fri Feb 15 02:26:27 MSK 2019



> On 24 Jan 2019, at 21:29, Vladislav Shpilevoy <v.shpilevoy at tarantool.org> wrote:
> 
> Thanks for the patch! See 7 comments below.

Sorry for delayed answer.

> On 16/01/2019 16:13, Nikita Pettik wrote:
>> According to ANSI, result of concatenation operation should derive
>> collation sequence from its operands. Now it is not true: result is
>> always comes with no ("none") collation.
>> In a nutshell*, rules are quite simple:
> 
> 1. If you want to put a reference, then use [<number>], like in
> literature and like Vova does in his commits. Firstly, I thought
> that '*' was a typo.

Ok, fixed.

>> a) If some data type  has an explicit collation EC1, then every data
> 
> 2. Double space after 'type'. In some other places below too.

Ok, fixed. As a rule I use vim auto-formatting utility, which aligns
borders to 72 chars automatically putting extra spaces between
sentences. I think it is kind of normal.

>> type that has an explicit collation shall have declared type collation
>> that is EC1.  The collation derivation is explicit and the collation is
>> EC1.
>> b) If every data type has an implicit collation, then:
>>  - If every data type has the same declared type collation IC1, then
>>    the collation derivation is implicit and the collation is IC1.
>>  - Otherwise, the collation derivation is none.
>> c) Otherwise, the collation derivation is none.
>> *Read complete statement at 9.5 Result of data type combinations
> 
> 3. Please, say a bit more words: that 9.5 is a chapter in an SQL
> standard, and the standard of which year it is.

Ok, done.

Fixed commit message:

Author: Nikita Pettik <korablev at tarantool.org>
Date:   Tue Jan 15 15:18:37 2019 +0300

    sql: compute resulting collation for concatenation
    
    According to ANSI, result of concatenation operation should derive
    collation sequence from its operands. Now it is not true: result is
    always comes with no ("none") collation.
    
    In a nutshell[1], rules are quite simple:
    
    a) If some data type has an explicit collation EC1, then every data type
    that has an explicit collation shall have declared type collation that
    is EC1.  The collation derivation is explicit and the collation is EC1.
    
    b) If every data type has an implicit collation, then:
    
     - If every data type has the same declared type collation IC1, then
       the collation derivation is implicit and the collation is IC1.
    
     - Otherwise, the collation derivation is none.
    
    c) Otherwise, the collation derivation is none.
    
    [1] Read complete statement at chapter 9.5 Result of data type
    combinations, ANSI 2013, Part 2: Foundations.
    
    Closes #3937

>> Closes #3937
>> ---
>>  src/box/sql/expr.c          |  47 +++++++++++++++++++-
>>  test/sql/collation.result   | 102 ++++++++++++++++++++++++++++++++++++++++++++
>>  test/sql/collation.test.lua |  46 ++++++++++++++++++++
>>  3 files changed, 193 insertions(+), 2 deletions(-)
>> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
>> index f8819f779..e6f536757 100644
>> --- a/src/box/sql/expr.c
>> +++ b/src/box/sql/expr.c
>> @@ -221,6 +221,45 @@ sql_expr_coll(Parse *parse, Expr *p, bool *is_explicit_coll, uint32_t *coll_id)
>>  			}
>>  			break;
>>  		}
>> +		if (op == TK_CONCAT) {
>> +			/*
>> +			 * Despite the fact that procedure below
>> +			 * is very similar to collation_check_compatability(),
>> +			 * it is slightly different: when both
>> +			 * operands have different implicit collations,
>> +			 * derived collation should be "none",
>> +			 * i.e. no collation is used at all
>> +			 * (instead of raising error).
> 
> 4. Typo: collation_check_compatability -> collation*S*_check_compat*I*bility.
> 
> Also, I think that it is not worth mentioning the difference here
> with that function, especially in such a big comment, looks like an excuse.
> It is better to put a link to the standard.

As you wish:

diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
index e6f536757..031b1f16f 100644
--- a/src/box/sql/expr.c
+++ b/src/box/sql/expr.c
@@ -223,13 +223,10 @@ sql_expr_coll(Parse *parse, Expr *p, bool *is_explicit_coll, uint32_t *coll_id)
                }
                if (op == TK_CONCAT) {
                        /*
-                        * Despite the fact that procedure below
-                        * is very similar to collation_check_compatability(),
-                        * it is slightly different: when both
-                        * operands have different implicit collations,
-                        * derived collation should be "none",
-                        * i.e. no collation is used at all
-                        * (instead of raising error).
+                        * Procedure below provides compatibility
+                        * checks declared in ANSI SQL 2013:
+                        * chapter 9.5 Result of data type
+                        * combinations.
                         */
>> +			bool is_rhs_forced;
>> +			uint32_t rhs_coll_id;
>> +			if (sql_expr_coll(parse, p->pRight, &is_rhs_forced,
>> +					  &rhs_coll_id) != 0)
>> +				return -1;
>> +			if (is_lhs_forced && is_rhs_forced) {
>> +				if (lhs_coll_id != rhs_coll_id)
>> +					return -1;
> 
> 5. Did you miss diag_set?

No, I did it on purpose. Firstly, this function is recursive,
so in case error occurred on bottom levels of recursion,
diag would be reseted on each level above (and parser’s
counter of errors would be incremented several times as
well). No terrible consequences would take place in this case,
but it looks like a bad pattern. Anyway, fixed it according to
your suggestion.

Secondly, many functions which call sql_expr_coll
don’t expect that it can return real error and they are not
adapted to handle them (e.g. wherePathSatisfiesOrderBy).
In original SQLite this function don’t fail under any circumstances.
Sure, if error is set as Parse->err, sooner or later it will arise
(I guess this is likely to be global problem of SQLite).

Diff (cut from context, sorry):

+                       if (is_lhs_forced && is_rhs_forced) {
+                               if (lhs_coll_id != rhs_coll_id) {
+                                       diag_set(ClientError,
+                                                ER_ILLEGAL_COLLATION_MIX);
+                                       parse->nErr++;
+                                       parse->rc = SQL_TARANTOOL_ERROR;
+                                       return -1;
+                               }
+                       }

>> +			}
>> +			if (is_lhs_forced) {
>> +				*coll_id = lhs_coll_id;
>> +				*is_explicit_coll = true;
>> +				return 0;
> 
> 6. In this function (sql_expr_coll) to break the cycle 'break'
> keyword is used, so lets be consistent and use 'break' as well.

Ok:

@@ -248,17 +245,17 @@ sql_expr_coll(Parse *parse, Expr *p, bool *is_explicit_coll, uint32_t *coll_id)
                        if (is_lhs_forced) {
                                *coll_id = lhs_coll_id;
                                *is_explicit_coll = true;
-                               return 0;
+                               break;
                        }
                        if (is_rhs_forced) {
                                *coll_id = rhs_coll_id;
                                *is_explicit_coll = true;
-                               return 0;
+                               break;
                        }
                        if (rhs_coll_id != lhs_coll_id)
-                               return 0;
+                               break;
                        *coll_id = lhs_coll_id;
-                       return 0;
+                       break;
                }
                if (p->flags & EP_Collate) {
                        if (p->pLeft && (p->pLeft->flags & EP_Collate) != 0) {

>> 
>> @@ -384,10 +423,14 @@ sql_binary_compare_coll_seq(Parse *parser, Expr *left, Expr *right)
>>  	bool is_rhs_forced;
>>  	uint32_t lhs_coll_id;
>>  	uint32_t rhs_coll_id;
>> -	if (sql_expr_coll(parser, left, &is_lhs_forced, &lhs_coll_id) != 0)
>> +	if (sql_expr_coll(parser, left, &is_lhs_forced, &lhs_coll_id) != 0) {
>> +		diag_set(ClientError, ER_ILLEGAL_COLLATION_MIX);
>>  		goto err;
>> -	if (sql_expr_coll(parser, right, &is_rhs_forced, &rhs_coll_id) != 0)
>> +	}
>> +	if (sql_expr_coll(parser, right, &is_rhs_forced, &rhs_coll_id) != 0) {
>> +		diag_set(ClientError, ER_ILLEGAL_COLLATION_MIX);
> 
> 7. Why do you set it here and not on the point 5 above? sql_expr_coll
> can return an error not only because of illegal collation mix, but also
> if a collation does not exist, for example.

Hm, I missed this fact. Ok, I’ve done it, see above.






More information about the Tarantool-patches mailing list