[tarantool-patches] Re: [PATCH 4/9] sql: introduce type boolean

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Mon Apr 22 21:02:20 MSK 2019


Hi! Thanks for the fixes! See 3 comments below.

> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
> index 6b38e8e66..d70b64c45 100644
> --- a/src/box/sql/expr.c
> +++ b/src/box/sql/expr.c
> @@ -3775,7 +3766,7 @@ sqlExprCodeTarget(Parse * pParse, Expr * pExpr, int target)
>                 }
>         case TK_TRUE:
>         case TK_FALSE: {
> -                       vdbe_emit_bool(pParse->pVdbe, pExpr, target);
> +                       sqlVdbeAddOp2(v, OP_Bool, op == TK_TRUE, target);
>                         return target;
>                 }
> 
>> 6. I see, that all other 'case's set pExpr->type. Why don't you
>> do it here?
> 
> It is required only for non-constant values. For literals it is assigned
> in parse.y spanExpr().

1. Then why case TK_INTEGER sets 'pExpr->type = FIELD_TYPE_INTEGER',
TK_STRING sets 'pExpr->type = FIELD_TYPE_STRING;', TK_BLOB sets
'pExpr->type = FIELD_TYPE_SCALAR;' - I see all of them being set in
parse.y.

On the contrary, there are places which create a new Expr, and
does not set type even in parse.y. For example:

    1240: A.pExpr = sql_expr_new_dequoted(pParse->db, TK_INTEGER, &sqlIntTokens[N]);

Here the expr is created and spanExpr() is not used. How does it
work?

>> 8. Looking at all these type comparison prohibitions I am getting
>> afraid of how will we select from SCALAR columns?
>>
>> A schema:
>>
>>    box.execute('CREATE TABLE t (id INTEGER PRIMARY KEY, a SCALAR UNIQUE);')
>>    box.execute("INSERT INTO t VALUES (1, 1), (2, true), (3, false), (4, 'str')")
>>
>> SQL select:
>>
>>> box.execute('SELECT * FROM t WHERE a > 1')
>>    ---
>>    - error: 'Type mismatch: can not convert INTEGER to boolean'
>>    ...
>>
>> The same Lua select:
>>
>>> box.space.T.index[1]:select({1}, {iterator = 'GT'})
>>    ---
>>    - - [4, 'str']
>>    ...
>>
>> In fact, now we can not use SCALAR in SQL for any comparisons
>> because it will raise type mismatch on literally everything.
>>
>> What are we going to do with it? IMO, we could add a flag like
>> 'is_scalar' to struct Mem in its flags section, which would allow
>> to compare this value with anything.
> 
> Konstantin suggested to extend struct Mem with field_type,
> which is going to substitute this flag.
> 
>> Looks crutchy though. And is
>> not a part of this issue of course, but should be filed into the
>> issue list.
>>
>> I see a similar issue https://github.com/tarantool/tarantool/issues/4124.
>> Probably, it is worth extending it instead of opening new.
> 
> I consider this as a trade-off of using scalar: users shouldn’t use this
> type until they are really have to. Nevertheless, they still can use CAST
> operator and switch case, or write their own lua-function converting values
> to required type.

2. Users may have good designed schema and work in each request with only
one type in such a column, but still they will get errors just because
the iterator can not walk to the needed tree node. Lets modify my example:

    SELECT * FROM t WHERE a = 1;

This request implicitly says that it will work with numbers only, but
somehow its success depends on the content of the space, not even
related to '1' value. In a nutshell it means that insertion of something
into a space can break existing requests not related to the inserted
data. It is weird. Such scalar type would be useless.

>>> + *
>>> + * @param str String to be converted to boolean.
>>> + *            Assumed to be null terminated.
>>> + * @param result Resulting value of cast.
>>> + * @retval 0 If string satisfies conditions above.
>>> + * @retval -1 Otherwise.
>>> + */
>>> +static int
>>> +str_cast_to_boolean(const char *str, bool *result)
>>> +{
>>> +	assert(str != NULL);
>>> +	for (; *str == ' '; str++);
>>> +	size_t rest_str_len = strlen(str);
>>> +	if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
>>> +		*result = true;
>>> +		return 0;
>>> +	}
>>> +	if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
>>> +		*result = false;
>>> +		return 0;
>>> +	}
>>
>> 16. I guess it is far from the most efficient implementation. You
>> calculate length of the whole string before comparison, scanning
>> its twice in the worst and most common case. Please, consider my
>> fixes below. I did not test them though, so probably there are some
>> typos.
>> Note, that in my fixes I've already accounted my comment about
>> trimming trailing whitespaces.
>>
>> ============================================================================
>> @@ -625,16 +625,20 @@ str_cast_to_boolean(const char *str, bool *result)
>> {
>> 	assert(str != NULL);
>> 	for (; *str == ' '; str++);
>> -	size_t rest_str_len = strlen(str);
>> -	if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
>> +	if (strncasecmp(str, "true", 4) == 0) {
> 
> We can’t use strncasecmp() without verification the fact that
> rest_str_len (i.e. string length after skipping leading spaces) is >= 4.

3.1. Why? According to the standard, strncasecmp() compares not more than
'n' characters taking into account terminating zero. I tried these
examples to simulate our case (when 'str' points to the beginning of a
possible 'true/false' token).

	printf("%d\n", strncasecmp("", "true", 4));
	printf("%d\n", strncasecmp("t", "true", 4));
	printf("%d\n", strncasecmp("tr", "true", 4));
	printf("%d\n", strncasecmp("tru", "true", 4));
	printf("%d\n", strncasecmp("true   ", "true", 4));

I got this output:

    -116
    -114
    -117
    -101
    0

It means, that you do not need to call 'strlen' before
'strncasecmp'. I've dropped 'strlen' again and the tests
passed. If you think that explicit length check is mandatory,
then please, provide a test. Probably I'm wrong somewhere in
my speculations.

> 
>> +		str += 4;
>> 		*result = true;
>> -		return 0;
>> -	}
>> -	if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
>> +	} else if (strncasecmp(str, "false", 5) == 0) {
>> +		str += 5;
>> 		*result = false;
>> -		return 0;
>> +	} else {
>> +		return -1;
>> 	}
>> -	return -1;
>> +	for (; *str != 0; ++str) {
>> +		if (*str != ' ')
>> +			return -1;
>> +	}
>> +	return 0;
>> }
>> ============================================================================
> 
> Taking into account my comment diff is next:
> 
> diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
> index 92406f938..19b42b992 100644
> --- a/src/box/sql/vdbemem.c
> +++ b/src/box/sql/vdbemem.c
> @@ -626,15 +626,20 @@ str_cast_to_boolean(const char *str, bool *result)
>         assert(str != NULL);
>         for (; *str == ' '; str++);
>         size_t rest_str_len = strlen(str);

3.2. My main point was that we should not call
strlen() - it leads to double scan of the string. If you
still need strlen(), then the fixes are useless. But consider
my objectives above, about the POSIX standard.

> -       if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
> +       if (rest_str_len >= 4 && strncasecmp(str, "true", 4) == 0) {
>                 *result = true;
> -               return 0;
> -       }
> -       if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
> +               str += 4;
> +       } else if (rest_str_len >= 5 && strncasecmp(str, "false", 5) == 0) {
>                 *result = false;
> -               return 0;
> +               str += 5;
> +       } else {
> +               return -1;
>         }
> -       return -1;
> +       for (; *str != '\0'; ++str) {
> +               if (*str != ' ')
> +                       return -1;
> +       }
> +       return 0;
>  }

My fixes, on the branch:

==================================================================
diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 08bea9d75..dbbb7fe23 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -1102,8 +1102,8 @@ quoteFunc(sql_context * context, int argc, sql_value ** argv)
 			break;
 		}
 	case MP_BOOL: {
-		sql_result_text(context, argv[0]->u.b ? "true" : "false", -1,
-				SQL_TRANSIENT);
+		sql_result_text(context, sql_value_boolean(argv[0]) ?
+				"true" : "false", -1, SQL_TRANSIENT);
 		break;
 	}
 	default:{
diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h
index 468c89521..1f77d34cd 100644
--- a/src/box/sql/sqlInt.h
+++ b/src/box/sql/sqlInt.h
@@ -583,7 +583,7 @@ int
 sql_column_int(sql_stmt *, int iCol);
 
 bool
-sql_column_boolean(sql_stmt *stmt, int column);
+sql_column_boolean(struct sql_stmt *stmt, int column);
 
 sql_int64
 sql_column_int64(sql_stmt *, int iCol);
diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h
index 73c77efed..db8458a6a 100644
--- a/src/box/sql/vdbeInt.h
+++ b/src/box/sql/vdbeInt.h
@@ -510,7 +510,7 @@ int sqlVdbeMemIntegerify(Mem *, bool is_forced);
 int sqlVdbeRealValue(Mem *, double *);
 
 int
-mem_value_bool(Mem *mem, bool *b);
+mem_value_bool(const struct Mem *mem, bool *b);
 
 int mem_apply_integer_type(Mem *);
 int sqlVdbeMemRealify(Mem *);
diff --git a/src/box/sql/vdbeapi.c b/src/box/sql/vdbeapi.c
index cfd65075e..9bbe6d2ad 100644
--- a/src/box/sql/vdbeapi.c
+++ b/src/box/sql/vdbeapi.c
@@ -991,7 +991,7 @@ sql_column_int(sql_stmt * pStmt, int i)
 }
 
 bool
-sql_column_boolean(sql_stmt *stmt, int i)
+sql_column_boolean(struct sql_stmt *stmt, int i)
 {
 	bool val = sql_value_boolean(columnMem(stmt, i));
 	columnMallocFailure(stmt);
diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
index 19b42b992..0bc1c9466 100644
--- a/src/box/sql/vdbemem.c
+++ b/src/box/sql/vdbemem.c
@@ -502,9 +502,9 @@ sqlVdbeRealValue(Mem * pMem, double *v)
 }
 
 int
-mem_value_bool(Mem *mem, bool *b)
+mem_value_bool(const struct Mem *mem, bool *b)
 {
-	if (mem->flags  & MEM_Bool) {
+	if ((mem->flags  & MEM_Bool) != 0) {
 		*b = mem->u.b;
 		return 0;
 	}
@@ -614,9 +614,9 @@ sqlVdbeMemNumerify(Mem * pMem)
  * For instance, "   tRuE  " can be successfully converted to
  * boolean value true.
  *
- * @param str String to be converted to boolean.
- *            Assumed to be null terminated.
- * @param result Resulting value of cast.
+ * @param str String to be converted to boolean. Assumed to be
+ *        null terminated.
+ * @param[out] result Resulting value of cast.
  * @retval 0 If string satisfies conditions above.
  * @retval -1 Otherwise.
  */
@@ -625,11 +625,10 @@ str_cast_to_boolean(const char *str, bool *result)
 {
 	assert(str != NULL);
 	for (; *str == ' '; str++);
-	size_t rest_str_len = strlen(str);
-	if (rest_str_len >= 4 && strncasecmp(str, "true", 4) == 0) {
+	if (strncasecmp(str, "true", 4) == 0) {
 		*result = true;
 		str += 4;
-	} else if (rest_str_len >= 5 && strncasecmp(str, "false", 5) == 0) {
+	} else if (strncasecmp(str, "false", 5) == 0) {
 		*result = false;
 		str += 5;
 	} else {
@@ -680,7 +679,7 @@ sqlVdbeMemCast(Mem * pMem, enum field_type type)
 			pMem->u.b = value;
 			return 0;
 		}
-		if ((pMem->flags & MEM_Bool)  != 0)
+		if ((pMem->flags & MEM_Bool) != 0)
 			return 0;
 		return -1;
 	case FIELD_TYPE_INTEGER:





More information about the Tarantool-patches mailing list