Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: "n.pettik" <korablev@tarantool.org>, tarantool-patches@freelists.org
Subject: [tarantool-patches] Re: [PATCH 4/9] sql: introduce type boolean
Date: Mon, 22 Apr 2019 21:02:20 +0300	[thread overview]
Message-ID: <afbeb182-e5d0-4e5a-e010-57913e36d345@tarantool.org> (raw)
In-Reply-To: <B0F64CC7-8817-4251-9849-5F89C5E398D0@tarantool.org>

Hi! Thanks for the fixes! See 3 comments below.

> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
> index 6b38e8e66..d70b64c45 100644
> --- a/src/box/sql/expr.c
> +++ b/src/box/sql/expr.c
> @@ -3775,7 +3766,7 @@ sqlExprCodeTarget(Parse * pParse, Expr * pExpr, int target)
>                 }
>         case TK_TRUE:
>         case TK_FALSE: {
> -                       vdbe_emit_bool(pParse->pVdbe, pExpr, target);
> +                       sqlVdbeAddOp2(v, OP_Bool, op == TK_TRUE, target);
>                         return target;
>                 }
> 
>> 6. I see, that all other 'case's set pExpr->type. Why don't you
>> do it here?
> 
> It is required only for non-constant values. For literals it is assigned
> in parse.y spanExpr().

1. Then why case TK_INTEGER sets 'pExpr->type = FIELD_TYPE_INTEGER',
TK_STRING sets 'pExpr->type = FIELD_TYPE_STRING;', TK_BLOB sets
'pExpr->type = FIELD_TYPE_SCALAR;' - I see all of them being set in
parse.y.

On the contrary, there are places which create a new Expr, and
does not set type even in parse.y. For example:

    1240: A.pExpr = sql_expr_new_dequoted(pParse->db, TK_INTEGER, &sqlIntTokens[N]);

Here the expr is created and spanExpr() is not used. How does it
work?

>> 8. Looking at all these type comparison prohibitions I am getting
>> afraid of how will we select from SCALAR columns?
>>
>> A schema:
>>
>>    box.execute('CREATE TABLE t (id INTEGER PRIMARY KEY, a SCALAR UNIQUE);')
>>    box.execute("INSERT INTO t VALUES (1, 1), (2, true), (3, false), (4, 'str')")
>>
>> SQL select:
>>
>>> box.execute('SELECT * FROM t WHERE a > 1')
>>    ---
>>    - error: 'Type mismatch: can not convert INTEGER to boolean'
>>    ...
>>
>> The same Lua select:
>>
>>> box.space.T.index[1]:select({1}, {iterator = 'GT'})
>>    ---
>>    - - [4, 'str']
>>    ...
>>
>> In fact, now we can not use SCALAR in SQL for any comparisons
>> because it will raise type mismatch on literally everything.
>>
>> What are we going to do with it? IMO, we could add a flag like
>> 'is_scalar' to struct Mem in its flags section, which would allow
>> to compare this value with anything.
> 
> Konstantin suggested to extend struct Mem with field_type,
> which is going to substitute this flag.
> 
>> Looks crutchy though. And is
>> not a part of this issue of course, but should be filed into the
>> issue list.
>>
>> I see a similar issue https://github.com/tarantool/tarantool/issues/4124.
>> Probably, it is worth extending it instead of opening new.
> 
> I consider this as a trade-off of using scalar: users shouldn’t use this
> type until they are really have to. Nevertheless, they still can use CAST
> operator and switch case, or write their own lua-function converting values
> to required type.

2. Users may have good designed schema and work in each request with only
one type in such a column, but still they will get errors just because
the iterator can not walk to the needed tree node. Lets modify my example:

    SELECT * FROM t WHERE a = 1;

This request implicitly says that it will work with numbers only, but
somehow its success depends on the content of the space, not even
related to '1' value. In a nutshell it means that insertion of something
into a space can break existing requests not related to the inserted
data. It is weird. Such scalar type would be useless.

>>> + *
>>> + * @param str String to be converted to boolean.
>>> + *            Assumed to be null terminated.
>>> + * @param result Resulting value of cast.
>>> + * @retval 0 If string satisfies conditions above.
>>> + * @retval -1 Otherwise.
>>> + */
>>> +static int
>>> +str_cast_to_boolean(const char *str, bool *result)
>>> +{
>>> +	assert(str != NULL);
>>> +	for (; *str == ' '; str++);
>>> +	size_t rest_str_len = strlen(str);
>>> +	if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
>>> +		*result = true;
>>> +		return 0;
>>> +	}
>>> +	if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
>>> +		*result = false;
>>> +		return 0;
>>> +	}
>>
>> 16. I guess it is far from the most efficient implementation. You
>> calculate length of the whole string before comparison, scanning
>> its twice in the worst and most common case. Please, consider my
>> fixes below. I did not test them though, so probably there are some
>> typos.
>> Note, that in my fixes I've already accounted my comment about
>> trimming trailing whitespaces.
>>
>> ============================================================================
>> @@ -625,16 +625,20 @@ str_cast_to_boolean(const char *str, bool *result)
>> {
>> 	assert(str != NULL);
>> 	for (; *str == ' '; str++);
>> -	size_t rest_str_len = strlen(str);
>> -	if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
>> +	if (strncasecmp(str, "true", 4) == 0) {
> 
> We can’t use strncasecmp() without verification the fact that
> rest_str_len (i.e. string length after skipping leading spaces) is >= 4.

3.1. Why? According to the standard, strncasecmp() compares not more than
'n' characters taking into account terminating zero. I tried these
examples to simulate our case (when 'str' points to the beginning of a
possible 'true/false' token).

	printf("%d\n", strncasecmp("", "true", 4));
	printf("%d\n", strncasecmp("t", "true", 4));
	printf("%d\n", strncasecmp("tr", "true", 4));
	printf("%d\n", strncasecmp("tru", "true", 4));
	printf("%d\n", strncasecmp("true   ", "true", 4));

I got this output:

    -116
    -114
    -117
    -101
    0

It means, that you do not need to call 'strlen' before
'strncasecmp'. I've dropped 'strlen' again and the tests
passed. If you think that explicit length check is mandatory,
then please, provide a test. Probably I'm wrong somewhere in
my speculations.

> 
>> +		str += 4;
>> 		*result = true;
>> -		return 0;
>> -	}
>> -	if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
>> +	} else if (strncasecmp(str, "false", 5) == 0) {
>> +		str += 5;
>> 		*result = false;
>> -		return 0;
>> +	} else {
>> +		return -1;
>> 	}
>> -	return -1;
>> +	for (; *str != 0; ++str) {
>> +		if (*str != ' ')
>> +			return -1;
>> +	}
>> +	return 0;
>> }
>> ============================================================================
> 
> Taking into account my comment diff is next:
> 
> diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
> index 92406f938..19b42b992 100644
> --- a/src/box/sql/vdbemem.c
> +++ b/src/box/sql/vdbemem.c
> @@ -626,15 +626,20 @@ str_cast_to_boolean(const char *str, bool *result)
>         assert(str != NULL);
>         for (; *str == ' '; str++);
>         size_t rest_str_len = strlen(str);

3.2. My main point was that we should not call
strlen() - it leads to double scan of the string. If you
still need strlen(), then the fixes are useless. But consider
my objectives above, about the POSIX standard.

> -       if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
> +       if (rest_str_len >= 4 && strncasecmp(str, "true", 4) == 0) {
>                 *result = true;
> -               return 0;
> -       }
> -       if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
> +               str += 4;
> +       } else if (rest_str_len >= 5 && strncasecmp(str, "false", 5) == 0) {
>                 *result = false;
> -               return 0;
> +               str += 5;
> +       } else {
> +               return -1;
>         }
> -       return -1;
> +       for (; *str != '\0'; ++str) {
> +               if (*str != ' ')
> +                       return -1;
> +       }
> +       return 0;
>  }

My fixes, on the branch:

==================================================================
diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 08bea9d75..dbbb7fe23 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -1102,8 +1102,8 @@ quoteFunc(sql_context * context, int argc, sql_value ** argv)
 			break;
 		}
 	case MP_BOOL: {
-		sql_result_text(context, argv[0]->u.b ? "true" : "false", -1,
-				SQL_TRANSIENT);
+		sql_result_text(context, sql_value_boolean(argv[0]) ?
+				"true" : "false", -1, SQL_TRANSIENT);
 		break;
 	}
 	default:{
diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h
index 468c89521..1f77d34cd 100644
--- a/src/box/sql/sqlInt.h
+++ b/src/box/sql/sqlInt.h
@@ -583,7 +583,7 @@ int
 sql_column_int(sql_stmt *, int iCol);
 
 bool
-sql_column_boolean(sql_stmt *stmt, int column);
+sql_column_boolean(struct sql_stmt *stmt, int column);
 
 sql_int64
 sql_column_int64(sql_stmt *, int iCol);
diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h
index 73c77efed..db8458a6a 100644
--- a/src/box/sql/vdbeInt.h
+++ b/src/box/sql/vdbeInt.h
@@ -510,7 +510,7 @@ int sqlVdbeMemIntegerify(Mem *, bool is_forced);
 int sqlVdbeRealValue(Mem *, double *);
 
 int
-mem_value_bool(Mem *mem, bool *b);
+mem_value_bool(const struct Mem *mem, bool *b);
 
 int mem_apply_integer_type(Mem *);
 int sqlVdbeMemRealify(Mem *);
diff --git a/src/box/sql/vdbeapi.c b/src/box/sql/vdbeapi.c
index cfd65075e..9bbe6d2ad 100644
--- a/src/box/sql/vdbeapi.c
+++ b/src/box/sql/vdbeapi.c
@@ -991,7 +991,7 @@ sql_column_int(sql_stmt * pStmt, int i)
 }
 
 bool
-sql_column_boolean(sql_stmt *stmt, int i)
+sql_column_boolean(struct sql_stmt *stmt, int i)
 {
 	bool val = sql_value_boolean(columnMem(stmt, i));
 	columnMallocFailure(stmt);
diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
index 19b42b992..0bc1c9466 100644
--- a/src/box/sql/vdbemem.c
+++ b/src/box/sql/vdbemem.c
@@ -502,9 +502,9 @@ sqlVdbeRealValue(Mem * pMem, double *v)
 }
 
 int
-mem_value_bool(Mem *mem, bool *b)
+mem_value_bool(const struct Mem *mem, bool *b)
 {
-	if (mem->flags  & MEM_Bool) {
+	if ((mem->flags  & MEM_Bool) != 0) {
 		*b = mem->u.b;
 		return 0;
 	}
@@ -614,9 +614,9 @@ sqlVdbeMemNumerify(Mem * pMem)
  * For instance, "   tRuE  " can be successfully converted to
  * boolean value true.
  *
- * @param str String to be converted to boolean.
- *            Assumed to be null terminated.
- * @param result Resulting value of cast.
+ * @param str String to be converted to boolean. Assumed to be
+ *        null terminated.
+ * @param[out] result Resulting value of cast.
  * @retval 0 If string satisfies conditions above.
  * @retval -1 Otherwise.
  */
@@ -625,11 +625,10 @@ str_cast_to_boolean(const char *str, bool *result)
 {
 	assert(str != NULL);
 	for (; *str == ' '; str++);
-	size_t rest_str_len = strlen(str);
-	if (rest_str_len >= 4 && strncasecmp(str, "true", 4) == 0) {
+	if (strncasecmp(str, "true", 4) == 0) {
 		*result = true;
 		str += 4;
-	} else if (rest_str_len >= 5 && strncasecmp(str, "false", 5) == 0) {
+	} else if (strncasecmp(str, "false", 5) == 0) {
 		*result = false;
 		str += 5;
 	} else {
@@ -680,7 +679,7 @@ sqlVdbeMemCast(Mem * pMem, enum field_type type)
 			pMem->u.b = value;
 			return 0;
 		}
-		if ((pMem->flags & MEM_Bool)  != 0)
+		if ((pMem->flags & MEM_Bool) != 0)
 			return 0;
 		return -1;
 	case FIELD_TYPE_INTEGER:

  reply	other threads:[~2019-04-22 18:02 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-14 15:03 [tarantool-patches] [PATCH 0/9] Introduce type BOOLEAN in SQL Nikita Pettik
2019-04-14 15:03 ` [tarantool-patches] [PATCH 1/9] sql: refactor mem_apply_numeric_type() Nikita Pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 2/9] sql: disallow text values participate in sum() aggregate Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:54     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy
2019-04-23 19:58         ` n.pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 3/9] sql: use msgpack types instead of custom ones Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:54     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy
2019-04-23 19:58         ` n.pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 4/9] sql: introduce type boolean Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:54     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy [this message]
2019-04-23 19:58         ` n.pettik
2019-04-23 21:06           ` Vladislav Shpilevoy
2019-04-14 15:04 ` [tarantool-patches] [PATCH 5/9] sql: improve type determination for column meta Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:54     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy
2019-04-23 19:58         ` n.pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 6/9] sql: make comparison predicate return boolean Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:54     ` n.pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 7/9] sql: make predicates accept and " Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:55     ` n.pettik
2019-04-14 15:04 ` [tarantool-patches] [PATCH 9/9] sql: make <search condition> accept only boolean Nikita Pettik
2019-04-16 14:12   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-18 17:55     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy
2019-04-23 19:59         ` n.pettik
2019-04-23 21:06           ` Vladislav Shpilevoy
2019-04-23 22:01             ` n.pettik
     [not found] ` <b2a84f129c2343d3da3311469cbb7b20488a21c2.1555252410.git.korablev@tarantool.org>
2019-04-16 14:12   ` [tarantool-patches] Re: [PATCH 8/9] sql: make LIKE predicate return boolean result Vladislav Shpilevoy
2019-04-18 17:55     ` n.pettik
2019-04-22 18:02       ` Vladislav Shpilevoy
2019-04-23 19:58         ` n.pettik
2019-04-24 10:28 ` [tarantool-patches] Re: [PATCH 0/9] Introduce type BOOLEAN in SQL Vladislav Shpilevoy
2019-04-25  8:46 ` Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afbeb182-e5d0-4e5a-e010-57913e36d345@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=korablev@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH 4/9] sql: introduce type boolean' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox