[tarantool-patches] Re: [PATCH 4/9] sql: introduce type boolean
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Mon Apr 22 21:02:20 MSK 2019
Hi! Thanks for the fixes! See 3 comments below.
> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
> index 6b38e8e66..d70b64c45 100644
> --- a/src/box/sql/expr.c
> +++ b/src/box/sql/expr.c
> @@ -3775,7 +3766,7 @@ sqlExprCodeTarget(Parse * pParse, Expr * pExpr, int target)
> }
> case TK_TRUE:
> case TK_FALSE: {
> - vdbe_emit_bool(pParse->pVdbe, pExpr, target);
> + sqlVdbeAddOp2(v, OP_Bool, op == TK_TRUE, target);
> return target;
> }
>
>> 6. I see, that all other 'case's set pExpr->type. Why don't you
>> do it here?
>
> It is required only for non-constant values. For literals it is assigned
> in parse.y spanExpr().
1. Then why case TK_INTEGER sets 'pExpr->type = FIELD_TYPE_INTEGER',
TK_STRING sets 'pExpr->type = FIELD_TYPE_STRING;', TK_BLOB sets
'pExpr->type = FIELD_TYPE_SCALAR;' - I see all of them being set in
parse.y.
On the contrary, there are places which create a new Expr, and
does not set type even in parse.y. For example:
1240: A.pExpr = sql_expr_new_dequoted(pParse->db, TK_INTEGER, &sqlIntTokens[N]);
Here the expr is created and spanExpr() is not used. How does it
work?
>> 8. Looking at all these type comparison prohibitions I am getting
>> afraid of how will we select from SCALAR columns?
>>
>> A schema:
>>
>> box.execute('CREATE TABLE t (id INTEGER PRIMARY KEY, a SCALAR UNIQUE);')
>> box.execute("INSERT INTO t VALUES (1, 1), (2, true), (3, false), (4, 'str')")
>>
>> SQL select:
>>
>>> box.execute('SELECT * FROM t WHERE a > 1')
>> ---
>> - error: 'Type mismatch: can not convert INTEGER to boolean'
>> ...
>>
>> The same Lua select:
>>
>>> box.space.T.index[1]:select({1}, {iterator = 'GT'})
>> ---
>> - - [4, 'str']
>> ...
>>
>> In fact, now we can not use SCALAR in SQL for any comparisons
>> because it will raise type mismatch on literally everything.
>>
>> What are we going to do with it? IMO, we could add a flag like
>> 'is_scalar' to struct Mem in its flags section, which would allow
>> to compare this value with anything.
>
> Konstantin suggested to extend struct Mem with field_type,
> which is going to substitute this flag.
>
>> Looks crutchy though. And is
>> not a part of this issue of course, but should be filed into the
>> issue list.
>>
>> I see a similar issue https://github.com/tarantool/tarantool/issues/4124.
>> Probably, it is worth extending it instead of opening new.
>
> I consider this as a trade-off of using scalar: users shouldn’t use this
> type until they are really have to. Nevertheless, they still can use CAST
> operator and switch case, or write their own lua-function converting values
> to required type.
2. Users may have good designed schema and work in each request with only
one type in such a column, but still they will get errors just because
the iterator can not walk to the needed tree node. Lets modify my example:
SELECT * FROM t WHERE a = 1;
This request implicitly says that it will work with numbers only, but
somehow its success depends on the content of the space, not even
related to '1' value. In a nutshell it means that insertion of something
into a space can break existing requests not related to the inserted
data. It is weird. Such scalar type would be useless.
>>> + *
>>> + * @param str String to be converted to boolean.
>>> + * Assumed to be null terminated.
>>> + * @param result Resulting value of cast.
>>> + * @retval 0 If string satisfies conditions above.
>>> + * @retval -1 Otherwise.
>>> + */
>>> +static int
>>> +str_cast_to_boolean(const char *str, bool *result)
>>> +{
>>> + assert(str != NULL);
>>> + for (; *str == ' '; str++);
>>> + size_t rest_str_len = strlen(str);
>>> + if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
>>> + *result = true;
>>> + return 0;
>>> + }
>>> + if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
>>> + *result = false;
>>> + return 0;
>>> + }
>>
>> 16. I guess it is far from the most efficient implementation. You
>> calculate length of the whole string before comparison, scanning
>> its twice in the worst and most common case. Please, consider my
>> fixes below. I did not test them though, so probably there are some
>> typos.
>> Note, that in my fixes I've already accounted my comment about
>> trimming trailing whitespaces.
>>
>> ============================================================================
>> @@ -625,16 +625,20 @@ str_cast_to_boolean(const char *str, bool *result)
>> {
>> assert(str != NULL);
>> for (; *str == ' '; str++);
>> - size_t rest_str_len = strlen(str);
>> - if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
>> + if (strncasecmp(str, "true", 4) == 0) {
>
> We can’t use strncasecmp() without verification the fact that
> rest_str_len (i.e. string length after skipping leading spaces) is >= 4.
3.1. Why? According to the standard, strncasecmp() compares not more than
'n' characters taking into account terminating zero. I tried these
examples to simulate our case (when 'str' points to the beginning of a
possible 'true/false' token).
printf("%d\n", strncasecmp("", "true", 4));
printf("%d\n", strncasecmp("t", "true", 4));
printf("%d\n", strncasecmp("tr", "true", 4));
printf("%d\n", strncasecmp("tru", "true", 4));
printf("%d\n", strncasecmp("true ", "true", 4));
I got this output:
-116
-114
-117
-101
0
It means, that you do not need to call 'strlen' before
'strncasecmp'. I've dropped 'strlen' again and the tests
passed. If you think that explicit length check is mandatory,
then please, provide a test. Probably I'm wrong somewhere in
my speculations.
>
>> + str += 4;
>> *result = true;
>> - return 0;
>> - }
>> - if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
>> + } else if (strncasecmp(str, "false", 5) == 0) {
>> + str += 5;
>> *result = false;
>> - return 0;
>> + } else {
>> + return -1;
>> }
>> - return -1;
>> + for (; *str != 0; ++str) {
>> + if (*str != ' ')
>> + return -1;
>> + }
>> + return 0;
>> }
>> ============================================================================
>
> Taking into account my comment diff is next:
>
> diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
> index 92406f938..19b42b992 100644
> --- a/src/box/sql/vdbemem.c
> +++ b/src/box/sql/vdbemem.c
> @@ -626,15 +626,20 @@ str_cast_to_boolean(const char *str, bool *result)
> assert(str != NULL);
> for (; *str == ' '; str++);
> size_t rest_str_len = strlen(str);
3.2. My main point was that we should not call
strlen() - it leads to double scan of the string. If you
still need strlen(), then the fixes are useless. But consider
my objectives above, about the POSIX standard.
> - if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
> + if (rest_str_len >= 4 && strncasecmp(str, "true", 4) == 0) {
> *result = true;
> - return 0;
> - }
> - if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
> + str += 4;
> + } else if (rest_str_len >= 5 && strncasecmp(str, "false", 5) == 0) {
> *result = false;
> - return 0;
> + str += 5;
> + } else {
> + return -1;
> }
> - return -1;
> + for (; *str != '\0'; ++str) {
> + if (*str != ' ')
> + return -1;
> + }
> + return 0;
> }
My fixes, on the branch:
==================================================================
diff --git a/src/box/sql/func.c b/src/box/sql/func.c
index 08bea9d75..dbbb7fe23 100644
--- a/src/box/sql/func.c
+++ b/src/box/sql/func.c
@@ -1102,8 +1102,8 @@ quoteFunc(sql_context * context, int argc, sql_value ** argv)
break;
}
case MP_BOOL: {
- sql_result_text(context, argv[0]->u.b ? "true" : "false", -1,
- SQL_TRANSIENT);
+ sql_result_text(context, sql_value_boolean(argv[0]) ?
+ "true" : "false", -1, SQL_TRANSIENT);
break;
}
default:{
diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h
index 468c89521..1f77d34cd 100644
--- a/src/box/sql/sqlInt.h
+++ b/src/box/sql/sqlInt.h
@@ -583,7 +583,7 @@ int
sql_column_int(sql_stmt *, int iCol);
bool
-sql_column_boolean(sql_stmt *stmt, int column);
+sql_column_boolean(struct sql_stmt *stmt, int column);
sql_int64
sql_column_int64(sql_stmt *, int iCol);
diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h
index 73c77efed..db8458a6a 100644
--- a/src/box/sql/vdbeInt.h
+++ b/src/box/sql/vdbeInt.h
@@ -510,7 +510,7 @@ int sqlVdbeMemIntegerify(Mem *, bool is_forced);
int sqlVdbeRealValue(Mem *, double *);
int
-mem_value_bool(Mem *mem, bool *b);
+mem_value_bool(const struct Mem *mem, bool *b);
int mem_apply_integer_type(Mem *);
int sqlVdbeMemRealify(Mem *);
diff --git a/src/box/sql/vdbeapi.c b/src/box/sql/vdbeapi.c
index cfd65075e..9bbe6d2ad 100644
--- a/src/box/sql/vdbeapi.c
+++ b/src/box/sql/vdbeapi.c
@@ -991,7 +991,7 @@ sql_column_int(sql_stmt * pStmt, int i)
}
bool
-sql_column_boolean(sql_stmt *stmt, int i)
+sql_column_boolean(struct sql_stmt *stmt, int i)
{
bool val = sql_value_boolean(columnMem(stmt, i));
columnMallocFailure(stmt);
diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
index 19b42b992..0bc1c9466 100644
--- a/src/box/sql/vdbemem.c
+++ b/src/box/sql/vdbemem.c
@@ -502,9 +502,9 @@ sqlVdbeRealValue(Mem * pMem, double *v)
}
int
-mem_value_bool(Mem *mem, bool *b)
+mem_value_bool(const struct Mem *mem, bool *b)
{
- if (mem->flags & MEM_Bool) {
+ if ((mem->flags & MEM_Bool) != 0) {
*b = mem->u.b;
return 0;
}
@@ -614,9 +614,9 @@ sqlVdbeMemNumerify(Mem * pMem)
* For instance, " tRuE " can be successfully converted to
* boolean value true.
*
- * @param str String to be converted to boolean.
- * Assumed to be null terminated.
- * @param result Resulting value of cast.
+ * @param str String to be converted to boolean. Assumed to be
+ * null terminated.
+ * @param[out] result Resulting value of cast.
* @retval 0 If string satisfies conditions above.
* @retval -1 Otherwise.
*/
@@ -625,11 +625,10 @@ str_cast_to_boolean(const char *str, bool *result)
{
assert(str != NULL);
for (; *str == ' '; str++);
- size_t rest_str_len = strlen(str);
- if (rest_str_len >= 4 && strncasecmp(str, "true", 4) == 0) {
+ if (strncasecmp(str, "true", 4) == 0) {
*result = true;
str += 4;
- } else if (rest_str_len >= 5 && strncasecmp(str, "false", 5) == 0) {
+ } else if (strncasecmp(str, "false", 5) == 0) {
*result = false;
str += 5;
} else {
@@ -680,7 +679,7 @@ sqlVdbeMemCast(Mem * pMem, enum field_type type)
pMem->u.b = value;
return 0;
}
- if ((pMem->flags & MEM_Bool) != 0)
+ if ((pMem->flags & MEM_Bool) != 0)
return 0;
return -1;
case FIELD_TYPE_INTEGER:
More information about the Tarantool-patches
mailing list