From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id BB546293FA for ; Mon, 22 Apr 2019 14:02:23 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6TDkw2CFQm4N for ; Mon, 22 Apr 2019 14:02:23 -0400 (EDT) Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id CC45D294A7 for ; Mon, 22 Apr 2019 14:02:22 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH 4/9] sql: introduce type boolean References: <45933291-f592-a292-59b8-2e21db3deed4@tarantool.org> From: Vladislav Shpilevoy Message-ID: Date: Mon, 22 Apr 2019 21:02:20 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: "n.pettik" , tarantool-patches@freelists.org Hi! Thanks for the fixes! See 3 comments below. > diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c > index 6b38e8e66..d70b64c45 100644 > --- a/src/box/sql/expr.c > +++ b/src/box/sql/expr.c > @@ -3775,7 +3766,7 @@ sqlExprCodeTarget(Parse * pParse, Expr * pExpr, int target) > } > case TK_TRUE: > case TK_FALSE: { > - vdbe_emit_bool(pParse->pVdbe, pExpr, target); > + sqlVdbeAddOp2(v, OP_Bool, op == TK_TRUE, target); > return target; > } > >> 6. I see, that all other 'case's set pExpr->type. Why don't you >> do it here? > > It is required only for non-constant values. For literals it is assigned > in parse.y spanExpr(). 1. Then why case TK_INTEGER sets 'pExpr->type = FIELD_TYPE_INTEGER', TK_STRING sets 'pExpr->type = FIELD_TYPE_STRING;', TK_BLOB sets 'pExpr->type = FIELD_TYPE_SCALAR;' - I see all of them being set in parse.y. On the contrary, there are places which create a new Expr, and does not set type even in parse.y. For example: 1240: A.pExpr = sql_expr_new_dequoted(pParse->db, TK_INTEGER, &sqlIntTokens[N]); Here the expr is created and spanExpr() is not used. How does it work? >> 8. Looking at all these type comparison prohibitions I am getting >> afraid of how will we select from SCALAR columns? >> >> A schema: >> >> box.execute('CREATE TABLE t (id INTEGER PRIMARY KEY, a SCALAR UNIQUE);') >> box.execute("INSERT INTO t VALUES (1, 1), (2, true), (3, false), (4, 'str')") >> >> SQL select: >> >>> box.execute('SELECT * FROM t WHERE a > 1') >> --- >> - error: 'Type mismatch: can not convert INTEGER to boolean' >> ... >> >> The same Lua select: >> >>> box.space.T.index[1]:select({1}, {iterator = 'GT'}) >> --- >> - - [4, 'str'] >> ... >> >> In fact, now we can not use SCALAR in SQL for any comparisons >> because it will raise type mismatch on literally everything. >> >> What are we going to do with it? IMO, we could add a flag like >> 'is_scalar' to struct Mem in its flags section, which would allow >> to compare this value with anything. > > Konstantin suggested to extend struct Mem with field_type, > which is going to substitute this flag. > >> Looks crutchy though. And is >> not a part of this issue of course, but should be filed into the >> issue list. >> >> I see a similar issue https://github.com/tarantool/tarantool/issues/4124. >> Probably, it is worth extending it instead of opening new. > > I consider this as a trade-off of using scalar: users shouldn’t use this > type until they are really have to. Nevertheless, they still can use CAST > operator and switch case, or write their own lua-function converting values > to required type. 2. Users may have good designed schema and work in each request with only one type in such a column, but still they will get errors just because the iterator can not walk to the needed tree node. Lets modify my example: SELECT * FROM t WHERE a = 1; This request implicitly says that it will work with numbers only, but somehow its success depends on the content of the space, not even related to '1' value. In a nutshell it means that insertion of something into a space can break existing requests not related to the inserted data. It is weird. Such scalar type would be useless. >>> + * >>> + * @param str String to be converted to boolean. >>> + * Assumed to be null terminated. >>> + * @param result Resulting value of cast. >>> + * @retval 0 If string satisfies conditions above. >>> + * @retval -1 Otherwise. >>> + */ >>> +static int >>> +str_cast_to_boolean(const char *str, bool *result) >>> +{ >>> + assert(str != NULL); >>> + for (; *str == ' '; str++); >>> + size_t rest_str_len = strlen(str); >>> + if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) { >>> + *result = true; >>> + return 0; >>> + } >>> + if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) { >>> + *result = false; >>> + return 0; >>> + } >> >> 16. I guess it is far from the most efficient implementation. You >> calculate length of the whole string before comparison, scanning >> its twice in the worst and most common case. Please, consider my >> fixes below. I did not test them though, so probably there are some >> typos. >> Note, that in my fixes I've already accounted my comment about >> trimming trailing whitespaces. >> >> ============================================================================ >> @@ -625,16 +625,20 @@ str_cast_to_boolean(const char *str, bool *result) >> { >> assert(str != NULL); >> for (; *str == ' '; str++); >> - size_t rest_str_len = strlen(str); >> - if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) { >> + if (strncasecmp(str, "true", 4) == 0) { > > We can’t use strncasecmp() without verification the fact that > rest_str_len (i.e. string length after skipping leading spaces) is >= 4. 3.1. Why? According to the standard, strncasecmp() compares not more than 'n' characters taking into account terminating zero. I tried these examples to simulate our case (when 'str' points to the beginning of a possible 'true/false' token). printf("%d\n", strncasecmp("", "true", 4)); printf("%d\n", strncasecmp("t", "true", 4)); printf("%d\n", strncasecmp("tr", "true", 4)); printf("%d\n", strncasecmp("tru", "true", 4)); printf("%d\n", strncasecmp("true ", "true", 4)); I got this output: -116 -114 -117 -101 0 It means, that you do not need to call 'strlen' before 'strncasecmp'. I've dropped 'strlen' again and the tests passed. If you think that explicit length check is mandatory, then please, provide a test. Probably I'm wrong somewhere in my speculations. > >> + str += 4; >> *result = true; >> - return 0; >> - } >> - if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) { >> + } else if (strncasecmp(str, "false", 5) == 0) { >> + str += 5; >> *result = false; >> - return 0; >> + } else { >> + return -1; >> } >> - return -1; >> + for (; *str != 0; ++str) { >> + if (*str != ' ') >> + return -1; >> + } >> + return 0; >> } >> ============================================================================ > > Taking into account my comment diff is next: > > diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c > index 92406f938..19b42b992 100644 > --- a/src/box/sql/vdbemem.c > +++ b/src/box/sql/vdbemem.c > @@ -626,15 +626,20 @@ str_cast_to_boolean(const char *str, bool *result) > assert(str != NULL); > for (; *str == ' '; str++); > size_t rest_str_len = strlen(str); 3.2. My main point was that we should not call strlen() - it leads to double scan of the string. If you still need strlen(), then the fixes are useless. But consider my objectives above, about the POSIX standard. > - if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) { > + if (rest_str_len >= 4 && strncasecmp(str, "true", 4) == 0) { > *result = true; > - return 0; > - } > - if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) { > + str += 4; > + } else if (rest_str_len >= 5 && strncasecmp(str, "false", 5) == 0) { > *result = false; > - return 0; > + str += 5; > + } else { > + return -1; > } > - return -1; > + for (; *str != '\0'; ++str) { > + if (*str != ' ') > + return -1; > + } > + return 0; > } My fixes, on the branch: ================================================================== diff --git a/src/box/sql/func.c b/src/box/sql/func.c index 08bea9d75..dbbb7fe23 100644 --- a/src/box/sql/func.c +++ b/src/box/sql/func.c @@ -1102,8 +1102,8 @@ quoteFunc(sql_context * context, int argc, sql_value ** argv) break; } case MP_BOOL: { - sql_result_text(context, argv[0]->u.b ? "true" : "false", -1, - SQL_TRANSIENT); + sql_result_text(context, sql_value_boolean(argv[0]) ? + "true" : "false", -1, SQL_TRANSIENT); break; } default:{ diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h index 468c89521..1f77d34cd 100644 --- a/src/box/sql/sqlInt.h +++ b/src/box/sql/sqlInt.h @@ -583,7 +583,7 @@ int sql_column_int(sql_stmt *, int iCol); bool -sql_column_boolean(sql_stmt *stmt, int column); +sql_column_boolean(struct sql_stmt *stmt, int column); sql_int64 sql_column_int64(sql_stmt *, int iCol); diff --git a/src/box/sql/vdbeInt.h b/src/box/sql/vdbeInt.h index 73c77efed..db8458a6a 100644 --- a/src/box/sql/vdbeInt.h +++ b/src/box/sql/vdbeInt.h @@ -510,7 +510,7 @@ int sqlVdbeMemIntegerify(Mem *, bool is_forced); int sqlVdbeRealValue(Mem *, double *); int -mem_value_bool(Mem *mem, bool *b); +mem_value_bool(const struct Mem *mem, bool *b); int mem_apply_integer_type(Mem *); int sqlVdbeMemRealify(Mem *); diff --git a/src/box/sql/vdbeapi.c b/src/box/sql/vdbeapi.c index cfd65075e..9bbe6d2ad 100644 --- a/src/box/sql/vdbeapi.c +++ b/src/box/sql/vdbeapi.c @@ -991,7 +991,7 @@ sql_column_int(sql_stmt * pStmt, int i) } bool -sql_column_boolean(sql_stmt *stmt, int i) +sql_column_boolean(struct sql_stmt *stmt, int i) { bool val = sql_value_boolean(columnMem(stmt, i)); columnMallocFailure(stmt); diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c index 19b42b992..0bc1c9466 100644 --- a/src/box/sql/vdbemem.c +++ b/src/box/sql/vdbemem.c @@ -502,9 +502,9 @@ sqlVdbeRealValue(Mem * pMem, double *v) } int -mem_value_bool(Mem *mem, bool *b) +mem_value_bool(const struct Mem *mem, bool *b) { - if (mem->flags & MEM_Bool) { + if ((mem->flags & MEM_Bool) != 0) { *b = mem->u.b; return 0; } @@ -614,9 +614,9 @@ sqlVdbeMemNumerify(Mem * pMem) * For instance, " tRuE " can be successfully converted to * boolean value true. * - * @param str String to be converted to boolean. - * Assumed to be null terminated. - * @param result Resulting value of cast. + * @param str String to be converted to boolean. Assumed to be + * null terminated. + * @param[out] result Resulting value of cast. * @retval 0 If string satisfies conditions above. * @retval -1 Otherwise. */ @@ -625,11 +625,10 @@ str_cast_to_boolean(const char *str, bool *result) { assert(str != NULL); for (; *str == ' '; str++); - size_t rest_str_len = strlen(str); - if (rest_str_len >= 4 && strncasecmp(str, "true", 4) == 0) { + if (strncasecmp(str, "true", 4) == 0) { *result = true; str += 4; - } else if (rest_str_len >= 5 && strncasecmp(str, "false", 5) == 0) { + } else if (strncasecmp(str, "false", 5) == 0) { *result = false; str += 5; } else { @@ -680,7 +679,7 @@ sqlVdbeMemCast(Mem * pMem, enum field_type type) pMem->u.b = value; return 0; } - if ((pMem->flags & MEM_Bool) != 0) + if ((pMem->flags & MEM_Bool) != 0) return 0; return -1; case FIELD_TYPE_INTEGER: