From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 786CA29F82 for ; Tue, 23 Apr 2019 15:58:28 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2j8ipbdzaaD4 for ; Tue, 23 Apr 2019 15:58:28 -0400 (EDT) Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 12F9429284 for ; Tue, 23 Apr 2019 15:58:28 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: [tarantool-patches] Re: [PATCH 4/9] sql: introduce type boolean From: "n.pettik" In-Reply-To: Date: Tue, 23 Apr 2019 22:58:26 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <45E647F0-79F4-41DE-BD69-996A2D3EB7F1@tarantool.org> References: <45933291-f592-a292-59b8-2e21db3deed4@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: Vladislav Shpilevoy >> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c >> index 6b38e8e66..d70b64c45 100644 >> --- a/src/box/sql/expr.c >> +++ b/src/box/sql/expr.c >> @@ -3775,7 +3766,7 @@ sqlExprCodeTarget(Parse * pParse, Expr * pExpr, = int target) >> } >> case TK_TRUE: >> case TK_FALSE: { >> - vdbe_emit_bool(pParse->pVdbe, pExpr, target); >> + sqlVdbeAddOp2(v, OP_Bool, op =3D=3D TK_TRUE, = target); >> return target; >> } >>=20 >>> 6. I see, that all other 'case's set pExpr->type. Why don't you >>> do it here? >>=20 >> It is required only for non-constant values. For literals it is = assigned >> in parse.y spanExpr(). >=20 > 1. Then why case TK_INTEGER sets 'pExpr->type =3D FIELD_TYPE_INTEGER', > TK_STRING sets 'pExpr->type =3D FIELD_TYPE_STRING;', TK_BLOB sets > 'pExpr->type =3D FIELD_TYPE_SCALAR;' - I see all of them being set in > parse.y. Those are literals, so as I said above, their types are assigned right during parsing. If you are asking =E2=80=9Cwhy do we need to do = so?=E2=80=9D, then answer is =E2=80=9Cthey are leaf nodes and it is most appropriate moment to do so=E2=80=9D. > On the contrary, there are places which create a new Expr, and > does not set type even in parse.y. For example: >=20 > 1240: A.pExpr =3D sql_expr_new_dequoted(pParse->db, TK_INTEGER, = &sqlIntTokens[N]); Speaking of this particular example, yes, it was buggy place. I=E2=80=99ve fixed it in "sql: make predicates accept and return = boolean" patch > Here the expr is created and spanExpr() is not used. How does it > work? >=20 >>> 8. Looking at all these type comparison prohibitions I am getting >>> afraid of how will we select from SCALAR columns? >>>=20 >>> A schema: >>>=20 >>> box.execute('CREATE TABLE t (id INTEGER PRIMARY KEY, a SCALAR = UNIQUE);') >>> box.execute("INSERT INTO t VALUES (1, 1), (2, true), (3, false), = (4, 'str')") >>>=20 >>> SQL select: >>>=20 >>>> box.execute('SELECT * FROM t WHERE a > 1') >>> --- >>> - error: 'Type mismatch: can not convert INTEGER to boolean' >>> ... >>>=20 >>> The same Lua select: >>>=20 >>>> box.space.T.index[1]:select({1}, {iterator =3D 'GT'}) >>> --- >>> - - [4, 'str'] >>> ... >>>=20 >>> In fact, now we can not use SCALAR in SQL for any comparisons >>> because it will raise type mismatch on literally everything. >>>=20 >>> What are we going to do with it? IMO, we could add a flag like >>> 'is_scalar' to struct Mem in its flags section, which would allow >>> to compare this value with anything. >>=20 >> Konstantin suggested to extend struct Mem with field_type, >> which is going to substitute this flag. >>=20 >>> Looks crutchy though. And is >>> not a part of this issue of course, but should be filed into the >>> issue list. >>>=20 >>> I see a similar issue = https://github.com/tarantool/tarantool/issues/4124. >>> Probably, it is worth extending it instead of opening new. >>=20 >> I consider this as a trade-off of using scalar: users shouldn=E2=80=99t= use this >> type until they are really have to. Nevertheless, they still can use = CAST >> operator and switch case, or write their own lua-function converting = values >> to required type. >=20 > 2. Users may have good designed schema and work in each request with = only > one type in such a column, but still they will get errors just because > the iterator can not walk to the needed tree node. Lets modify my = example: >=20 > SELECT * FROM t WHERE a =3D 1; >=20 > This request implicitly says that it will work with numbers only, but > somehow its success depends on the content of the space, not even > related to '1' value. In a nutshell it means that insertion of = something > into a space can break existing requests not related to the inserted > data. It is weird. Such scalar type would be useless. Yep, we can observe the same behaviour in DB2. For instance: db2 =3D> create table t (id varchar(10)) DB20000I The SQL command completed successfully. db2 =3D> insert into t values(1) DB20000I The SQL command completed successfully. db2 =3D> select id from t where id =3D 1 ID =20 ---------- 1 =20 1 record(s) selected. db2 =3D> insert into t values('abc=E2=80=99) DB20000I The SQL command completed successfully. db2 =3D> insert into t values(1) DB20000I The SQL command completed successfully. db2 =3D> select id from t where id =3D 1 ID =20 ---------- 1 =20 SQL0420N Invalid character found in a character string argument of the=20= function "DECFLOAT". SQLSTATE=3D22018 After insertion of unconvertible to integer value =E2=80=98abc=E2=80=99, select starts to fail. Note that DB2 is considered to be most compatible with ANSI according to members of our server team. >>>> + * >>>> + * @param str String to be converted to boolean. >>>> + * Assumed to be null terminated. >>>> + * @param result Resulting value of cast. >>>> + * @retval 0 If string satisfies conditions above. >>>> + * @retval -1 Otherwise. >>>> + */ >>>> +static int >>>> +str_cast_to_boolean(const char *str, bool *result) >>>> +{ >>>> + assert(str !=3D NULL); >>>> + for (; *str =3D=3D ' '; str++); >>>> + size_t rest_str_len =3D strlen(str); >>>> + if (rest_str_len =3D=3D 4 && strncasecmp(str, "true", 4) =3D=3D = 0) { >>>> + *result =3D true; >>>> + return 0; >>>> + } >>>> + if (rest_str_len =3D=3D 5 && strncasecmp(str, "false", 5) =3D=3D = 0) { >>>> + *result =3D false; >>>> + return 0; >>>> + } >>>=20 >>> 16. I guess it is far from the most efficient implementation. You >>> calculate length of the whole string before comparison, scanning >>> its twice in the worst and most common case. Please, consider my >>> fixes below. I did not test them though, so probably there are some >>> typos. >>> Note, that in my fixes I've already accounted my comment about >>> trimming trailing whitespaces. >>>=20 >>> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D >>> @@ -625,16 +625,20 @@ str_cast_to_boolean(const char *str, bool = *result) >>> { >>> assert(str !=3D NULL); >>> for (; *str =3D=3D ' '; str++); >>> - size_t rest_str_len =3D strlen(str); >>> - if (rest_str_len =3D=3D 4 && strncasecmp(str, "true", 4) =3D=3D = 0) { >>> + if (strncasecmp(str, "true", 4) =3D=3D 0) { >>=20 >> We can=E2=80=99t use strncasecmp() without verification the fact that >> rest_str_len (i.e. string length after skipping leading spaces) is >=3D= 4. >=20 > 3.1. Why? According to the standard, strncasecmp() compares not more = than > 'n' characters taking into account terminating zero. I tried these > examples to simulate our case (when 'str' points to the beginning of a > possible 'true/false' token). >=20 > printf("%d\n", strncasecmp("", "true", 4)); > printf("%d\n", strncasecmp("t", "true", 4)); > printf("%d\n", strncasecmp("tr", "true", 4)); > printf("%d\n", strncasecmp("tru", "true", 4)); > printf("%d\n", strncasecmp("true ", "true", 4)); >=20 > I got this output: >=20 > -116 > -114 > -117 > -101 > 0 >=20 > It means, that you do not need to call 'strlen' before > 'strncasecmp'. I've dropped 'strlen' again and the tests > passed. If you think that explicit length check is mandatory, > then please, provide a test. Probably I'm wrong somewhere in > my speculations. Well, you are right: I missed the fact that it compares UP to n symbols. I=E2=80=99m sorry to have confused you. >> - if (rest_str_len =3D=3D 4 && strncasecmp(str, "true", 4) =3D=3D= 0) { >> + if (rest_str_len >=3D 4 && strncasecmp(str, "true", 4) =3D=3D = 0) { >> *result =3D true; >> - return 0; >> - } >> - if (rest_str_len =3D=3D 5 && strncasecmp(str, "false", 5) =3D=3D= 0) { >> + str +=3D 4; >> + } else if (rest_str_len >=3D 5 && strncasecmp(str, "false", = 5) =3D=3D 0) { >> *result =3D false; >> - return 0; >> + str +=3D 5; >> + } else { >> + return -1; >> } >> - return -1; >> + for (; *str !=3D '\0'; ++str) { >> + if (*str !=3D ' ') >> + return -1; >> + } >> + return 0; >> } >=20 > My fixes, on the branch: Ok, applied.