From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 18ADB2A3C6 for ; Tue, 16 Apr 2019 10:12:49 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ghlCRyaGNjrS for ; Tue, 16 Apr 2019 10:12:48 -0400 (EDT) Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 9AD6C29F39 for ; Tue, 16 Apr 2019 10:12:48 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH 4/9] sql: introduce type boolean References: From: Vladislav Shpilevoy Message-ID: <45933291-f592-a292-59b8-2e21db3deed4@tarantool.org> Date: Tue, 16 Apr 2019 17:12:46 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org, Nikita Pettik Cc: kostja@tarantool.org Thanks for the patch! See 17 comments below. On 14/04/2019 18:04, Nikita Pettik wrote: > This patch introduces basic facilities to operate on boolean type: > boolean literals "true" and "false" where true > false; alias to null - > unknown; column type "BOOLEAN" and shortcut "BOOL"; opportunity to > insert and select boolean values from table; OR and AND predicates > accept boolean arguments; CAST operation involving boolean type; > comparison between boolean values (including VDBE sorter routines). > > Part of #3648 > --- > extra/mkkeywordhash.c | 5 + > src/box/execute.c | 9 + > src/box/lua/lua_sql.c | 3 + > src/box/sql/build.c | 11 +- > src/box/sql/expr.c | 14 + > src/box/sql/func.c | 15 ++ > src/box/sql/parse.y | 15 ++ > src/box/sql/sqlInt.h | 6 + > src/box/sql/vdbe.c | 34 ++- > src/box/sql/vdbeInt.h | 6 +- > src/box/sql/vdbeapi.c | 16 ++ > src/box/sql/vdbeaux.c | 31 ++- > src/box/sql/vdbemem.c | 73 +++++- > test/sql-tap/whereG.test.lua | 6 +- > test/sql/types.result | 601 ++++++++++++++++++++++++++++++++++++++++++- > test/sql/types.test.lua | 132 ++++++++++ > 16 files changed, 952 insertions(+), 25 deletions(-) > > diff --git a/src/box/sql/build.c b/src/box/sql/build.c > index 7724e9415..c6fbb1af6 100644 > --- a/src/box/sql/build.c > +++ b/src/box/sql/build.c > @@ -958,8 +955,7 @@ emitNewSysSpaceSequenceRecord(Parse *pParse, int space_id, const char reg_seq_id > sqlVdbeAddOp2(v, OP_IntCopy, reg_seq_id, first_col + 2); > > /* 3. True, which is 1 in SQL */ 1. I guess, it is not '1' anymore. > - sqlVdbeAddOp2(v, OP_Bool, 0, first_col + 3); > - sqlVdbeChangeP4(v, -1, (char*)&const_true, P4_BOOL); > + sqlVdbeAddOp2(v, OP_Bool, true, first_col + 3); > > sqlVdbeAddOp3(v, OP_MakeRecord, first_col + 1, 3, first_col); > > diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c > index 4b98bd175..6b38e8e66 100644 > --- a/src/box/sql/expr.c > +++ b/src/box/sql/expr.c > @@ -3360,6 +3360,15 @@ expr_code_int(struct Parse *parse, struct Expr *expr, bool is_neg, > } > } > 2. It is worth adding a comment. At a first glance I did not understand what is 'mem', and what 'expr' is expected and allowed to store - str, int, float? > +static void > +vdbe_emit_bool(struct Vdbe *v, const struct Expr *expr, int mem) > +{ > + const char *z = expr->u.zToken; 3. It would be safer to add an assertion here that expr actually stores a string only. 4. The function is quite small, and is called in one place only. Mark it 'inline'. > + assert(z != NULL); > + bool val = strncasecmp(z, "TRUE", 4) == 0; > + sqlVdbeAddOp2(v, OP_Bool, val, mem); > +} > + > /* > * Erase column-cache entry number i > */ > @@ -3764,6 +3773,11 @@ sqlExprCodeTarget(Parse * pParse, Expr * pExpr, int target) > expr_code_int(pParse, pExpr, false, target); > return target; > } > + case TK_TRUE: > + case TK_FALSE: { > + vdbe_emit_bool(pParse->pVdbe, pExpr, target); 5. Correct me, if I am wrong but there is how I understand boolean value emission. - Parser scans a literal. It marks 'true' with TK_TRUE, 'false' with TK_FALSE (taking into account case-insensitivity); - Here you merge TK_FALSE and TK_TRUE into a single 'case' statement; - Inside that statement you call vdbe_emit_bool, which again parses the string for being equal 'true' of 'false'. So why? Since the first step you already have the boolean value. The value is '== TK_TRUE'. You do not need to use strcasecmp here. Just split these 'case' statements, and make vdbe_emit_bool() taking a boolean value instead of expr. 6. I see, that all other 'case's set pExpr->type. Why don't you do it here? > + return target; > + } > #ifndef SQL_OMIT_FLOATING_POINT > case TK_FLOAT:{ > pExpr->type = FIELD_TYPE_INTEGER; > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 3cdb119c8..860cd8920 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -1091,6 +1101,11 @@ quoteFunc(sql_context * context, int argc, sql_value ** argv) > } > break; > } > + case MP_BOOL: { > + sql_result_text(context, argv[0]->u.b ? "true" : "false", -1, > + SQL_TRANSIENT); > + break; > + } > default:{ > assert(sql_value_mp_type(argv[0]) == MP_NIL); > sql_result_text(context, "NULL", 4, SQL_STATIC); 7. In the same file lengthFunc() now accepts boolean value. But obviously it should not, bool is not string. It exacerbates 4159 issue. > diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h > index 9134f767d..4f62c2782 100644 > --- a/src/box/sql/sqlInt.h > +++ b/src/box/sql/sqlInt.h > @@ -2161,6 +2167,24 @@ case OP_Ge: { /* same as TK_GE, jump, in1, in3 */ > } > break; > } > + } else if ((flags1 | flags3) & MEM_Bool) { > + /* > + * If one of values is of type BOOLEAN, then the > + * second one must be BOOLEAN as well. Otherwise > + * an error is raised. > + */ > + bool is_bool_type_arg1 = flags1 & MEM_Bool; > + bool is_bool_type_arg3 = flags3 & MEM_Bool; > + if (! is_bool_type_arg1 || ! is_bool_type_arg3) { > + char *inconsistent_type = ! is_bool_type_arg1 ? > + mem_type_to_str(pIn1) : > + mem_type_to_str(pIn3); > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + inconsistent_type, "boolean"); > + rc = SQL_TARANTOOL_ERROR; > + goto abort_due_to_error; > + } > + res = sqlMemCompare(pIn3, pIn1, NULL); > } else { 8. Looking at all these type comparison prohibitions I am getting afraid of how will we select from SCALAR columns? A schema: box.execute('CREATE TABLE t (id INTEGER PRIMARY KEY, a SCALAR UNIQUE);') box.execute("INSERT INTO t VALUES (1, 1), (2, true), (3, false), (4, 'str')") SQL select: > box.execute('SELECT * FROM t WHERE a > 1') --- - error: 'Type mismatch: can not convert INTEGER to boolean' ... The same Lua select: > box.space.T.index[1]:select({1}, {iterator = 'GT'}) --- - - [4, 'str'] ... In fact, now we can not use SCALAR in SQL for any comparisons because it will raise type mismatch on literally everything. What are we going to do with it? IMO, we could add a flag like 'is_scalar' to struct Mem in its flags section, which would allow to compare this value with anything. Looks crutchy though. And is not a part of this issue of course, but should be filed into the issue list. I see a similar issue https://github.com/tarantool/tarantool/issues/4124. Probably, it is worth extending it instead of opening new. > enum field_type type = pOp->p5 & FIELD_TYPE_MASK; > if (sql_type_is_numeric(type)) { > @@ -2414,6 +2438,8 @@ case OP_Or: { /* same as TK_OR, in1, in2, out3 */ > pIn1 = &aMem[pOp->p1]; > if (pIn1->flags & MEM_Null) { > v1 = 2; > + } else if ((pIn1->flags & MEM_Bool) != 0) { > + v1 = pIn1->u.b; > } else { > int64_t i; > if (sqlVdbeIntValue(pIn1, &i) != 0) { > @@ -2427,6 +2453,8 @@ case OP_Or: { /* same as TK_OR, in1, in2, out3 */ > pIn2 = &aMem[pOp->p2]; > if (pIn2->flags & MEM_Null) { > v2 = 2; > + } else if ((pIn2->flags & MEM_Bool) != 0) { > + v2 = pIn2->u.b; > } else { > int64_t i; > if (sqlVdbeIntValue(pIn2, &i) != 0) { > @@ -2540,6 +2568,8 @@ case OP_IfNot: { /* jump, in1 */ > pIn1 = &aMem[pOp->p1]; > if (pIn1->flags & MEM_Null) { > c = pOp->p3; > + } else if ((pIn1->flags & MEM_Bool) != 0) { > + c = pOp->opcode==OP_IfNot ? ! pIn1->u.b : pIn1->u.b; > } else { > double v; > if (sqlVdbeRealValue(pIn1, &v) != 0) { > diff --git a/src/box/sql/vdbeapi.c b/src/box/sql/vdbeapi.c > index 6e867ca84..e1302afc0 100644 > --- a/src/box/sql/vdbeapi.c > +++ b/src/box/sql/vdbeapi.c > @@ -982,6 +990,14 @@ sql_column_int(sql_stmt * pStmt, int i) > return val; > } > > +bool > +sql_column_boolean(sql_stmt *stmt, int i) > +{ > + bool val = sql_value_boolean(columnMem(stmt, i)); > + columnMallocFailure(stmt); > + return val; > +} > + > sql_int64 > sql_column_int64(sql_stmt * pStmt, int i) > { 9. You did not update sql_bind_value(), now it binds NULL on any boolean value. On the other hand it is not used at all and probably could be removed. > diff --git a/src/box/sql/vdbeaux.c b/src/box/sql/vdbeaux.c > index 0cc3c1487..0f56028e5 100644 > --- a/src/box/sql/vdbeaux.c > +++ b/src/box/sql/vdbeaux.c > @@ -3383,6 +3383,17 @@ sqlMemCompare(const Mem * pMem1, const Mem * pMem2, const struct coll * pColl) > return (f2 & MEM_Null) - (f1 & MEM_Null); > } > > + if (combined_flags & MEM_Bool) { 10. Sorry for nit, but we use explicit != 0 in conditions. > + if ((f1 & f2 & MEM_Bool) != 0) { > + if (pMem1->u.b == pMem2->u.b) > + return 0; > + if (pMem1->u.b) > + return 1; > + return -1; > + } > + return -1; > + } > + > /* At least one of the two values is a number > */ > if (combined_flags & (MEM_Int | MEM_Real)) { > @@ -3561,10 +3572,16 @@ sqlVdbeCompareMsgpack(const char **key1, > break; > } > case MP_BOOL:{ > - assert((unsigned char)(*aKey1) == 0xc2 > - || (unsigned char)*aKey1 == 0xc3); > - mem1.u.i = (unsigned)(size_t) aKey1++ - 0xc2; > - goto do_int; > + 11. Stray empty line. > + mem1.u.b = mp_decode_bool(&aKey1); > + if ((pKey2->flags & MEM_Bool) != 0) { > + if (mem1.u.b != pKey2->u.b) { > + rc = mem1.u.b ? 1 : -1; > + } 12. Usually we omit curly braces when body of a cycle or a condition is one line. > + } else { > + rc = (pKey2->flags & MEM_Null) != 0 ? +1 : -1; 13. '+1' ? Sorry, we do not use prefix '+'. > + } > + break; > } > case MP_UINT:{ > uint64_t v = mp_decode_uint(&aKey1); > diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c > index 15a2f55cb..4fed0eefe 100644 > --- a/src/box/sql/vdbemem.c > +++ b/src/box/sql/vdbemem.c > @@ -499,6 +501,16 @@ sqlVdbeRealValue(Mem * pMem, double *v) > return -1; > } > > +int > +vdbe_value_boolean(Mem *mem, bool *b) > +{ > + if (mem->flags & MEM_Bool) { > + *b = mem->u.b; > + return 0; > + } > + return -1; > +} 14. Probably it is better to name it by Mem, not by Vdbe, because it does not take a Vdbe pointer. And similar functions contract type names: 'boolean' -> 'bool', 'integer' -> 'int', etc. In summary, 'mem_value_bool' for example. > + > /* > * The MEM structure is already a MEM_Real. Try to also make it a > * MEM_Int if we can. > @@ -594,6 +606,37 @@ sqlVdbeMemNumerify(Mem * pMem) > return SQL_OK; > } > > +/** > + * According to ANSI SQL string value can be converted to boolean > + * type if string consists of literal "true" or "false" and > + * number of leading spaces. > + * > + * For instance, " tRuE" can be successfully converted to > + * boolean value true. 15. So ' true' is valid, but 'true ' is not? As I see the standard, it is not so. Cite: """ If TD is boolean, then Case: a) If SD is character string, then SV is replaced by TRIM ( BOTH ' ' FROM VE ) Case: i) If the rules for in Subclause 5.3, “”, can be applied to SV to determine a valid value of the data type TD, then let TV be that value. ii) Otherwise, an exception condition is raised: data exception — invalid character value for cast. b) If SD is boolean, then TV is SV. """ TD - cast type, SD - type of cast target, VE,SV - cast target, TV - cast result value. The key line: "TRIM ( BOTH ' ' FROM VE )" -> 'BOTH'. Spaces are trimmed off from both sides. > + * > + * @param str String to be converted to boolean. > + * Assumed to be null terminated. > + * @param result Resulting value of cast. > + * @retval 0 If string satisfies conditions above. > + * @retval -1 Otherwise. > + */ > +static int > +str_cast_to_boolean(const char *str, bool *result) > +{ > + assert(str != NULL); > + for (; *str == ' '; str++); > + size_t rest_str_len = strlen(str); > + if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) { > + *result = true; > + return 0; > + } > + if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) { > + *result = false; > + return 0; > + } 16. I guess it is far from the most efficient implementation. You calculate length of the whole string before comparison, scanning its twice in the worst and most common case. Please, consider my fixes below. I did not test them though, so probably there are some typos. Note, that in my fixes I've already accounted my comment about trimming trailing whitespaces. ============================================================================ @@ -625,16 +625,20 @@ str_cast_to_boolean(const char *str, bool *result) { assert(str != NULL); for (; *str == ' '; str++); - size_t rest_str_len = strlen(str); - if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) { + if (strncasecmp(str, "true", 4) == 0) { + str += 4; *result = true; - return 0; - } - if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) { + } else if (strncasecmp(str, "false", 5) == 0) { + str += 5; *result = false; - return 0; + } else { + return -1; } - return -1; + for (; *str != 0; ++str) { + if (*str != ' ') + return -1; + } + return 0; } ============================================================================ > + return -1; > +} > + > /* > * Cast the datatype of the value in pMem according to the type > * @type. Casting is different from applying type in that a cast > diff --git a/test/sql/types.result b/test/sql/types.result > index 0a5085658..3aa0169e2 100644 > --- a/test/sql/types.result > +++ b/test/sql/types.result > +box.execute("SELECT unknown;") > +--- > +- metadata: > + - name: unknown > + type: scalar 17. Should not it be boolean? > + rows: > + - [null] > +...