[tarantool-patches] Re: [PATCH 4/9] sql: introduce type boolean

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Tue Apr 16 17:12:46 MSK 2019


Thanks for the patch! See 17 comments below.

On 14/04/2019 18:04, Nikita Pettik wrote:
> This patch introduces basic facilities to operate on boolean type:
> boolean literals "true" and "false" where true > false; alias to null -
> unknown; column type "BOOLEAN" and shortcut "BOOL"; opportunity to
> insert and select boolean values from table; OR and AND predicates
> accept boolean arguments; CAST operation involving boolean type;
> comparison between boolean values (including VDBE sorter routines).
> 
> Part of #3648
> ---
>  extra/mkkeywordhash.c        |   5 +
>  src/box/execute.c            |   9 +
>  src/box/lua/lua_sql.c        |   3 +
>  src/box/sql/build.c          |  11 +-
>  src/box/sql/expr.c           |  14 +
>  src/box/sql/func.c           |  15 ++
>  src/box/sql/parse.y          |  15 ++
>  src/box/sql/sqlInt.h         |   6 +
>  src/box/sql/vdbe.c           |  34 ++-
>  src/box/sql/vdbeInt.h        |   6 +-
>  src/box/sql/vdbeapi.c        |  16 ++
>  src/box/sql/vdbeaux.c        |  31 ++-
>  src/box/sql/vdbemem.c        |  73 +++++-
>  test/sql-tap/whereG.test.lua |   6 +-
>  test/sql/types.result        | 601 ++++++++++++++++++++++++++++++++++++++++++-
>  test/sql/types.test.lua      | 132 ++++++++++
>  16 files changed, 952 insertions(+), 25 deletions(-)
> 
> diff --git a/src/box/sql/build.c b/src/box/sql/build.c
> index 7724e9415..c6fbb1af6 100644
> --- a/src/box/sql/build.c
> +++ b/src/box/sql/build.c
> @@ -958,8 +955,7 @@ emitNewSysSpaceSequenceRecord(Parse *pParse, int space_id, const char reg_seq_id
>  	sqlVdbeAddOp2(v, OP_IntCopy, reg_seq_id, first_col + 2);
>  
>  	/* 3. True, which is 1 in SQL  */

1. I guess, it is not '1' anymore.

> -	sqlVdbeAddOp2(v, OP_Bool, 0, first_col + 3);
> -	sqlVdbeChangeP4(v, -1, (char*)&const_true, P4_BOOL);
> +	sqlVdbeAddOp2(v, OP_Bool, true, first_col + 3);
>  
>  	sqlVdbeAddOp3(v, OP_MakeRecord, first_col + 1, 3, first_col);
>  
> diff --git a/src/box/sql/expr.c b/src/box/sql/expr.c
> index 4b98bd175..6b38e8e66 100644
> --- a/src/box/sql/expr.c
> +++ b/src/box/sql/expr.c
> @@ -3360,6 +3360,15 @@ expr_code_int(struct Parse *parse, struct Expr *expr, bool is_neg,
>  	}
>  }
>  

2. It is worth adding a comment. At a first glance I did not
understand what is 'mem', and what 'expr' is expected and allowed
to store - str, int, float?

> +static void
> +vdbe_emit_bool(struct Vdbe *v, const struct Expr *expr, int mem)
> +{
> +	const char *z = expr->u.zToken;

3. It would be safer to add an assertion here that expr actually
stores a string only.

4. The function is quite small, and is called in one place
only. Mark it 'inline'.

> +	assert(z != NULL);
> +	bool val = strncasecmp(z, "TRUE", 4) == 0;
> +	sqlVdbeAddOp2(v, OP_Bool, val, mem);
> +}
> +
>  /*
>   * Erase column-cache entry number i
>   */
> @@ -3764,6 +3773,11 @@ sqlExprCodeTarget(Parse * pParse, Expr * pExpr, int target)
>  			expr_code_int(pParse, pExpr, false, target);
>  			return target;
>  		}
> +	case TK_TRUE:
> +	case TK_FALSE: {
> +			vdbe_emit_bool(pParse->pVdbe, pExpr, target);

5. Correct me, if I am wrong but there is how I understand
boolean value emission.

 - Parser scans a literal. It marks 'true' with TK_TRUE, 'false'
   with TK_FALSE (taking into account case-insensitivity);

 - Here you merge TK_FALSE and TK_TRUE into a single 'case'
   statement;

 - Inside that statement you call vdbe_emit_bool, which again
   parses the string for being equal 'true' of 'false'.

So why? Since the first step you already have the boolean value.
The value is '== TK_TRUE'. You do not need to use strcasecmp here.
Just split these 'case' statements, and make vdbe_emit_bool()
taking a boolean value instead of expr.

6. I see, that all other 'case's set pExpr->type. Why don't you
do it here?

> +			return target;
> +		}
>  #ifndef SQL_OMIT_FLOATING_POINT
>  	case TK_FLOAT:{
>  			pExpr->type = FIELD_TYPE_INTEGER;
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index 3cdb119c8..860cd8920 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -1091,6 +1101,11 @@ quoteFunc(sql_context * context, int argc, sql_value ** argv)
>  			}
>  			break;
>  		}
> +	case MP_BOOL: {
> +		sql_result_text(context, argv[0]->u.b ? "true" : "false", -1,
> +				SQL_TRANSIENT);
> +		break;
> +	}
>  	default:{
>  			assert(sql_value_mp_type(argv[0]) == MP_NIL);
>  			sql_result_text(context, "NULL", 4, SQL_STATIC);

7. In the same file lengthFunc() now accepts boolean value. But obviously
it should not, bool is not string. It exacerbates 4159 issue.

> diff --git a/src/box/sql/sqlInt.h b/src/box/sql/sqlInt.h
> index 9134f767d..4f62c2782 100644
> --- a/src/box/sql/sqlInt.h
> +++ b/src/box/sql/sqlInt.h
> @@ -2161,6 +2167,24 @@ case OP_Ge: {             /* same as TK_GE, jump, in1, in3 */
>  			}
>  			break;
>  		}
> +	} else if ((flags1 | flags3) & MEM_Bool) {
> +		/*
> +		 * If one of values is of type BOOLEAN, then the
> +		 * second one must be BOOLEAN as well. Otherwise
> +		 * an error is raised.
> +		 */
> +		bool is_bool_type_arg1 = flags1 & MEM_Bool;
> +		bool is_bool_type_arg3 = flags3 & MEM_Bool;
> +		if (! is_bool_type_arg1 || ! is_bool_type_arg3) {
> +			char *inconsistent_type = ! is_bool_type_arg1 ?
> +						  mem_type_to_str(pIn1) :
> +						  mem_type_to_str(pIn3);
> +			diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +				 inconsistent_type, "boolean");
> +			rc = SQL_TARANTOOL_ERROR;
> +			goto abort_due_to_error;
> +		}
> +		res = sqlMemCompare(pIn3, pIn1, NULL);
>  	} else {

8. Looking at all these type comparison prohibitions I am getting
afraid of how will we select from SCALAR columns?

A schema:

    box.execute('CREATE TABLE t (id INTEGER PRIMARY KEY, a SCALAR UNIQUE);')
    box.execute("INSERT INTO t VALUES (1, 1), (2, true), (3, false), (4, 'str')")

SQL select:

    > box.execute('SELECT * FROM t WHERE a > 1')
    ---
    - error: 'Type mismatch: can not convert INTEGER to boolean'
    ...

The same Lua select:

    > box.space.T.index[1]:select({1}, {iterator = 'GT'})
    ---
    - - [4, 'str']
    ...

In fact, now we can not use SCALAR in SQL for any comparisons
because it will raise type mismatch on literally everything.

What are we going to do with it? IMO, we could add a flag like
'is_scalar' to struct Mem in its flags section, which would allow
to compare this value with anything. Looks crutchy though. And is
not a part of this issue of course, but should be filed into the
issue list.

I see a similar issue https://github.com/tarantool/tarantool/issues/4124.
Probably, it is worth extending it instead of opening new.

>  		enum field_type type = pOp->p5 & FIELD_TYPE_MASK;
>  		if (sql_type_is_numeric(type)) {
> @@ -2414,6 +2438,8 @@ case OP_Or: {             /* same as TK_OR, in1, in2, out3 */
>  	pIn1 = &aMem[pOp->p1];
>  	if (pIn1->flags & MEM_Null) {
>  		v1 = 2;
> +	} else if ((pIn1->flags & MEM_Bool) != 0) {
> +		v1 = pIn1->u.b;
>  	} else {
>  		int64_t i;
>  		if (sqlVdbeIntValue(pIn1, &i) != 0) {
> @@ -2427,6 +2453,8 @@ case OP_Or: {             /* same as TK_OR, in1, in2, out3 */
>  	pIn2 = &aMem[pOp->p2];
>  	if (pIn2->flags & MEM_Null) {
>  		v2 = 2;
> +	} else if ((pIn2->flags & MEM_Bool) != 0) {
> +		v2 = pIn2->u.b;
>  	} else {
>  		int64_t i;
>  		if (sqlVdbeIntValue(pIn2, &i) != 0) {
> @@ -2540,6 +2568,8 @@ case OP_IfNot: {            /* jump, in1 */
>  	pIn1 = &aMem[pOp->p1];
>  	if (pIn1->flags & MEM_Null) {
>  		c = pOp->p3;
> +	} else if ((pIn1->flags & MEM_Bool) != 0) {
> +		c = pOp->opcode==OP_IfNot ? ! pIn1->u.b : pIn1->u.b;
>  	} else {
>  		double v;
>  		if (sqlVdbeRealValue(pIn1, &v) != 0) {
> diff --git a/src/box/sql/vdbeapi.c b/src/box/sql/vdbeapi.c
> index 6e867ca84..e1302afc0 100644
> --- a/src/box/sql/vdbeapi.c
> +++ b/src/box/sql/vdbeapi.c
> @@ -982,6 +990,14 @@ sql_column_int(sql_stmt * pStmt, int i)
>  	return val;
>  }
>  
> +bool
> +sql_column_boolean(sql_stmt *stmt, int i)
> +{
> +	bool val = sql_value_boolean(columnMem(stmt, i));
> +	columnMallocFailure(stmt);
> +	return val;
> +}
> +
>  sql_int64
>  sql_column_int64(sql_stmt * pStmt, int i)
>  {

9. You did not update sql_bind_value(), now it binds NULL
on any boolean value. On the other hand it is not used at
all and probably could be removed.

> diff --git a/src/box/sql/vdbeaux.c b/src/box/sql/vdbeaux.c
> index 0cc3c1487..0f56028e5 100644
> --- a/src/box/sql/vdbeaux.c
> +++ b/src/box/sql/vdbeaux.c
> @@ -3383,6 +3383,17 @@ sqlMemCompare(const Mem * pMem1, const Mem * pMem2, const struct coll * pColl)
>  		return (f2 & MEM_Null) - (f1 & MEM_Null);
>  	}
>  
> +	if (combined_flags & MEM_Bool) {

10. Sorry for nit, but we use explicit != 0 in conditions.

> +		if ((f1 & f2 & MEM_Bool) != 0) {
> +			if (pMem1->u.b == pMem2->u.b)
> +				return 0;
> +			if (pMem1->u.b)
> +				return 1;
> +			return -1;
> +		}
> +		return -1;
> +	}
> +
>  	/* At least one of the two values is a number
>  	 */
>  	if (combined_flags & (MEM_Int | MEM_Real)) {
> @@ -3561,10 +3572,16 @@ sqlVdbeCompareMsgpack(const char **key1,
>  			break;
>  		}
>  	case MP_BOOL:{
> -			assert((unsigned char)(*aKey1) == 0xc2
> -			       || (unsigned char)*aKey1 == 0xc3);
> -			mem1.u.i = (unsigned)(size_t) aKey1++ - 0xc2;
> -			goto do_int;
> +

11. Stray empty line.

> +			mem1.u.b = mp_decode_bool(&aKey1);
> +			if ((pKey2->flags & MEM_Bool) != 0) {
> +				if (mem1.u.b != pKey2->u.b) {
> +					rc = mem1.u.b ? 1 : -1;
> +				}

12. Usually we omit curly braces when body of a cycle or a condition
is one line.

> +			} else {
> +				rc = (pKey2->flags & MEM_Null) != 0 ? +1 : -1;

13. '+1' ? Sorry, we do not use prefix '+'.

> +			}
> +			break;
>  		}
>  	case MP_UINT:{
>  			uint64_t v = mp_decode_uint(&aKey1);
> diff --git a/src/box/sql/vdbemem.c b/src/box/sql/vdbemem.c
> index 15a2f55cb..4fed0eefe 100644
> --- a/src/box/sql/vdbemem.c
> +++ b/src/box/sql/vdbemem.c
> @@ -499,6 +501,16 @@ sqlVdbeRealValue(Mem * pMem, double *v)
>  	return -1;
>  }
>  
> +int
> +vdbe_value_boolean(Mem *mem, bool *b)
> +{
> +	if (mem->flags  & MEM_Bool) {
> +		*b = mem->u.b;
> +		return 0;
> +	}
> +	return -1;
> +}

14. Probably it is better to name it by Mem, not by
Vdbe, because it does not take a Vdbe pointer. And
similar functions contract type names: 'boolean' -> 'bool',
'integer' -> 'int', etc. In summary, 'mem_value_bool'
for example.

> +
>  /*
>   * The MEM structure is already a MEM_Real.  Try to also make it a
>   * MEM_Int if we can.
> @@ -594,6 +606,37 @@ sqlVdbeMemNumerify(Mem * pMem)
>  	return SQL_OK;
>  }
>  
> +/**
> + * According to ANSI SQL string value can be converted to boolean
> + * type if string consists of literal "true" or "false" and
> + * number of leading spaces.
> + *
> + * For instance, "   tRuE" can be successfully converted to
> + * boolean value true.
15. So '   true' is valid, but 'true    ' is not? As I see the
standard, it is not so. Cite:

"""
    If TD is boolean, then

    Case:

        a) If SD is character string, then SV is replaced by

            TRIM ( BOTH ' ' FROM VE )

           Case:

            i)  If the rules for <literal> in Subclause 5.3, “<literal>”,
                can be applied to SV to determine a valid value of the
                data type TD, then let TV be that value.

            ii) Otherwise, an exception condition is raised: data
                exception — invalid character value for cast.

        b) If SD is boolean, then TV is SV.
"""

TD - cast type,
SD - type of cast target,
VE,SV - cast target,
TV - cast result value.

The key line: "TRIM ( BOTH ' ' FROM VE )" -> 'BOTH'.
Spaces are trimmed off from both sides.

> + *
> + * @param str String to be converted to boolean.
> + *            Assumed to be null terminated.
> + * @param result Resulting value of cast.
> + * @retval 0 If string satisfies conditions above.
> + * @retval -1 Otherwise.
> + */
> +static int
> +str_cast_to_boolean(const char *str, bool *result)
> +{
> +	assert(str != NULL);
> +	for (; *str == ' '; str++);
> +	size_t rest_str_len = strlen(str);
> +	if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
> +		*result = true;
> +		return 0;
> +	}
> +	if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
> +		*result = false;
> +		return 0;
> +	}

16. I guess it is far from the most efficient implementation. You
calculate length of the whole string before comparison, scanning
its twice in the worst and most common case. Please, consider my
fixes below. I did not test them though, so probably there are some
typos.

Note, that in my fixes I've already accounted my comment about
trimming trailing whitespaces.

============================================================================
@@ -625,16 +625,20 @@ str_cast_to_boolean(const char *str, bool *result)
 {
 	assert(str != NULL);
 	for (; *str == ' '; str++);
-	size_t rest_str_len = strlen(str);
-	if (rest_str_len == 4 && strncasecmp(str, "true", 4) == 0) {
+	if (strncasecmp(str, "true", 4) == 0) {
+		str += 4;
 		*result = true;
-		return 0;
-	}
-	if (rest_str_len == 5 && strncasecmp(str, "false", 5) == 0) {
+	} else if (strncasecmp(str, "false", 5) == 0) {
+		str += 5;
 		*result = false;
-		return 0;
+	} else {
+		return -1;
 	}
-	return -1;
+	for (; *str != 0; ++str) {
+		if (*str != ' ')
+			return -1;
+	}
+	return 0;
 }
============================================================================

> +	return -1;
> +}
> +
>  /*
>   * Cast the datatype of the value in pMem according to the type
>   * @type.  Casting is different from applying type in that a cast
> diff --git a/test/sql/types.result b/test/sql/types.result
> index 0a5085658..3aa0169e2 100644
> --- a/test/sql/types.result
> +++ b/test/sql/types.result
> +box.execute("SELECT unknown;")
> +---
> +- metadata:
> +  - name: unknown
> +    type: scalar

17. Should not it be boolean?

> +  rows:
> +  - [null]
> +...




More information about the Tarantool-patches mailing list