[Tarantool-patches] [PATCH v1 1/1] sql: check arguments types of built-in functions

Fri Oct 16 00:23:00 MSK 2020

Hi! Thanks for the fixes!

On 14.10.2020 00:44, Mergen Imeev wrote:
> Hi! Thank you for the review! My answers, diff and new patch below.
> 
> On Fri, Oct 09, 2020 at 11:08:32PM +0200, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>>
>> See 12 comments/questions below.
>>
>> 1.
>>
>> 	tarantool> box.execute("SELECT GREATEST(1, '2')")
>> 	---
>> 	- metadata:
>> 	  - name: COLUMN_1
>> 	    type: scalar
>> 	  rows:
>> 	  - ['2']
>> 	...
>>
>> Is it supposed to happen? Shouldn't the function check that all
>> arguments are of the same type? The same for LEAST().
>>
> At the moment LEAST() ang GREATEST() works using SCALAR rules from BOX for
> comparison. Not sure if this is right, though.

Isn't this patch aimed to fix such places?

>>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
>>> index 0aedb2d3d..d348c3d09 100644
>>> --- a/src/box/sql/func.c
>>> +++ b/src/box/sql/func.c
>>> @@ -467,30 +467,21 @@ lengthFunc(sql_context * context, int argc, sql_value ** argv)
>>>  
>>>  	assert(argc == 1);
>>>  	UNUSED_PARAMETER(argc);
>>> -	switch (sql_value_type(argv[0])) {
>>> -	case MP_BIN:
>>> -	case MP_ARRAY:
>>> -	case MP_MAP:
>>> -	case MP_INT:
>>> -	case MP_UINT:
>>> -	case MP_BOOL:
>>> -	case MP_DOUBLE:{
>>> -			sql_result_uint(context, sql_value_bytes(argv[0]));
>>> -			break;
>>> -		}
>>> -	case MP_STR:{
>>> -			const unsigned char *z = sql_value_text(argv[0]);
>>> -			if (z == 0)
>>> -				return;
>>> -			len = sql_utf8_char_count(z, sql_value_bytes(argv[0]));
>>> -			sql_result_uint(context, len);
>>> -			break;
>>> -		}
>>> -	default:{
>>> -			sql_result_null(context);
>>> -			break;
>>> -		}
>>> -	}
>>> +	enum mp_type type = mem_mp_type(argv[0]);
>>> +	if (type == MP_NIL)
>>> +		return sql_result_null(context);
>>> +	if (type == MP_STR) {
>>> +		const unsigned char *z = sql_value_text(argv[0]);
>>> +		if (z == NULL)
>>> +			return sql_result_null(context);
>>> +		len = sql_utf8_char_count(z, sql_value_bytes(argv[0]));
>>> +		return sql_result_uint(context, len);
>>> +	}
>>> +	if (type == MP_BIN)
>>> +		return sql_result_uint(context, sql_value_bytes(argv[0]));
>>> +	diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
>>> +		 sql_value_to_diag_str(argv[0]), "string or varbinary");
>>> +	context->is_aborted = true;
>>
>> 6. What was wrong with leaving it a switch-case? Here and in other places.
>> For more than 2 similarly looking checks switch usually looks better, and
>> probably works better as well (but this I don't know for sure, I just
>> read some stuff about comilers being able to generate kind of smart goto
>> table for swithches).
>>
> Mostly because shich can be used only in functions that accepts only one
> argument.

Not exactly. As I see, mostly when you accept multiple arguments, they have
the same type. So you could switch-test one of them, and the compare == to
others.

> I think it is better that all of built-in functions (except typeof())
> have the same structure. Just my thoughts, though.

Whatever, leave it as you want. Personally I would chose switch where
possible, but no strict rules here.

>>> @@ -988,9 +979,9 @@ randomBlob(sql_context * context, int argc, sql_value ** argv)
>>>  	unsigned char *p;
>>>  	assert(argc == 1);
>>>  	UNUSED_PARAMETER(argc);
>>> -	if (mp_type_is_bloblike(sql_value_type(argv[0]))) {
>>> +	if (!mp_type_is_numeric(sql_value_type(argv[0]))) {
>>>  		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
>>> -			 sql_value_to_diag_str(argv[0]), "numeric");
>>> +			 sql_value_to_diag_str(argv[0]), "integer");
>>
>> 9. You check for numeric, but print integer? So if I pass 1.1 it will
>> pass, and will return 1 random byte. But should throw an error. Is there
>> a test on that?
>>
> According to our implicit cast rules, numbers of all types may be converted to
> INTEGER. However, currently these functions cannot fully work with our INTEGER
> type. They may only wor with int64. This also will be mentioned in the issue.

What issue?

See 9 comments below. Please, send next version to Nikita. We need to
hurry up, and I am sure he has a lot to add.

> From d1a97a66e58ad09aa9b1f7dea02b42f5d560c0bc Mon Sep 17 00:00:00 2001
> From: Mergen Imeev <imeevma at gmail.com>
> Date: Thu, 8 Oct 2020 00:23:18 +0300
> Subject: [PATCH] sql: check arguments types of built-in functions
> 
> After this patch, the argument types of the SQL built-in functions will
> be checked. Implicit casting rules will be applied during this check.
> 
> Closes #4159
> 
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index 0aedb2d3d..0da6c8f06 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -467,30 +467,21 @@ lengthFunc(sql_context * context, int argc, sql_value ** argv)
>  
>  	assert(argc == 1);
>  	UNUSED_PARAMETER(argc);
> -	switch (sql_value_type(argv[0])) {
> -	case MP_BIN:
> -	case MP_ARRAY:
> -	case MP_MAP:
> -	case MP_INT:
> -	case MP_UINT:
> -	case MP_BOOL:
> -	case MP_DOUBLE:{
> -			sql_result_uint(context, sql_value_bytes(argv[0]));
> -			break;
> -		}
> -	case MP_STR:{
> -			const unsigned char *z = sql_value_text(argv[0]);
> -			if (z == 0)
> -				return;
> -			len = sql_utf8_char_count(z, sql_value_bytes(argv[0]));
> -			sql_result_uint(context, len);
> -			break;
> -		}
> -	default:{
> -			sql_result_null(context);
> -			break;
> -		}
> -	}
> +	enum mp_type type = mem_mp_type(argv[0]);
> +	if (type == MP_NIL)
> +		return sql_result_null(context);
> +	if (type == MP_STR) {
> +		const unsigned char *z = sql_value_text(argv[0]);
> +		if (z == NULL)
> +			return sql_result_null(context);
> +		len = sql_utf8_char_count(z, sql_value_bytes(argv[0]));
> +		return sql_result_uint(context, len);
> +	}
> +	if (type == MP_BIN)
> +		return sql_result_uint(context, sql_value_bytes(argv[0]));
> +	diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +		 sql_value_to_diag_str(argv[0]), "string or varbinary");
> +	context->is_aborted = true;
>  }
>  
>  /*
> @@ -504,44 +495,24 @@ absFunc(sql_context * context, int argc, sql_value ** argv)
>  {
>  	assert(argc == 1);
>  	UNUSED_PARAMETER(argc);
> -	switch (sql_value_type(argv[0])) {
> -	case MP_UINT: {
> -		sql_result_uint(context, sql_value_uint64(argv[0]));
> -		break;
> -	}
> -	case MP_INT: {
> +	enum mp_type type = mem_mp_type(argv[0]);
> +	if (type == MP_NIL)
> +		return sql_result_null(context);
> +	if (type == MP_UINT)
> +		return sql_result_uint(context, sql_value_uint64(argv[0]));
> +	if (type == MP_INT) {
>  		int64_t value = sql_value_int64(argv[0]);
> -		assert(value < 0);
> -		sql_result_uint(context, -value);
> -		break;
> -	}
> -	case MP_NIL:{
> -			/* IMP: R-37434-19929 Abs(X) returns NULL if X is NULL. */
> -			sql_result_null(context);
> -			break;
> -		}
> -	case MP_BOOL:
> -	case MP_BIN:
> -	case MP_ARRAY:
> -	case MP_MAP: {
> -		diag_set(ClientError, ER_INCONSISTENT_TYPES, "number",
> -			 mem_type_to_str(argv[0]));
> -		context->is_aborted = true;
> -		return;
> +		return sql_result_uint(context, (uint64_t)(-value));
>  	}
> -	default:{
> -			/* Because sql_value_double() returns 0.0 if the argument is not
> -			 * something that can be converted into a number, we have:
> -			 * IMP: R-01992-00519 Abs(X) returns 0.0 if X is a string or blob
> -			 * that cannot be converted to a numeric value.
> -			 */
> -			double rVal = sql_value_double(argv[0]);
> -			if (rVal < 0)
> -				rVal = -rVal;
> -			sql_result_double(context, rVal);
> -			break;
> -		}
> +	if (type == MP_DOUBLE) {
> +		double value = sql_value_double(argv[0]);
> +		if (value < 0)
> +			value = -value;
> +		return sql_result_double(context, value);
>  	}
> +	diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +		 sql_value_to_diag_str(argv[0]), "number");
> +	context->is_aborted = true;
>  }
>  
>  /**
> @@ -564,34 +535,31 @@ position_func(struct sql_context *context, int argc, struct Mem **argv)
>  	enum mp_type needle_type = sql_value_type(needle);
>  	enum mp_type haystack_type = sql_value_type(haystack);
>  
> -	if (haystack_type == MP_NIL || needle_type == MP_NIL)
> -		return;
> -	/*
> -	 * Position function can be called only with string
> -	 * or blob params.
> -	 */
> -	struct Mem *inconsistent_type_arg = NULL;
> -	if (needle_type != MP_STR && needle_type != MP_BIN)
> -		inconsistent_type_arg = needle;
> -	if (haystack_type != MP_STR && haystack_type != MP_BIN)
> -		inconsistent_type_arg = haystack;
> -	if (inconsistent_type_arg != NULL) {
> -		diag_set(ClientError, ER_INCONSISTENT_TYPES,
> -			 "text or varbinary",
> -			 mem_type_to_str(inconsistent_type_arg));
> +	if (needle_type != MP_NIL && needle_type != MP_STR &&
> +	    needle_type != MP_BIN) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(needle), "string or varbinary");
>  		context->is_aborted = true;
>  		return;
>  	}
> -	/*
> -	 * Both params of Position function must be of the same
> -	 * type.
> -	 */
> -	if (haystack_type != needle_type) {
> -		diag_set(ClientError, ER_INCONSISTENT_TYPES,
> -			 mem_type_to_str(needle), mem_type_to_str(haystack));
> -		context->is_aborted = true;
> -		return;
> +	if (haystack_type != MP_NIL && haystack_type != needle_type) {
> +		if (needle_type != MP_NIL) {
> +			diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +				 sql_value_to_diag_str(haystack),
> +				 mem_type_to_str(needle));
> +			context->is_aborted = true;
> +			return;
> +		}
> +		if (haystack_type != MP_STR && haystack_type != MP_BIN) {
> +			diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +				 sql_value_to_diag_str(haystack),
> +				 "string or varbinary");
> +			context->is_aborted = true;
> +			return;
> +		}
>  	}
> +	if (needle_type == MP_NIL || haystack_type == MP_NIL)
> +		return;
>  
>  	int n_needle_bytes = sql_value_bytes(needle);
>  	int n_haystack_bytes = sql_value_bytes(haystack);
> @@ -707,6 +675,17 @@ printfFunc(sql_context * context, int argc, sql_value ** argv)
>  	}
>  }
>  
> +static bool
> +is_num_to_int_possible(struct Mem *mem)

1. The function lacks a comment. Why is it allowed to convert a double
to an int anywhere? Need to mention that this is the allowed implicit
cast.

> +{
> +	enum mp_type type = mem_mp_type(mem);
> +	assert(mp_type_is_numeric(type));
> +	if (type == MP_INT || type == MP_UINT)
> +		return true;
> +	assert(type == MP_DOUBLE);
> +	return mem->u.r >= (double)INT64_MIN && mem->u.r < (double)UINT64_MAX;

2. Why not <= UINT64_MAX, why <?

Also would be better to handle such checks inside sql_value_int(). It
already can return a proper success status from sqlVdbeIntValue(), but
ignores its result. Maybe not a part of this patchset though due to
lack of time.

> @@ -735,22 +713,45 @@ substrFunc(sql_context * context, int argc, sql_value ** argv)
>  		context->is_aborted = true;
>  		return;
>  	}
> -	if (sql_value_is_null(argv[1])
> -	    || (argc == 3 && sql_value_is_null(argv[2]))
> -	    ) {
> +	enum mp_type type0 = mem_mp_type(argv[0]);
> +	if (type0 != MP_NIL && type0 != MP_STR && type0 != MP_BIN) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(argv[0]), "string or varbinary");
> +		context->is_aborted = true;
>  		return;
>  	}
> -	p0type = sql_value_type(argv[0]);
> +	enum mp_type type1 = mem_mp_type(argv[1]);
> +	if (type1 != MP_NIL && (!mp_type_is_numeric(type1) ||
> +				!is_num_to_int_possible(argv[1]))) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(argv[1]), "integer");
> +		context->is_aborted = true;
> +		return;
> +	}
> +	enum mp_type type2 = MP_UINT;
> +	if (argc == 3) {
> +		type2 = mem_mp_type(argv[2]);
> +		if (type2 != MP_NIL && (!mp_type_is_numeric(type2) ||
> +					!is_num_to_int_possible(argv[2]))) {

3. This construction 'is_numeric && int_possible' is repeated
6 times in this file. Maybe time to extract into a new function.
You could just move mp_type_is_numeric() into is_num_to_int_possible(),
for example.

> +			diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +				 sql_value_to_diag_str(argv[2]), "integer");
> +			context->is_aborted = true;
> +			return;
> +		}
> +	}
> +	if (type0 == MP_NIL || type1 == MP_NIL || type2 == MP_NIL)
> +		return;
> +
>  	p1 = sql_value_int(argv[1]);

4. is_num_to_int_possible() allows values > INT64_MAX, so it will
overflow here.

> -	if (p0type == MP_BIN) {
> +	if (type0 == MP_BIN) {
>  		len = sql_value_bytes(argv[0]);
>  		z = sql_value_blob(argv[0]);
> -		if (z == 0)
> +		if (z == NULL)
>  			return;
>  		assert(len == sql_value_bytes(argv[0]));
>  	} else {
>  		z = sql_value_text(argv[0]);
> -		if (z == 0)
> +		if (z == NULL)
>  			return;
>  		len = 0;
>  		if (p1 < 0)
> @@ -763,9 +764,9 @@ substrFunc(sql_context * context, int argc, sql_value ** argv)
>  			negP2 = 1;
>  		}
>  	} else {
> -		p2 = sql_context_db_handle(context)->
> -		    aLimit[SQL_LIMIT_LENGTH];
> +		p2 = sql_context_db_handle(context)->aLimit[SQL_LIMIT_LENGTH];
>  	}
> +

5. This whole diff hunk and the 2 changes above it are not necessary.

>  	if (p1 < 0) {
>  		p1 += len;
>  		if (p1 < 0) {
> @@ -836,22 +837,32 @@ roundFunc(sql_context * context, int argc, sql_value ** argv)
>  		context->is_aborted = true;
>  		return;
>  	}
> +	enum mp_type type0 = mem_mp_type(argv[0]);
> +	if (type0!= MP_NIL && !mp_type_is_numeric(type0)) {

6. Missing whitespace after type0.

> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(argv[0]), "numeric");
> +		context->is_aborted = true;
> +		return;
> +	}
> +	enum mp_type type1 = MP_UINT;
>  	if (argc == 2) {
> -		if (sql_value_is_null(argv[1]))
> +		type1 = mem_mp_type(argv[1]);
> +		if (type1 != MP_NIL && (!mp_type_is_numeric(type1) ||
> +					!is_num_to_int_possible(argv[1]))) {
> +			diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +				 sql_value_to_diag_str(argv[1]), "integer");
> +			context->is_aborted = true;
>  			return;
> +		}
> +	}
> +	if (type0 == MP_NIL || type1 == MP_NIL)
> +		return;
> +
> +	if (argc == 2) {
>  		n = sql_value_int(argv[1]);
>  		if (n < 0)
>  			n = 0;
>  	}
> -	if (sql_value_is_null(argv[0]))
> -		return;
> -	enum mp_type mp_type = sql_value_type(argv[0]);
> -	if (mp_type_is_bloblike(mp_type)) {
> -		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> -			 sql_value_to_diag_str(argv[0]), "numeric");
> -		context->is_aborted = true;
> -		return;
> -	}
>  	r = sql_value_double(argv[0]);
>  	/* If Y==0 and X will fit in a 64-bit int,
>  	 * handle the rounding directly,
> @@ -907,9 +918,11 @@ case_type##ICUFunc(sql_context *context, int argc, sql_value **argv)   \
>  	int n;                                                                 \
>  	UNUSED_PARAMETER(argc);                                                \
>  	int arg_type = sql_value_type(argv[0]);                                \
> -	if (mp_type_is_bloblike(arg_type)) {                                   \
> -		diag_set(ClientError, ER_INCONSISTENT_TYPES, "text",           \
> -			 "varbinary");                                         \
> +	if (arg_type == MP_NIL)                                                \
> +		return;                                                        \
> +	if (arg_type != MP_STR) {                                              \

7. I would better use a single 'if' for != MP_STR, and would check for
== MP_NIL inside it. Because almost always the value != MP_NIL, and
therefore we would avoid unnecessary branching on the common path. Up
to you.

> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,                    \
> +			 sql_value_to_diag_str(argv[0]), "string");            \
>  		context->is_aborted = true;                                    \
>  		return;                                                        \
>  	}                                                                      \
> @@ -1750,6 +1846,12 @@ trim_func_one_arg(struct sql_context *context, sql_value *arg)
>  	enum mp_type val_type = sql_value_type(arg);
>  	if (val_type == MP_NIL)
>  		return;
> +	if (val_type != MP_STR && !mp_type_is_bloblike(val_type)) {

8. mp_type_is_bloblike() is true for arrays and maps. Does not look like
a good check for TRIM function. The same for other usages of this func.

> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(arg), "string or varbinary");
> +		context->is_aborted = true;
> +		return;
> +	}
> diff --git a/test/sql-tap/position.test.lua b/test/sql-tap/position.test.lua
> index e0455abc9..0a2c42173 100755
> --- a/test/sql-tap/position.test.lua
> +++ b/test/sql-tap/position.test.lua
> @@ -238,7 +238,7 @@ test:do_test(
>          return test:catchsql "SELECT position(34, 123456.78);"
>      end, {
>          -- <position-1.24>
> -        1, "Inconsistent types: expected text or varbinary got real"
> +        1, "Type mismatch: can not convert 34 to string or varbinary"

9. Tbh, the old error looked better. The new one is strange, because 34
can be converted to a string. It just does not happen implicitly. So the
message is misleading.