Re: [Tarantool-patches] [PATCH v1 1/1] sql: check arguments types of built-in functions

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: imeevma@tarantool.org, tsafin@tarantool.org
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v1 1/1] sql: check arguments types of built-in functions
Date: Fri, 9 Oct 2020 23:08:32 +0200	[thread overview]
Message-ID: <d0a9c2db-8a24-f55a-af0a-b79e6945aaca@tarantool.org> (raw)
In-Reply-To: <b71985ee59624263b7c4b297abdf5dde83750e61.1602147647.git.imeevma@gmail.com>

Hi! Thanks for the patch!

See 12 comments/questions below.

1.

	tarantool> box.execute("SELECT GREATEST(1, '2')")
	---
	- metadata:
	  - name: COLUMN_1
	    type: scalar
	  rows:
	  - ['2']
	...

Is it supposed to happen? Shouldn't the function check that all
arguments are of the same type? The same for LEAST().


2. A common problem for all functions where you check for their
arguments being NULL:

	tarantool> box.execute([[SELECT POSITION(100, NULL)]])
	---
	- metadata:
	  - name: COLUMN_1
	    type: integer
	  rows:
	  - [null]
	...

Yes, the second argument is NULL, but the first one mismatches its
type. So this should be an error (I think so, but may be wrong).
The same for SUBSTR(), ROUND(), LIKE(), REPLACE().


3. Is it intended?

	tarantool> box.execute([[SELECT PRINTF(100)]])
	---
	- metadata:
	  - name: COLUMN_1
	    type: string
	  rows:
	  - ['100']
	...

AFAIR, the function is not a standard one, so I don't know if we
want to allow to omit format argument and just stringify whatever
is passed first. Nonetheless I mention it here in case you wanted
to fix it but missed.


4. HEX() in the comment says its argument is blob. But it takes
anything:

	tarantool> box.execute([[SELECT HEX(100)]])
	---
	- metadata:
	  - name: COLUMN_1
	    type: string
	  rows:
	  - ['313030']
	...

Don't know if it is intended. Likely not.


5. TRIM() takes non-string arguments. Definitely a mistake.

	tarantool> box.execute([[SELECT TRIM(100)]])
	---
	- metadata:
	  - name: COLUMN_1
	    type: string
	  rows:
	  - ['100']
	...

> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index 0aedb2d3d..d348c3d09 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -467,30 +467,21 @@ lengthFunc(sql_context * context, int argc, sql_value ** argv)
>  
>  	assert(argc == 1);
>  	UNUSED_PARAMETER(argc);
> -	switch (sql_value_type(argv[0])) {
> -	case MP_BIN:
> -	case MP_ARRAY:
> -	case MP_MAP:
> -	case MP_INT:
> -	case MP_UINT:
> -	case MP_BOOL:
> -	case MP_DOUBLE:{
> -			sql_result_uint(context, sql_value_bytes(argv[0]));
> -			break;
> -		}
> -	case MP_STR:{
> -			const unsigned char *z = sql_value_text(argv[0]);
> -			if (z == 0)
> -				return;
> -			len = sql_utf8_char_count(z, sql_value_bytes(argv[0]));
> -			sql_result_uint(context, len);
> -			break;
> -		}
> -	default:{
> -			sql_result_null(context);
> -			break;
> -		}
> -	}
> +	enum mp_type type = mem_mp_type(argv[0]);
> +	if (type == MP_NIL)
> +		return sql_result_null(context);
> +	if (type == MP_STR) {
> +		const unsigned char *z = sql_value_text(argv[0]);
> +		if (z == NULL)
> +			return sql_result_null(context);
> +		len = sql_utf8_char_count(z, sql_value_bytes(argv[0]));
> +		return sql_result_uint(context, len);
> +	}
> +	if (type == MP_BIN)
> +		return sql_result_uint(context, sql_value_bytes(argv[0]));
> +	diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +		 sql_value_to_diag_str(argv[0]), "string or varbinary");
> +	context->is_aborted = true;

6. What was wrong with leaving it a switch-case? Here and in other places.
For more than 2 similarly looking checks switch usually looks better, and
probably works better as well (but this I don't know for sure, I just
read some stuff about comilers being able to generate kind of smart goto
table for swithches).

>  }
>  
>  /*
> @@ -566,29 +537,23 @@ position_func(struct sql_context *context, int argc, struct Mem **argv)
>  
>  	if (haystack_type == MP_NIL || needle_type == MP_NIL)
>  		return;
> -	/*
> -	 * Position function can be called only with string
> -	 * or blob params.
> -	 */
> -	struct Mem *inconsistent_type_arg = NULL;
> -	if (needle_type != MP_STR && needle_type != MP_BIN)
> -		inconsistent_type_arg = needle;
> -	if (haystack_type != MP_STR && haystack_type != MP_BIN)
> -		inconsistent_type_arg = haystack;
> -	if (inconsistent_type_arg != NULL) {
> -		diag_set(ClientError, ER_INCONSISTENT_TYPES,
> -			 "text or varbinary",
> -			 mem_type_to_str(inconsistent_type_arg));
> +	if (needle_type != MP_STR && needle_type != MP_BIN) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(needle), "string or varbinary");
> +		context->is_aborted = true;
> +		return;
> +	}
> +	if (haystack_type != MP_STR && haystack_type != MP_BIN) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(haystack),
> +			 "string or varbinary");
>  		context->is_aborted = true;
>  		return;

7. I think you can drop this condition. Because if needle_type is
STR or BIN and bellow you check haystack_type == needle_type,
then haystack_type is also correct.

>  	}
> -	/*
> -	 * Both params of Position function must be of the same
> -	 * type.
> -	 */
>  	if (haystack_type != needle_type) {
> -		diag_set(ClientError, ER_INCONSISTENT_TYPES,
> -			 mem_type_to_str(needle), mem_type_to_str(haystack));
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(haystack),
> +			 mem_type_to_str(needle));
>  		context->is_aborted = true;
>  		return;
>  	}
> @@ -735,12 +700,24 @@ substrFunc(sql_context * context, int argc, sql_value ** argv)
>  		context->is_aborted = true;
>  		return;
>  	}
> -	if (sql_value_is_null(argv[1])
> +	if (sql_value_is_null(argv[0]) || sql_value_is_null(argv[1])
>  	    || (argc == 3 && sql_value_is_null(argv[2]))
>  	    ) {
>  		return;

8. I realized that it looks strange, that some functions allow to pass
NULL into non-main parameters and then return NULL. But some does not.
For example, this function allows to pass NULL instead of the integer
parameters. But function randomBlob() does not. Is it intended?

Not related to your patch. But if it is a bug, then better file a ticket.
(Kirill will throw it into the wishlist, but at least our conscience will
be clean.)

> @@ -988,9 +979,9 @@ randomBlob(sql_context * context, int argc, sql_value ** argv)
>  	unsigned char *p;
>  	assert(argc == 1);
>  	UNUSED_PARAMETER(argc);
> -	if (mp_type_is_bloblike(sql_value_type(argv[0]))) {
> +	if (!mp_type_is_numeric(sql_value_type(argv[0]))) {
>  		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> -			 sql_value_to_diag_str(argv[0]), "numeric");
> +			 sql_value_to_diag_str(argv[0]), "integer");

9. You check for numeric, but print integer? So if I pass 1.1 it will
pass, and will return 1 random byte. But should throw an error. Is there
a test on that?

>  		context->is_aborted = true;
>  		return;
>  	}
> @@ -1238,14 +1229,17 @@ likeFunc(sql_context *context, int argc, sql_value **argv)
>  	int rhs_type = sql_value_type(argv[0]);
>  	int lhs_type = sql_value_type(argv[1]);
>  
> -	if (lhs_type != MP_STR || rhs_type != MP_STR) {
> -		if (lhs_type == MP_NIL || rhs_type == MP_NIL)
> -			return;
> -		char *inconsistent_type = rhs_type != MP_STR ?
> -					  mem_type_to_str(argv[0]) :
> -					  mem_type_to_str(argv[1]);
> -		diag_set(ClientError, ER_INCONSISTENT_TYPES, "text",
> -			 inconsistent_type);
> +	if (lhs_type == MP_NIL || rhs_type == MP_NIL)
> +		return;

10. So if I pass NIL and 100, it wil pass? Not sure it should. The
second type still mismatches. The same in some other updated functions.

> +	if (rhs_type != MP_STR) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(argv[0]), "string");
> +		context->is_aborted = true;
> +		return;
> +	}
> +	if (lhs_type != MP_STR) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(argv[1]), "string");
>  		context->is_aborted = true;
>  		return;
>  	}
> @@ -1476,10 +1479,16 @@ charFunc(sql_context * context, int argc, sql_value ** argv)
>  	for (i = 0; i < argc; i++) {
>  		uint64_t x;
>  		unsigned c;
> -		if (sql_value_type(argv[i]) == MP_INT)
> -			x = 0xfffd;
> -		else
> -			x = sql_value_uint64(argv[i]);
> +		if (sql_value_is_null(argv[i]))
> +			continue;
> +		if (!mp_type_is_numeric(mem_mp_type(argv[i]))) {
> +			diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +				 sql_value_to_diag_str(argv[i]), "integer");
> +			context->is_aborted = true;
> +			return;
> +		}
> +		int y = sql_value_int(argv[i]);
> +		x = y < 0 ? 0xfffd : y;

11. Why was it using 64bit, but now uses simple int?

>  		if (x > 0x10ffff)
>  			x = 0xfffd;
>  		c = (unsigned)(x & 0x1fffff);
> @@ -1570,6 +1585,43 @@ replaceFunc(sql_context * context, int argc, sql_value ** argv)
>  
>  	assert(argc == 3);
>  	UNUSED_PARAMETER(argc);
> +	enum mp_type type0 = mem_mp_type(argv[0]);
> +	enum mp_type type1 = mem_mp_type(argv[1]);
> +	enum mp_type type2 = mem_mp_type(argv[2]);
> +	if (type0 == MP_NIL || type1 == MP_NIL || type2 == MP_NIL)
> +		return;
> +	if (type0 != MP_STR && type0 != MP_BIN) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(argv[0]), "string or varbinary");
> +		context->is_aborted = true;
> +		return;
> +	}
> +	if (type1 != MP_STR && type1 != MP_BIN) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(argv[1]), "string or varbinary");
> +		context->is_aborted = true;
> +		return;
> +	}
> +	if (type2 != MP_STR && type2 != MP_BIN) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(argv[2]), "string or varbinary");
> +		context->is_aborted = true;
> +		return;
> +	}

12. It is enough to check type0 and that the other types are equal
to type0.

> +	if (type1 != type0) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(argv[1]),
> +			 mem_type_to_str(argv[0]));
> +		context->is_aborted = true;
> +		return;
> +	}
> +	if (type2 != type0) {
> +		diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> +			 sql_value_to_diag_str(argv[2]),
> +			 mem_type_to_str(argv[0]));
> +		context->is_aborted = true;
> +		return;
> +	}
>  	zStr = sql_value_text(argv[0]);
>  	if (zStr == 0)
>  		return;

next prev parent reply	other threads:[~2020-10-09 21:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-08  9:03 imeevma
2020-10-09 21:08 ` Vladislav Shpilevoy [this message]
2020-10-13 22:44   ` Mergen Imeev
2020-10-15 21:23     ` Vladislav Shpilevoy
2020-10-16  1:20       ` Mergen Imeev
2020-10-16  1:01 imeevma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d0a9c2db-8a24-f55a-af0a-b79e6945aaca@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=imeevma@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=tsafin@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v1 1/1] sql: check arguments types of built-in functions' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox