[Tarantool-patches] [PATCH v1 1/1] sql: check arguments types of built-in functions
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Fri Oct 16 00:23:00 MSK 2020
Hi! Thanks for the fixes!
On 14.10.2020 00:44, Mergen Imeev wrote:
> Hi! Thank you for the review! My answers, diff and new patch below.
>
> On Fri, Oct 09, 2020 at 11:08:32PM +0200, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>>
>> See 12 comments/questions below.
>>
>> 1.
>>
>> tarantool> box.execute("SELECT GREATEST(1, '2')")
>> ---
>> - metadata:
>> - name: COLUMN_1
>> type: scalar
>> rows:
>> - ['2']
>> ...
>>
>> Is it supposed to happen? Shouldn't the function check that all
>> arguments are of the same type? The same for LEAST().
>>
> At the moment LEAST() ang GREATEST() works using SCALAR rules from BOX for
> comparison. Not sure if this is right, though.
Isn't this patch aimed to fix such places?
>>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
>>> index 0aedb2d3d..d348c3d09 100644
>>> --- a/src/box/sql/func.c
>>> +++ b/src/box/sql/func.c
>>> @@ -467,30 +467,21 @@ lengthFunc(sql_context * context, int argc, sql_value ** argv)
>>>
>>> assert(argc == 1);
>>> UNUSED_PARAMETER(argc);
>>> - switch (sql_value_type(argv[0])) {
>>> - case MP_BIN:
>>> - case MP_ARRAY:
>>> - case MP_MAP:
>>> - case MP_INT:
>>> - case MP_UINT:
>>> - case MP_BOOL:
>>> - case MP_DOUBLE:{
>>> - sql_result_uint(context, sql_value_bytes(argv[0]));
>>> - break;
>>> - }
>>> - case MP_STR:{
>>> - const unsigned char *z = sql_value_text(argv[0]);
>>> - if (z == 0)
>>> - return;
>>> - len = sql_utf8_char_count(z, sql_value_bytes(argv[0]));
>>> - sql_result_uint(context, len);
>>> - break;
>>> - }
>>> - default:{
>>> - sql_result_null(context);
>>> - break;
>>> - }
>>> - }
>>> + enum mp_type type = mem_mp_type(argv[0]);
>>> + if (type == MP_NIL)
>>> + return sql_result_null(context);
>>> + if (type == MP_STR) {
>>> + const unsigned char *z = sql_value_text(argv[0]);
>>> + if (z == NULL)
>>> + return sql_result_null(context);
>>> + len = sql_utf8_char_count(z, sql_value_bytes(argv[0]));
>>> + return sql_result_uint(context, len);
>>> + }
>>> + if (type == MP_BIN)
>>> + return sql_result_uint(context, sql_value_bytes(argv[0]));
>>> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
>>> + sql_value_to_diag_str(argv[0]), "string or varbinary");
>>> + context->is_aborted = true;
>>
>> 6. What was wrong with leaving it a switch-case? Here and in other places.
>> For more than 2 similarly looking checks switch usually looks better, and
>> probably works better as well (but this I don't know for sure, I just
>> read some stuff about comilers being able to generate kind of smart goto
>> table for swithches).
>>
> Mostly because shich can be used only in functions that accepts only one
> argument.
Not exactly. As I see, mostly when you accept multiple arguments, they have
the same type. So you could switch-test one of them, and the compare == to
others.
> I think it is better that all of built-in functions (except typeof())
> have the same structure. Just my thoughts, though.
Whatever, leave it as you want. Personally I would chose switch where
possible, but no strict rules here.
>>> @@ -988,9 +979,9 @@ randomBlob(sql_context * context, int argc, sql_value ** argv)
>>> unsigned char *p;
>>> assert(argc == 1);
>>> UNUSED_PARAMETER(argc);
>>> - if (mp_type_is_bloblike(sql_value_type(argv[0]))) {
>>> + if (!mp_type_is_numeric(sql_value_type(argv[0]))) {
>>> diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
>>> - sql_value_to_diag_str(argv[0]), "numeric");
>>> + sql_value_to_diag_str(argv[0]), "integer");
>>
>> 9. You check for numeric, but print integer? So if I pass 1.1 it will
>> pass, and will return 1 random byte. But should throw an error. Is there
>> a test on that?
>>
> According to our implicit cast rules, numbers of all types may be converted to
> INTEGER. However, currently these functions cannot fully work with our INTEGER
> type. They may only wor with int64. This also will be mentioned in the issue.
What issue?
See 9 comments below. Please, send next version to Nikita. We need to
hurry up, and I am sure he has a lot to add.
> From d1a97a66e58ad09aa9b1f7dea02b42f5d560c0bc Mon Sep 17 00:00:00 2001
> From: Mergen Imeev <imeevma at gmail.com>
> Date: Thu, 8 Oct 2020 00:23:18 +0300
> Subject: [PATCH] sql: check arguments types of built-in functions
>
> After this patch, the argument types of the SQL built-in functions will
> be checked. Implicit casting rules will be applied during this check.
>
> Closes #4159
>
> diff --git a/src/box/sql/func.c b/src/box/sql/func.c
> index 0aedb2d3d..0da6c8f06 100644
> --- a/src/box/sql/func.c
> +++ b/src/box/sql/func.c
> @@ -467,30 +467,21 @@ lengthFunc(sql_context * context, int argc, sql_value ** argv)
>
> assert(argc == 1);
> UNUSED_PARAMETER(argc);
> - switch (sql_value_type(argv[0])) {
> - case MP_BIN:
> - case MP_ARRAY:
> - case MP_MAP:
> - case MP_INT:
> - case MP_UINT:
> - case MP_BOOL:
> - case MP_DOUBLE:{
> - sql_result_uint(context, sql_value_bytes(argv[0]));
> - break;
> - }
> - case MP_STR:{
> - const unsigned char *z = sql_value_text(argv[0]);
> - if (z == 0)
> - return;
> - len = sql_utf8_char_count(z, sql_value_bytes(argv[0]));
> - sql_result_uint(context, len);
> - break;
> - }
> - default:{
> - sql_result_null(context);
> - break;
> - }
> - }
> + enum mp_type type = mem_mp_type(argv[0]);
> + if (type == MP_NIL)
> + return sql_result_null(context);
> + if (type == MP_STR) {
> + const unsigned char *z = sql_value_text(argv[0]);
> + if (z == NULL)
> + return sql_result_null(context);
> + len = sql_utf8_char_count(z, sql_value_bytes(argv[0]));
> + return sql_result_uint(context, len);
> + }
> + if (type == MP_BIN)
> + return sql_result_uint(context, sql_value_bytes(argv[0]));
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> + sql_value_to_diag_str(argv[0]), "string or varbinary");
> + context->is_aborted = true;
> }
>
> /*
> @@ -504,44 +495,24 @@ absFunc(sql_context * context, int argc, sql_value ** argv)
> {
> assert(argc == 1);
> UNUSED_PARAMETER(argc);
> - switch (sql_value_type(argv[0])) {
> - case MP_UINT: {
> - sql_result_uint(context, sql_value_uint64(argv[0]));
> - break;
> - }
> - case MP_INT: {
> + enum mp_type type = mem_mp_type(argv[0]);
> + if (type == MP_NIL)
> + return sql_result_null(context);
> + if (type == MP_UINT)
> + return sql_result_uint(context, sql_value_uint64(argv[0]));
> + if (type == MP_INT) {
> int64_t value = sql_value_int64(argv[0]);
> - assert(value < 0);
> - sql_result_uint(context, -value);
> - break;
> - }
> - case MP_NIL:{
> - /* IMP: R-37434-19929 Abs(X) returns NULL if X is NULL. */
> - sql_result_null(context);
> - break;
> - }
> - case MP_BOOL:
> - case MP_BIN:
> - case MP_ARRAY:
> - case MP_MAP: {
> - diag_set(ClientError, ER_INCONSISTENT_TYPES, "number",
> - mem_type_to_str(argv[0]));
> - context->is_aborted = true;
> - return;
> + return sql_result_uint(context, (uint64_t)(-value));
> }
> - default:{
> - /* Because sql_value_double() returns 0.0 if the argument is not
> - * something that can be converted into a number, we have:
> - * IMP: R-01992-00519 Abs(X) returns 0.0 if X is a string or blob
> - * that cannot be converted to a numeric value.
> - */
> - double rVal = sql_value_double(argv[0]);
> - if (rVal < 0)
> - rVal = -rVal;
> - sql_result_double(context, rVal);
> - break;
> - }
> + if (type == MP_DOUBLE) {
> + double value = sql_value_double(argv[0]);
> + if (value < 0)
> + value = -value;
> + return sql_result_double(context, value);
> }
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> + sql_value_to_diag_str(argv[0]), "number");
> + context->is_aborted = true;
> }
>
> /**
> @@ -564,34 +535,31 @@ position_func(struct sql_context *context, int argc, struct Mem **argv)
> enum mp_type needle_type = sql_value_type(needle);
> enum mp_type haystack_type = sql_value_type(haystack);
>
> - if (haystack_type == MP_NIL || needle_type == MP_NIL)
> - return;
> - /*
> - * Position function can be called only with string
> - * or blob params.
> - */
> - struct Mem *inconsistent_type_arg = NULL;
> - if (needle_type != MP_STR && needle_type != MP_BIN)
> - inconsistent_type_arg = needle;
> - if (haystack_type != MP_STR && haystack_type != MP_BIN)
> - inconsistent_type_arg = haystack;
> - if (inconsistent_type_arg != NULL) {
> - diag_set(ClientError, ER_INCONSISTENT_TYPES,
> - "text or varbinary",
> - mem_type_to_str(inconsistent_type_arg));
> + if (needle_type != MP_NIL && needle_type != MP_STR &&
> + needle_type != MP_BIN) {
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> + sql_value_to_diag_str(needle), "string or varbinary");
> context->is_aborted = true;
> return;
> }
> - /*
> - * Both params of Position function must be of the same
> - * type.
> - */
> - if (haystack_type != needle_type) {
> - diag_set(ClientError, ER_INCONSISTENT_TYPES,
> - mem_type_to_str(needle), mem_type_to_str(haystack));
> - context->is_aborted = true;
> - return;
> + if (haystack_type != MP_NIL && haystack_type != needle_type) {
> + if (needle_type != MP_NIL) {
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> + sql_value_to_diag_str(haystack),
> + mem_type_to_str(needle));
> + context->is_aborted = true;
> + return;
> + }
> + if (haystack_type != MP_STR && haystack_type != MP_BIN) {
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> + sql_value_to_diag_str(haystack),
> + "string or varbinary");
> + context->is_aborted = true;
> + return;
> + }
> }
> + if (needle_type == MP_NIL || haystack_type == MP_NIL)
> + return;
>
> int n_needle_bytes = sql_value_bytes(needle);
> int n_haystack_bytes = sql_value_bytes(haystack);
> @@ -707,6 +675,17 @@ printfFunc(sql_context * context, int argc, sql_value ** argv)
> }
> }
>
> +static bool
> +is_num_to_int_possible(struct Mem *mem)
1. The function lacks a comment. Why is it allowed to convert a double
to an int anywhere? Need to mention that this is the allowed implicit
cast.
> +{
> + enum mp_type type = mem_mp_type(mem);
> + assert(mp_type_is_numeric(type));
> + if (type == MP_INT || type == MP_UINT)
> + return true;
> + assert(type == MP_DOUBLE);
> + return mem->u.r >= (double)INT64_MIN && mem->u.r < (double)UINT64_MAX;
2. Why not <= UINT64_MAX, why <?
Also would be better to handle such checks inside sql_value_int(). It
already can return a proper success status from sqlVdbeIntValue(), but
ignores its result. Maybe not a part of this patchset though due to
lack of time.
> @@ -735,22 +713,45 @@ substrFunc(sql_context * context, int argc, sql_value ** argv)
> context->is_aborted = true;
> return;
> }
> - if (sql_value_is_null(argv[1])
> - || (argc == 3 && sql_value_is_null(argv[2]))
> - ) {
> + enum mp_type type0 = mem_mp_type(argv[0]);
> + if (type0 != MP_NIL && type0 != MP_STR && type0 != MP_BIN) {
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> + sql_value_to_diag_str(argv[0]), "string or varbinary");
> + context->is_aborted = true;
> return;
> }
> - p0type = sql_value_type(argv[0]);
> + enum mp_type type1 = mem_mp_type(argv[1]);
> + if (type1 != MP_NIL && (!mp_type_is_numeric(type1) ||
> + !is_num_to_int_possible(argv[1]))) {
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> + sql_value_to_diag_str(argv[1]), "integer");
> + context->is_aborted = true;
> + return;
> + }
> + enum mp_type type2 = MP_UINT;
> + if (argc == 3) {
> + type2 = mem_mp_type(argv[2]);
> + if (type2 != MP_NIL && (!mp_type_is_numeric(type2) ||
> + !is_num_to_int_possible(argv[2]))) {
3. This construction 'is_numeric && int_possible' is repeated
6 times in this file. Maybe time to extract into a new function.
You could just move mp_type_is_numeric() into is_num_to_int_possible(),
for example.
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> + sql_value_to_diag_str(argv[2]), "integer");
> + context->is_aborted = true;
> + return;
> + }
> + }
> + if (type0 == MP_NIL || type1 == MP_NIL || type2 == MP_NIL)
> + return;
> +
> p1 = sql_value_int(argv[1]);
4. is_num_to_int_possible() allows values > INT64_MAX, so it will
overflow here.
> - if (p0type == MP_BIN) {
> + if (type0 == MP_BIN) {
> len = sql_value_bytes(argv[0]);
> z = sql_value_blob(argv[0]);
> - if (z == 0)
> + if (z == NULL)
> return;
> assert(len == sql_value_bytes(argv[0]));
> } else {
> z = sql_value_text(argv[0]);
> - if (z == 0)
> + if (z == NULL)
> return;
> len = 0;
> if (p1 < 0)
> @@ -763,9 +764,9 @@ substrFunc(sql_context * context, int argc, sql_value ** argv)
> negP2 = 1;
> }
> } else {
> - p2 = sql_context_db_handle(context)->
> - aLimit[SQL_LIMIT_LENGTH];
> + p2 = sql_context_db_handle(context)->aLimit[SQL_LIMIT_LENGTH];
> }
> +
5. This whole diff hunk and the 2 changes above it are not necessary.
> if (p1 < 0) {
> p1 += len;
> if (p1 < 0) {
> @@ -836,22 +837,32 @@ roundFunc(sql_context * context, int argc, sql_value ** argv)
> context->is_aborted = true;
> return;
> }
> + enum mp_type type0 = mem_mp_type(argv[0]);
> + if (type0!= MP_NIL && !mp_type_is_numeric(type0)) {
6. Missing whitespace after type0.
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> + sql_value_to_diag_str(argv[0]), "numeric");
> + context->is_aborted = true;
> + return;
> + }
> + enum mp_type type1 = MP_UINT;
> if (argc == 2) {
> - if (sql_value_is_null(argv[1]))
> + type1 = mem_mp_type(argv[1]);
> + if (type1 != MP_NIL && (!mp_type_is_numeric(type1) ||
> + !is_num_to_int_possible(argv[1]))) {
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> + sql_value_to_diag_str(argv[1]), "integer");
> + context->is_aborted = true;
> return;
> + }
> + }
> + if (type0 == MP_NIL || type1 == MP_NIL)
> + return;
> +
> + if (argc == 2) {
> n = sql_value_int(argv[1]);
> if (n < 0)
> n = 0;
> }
> - if (sql_value_is_null(argv[0]))
> - return;
> - enum mp_type mp_type = sql_value_type(argv[0]);
> - if (mp_type_is_bloblike(mp_type)) {
> - diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> - sql_value_to_diag_str(argv[0]), "numeric");
> - context->is_aborted = true;
> - return;
> - }
> r = sql_value_double(argv[0]);
> /* If Y==0 and X will fit in a 64-bit int,
> * handle the rounding directly,
> @@ -907,9 +918,11 @@ case_type##ICUFunc(sql_context *context, int argc, sql_value **argv) \
> int n; \
> UNUSED_PARAMETER(argc); \
> int arg_type = sql_value_type(argv[0]); \
> - if (mp_type_is_bloblike(arg_type)) { \
> - diag_set(ClientError, ER_INCONSISTENT_TYPES, "text", \
> - "varbinary"); \
> + if (arg_type == MP_NIL) \
> + return; \
> + if (arg_type != MP_STR) { \
7. I would better use a single 'if' for != MP_STR, and would check for
== MP_NIL inside it. Because almost always the value != MP_NIL, and
therefore we would avoid unnecessary branching on the common path. Up
to you.
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, \
> + sql_value_to_diag_str(argv[0]), "string"); \
> context->is_aborted = true; \
> return; \
> } \
> @@ -1750,6 +1846,12 @@ trim_func_one_arg(struct sql_context *context, sql_value *arg)
> enum mp_type val_type = sql_value_type(arg);
> if (val_type == MP_NIL)
> return;
> + if (val_type != MP_STR && !mp_type_is_bloblike(val_type)) {
8. mp_type_is_bloblike() is true for arrays and maps. Does not look like
a good check for TRIM function. The same for other usages of this func.
> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH,
> + sql_value_to_diag_str(arg), "string or varbinary");
> + context->is_aborted = true;
> + return;
> + }
> diff --git a/test/sql-tap/position.test.lua b/test/sql-tap/position.test.lua
> index e0455abc9..0a2c42173 100755
> --- a/test/sql-tap/position.test.lua
> +++ b/test/sql-tap/position.test.lua
> @@ -238,7 +238,7 @@ test:do_test(
> return test:catchsql "SELECT position(34, 123456.78);"
> end, {
> -- <position-1.24>
> - 1, "Inconsistent types: expected text or varbinary got real"
> + 1, "Type mismatch: can not convert 34 to string or varbinary"
9. Tbh, the old error looked better. The new one is strange, because 34
can be converted to a string. It just does not happen implicitly. So the
message is misleading.
More information about the Tarantool-patches
mailing list