From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 35F00469719 for ; Fri, 16 Oct 2020 00:23:02 +0300 (MSK) References: <20201013224459.GA335783@tarantool.org> From: Vladislav Shpilevoy Message-ID: <5a2295c6-d999-ea6e-a4d6-520ddd5ad29e@tarantool.org> Date: Thu, 15 Oct 2020 23:23:00 +0200 MIME-Version: 1.0 In-Reply-To: <20201013224459.GA335783@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH v1 1/1] sql: check arguments types of built-in functions List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Mergen Imeev Cc: tarantool-patches@dev.tarantool.org Hi! Thanks for the fixes! On 14.10.2020 00:44, Mergen Imeev wrote: > Hi! Thank you for the review! My answers, diff and new patch below. > > On Fri, Oct 09, 2020 at 11:08:32PM +0200, Vladislav Shpilevoy wrote: >> Hi! Thanks for the patch! >> >> See 12 comments/questions below. >> >> 1. >> >> tarantool> box.execute("SELECT GREATEST(1, '2')") >> --- >> - metadata: >> - name: COLUMN_1 >> type: scalar >> rows: >> - ['2'] >> ... >> >> Is it supposed to happen? Shouldn't the function check that all >> arguments are of the same type? The same for LEAST(). >> > At the moment LEAST() ang GREATEST() works using SCALAR rules from BOX for > comparison. Not sure if this is right, though. Isn't this patch aimed to fix such places? >>> diff --git a/src/box/sql/func.c b/src/box/sql/func.c >>> index 0aedb2d3d..d348c3d09 100644 >>> --- a/src/box/sql/func.c >>> +++ b/src/box/sql/func.c >>> @@ -467,30 +467,21 @@ lengthFunc(sql_context * context, int argc, sql_value ** argv) >>> >>> assert(argc == 1); >>> UNUSED_PARAMETER(argc); >>> - switch (sql_value_type(argv[0])) { >>> - case MP_BIN: >>> - case MP_ARRAY: >>> - case MP_MAP: >>> - case MP_INT: >>> - case MP_UINT: >>> - case MP_BOOL: >>> - case MP_DOUBLE:{ >>> - sql_result_uint(context, sql_value_bytes(argv[0])); >>> - break; >>> - } >>> - case MP_STR:{ >>> - const unsigned char *z = sql_value_text(argv[0]); >>> - if (z == 0) >>> - return; >>> - len = sql_utf8_char_count(z, sql_value_bytes(argv[0])); >>> - sql_result_uint(context, len); >>> - break; >>> - } >>> - default:{ >>> - sql_result_null(context); >>> - break; >>> - } >>> - } >>> + enum mp_type type = mem_mp_type(argv[0]); >>> + if (type == MP_NIL) >>> + return sql_result_null(context); >>> + if (type == MP_STR) { >>> + const unsigned char *z = sql_value_text(argv[0]); >>> + if (z == NULL) >>> + return sql_result_null(context); >>> + len = sql_utf8_char_count(z, sql_value_bytes(argv[0])); >>> + return sql_result_uint(context, len); >>> + } >>> + if (type == MP_BIN) >>> + return sql_result_uint(context, sql_value_bytes(argv[0])); >>> + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, >>> + sql_value_to_diag_str(argv[0]), "string or varbinary"); >>> + context->is_aborted = true; >> >> 6. What was wrong with leaving it a switch-case? Here and in other places. >> For more than 2 similarly looking checks switch usually looks better, and >> probably works better as well (but this I don't know for sure, I just >> read some stuff about comilers being able to generate kind of smart goto >> table for swithches). >> > Mostly because shich can be used only in functions that accepts only one > argument. Not exactly. As I see, mostly when you accept multiple arguments, they have the same type. So you could switch-test one of them, and the compare == to others. > I think it is better that all of built-in functions (except typeof()) > have the same structure. Just my thoughts, though. Whatever, leave it as you want. Personally I would chose switch where possible, but no strict rules here. >>> @@ -988,9 +979,9 @@ randomBlob(sql_context * context, int argc, sql_value ** argv) >>> unsigned char *p; >>> assert(argc == 1); >>> UNUSED_PARAMETER(argc); >>> - if (mp_type_is_bloblike(sql_value_type(argv[0]))) { >>> + if (!mp_type_is_numeric(sql_value_type(argv[0]))) { >>> diag_set(ClientError, ER_SQL_TYPE_MISMATCH, >>> - sql_value_to_diag_str(argv[0]), "numeric"); >>> + sql_value_to_diag_str(argv[0]), "integer"); >> >> 9. You check for numeric, but print integer? So if I pass 1.1 it will >> pass, and will return 1 random byte. But should throw an error. Is there >> a test on that? >> > According to our implicit cast rules, numbers of all types may be converted to > INTEGER. However, currently these functions cannot fully work with our INTEGER > type. They may only wor with int64. This also will be mentioned in the issue. What issue? See 9 comments below. Please, send next version to Nikita. We need to hurry up, and I am sure he has a lot to add. > From d1a97a66e58ad09aa9b1f7dea02b42f5d560c0bc Mon Sep 17 00:00:00 2001 > From: Mergen Imeev > Date: Thu, 8 Oct 2020 00:23:18 +0300 > Subject: [PATCH] sql: check arguments types of built-in functions > > After this patch, the argument types of the SQL built-in functions will > be checked. Implicit casting rules will be applied during this check. > > Closes #4159 > > diff --git a/src/box/sql/func.c b/src/box/sql/func.c > index 0aedb2d3d..0da6c8f06 100644 > --- a/src/box/sql/func.c > +++ b/src/box/sql/func.c > @@ -467,30 +467,21 @@ lengthFunc(sql_context * context, int argc, sql_value ** argv) > > assert(argc == 1); > UNUSED_PARAMETER(argc); > - switch (sql_value_type(argv[0])) { > - case MP_BIN: > - case MP_ARRAY: > - case MP_MAP: > - case MP_INT: > - case MP_UINT: > - case MP_BOOL: > - case MP_DOUBLE:{ > - sql_result_uint(context, sql_value_bytes(argv[0])); > - break; > - } > - case MP_STR:{ > - const unsigned char *z = sql_value_text(argv[0]); > - if (z == 0) > - return; > - len = sql_utf8_char_count(z, sql_value_bytes(argv[0])); > - sql_result_uint(context, len); > - break; > - } > - default:{ > - sql_result_null(context); > - break; > - } > - } > + enum mp_type type = mem_mp_type(argv[0]); > + if (type == MP_NIL) > + return sql_result_null(context); > + if (type == MP_STR) { > + const unsigned char *z = sql_value_text(argv[0]); > + if (z == NULL) > + return sql_result_null(context); > + len = sql_utf8_char_count(z, sql_value_bytes(argv[0])); > + return sql_result_uint(context, len); > + } > + if (type == MP_BIN) > + return sql_result_uint(context, sql_value_bytes(argv[0])); > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + sql_value_to_diag_str(argv[0]), "string or varbinary"); > + context->is_aborted = true; > } > > /* > @@ -504,44 +495,24 @@ absFunc(sql_context * context, int argc, sql_value ** argv) > { > assert(argc == 1); > UNUSED_PARAMETER(argc); > - switch (sql_value_type(argv[0])) { > - case MP_UINT: { > - sql_result_uint(context, sql_value_uint64(argv[0])); > - break; > - } > - case MP_INT: { > + enum mp_type type = mem_mp_type(argv[0]); > + if (type == MP_NIL) > + return sql_result_null(context); > + if (type == MP_UINT) > + return sql_result_uint(context, sql_value_uint64(argv[0])); > + if (type == MP_INT) { > int64_t value = sql_value_int64(argv[0]); > - assert(value < 0); > - sql_result_uint(context, -value); > - break; > - } > - case MP_NIL:{ > - /* IMP: R-37434-19929 Abs(X) returns NULL if X is NULL. */ > - sql_result_null(context); > - break; > - } > - case MP_BOOL: > - case MP_BIN: > - case MP_ARRAY: > - case MP_MAP: { > - diag_set(ClientError, ER_INCONSISTENT_TYPES, "number", > - mem_type_to_str(argv[0])); > - context->is_aborted = true; > - return; > + return sql_result_uint(context, (uint64_t)(-value)); > } > - default:{ > - /* Because sql_value_double() returns 0.0 if the argument is not > - * something that can be converted into a number, we have: > - * IMP: R-01992-00519 Abs(X) returns 0.0 if X is a string or blob > - * that cannot be converted to a numeric value. > - */ > - double rVal = sql_value_double(argv[0]); > - if (rVal < 0) > - rVal = -rVal; > - sql_result_double(context, rVal); > - break; > - } > + if (type == MP_DOUBLE) { > + double value = sql_value_double(argv[0]); > + if (value < 0) > + value = -value; > + return sql_result_double(context, value); > } > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + sql_value_to_diag_str(argv[0]), "number"); > + context->is_aborted = true; > } > > /** > @@ -564,34 +535,31 @@ position_func(struct sql_context *context, int argc, struct Mem **argv) > enum mp_type needle_type = sql_value_type(needle); > enum mp_type haystack_type = sql_value_type(haystack); > > - if (haystack_type == MP_NIL || needle_type == MP_NIL) > - return; > - /* > - * Position function can be called only with string > - * or blob params. > - */ > - struct Mem *inconsistent_type_arg = NULL; > - if (needle_type != MP_STR && needle_type != MP_BIN) > - inconsistent_type_arg = needle; > - if (haystack_type != MP_STR && haystack_type != MP_BIN) > - inconsistent_type_arg = haystack; > - if (inconsistent_type_arg != NULL) { > - diag_set(ClientError, ER_INCONSISTENT_TYPES, > - "text or varbinary", > - mem_type_to_str(inconsistent_type_arg)); > + if (needle_type != MP_NIL && needle_type != MP_STR && > + needle_type != MP_BIN) { > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + sql_value_to_diag_str(needle), "string or varbinary"); > context->is_aborted = true; > return; > } > - /* > - * Both params of Position function must be of the same > - * type. > - */ > - if (haystack_type != needle_type) { > - diag_set(ClientError, ER_INCONSISTENT_TYPES, > - mem_type_to_str(needle), mem_type_to_str(haystack)); > - context->is_aborted = true; > - return; > + if (haystack_type != MP_NIL && haystack_type != needle_type) { > + if (needle_type != MP_NIL) { > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + sql_value_to_diag_str(haystack), > + mem_type_to_str(needle)); > + context->is_aborted = true; > + return; > + } > + if (haystack_type != MP_STR && haystack_type != MP_BIN) { > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + sql_value_to_diag_str(haystack), > + "string or varbinary"); > + context->is_aborted = true; > + return; > + } > } > + if (needle_type == MP_NIL || haystack_type == MP_NIL) > + return; > > int n_needle_bytes = sql_value_bytes(needle); > int n_haystack_bytes = sql_value_bytes(haystack); > @@ -707,6 +675,17 @@ printfFunc(sql_context * context, int argc, sql_value ** argv) > } > } > > +static bool > +is_num_to_int_possible(struct Mem *mem) 1. The function lacks a comment. Why is it allowed to convert a double to an int anywhere? Need to mention that this is the allowed implicit cast. > +{ > + enum mp_type type = mem_mp_type(mem); > + assert(mp_type_is_numeric(type)); > + if (type == MP_INT || type == MP_UINT) > + return true; > + assert(type == MP_DOUBLE); > + return mem->u.r >= (double)INT64_MIN && mem->u.r < (double)UINT64_MAX; 2. Why not <= UINT64_MAX, why @@ -735,22 +713,45 @@ substrFunc(sql_context * context, int argc, sql_value ** argv) > context->is_aborted = true; > return; > } > - if (sql_value_is_null(argv[1]) > - || (argc == 3 && sql_value_is_null(argv[2])) > - ) { > + enum mp_type type0 = mem_mp_type(argv[0]); > + if (type0 != MP_NIL && type0 != MP_STR && type0 != MP_BIN) { > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + sql_value_to_diag_str(argv[0]), "string or varbinary"); > + context->is_aborted = true; > return; > } > - p0type = sql_value_type(argv[0]); > + enum mp_type type1 = mem_mp_type(argv[1]); > + if (type1 != MP_NIL && (!mp_type_is_numeric(type1) || > + !is_num_to_int_possible(argv[1]))) { > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + sql_value_to_diag_str(argv[1]), "integer"); > + context->is_aborted = true; > + return; > + } > + enum mp_type type2 = MP_UINT; > + if (argc == 3) { > + type2 = mem_mp_type(argv[2]); > + if (type2 != MP_NIL && (!mp_type_is_numeric(type2) || > + !is_num_to_int_possible(argv[2]))) { 3. This construction 'is_numeric && int_possible' is repeated 6 times in this file. Maybe time to extract into a new function. You could just move mp_type_is_numeric() into is_num_to_int_possible(), for example. > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + sql_value_to_diag_str(argv[2]), "integer"); > + context->is_aborted = true; > + return; > + } > + } > + if (type0 == MP_NIL || type1 == MP_NIL || type2 == MP_NIL) > + return; > + > p1 = sql_value_int(argv[1]); 4. is_num_to_int_possible() allows values > INT64_MAX, so it will overflow here. > - if (p0type == MP_BIN) { > + if (type0 == MP_BIN) { > len = sql_value_bytes(argv[0]); > z = sql_value_blob(argv[0]); > - if (z == 0) > + if (z == NULL) > return; > assert(len == sql_value_bytes(argv[0])); > } else { > z = sql_value_text(argv[0]); > - if (z == 0) > + if (z == NULL) > return; > len = 0; > if (p1 < 0) > @@ -763,9 +764,9 @@ substrFunc(sql_context * context, int argc, sql_value ** argv) > negP2 = 1; > } > } else { > - p2 = sql_context_db_handle(context)-> > - aLimit[SQL_LIMIT_LENGTH]; > + p2 = sql_context_db_handle(context)->aLimit[SQL_LIMIT_LENGTH]; > } > + 5. This whole diff hunk and the 2 changes above it are not necessary. > if (p1 < 0) { > p1 += len; > if (p1 < 0) { > @@ -836,22 +837,32 @@ roundFunc(sql_context * context, int argc, sql_value ** argv) > context->is_aborted = true; > return; > } > + enum mp_type type0 = mem_mp_type(argv[0]); > + if (type0!= MP_NIL && !mp_type_is_numeric(type0)) { 6. Missing whitespace after type0. > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + sql_value_to_diag_str(argv[0]), "numeric"); > + context->is_aborted = true; > + return; > + } > + enum mp_type type1 = MP_UINT; > if (argc == 2) { > - if (sql_value_is_null(argv[1])) > + type1 = mem_mp_type(argv[1]); > + if (type1 != MP_NIL && (!mp_type_is_numeric(type1) || > + !is_num_to_int_possible(argv[1]))) { > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + sql_value_to_diag_str(argv[1]), "integer"); > + context->is_aborted = true; > return; > + } > + } > + if (type0 == MP_NIL || type1 == MP_NIL) > + return; > + > + if (argc == 2) { > n = sql_value_int(argv[1]); > if (n < 0) > n = 0; > } > - if (sql_value_is_null(argv[0])) > - return; > - enum mp_type mp_type = sql_value_type(argv[0]); > - if (mp_type_is_bloblike(mp_type)) { > - diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > - sql_value_to_diag_str(argv[0]), "numeric"); > - context->is_aborted = true; > - return; > - } > r = sql_value_double(argv[0]); > /* If Y==0 and X will fit in a 64-bit int, > * handle the rounding directly, > @@ -907,9 +918,11 @@ case_type##ICUFunc(sql_context *context, int argc, sql_value **argv) \ > int n; \ > UNUSED_PARAMETER(argc); \ > int arg_type = sql_value_type(argv[0]); \ > - if (mp_type_is_bloblike(arg_type)) { \ > - diag_set(ClientError, ER_INCONSISTENT_TYPES, "text", \ > - "varbinary"); \ > + if (arg_type == MP_NIL) \ > + return; \ > + if (arg_type != MP_STR) { \ 7. I would better use a single 'if' for != MP_STR, and would check for == MP_NIL inside it. Because almost always the value != MP_NIL, and therefore we would avoid unnecessary branching on the common path. Up to you. > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, \ > + sql_value_to_diag_str(argv[0]), "string"); \ > context->is_aborted = true; \ > return; \ > } \ > @@ -1750,6 +1846,12 @@ trim_func_one_arg(struct sql_context *context, sql_value *arg) > enum mp_type val_type = sql_value_type(arg); > if (val_type == MP_NIL) > return; > + if (val_type != MP_STR && !mp_type_is_bloblike(val_type)) { 8. mp_type_is_bloblike() is true for arrays and maps. Does not look like a good check for TRIM function. The same for other usages of this func. > + diag_set(ClientError, ER_SQL_TYPE_MISMATCH, > + sql_value_to_diag_str(arg), "string or varbinary"); > + context->is_aborted = true; > + return; > + } > diff --git a/test/sql-tap/position.test.lua b/test/sql-tap/position.test.lua > index e0455abc9..0a2c42173 100755 > --- a/test/sql-tap/position.test.lua > +++ b/test/sql-tap/position.test.lua > @@ -238,7 +238,7 @@ test:do_test( > return test:catchsql "SELECT position(34, 123456.78);" > end, { > -- > - 1, "Inconsistent types: expected text or varbinary got real" > + 1, "Type mismatch: can not convert 34 to string or varbinary" 9. Tbh, the old error looked better. The new one is strange, because 34 can be converted to a string. It just does not happen implicitly. So the message is misleading.