[Tarantool-discussions] The result type and argument types of the built-in SQL functions.
Peter Gulutzan
pgulutzan at ocelot.ca
Mon Jul 27 22:39:29 MSK 2020
Hi,
On 2020-07-27 6:24 a.m., Mergen Imeev wrote:
> Hi, Peter!
>
> I would like to ask you a few questions about the result type and
argument
> types of the SQL built-in functions.
>
> I suggest changing the result types of some functions. A table with
the current
> result type and the suggested result type is below.
>
> FUNCTION NAME CURRENT SUGGESTED
> abs number number
> avg number double
> char string string
> character_length integer unsigned
> char_length integer unsigned
> coalesce scalar scalar
> count integer unsigned
> greatest scalar scalar
> group_concat string string
> hex string string
> ifnull integer scalar
> least scalar scalar
> length integer unsigned
> like integer boolean
> likelihood boolean scalar
> likely boolean scalar
> lower string string
> max scalar scalar
> min scalar scalar
> nullif scalar scalar
> position integer unsigned
> printf string string
> quote string string
> random integer integer
> randomblob varbinary varbinary
> replace string string
> round integer number
> row_count integer unsigned
> soundex string string
> substr string string
> sum number number
> total number double
> trim string string
> typeof string string
> unicode string unsigned
> unlikely boolean scalar
> upper string string
> version string string
> zeroblob varbinary varbinary
>
>
> The second question is about the types of arguments to built-in
functions.
>
> I suggest this:
>
> FUNCTION NAME TYPES OF ARGUMENTS
> abs number
> avg number
> char* unsigned
> character_length string
> char_length string
> coalesce scalar
> count scalar
> greatest scalar
> group_concat scalar, scalar
> hex scalar
> ifnull scalar, scalar
> least* scalar
> length string
> like string, string, string
> likelihood scalar, double
> likely scalar
> lower string
> max scalar
> min scalar
> nullif scalar, scalar
> position string, string
> printf* scalar
> quote scalar
> randomblob unsigned
> replace string, string, string
> round double, unsigned
> soundex string
> substr string, integer, integer
> sum number
> total number
> trim* string
> typeof scalar
> unicode string
> unlikely scalar
> upper string
> zeroblob unsigned
>
> * - all arguments must be of this type.
>
>
> Also, we have to decide on BLOB instead of STRING. Last time you
wrote that we
> should allow BLOB instead of STRING, but I think it will be rather
inconvenient,
> because in this case we have to write SCALAR instead of STRING in
function
> definition and check the type of the argument inside the function.
Because of
> this, it will be a little incompatible with the definition of a
function that
> will be placed in the '_func' system space. I mean, the definition
will state
> that it accepts SCALAR, but in reality it will only accept STRING and
BLOB.
>
> So, I think we should disallow BLOB instead of STRING, or decide in which
> functions we allow BLOB instead of STRING.
>
>
> And one more question. I think we are going to add MAP and ARRAY
types in SQL in
> the near future, so it might be a good idea to write ANY instead of
SCALAR for
> some of these functions. What do you think about this?
>
Re your table of "current" result data types. I think we must define
"current".
In version 2.4, SELECT TYPEOF(LENGTH('')); returns 'integer'.
In version 2.6, SELECT TYPEOF(LENGTH('')); returns 'unsigned'.
In other words, somebody has already made changes, for tarantool-master.
However, I did not document that the return will be 'integer' so this is
a "change in behaviour" but not a "change in documented behaviour".
And I think we must define 'result type'.
You write that GREATEST etc. return 'scalar'.
But of course SELECT TYPEOF(GREATEST(1,'a')); returns 'string'.
So I assume you do not mean that the result value has the Tarantool 'scalar'
type, you only mean that the result value will be anything that
Tarantool/SQL currently supports.
Re returning UNSIGNED instead of INTEGER for ...LENGTH(), ROW_COUNT(), etc.
(a) This data type is not standard and is not built-in for most major DBMSs.
Even in MySQL
https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html
UNSIGNED is an attribute of a data type, as in BIGINT UNSIGNED, not a
data type.
(b) The maximum INTEGER value is the same as the maximum UNSIGNED value,
18446744073709551615, so the change is not necessary.
(c) Although in Tarantool it is not likely, there are generic programs
that might
be checking whether the result of a function is negative. For example,
in ODBC
https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlrowcount-function?v,
a result of an equivalent of our ROW_COUNT() function can be -1.
... Therefore I am not enthusiastic about this change.
Re returning NUMBER from ROUND():
You say that the current return is 'integer', but when I try it,
I get 'double'. I think this current behaviour is acceptable.
However, if ROUND(1) returns INTEGER, that is good too --
the general idea, not a law, would be
"if the result data type can be the same as the input data type(s),
let it be the same".
Re types of arguments:
You suggest that TRIM() etc. must have 'string' arguments,
but currently they can be 'varbinary' and I don't see why that is bad.
You suggest that CHAR() must have 'unsigned' argument,
but currently it can be some other type, and well, *maybe* that is bad.
I don't object to strictness, but worry that I might have to document
"sometimes we do an implicit cast, some other times we are strict".
Re BLOB instead of string. This is related to the fact that
TRIM() etc. currently do not need to have 'string' arguments,
they can have 'varbinary' arguments. I admit that many (maybe
most) other vendors expect character strings in such cases.
But the standard suggests that something very similar is allowed,
the DB2 manual says it is allowed,
the MariaDB manual is not explicit but here I show it is allowed:
"
mariadb>SELECT HEX(TRIM(X'D0' FROM CAST('Д' AS BINARY)));
OK 1 rows affected (0.0 seconds)
+--------------------------------------------+
| HEX(TRIM(X'D0' FROM CAST('Д' AS BINARY))) |
+--------------------------------------------+
| 94 |
+--------------------------------------------+
"
I believe that Tarantool should continue to allow varbinary arguments.
Re MAP and ARRAY types "in the near future":
I think we must define "near future".
Currently in SQL we do not even have the Lua DECIMAL or UUID data types.
Kirill Yukhin made the issue Implement DECIMAL data type #4415
on August 8 2019, saying
"After DECIMAL type was introduced to the core, its time to implement
this type in SQL frontend."
We are nearly at August 8 2020, so apparently it takes more than one
year to put
a data type in SQL even though it is already in the core.
(
So I think that maps and arrays, which I think are more difficult,
will not exist in SQL for two years. I am not worried.
However, it is interesting to imagine
UPPER(array of strings) -- should we return upper of all elements?
UPPER(map) -- should we return upper of both the key and the value?
and so on.
I believe Lua non-scalar values should be flattened in SQL,
so perhaps the questions can all be avoided.
Peter Gulutzan
More information about the Tarantool-discussions
mailing list