[Tarantool-discussions] The result type and argument types of the built-in SQL functions.

Peter Gulutzan pgulutzan at ocelot.ca
Mon Jul 27 22:39:29 MSK 2020


Hi,


On 2020-07-27 6:24 a.m., Mergen Imeev wrote:
 > Hi, Peter!
 >
 > I would like to ask you a few questions about the result type and 
argument
 > types of the SQL built-in functions.
 >
 > I suggest changing the result types of some functions. A table with 
the current
 > result type and the suggested result type is below.
 >
 > FUNCTION NAME        CURRENT        SUGGESTED
 > abs            number        number
 > avg            number        double
 > char            string        string
 > character_length    integer        unsigned
 > char_length        integer        unsigned
 > coalesce        scalar        scalar
 > count            integer        unsigned
 > greatest        scalar        scalar
 > group_concat        string        string
 > hex            string        string
 > ifnull            integer        scalar
 > least            scalar        scalar
 > length            integer        unsigned
 > like            integer        boolean
 > likelihood        boolean        scalar
 > likely            boolean        scalar
 > lower            string        string
 > max            scalar        scalar
 > min            scalar        scalar
 > nullif            scalar        scalar
 > position        integer        unsigned
 > printf            string        string
 > quote            string        string
 > random            integer        integer
 > randomblob        varbinary    varbinary
 > replace            string        string
 > round            integer        number
 > row_count        integer        unsigned
 > soundex            string        string
 > substr            string        string
 > sum            number        number
 > total            number        double
 > trim            string        string
 > typeof            string        string
 > unicode            string        unsigned
 > unlikely        boolean        scalar
 > upper            string        string
 > version            string        string
 > zeroblob        varbinary    varbinary
 >
 >
 > The second question is about the types of arguments to built-in 
functions.
 >
 > I suggest this:
 >
 > FUNCTION NAME        TYPES OF ARGUMENTS
 > abs            number
 > avg            number
 > char*            unsigned
 > character_length    string
 > char_length        string
 > coalesce        scalar
 > count            scalar
 > greatest        scalar
 > group_concat        scalar, scalar
 > hex            scalar
 > ifnull            scalar, scalar
 > least*            scalar
 > length            string
 > like            string, string, string
 > likelihood        scalar, double
 > likely            scalar
 > lower            string
 > max            scalar
 > min            scalar
 > nullif            scalar, scalar
 > position        string, string
 > printf*            scalar
 > quote            scalar
 > randomblob        unsigned
 > replace            string, string, string
 > round            double, unsigned
 > soundex            string
 > substr            string, integer, integer
 > sum            number
 > total            number
 > trim*            string
 > typeof            scalar
 > unicode            string
 > unlikely        scalar
 > upper            string
 > zeroblob        unsigned
 >
 > * - all arguments must be of this type.
 >
 >
 > Also, we have to decide on BLOB instead of STRING. Last time you 
wrote that we
 > should allow BLOB instead of STRING, but I think it will be rather 
inconvenient,
 > because in this case we have to write SCALAR instead of STRING in 
function
 > definition and check the type of the argument inside the function. 
Because of
 > this, it will be a little incompatible with the definition of a 
function that
 > will be placed in the '_func' system space. I mean, the definition 
will state
 > that it accepts SCALAR, but in reality it will only accept STRING and 
BLOB.
 >
 > So, I think we should disallow BLOB instead of STRING, or decide in which
 > functions we allow BLOB instead of STRING.
 >
 >
 > And one more question. I think we are going to add MAP and ARRAY 
types in SQL in
 > the near future, so it might be a good idea to write ANY instead of 
SCALAR for
 > some of these functions. What do you think about this?
 >

Re your table of "current" result data types. I think we must define 
"current".
In version 2.4, SELECT TYPEOF(LENGTH('')); returns 'integer'.
In version 2.6, SELECT TYPEOF(LENGTH('')); returns 'unsigned'.
In other words, somebody has already made changes, for tarantool-master.
However, I did not document that the return will be 'integer' so this is
a "change in behaviour" but not a "change in documented behaviour".

And I think we must define 'result type'.
You write that GREATEST etc. return 'scalar'.
But of course SELECT TYPEOF(GREATEST(1,'a')); returns 'string'.
So I assume you do not mean that the result value has the Tarantool 'scalar'
type, you only mean that the result value will be anything that
Tarantool/SQL currently supports.

Re returning UNSIGNED instead of INTEGER for ...LENGTH(), ROW_COUNT(), etc.
(a) This data type is not standard and is not built-in for most major DBMSs.
Even in MySQL 
https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html
UNSIGNED is an attribute of a data type, as in BIGINT UNSIGNED, not a 
data type.
(b) The maximum INTEGER value is the same as the maximum UNSIGNED value,
18446744073709551615, so the change is not necessary.
(c) Although in Tarantool it is not likely, there are generic programs 
that might
be checking whether the result of a function is negative. For example, 
in ODBC
https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlrowcount-function?v,
a result of an equivalent of our ROW_COUNT() function can be -1.
... Therefore I am not enthusiastic about this change.

Re returning NUMBER from ROUND():
You say that the current return is 'integer', but when I try it,
I get 'double'. I think this current behaviour is acceptable.
However, if ROUND(1) returns INTEGER, that is good too --
the general idea, not a law, would be
"if the result data type can be the same as the input data type(s),
let it be the same".

Re types of arguments:
You suggest that TRIM() etc. must have 'string' arguments,
but currently they can be 'varbinary' and I don't see why that is bad.
You suggest that CHAR() must have 'unsigned' argument,
but currently it can be some other type, and well, *maybe* that is bad.
I don't object to strictness, but worry that I might have to document
"sometimes we do an implicit cast, some other times we are strict".

Re BLOB instead of string. This is related to the fact that
TRIM() etc. currently do not need to have 'string' arguments,
they can have 'varbinary' arguments. I admit that many (maybe
most) other vendors expect character strings in such cases.
But the standard suggests that something very similar is allowed,
the DB2 manual says it is allowed,
the MariaDB manual is not explicit but here I show it is allowed:
"
mariadb>SELECT HEX(TRIM(X'D0' FROM CAST('Д' AS BINARY)));
OK 1 rows affected (0.0 seconds)
+--------------------------------------------+
| HEX(TRIM(X'D0' FROM CAST('Д' AS BINARY)))  |
+--------------------------------------------+
| 94                                         |
+--------------------------------------------+
"
I believe that Tarantool should continue to allow varbinary arguments.

Re MAP and ARRAY types "in the near future":
I think we must define "near future".
Currently in SQL we do not even have the Lua DECIMAL or UUID data types.
Kirill Yukhin made the issue Implement DECIMAL data type #4415
on August 8 2019, saying
"After DECIMAL type was introduced to the core, its time to implement 
this type in SQL frontend."
We are nearly at August 8 2020, so apparently it takes more than one 
year to put
a data type in SQL even though it is already in the core.
(
So I think that maps and arrays, which I think are more difficult,
will not exist in SQL for two years. I am not worried.
However, it is interesting to imagine
UPPER(array of strings) -- should we return upper of all elements?
UPPER(map) -- should we return upper of both the key and the value?
and so on.
I believe Lua non-scalar values should be flattened in SQL,
so perhaps the questions can all be avoided.

Peter Gulutzan



More information about the Tarantool-discussions mailing list