[Tarantool-discussions] The result type and argument types of the built-in SQL functions.
Mergen Imeev
imeevma at tarantool.org
Tue Jul 28 14:28:14 MSK 2020
Hi!
On Mon, Jul 27, 2020 at 01:39:29PM -0600, Peter Gulutzan wrote:
> Hi,
>
>
> On 2020-07-27 6:24 a.m., Mergen Imeev wrote:
<cut>
> >
>
> Re your table of "current" result data types. I think we must define
> "current".
> In version 2.4, SELECT TYPEOF(LENGTH('')); returns 'integer'.
When I say function result type, I mean this:
tarantool> SELECT LENGTH('');
---
- metadata:
- name: COLUMN_1
type: integer
rows:
- [0]
...
As you can see, it says that the result type of the LENGTH() function is
INTEGER. However, as you said, the type of the value we got is actually
UNSIGNED. I will fix this after we come to an agreement.
> In version 2.6, SELECT TYPEOF(LENGTH('')); returns 'unsigned'.
> In other words, somebody has already made changes, for tarantool-master.
> However, I did not document that the return will be 'integer' so this is
> a "change in behaviour" but not a "change in documented behaviour".
>
> And I think we must define 'result type'.
> You write that GREATEST etc. return 'scalar'.
> But of course SELECT TYPEOF(GREATEST(1,'a')); returns 'string'.
> So I assume you do not mean that the result value has the Tarantool 'scalar'
> type, you only mean that the result value will be anything that
> Tarantool/SQL currently supports.
I agree that this is true for the value that was received. However, result type
functions are more like column types:
tarantool> SELECT GREATEST(1,'a');
---
- metadata:
- name: COLUMN_1
type: scalar
rows:
- ['a']
...
>
> Re returning UNSIGNED instead of INTEGER for ...LENGTH(), ROW_COUNT(), etc.
> (a) This data type is not standard and is not built-in for most major DBMSs.
> Even in MySQL
> https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html
> UNSIGNED is an attribute of a data type, as in BIGINT UNSIGNED, not a data
> type.
> (b) The maximum INTEGER value is the same as the maximum UNSIGNED value,
> 18446744073709551615, so the change is not necessary.
> (c) Although in Tarantool it is not likely, there are generic programs that
> might
> be checking whether the result of a function is negative. For example, in
> ODBC
> https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlrowcount-function?v,
> a result of an equivalent of our ROW_COUNT() function can be -1.
> ... Therefore I am not enthusiastic about this change.
I have no objection here.
>
> Re returning NUMBER from ROUND():
> You say that the current return is 'integer', but when I try it,
> I get 'double'. I think this current behaviour is acceptable.
I talked about this:
tarantool> SELECT ROUND(1.2345, 2);
---
- metadata:
- name: COLUMN_1
type: integer
rows:
- [1.23]
...
As you can see, the result is not INTEGER, even if it is written that it is
INTEGER.
> However, if ROUND(1) returns INTEGER, that is good too --
> the general idea, not a law, would be
> "if the result data type can be the same as the input data type(s),
> let it be the same".
I believe ROUND (a) and ROUND (a, 0) should return INTEGER. I still think that
we should consider NUMBER as the return type of a function. However, this does
not mean that the type of the values we get as a result will be NUMBER. It will
be either INTEGER or DOUBLE.
Also, I have an important question: why don't we treat NUMBER the same way we
treat SCALAR? Before adding the DOUBLE type, it was possible to say that NUMBER
contains INTEGER and real values. So it was in the same position as INTEGER and
UNSIGNED right now. However, after adding DOUBLE, all numeric values can be
either INTEGER or DOUBLE, so we don't need NUMBER values. I suggest that we
allow NUMBER to be the column type, but there should be no NUMBER values. What
do you think about this?
>
> Re types of arguments:
> You suggest that TRIM() etc. must have 'string' arguments,
> but currently they can be 'varbinary' and I don't see why that is bad.
> You suggest that CHAR() must have 'unsigned' argument,
> but currently it can be some other type, and well, *maybe* that is bad.
> I don't object to strictness, but worry that I might have to document
> "sometimes we do an implicit cast, some other times we are strict".
I believe the values given as arguments should follow the "IMPLICIT CAST FOR
ASSIGNMENT" rules. I plan to use the same mechanism, so I don't see any problems
here. For exapmle, ROUND(1.234, 2.5) will work the same as ROUND(1.234, 2).
>
> Re BLOB instead of string. This is related to the fact that
> TRIM() etc. currently do not need to have 'string' arguments,
> they can have 'varbinary' arguments. I admit that many (maybe
> most) other vendors expect character strings in such cases.
> But the standard suggests that something very similar is allowed,
> the DB2 manual says it is allowed,
> the MariaDB manual is not explicit but here I show it is allowed:
> "
> mariadb>SELECT HEX(TRIM(X'D0' FROM CAST('Д' AS BINARY)));
> OK 1 rows affected (0.0 seconds)
> +--------------------------------------------+
> | HEX(TRIM(X'D0' FROM CAST('Д' AS BINARY))) |
> +--------------------------------------------+
> | 94 |
> +--------------------------------------------+
> "
> I believe that Tarantool should continue to allow varbinary arguments.
So you mean that all functions that can accept STRING must accept VARBINARY?
Do you think we should be thinking about a new type that should contain STRING
and VARBINARY the same way NUMBER contains INTEGER and DOUBLE? If you agree with
this, can you suggest a name for the new type?
>
> Re MAP and ARRAY types "in the near future":
> I think we must define "near future".
> Currently in SQL we do not even have the Lua DECIMAL or UUID data types.
> Kirill Yukhin made the issue Implement DECIMAL data type #4415
> on August 8 2019, saying
> "After DECIMAL type was introduced to the core, its time to implement this
> type in SQL frontend."
> We are nearly at August 8 2020, so apparently it takes more than one year to
> put
> a data type in SQL even though it is already in the core.
> (
> So I think that maps and arrays, which I think are more difficult,
> will not exist in SQL for two years. I am not worried.
It is actually in plans for the next release, however I won't argue with you
here.
> However, it is interesting to imagine
> UPPER(array of strings) -- should we return upper of all elements?
> UPPER(map) -- should we return upper of both the key and the value?
> and so on.
I didn't even thought about these cases. I mean, if function should accept
STRING and we will give her an ARRAY, that it should throw an error. After all
ARRAY is not STRING.
> I believe Lua non-scalar values should be flattened in SQL,
> so perhaps the questions can all be avoided.
>
> Peter Gulutzan
>
Could you take another look at the tables?
Also, here's what the function definition would look like after my patches:
tarantool> box.execute([[select "name", "param_list", "returns", "aggregate" from "_func" where "language" = 'SQL_BUILTIN' order by "name" LIMIT 10;]])
---
- metadata:
- name: name
type: string
- name: param_list
type: array
- name: returns
type: string
- name: aggregate
type: string
rows:
- ['ABS', ['number'], 'number', 'none']
- ['AVG', ['number'], 'number', 'group']
- ['CEIL', [], 'any', 'none']
- ['CEILING', [], 'any', 'none']
- ['CHAR', ['unsigned'], 'string', 'none']
- ['CHARACTER_LENGTH', ['scalar'], 'integer', 'none']
- ['CHAR_LENGTH', ['scalar'], 'integer', 'none']
- ['COALESCE', ['scalar'], 'scalar', 'none']
- ['COUNT', ['scalar'], 'integer', 'group']
- ['CURRENT_DATE', [], 'any', 'none']
...
I used SCALAR instead of STRING, so we may use VARBINARY instead of STRING.
Below is an updated table of function result types. Remember that this is not
always the type of the value that we get from the function. However, all values
that we receive as a result of executing the function must be of the specified
type.
FUNCTION NAME CURRENT SUGGESTED
abs number number
avg number double
char string string
character_length integer integer
char_length integer integer
coalesce scalar scalar
count integer integer
greatest scalar scalar
group_concat string string
hex string string
ifnull integer scalar
least scalar scalar
length integer integer
like integer boolean
likelihood boolean scalar
likely boolean scalar
lower string string
max scalar scalar
min scalar scalar
nullif scalar scalar
position integer integer
printf string string
quote string string
random integer integer
randomblob varbinary varbinary
replace string string
round integer number
row_count integer integer
soundex string string
substr string string
sum number number
total number double
trim string string
typeof string string
unicode string integer
unlikely boolean scalar
upper string string
version string string
zeroblob varbinary varbinary
Below is an updated table of function argument types. Note that the IMPLICIT
CAST FOR ASSIGNMENT rules will be applied before values are passed to the
function as arguments. This means that if the function takes STRING and we give
it an INTEGER value, we will get an error even before we call the function.
FUNCTION NAME TYPES OF ARGUMENTS
abs number
avg number
char* unsigned
character_length scalar
char_length scalar
coalesce scalar
count scalar
greatest scalar
group_concat scalar, scalar
hex scalar
ifnull scalar, scalar
least* scalar
length scalar
like scalar, scalar, scalar
likelihood scalar, double
likely scalar
lower scalar
max scalar
min scalar
nullif scalar, scalar
position scalar, scalar
printf* scalar
quote scalar
randomblob unsigned
replace scalar, scalar, scalar
round double, unsigned
soundex scalar
substr scalar, integer, integer
sum number
total number
trim* scalar
typeof scalar
unicode scalar
unlikely scalar
upper scalar
zeroblob unsigned
I still believe that not all functions that accept STRING have to to accept
VARBINARY. But that's up to you.
More information about the Tarantool-discussions
mailing list