From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rhino.ch-server.com (rhino.ch-server.com [209.59.190.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 1A24B445320 for ; Mon, 27 Jul 2020 22:39:35 +0300 (MSK) References: <20200727122429.GA49280@tarantool.org> From: Peter Gulutzan Message-ID: <0b155dec-3a58-a8fd-3aa8-343bf47e1e69@ocelot.ca> Date: Mon, 27 Jul 2020 13:39:29 -0600 MIME-Version: 1.0 In-Reply-To: <20200727122429.GA49280@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-US Subject: Re: [Tarantool-discussions] The result type and argument types of the built-in SQL functions. List-Id: Tarantool development process List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Mergen Imeev , korablev@tarantool.org, v.shpilevoy@tarantool.org, tsafin@tarantool.org, sergos@tarantool.org, tarantool-discussions@dev.tarantool.org Hi, On 2020-07-27 6:24 a.m., Mergen Imeev wrote: > Hi, Peter! > > I would like to ask you a few questions about the result type and argument > types of the SQL built-in functions. > > I suggest changing the result types of some functions. A table with the current > result type and the suggested result type is below. > > FUNCTION NAME        CURRENT        SUGGESTED > abs            number        number > avg            number        double > char            string        string > character_length    integer        unsigned > char_length        integer        unsigned > coalesce        scalar        scalar > count            integer        unsigned > greatest        scalar        scalar > group_concat        string        string > hex            string        string > ifnull            integer        scalar > least            scalar        scalar > length            integer        unsigned > like            integer        boolean > likelihood        boolean        scalar > likely            boolean        scalar > lower            string        string > max            scalar        scalar > min            scalar        scalar > nullif            scalar        scalar > position        integer        unsigned > printf            string        string > quote            string        string > random            integer        integer > randomblob        varbinary    varbinary > replace            string        string > round            integer        number > row_count        integer        unsigned > soundex            string        string > substr            string        string > sum            number        number > total            number        double > trim            string        string > typeof            string        string > unicode            string        unsigned > unlikely        boolean        scalar > upper            string        string > version            string        string > zeroblob        varbinary    varbinary > > > The second question is about the types of arguments to built-in functions. > > I suggest this: > > FUNCTION NAME        TYPES OF ARGUMENTS > abs            number > avg            number > char*            unsigned > character_length    string > char_length        string > coalesce        scalar > count            scalar > greatest        scalar > group_concat        scalar, scalar > hex            scalar > ifnull            scalar, scalar > least*            scalar > length            string > like            string, string, string > likelihood        scalar, double > likely            scalar > lower            string > max            scalar > min            scalar > nullif            scalar, scalar > position        string, string > printf*            scalar > quote            scalar > randomblob        unsigned > replace            string, string, string > round            double, unsigned > soundex            string > substr            string, integer, integer > sum            number > total            number > trim*            string > typeof            scalar > unicode            string > unlikely        scalar > upper            string > zeroblob        unsigned > > * - all arguments must be of this type. > > > Also, we have to decide on BLOB instead of STRING. Last time you wrote that we > should allow BLOB instead of STRING, but I think it will be rather inconvenient, > because in this case we have to write SCALAR instead of STRING in function > definition and check the type of the argument inside the function. Because of > this, it will be a little incompatible with the definition of a function that > will be placed in the '_func' system space. I mean, the definition will state > that it accepts SCALAR, but in reality it will only accept STRING and BLOB. > > So, I think we should disallow BLOB instead of STRING, or decide in which > functions we allow BLOB instead of STRING. > > > And one more question. I think we are going to add MAP and ARRAY types in SQL in > the near future, so it might be a good idea to write ANY instead of SCALAR for > some of these functions. What do you think about this? > Re your table of "current" result data types. I think we must define "current". In version 2.4, SELECT TYPEOF(LENGTH('')); returns 'integer'. In version 2.6, SELECT TYPEOF(LENGTH('')); returns 'unsigned'. In other words, somebody has already made changes, for tarantool-master. However, I did not document that the return will be 'integer' so this is a "change in behaviour" but not a "change in documented behaviour". And I think we must define 'result type'. You write that GREATEST etc. return 'scalar'. But of course SELECT TYPEOF(GREATEST(1,'a')); returns 'string'. So I assume you do not mean that the result value has the Tarantool 'scalar' type, you only mean that the result value will be anything that Tarantool/SQL currently supports. Re returning UNSIGNED instead of INTEGER for ...LENGTH(), ROW_COUNT(), etc. (a) This data type is not standard and is not built-in for most major DBMSs. Even in MySQL https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html UNSIGNED is an attribute of a data type, as in BIGINT UNSIGNED, not a data type. (b) The maximum INTEGER value is the same as the maximum UNSIGNED value, 18446744073709551615, so the change is not necessary. (c) Although in Tarantool it is not likely, there are generic programs that might be checking whether the result of a function is negative. For example, in ODBC https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlrowcount-function?v, a result of an equivalent of our ROW_COUNT() function can be -1. ... Therefore I am not enthusiastic about this change. Re returning NUMBER from ROUND(): You say that the current return is 'integer', but when I try it, I get 'double'. I think this current behaviour is acceptable. However, if ROUND(1) returns INTEGER, that is good too -- the general idea, not a law, would be "if the result data type can be the same as the input data type(s), let it be the same". Re types of arguments: You suggest that TRIM() etc. must have 'string' arguments, but currently they can be 'varbinary' and I don't see why that is bad. You suggest that CHAR() must have 'unsigned' argument, but currently it can be some other type, and well, *maybe* that is bad. I don't object to strictness, but worry that I might have to document "sometimes we do an implicit cast, some other times we are strict". Re BLOB instead of string. This is related to the fact that TRIM() etc. currently do not need to have 'string' arguments, they can have 'varbinary' arguments. I admit that many (maybe most) other vendors expect character strings in such cases. But the standard suggests that something very similar is allowed, the DB2 manual says it is allowed, the MariaDB manual is not explicit but here I show it is allowed: " mariadb>SELECT HEX(TRIM(X'D0' FROM CAST('Д' AS BINARY))); OK 1 rows affected (0.0 seconds) +--------------------------------------------+ | HEX(TRIM(X'D0' FROM CAST('Д' AS BINARY)))  | +--------------------------------------------+ | 94                                         | +--------------------------------------------+ " I believe that Tarantool should continue to allow varbinary arguments. Re MAP and ARRAY types "in the near future": I think we must define "near future". Currently in SQL we do not even have the Lua DECIMAL or UUID data types. Kirill Yukhin made the issue Implement DECIMAL data type #4415 on August 8 2019, saying "After DECIMAL type was introduced to the core, its time to implement this type in SQL frontend." We are nearly at August 8 2020, so apparently it takes more than one year to put a data type in SQL even though it is already in the core. ( So I think that maps and arrays, which I think are more difficult, will not exist in SQL for two years. I am not worried. However, it is interesting to imagine UPPER(array of strings) -- should we return upper of all elements? UPPER(map) -- should we return upper of both the key and the value? and so on. I believe Lua non-scalar values should be flattened in SQL, so perhaps the questions can all be avoided. Peter Gulutzan