From: Peter Gulutzan <pgulutzan@ocelot.ca> To: Mergen Imeev <imeevma@tarantool.org>, korablev@tarantool.org, v.shpilevoy@tarantool.org, tsafin@tarantool.org, sergos@tarantool.org, tarantool-discussions@dev.tarantool.org Subject: Re: [Tarantool-discussions] The result type and argument types of the built-in SQL functions. Date: Mon, 27 Jul 2020 13:39:29 -0600 [thread overview] Message-ID: <0b155dec-3a58-a8fd-3aa8-343bf47e1e69@ocelot.ca> (raw) In-Reply-To: <20200727122429.GA49280@tarantool.org> Hi, On 2020-07-27 6:24 a.m., Mergen Imeev wrote: > Hi, Peter! > > I would like to ask you a few questions about the result type and argument > types of the SQL built-in functions. > > I suggest changing the result types of some functions. A table with the current > result type and the suggested result type is below. > > FUNCTION NAME CURRENT SUGGESTED > abs number number > avg number double > char string string > character_length integer unsigned > char_length integer unsigned > coalesce scalar scalar > count integer unsigned > greatest scalar scalar > group_concat string string > hex string string > ifnull integer scalar > least scalar scalar > length integer unsigned > like integer boolean > likelihood boolean scalar > likely boolean scalar > lower string string > max scalar scalar > min scalar scalar > nullif scalar scalar > position integer unsigned > printf string string > quote string string > random integer integer > randomblob varbinary varbinary > replace string string > round integer number > row_count integer unsigned > soundex string string > substr string string > sum number number > total number double > trim string string > typeof string string > unicode string unsigned > unlikely boolean scalar > upper string string > version string string > zeroblob varbinary varbinary > > > The second question is about the types of arguments to built-in functions. > > I suggest this: > > FUNCTION NAME TYPES OF ARGUMENTS > abs number > avg number > char* unsigned > character_length string > char_length string > coalesce scalar > count scalar > greatest scalar > group_concat scalar, scalar > hex scalar > ifnull scalar, scalar > least* scalar > length string > like string, string, string > likelihood scalar, double > likely scalar > lower string > max scalar > min scalar > nullif scalar, scalar > position string, string > printf* scalar > quote scalar > randomblob unsigned > replace string, string, string > round double, unsigned > soundex string > substr string, integer, integer > sum number > total number > trim* string > typeof scalar > unicode string > unlikely scalar > upper string > zeroblob unsigned > > * - all arguments must be of this type. > > > Also, we have to decide on BLOB instead of STRING. Last time you wrote that we > should allow BLOB instead of STRING, but I think it will be rather inconvenient, > because in this case we have to write SCALAR instead of STRING in function > definition and check the type of the argument inside the function. Because of > this, it will be a little incompatible with the definition of a function that > will be placed in the '_func' system space. I mean, the definition will state > that it accepts SCALAR, but in reality it will only accept STRING and BLOB. > > So, I think we should disallow BLOB instead of STRING, or decide in which > functions we allow BLOB instead of STRING. > > > And one more question. I think we are going to add MAP and ARRAY types in SQL in > the near future, so it might be a good idea to write ANY instead of SCALAR for > some of these functions. What do you think about this? > Re your table of "current" result data types. I think we must define "current". In version 2.4, SELECT TYPEOF(LENGTH('')); returns 'integer'. In version 2.6, SELECT TYPEOF(LENGTH('')); returns 'unsigned'. In other words, somebody has already made changes, for tarantool-master. However, I did not document that the return will be 'integer' so this is a "change in behaviour" but not a "change in documented behaviour". And I think we must define 'result type'. You write that GREATEST etc. return 'scalar'. But of course SELECT TYPEOF(GREATEST(1,'a')); returns 'string'. So I assume you do not mean that the result value has the Tarantool 'scalar' type, you only mean that the result value will be anything that Tarantool/SQL currently supports. Re returning UNSIGNED instead of INTEGER for ...LENGTH(), ROW_COUNT(), etc. (a) This data type is not standard and is not built-in for most major DBMSs. Even in MySQL https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html UNSIGNED is an attribute of a data type, as in BIGINT UNSIGNED, not a data type. (b) The maximum INTEGER value is the same as the maximum UNSIGNED value, 18446744073709551615, so the change is not necessary. (c) Although in Tarantool it is not likely, there are generic programs that might be checking whether the result of a function is negative. For example, in ODBC https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlrowcount-function?v, a result of an equivalent of our ROW_COUNT() function can be -1. ... Therefore I am not enthusiastic about this change. Re returning NUMBER from ROUND(): You say that the current return is 'integer', but when I try it, I get 'double'. I think this current behaviour is acceptable. However, if ROUND(1) returns INTEGER, that is good too -- the general idea, not a law, would be "if the result data type can be the same as the input data type(s), let it be the same". Re types of arguments: You suggest that TRIM() etc. must have 'string' arguments, but currently they can be 'varbinary' and I don't see why that is bad. You suggest that CHAR() must have 'unsigned' argument, but currently it can be some other type, and well, *maybe* that is bad. I don't object to strictness, but worry that I might have to document "sometimes we do an implicit cast, some other times we are strict". Re BLOB instead of string. This is related to the fact that TRIM() etc. currently do not need to have 'string' arguments, they can have 'varbinary' arguments. I admit that many (maybe most) other vendors expect character strings in such cases. But the standard suggests that something very similar is allowed, the DB2 manual says it is allowed, the MariaDB manual is not explicit but here I show it is allowed: " mariadb>SELECT HEX(TRIM(X'D0' FROM CAST('Д' AS BINARY))); OK 1 rows affected (0.0 seconds) +--------------------------------------------+ | HEX(TRIM(X'D0' FROM CAST('Д' AS BINARY))) | +--------------------------------------------+ | 94 | +--------------------------------------------+ " I believe that Tarantool should continue to allow varbinary arguments. Re MAP and ARRAY types "in the near future": I think we must define "near future". Currently in SQL we do not even have the Lua DECIMAL or UUID data types. Kirill Yukhin made the issue Implement DECIMAL data type #4415 on August 8 2019, saying "After DECIMAL type was introduced to the core, its time to implement this type in SQL frontend." We are nearly at August 8 2020, so apparently it takes more than one year to put a data type in SQL even though it is already in the core. ( So I think that maps and arrays, which I think are more difficult, will not exist in SQL for two years. I am not worried. However, it is interesting to imagine UPPER(array of strings) -- should we return upper of all elements? UPPER(map) -- should we return upper of both the key and the value? and so on. I believe Lua non-scalar values should be flattened in SQL, so perhaps the questions can all be avoided. Peter Gulutzan
next prev parent reply other threads:[~2020-07-27 19:39 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-27 12:24 Mergen Imeev 2020-07-27 19:39 ` Peter Gulutzan [this message] 2020-07-28 11:28 ` Mergen Imeev 2020-07-29 16:47 ` Peter Gulutzan 2020-07-29 8:50 ` Timur Safin 2020-07-29 17:25 ` Peter Gulutzan 2020-07-29 8:54 ` Timur Safin 2020-07-29 9:34 ` Mergen Imeev
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=0b155dec-3a58-a8fd-3aa8-343bf47e1e69@ocelot.ca \ --to=pgulutzan@ocelot.ca \ --cc=imeevma@tarantool.org \ --cc=korablev@tarantool.org \ --cc=sergos@tarantool.org \ --cc=tarantool-discussions@dev.tarantool.org \ --cc=tsafin@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-discussions] The result type and argument types of the built-in SQL functions.' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox