From i.kosarev at tarantool.org Thu Jul 2 17:22:49 2020 From: i.kosarev at tarantool.org (=?UTF-8?B?SWx5YSBLb3NhcmV2?=) Date: Thu, 02 Jul 2020 17:22:49 +0300 Subject: [Tarantool-discussions] =?utf-8?q?=5Bdev=5D__=5Brfc=5D_iproto_con?= =?utf-8?q?nections_processing_improvements?= In-Reply-To: <20200630114713.GB272113@atlas> References: <1590760078.650797476@f437.i.mail.ru> <1593515892.514490810@f382.i.mail.ru> <20200630114713.GB272113@atlas> Message-ID: <1593699769.684917525@f473.i.mail.ru> As a result of private discussion, here are the steps to be implemented: 1. Greeting should be done by iproto solely. This means session creation has to be moved to a later point (after iproto_msg_decode). Thus iproto has to be able to reach iproto_msg_decode without tx assistance. Iproto also should be able to finish connection itself in case it is possible (connection being rejected by iproto). 2. Introduce state machine managed from tx. tx should be able to enable different iproto states depending on tx work phase, for example, to reject all connections on secondary index build. 3. To be more specific, we need to be able to classify different types of connections, for example, replica connection vs client connection. This means we need to add specific flag for replica authentication and prioritize it if needed depending on the iproto state. 4. New approach to connections handling means we need to reconsider clients behavior: specific error for this rejection type, reconnection on timeout. ? -- Ilya Kosarev >???????, 30 ???? 2020, 14:47 +03:00 ?? Konstantin Osipov : >? >* Ilya Kosarev < i.kosarev at tarantool.org > [20/06/30 14:19]: >> >> ?????? ???????????????? ??? ?????. >> ?? ????, ??? ??? ???????? ?? ?? ???? ????????. >> https://github.com/tarantool/tarantool/issues/3776 >> https://github.com/tarantool/tarantool/issues/4646 >> https://github.com/tarantool/tarantool/issues/4910 >> ? >> ??? vtab/????????? ??????? ?????? ?????????, ? ???????, ??? ??? ????. >> ? >> ? iproto_msg_decode???? ??? ???? ??????????? ???????? ???? ?? ???????. >??? ??????? > > >-- >Konstantin Osipov, Moscow, Russia ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From imeevma at tarantool.org Mon Jul 6 16:52:23 2020 From: imeevma at tarantool.org (Mergen Imeev) Date: Mon, 6 Jul 2020 16:52:23 +0300 Subject: [Tarantool-discussions] Check implicit cast between strings and numbers Message-ID: <2a8578c7-10f5-fd38-409e-4e996f07b865@tarantool.org> Hi, Peter, could you please take a look at my branch and say if something is wrong with the implicit casts between strings and numerics. I mean, there shouldn't be any. Branch: imeevma/gh-3809-disallow-imlicit-cast-from-string-to-nums There should be no implicit casts for assignment after this patch, except for the following: 1) Any type can be implicitly cast to ANY. 2) Any scalar type can be implicitly cast to SCALAR. 3) Any numeric type can be implicitly cast to NUMBER. 4) In some cases, numbers can be implicitly converted to another number type. The rules can be seen in the commit message of the patches. For comparison, we say that any numbers can be compared with each other. However, you may find some cases where the result is not what it should be. This will be fixed in another issue. Also, comparison with SCALAR does not work correctly. This will also be fixed later after we decide how this should work. From pgulutzan at ocelot.ca Mon Jul 6 22:51:48 2020 From: pgulutzan at ocelot.ca (Peter Gulutzan) Date: Mon, 6 Jul 2020 13:51:48 -0600 Subject: [Tarantool-discussions] Check implicit cast between strings and numbers In-Reply-To: <2a8578c7-10f5-fd38-409e-4e996f07b865@tarantool.org> References: <2a8578c7-10f5-fd38-409e-4e996f07b865@tarantool.org> Message-ID: <1924f068-8ec2-ad91-569a-7eeecaad9767@ocelot.ca> Hi, On 2020-07-06 7:52 a.m., Mergen Imeev wrote: > Hi, > > Peter, could you please take a look at my branch and say if > something is wrong with the implicit casts between strings and > numerics. I mean, there shouldn't be any. > > Branch: imeevma/gh-3809-disallow-imlicit-cast-from-string-to-nums > > There should be no implicit casts for assignment after this patch, > except for the following: > 1) Any type can be implicitly cast to ANY. > 2) Any scalar type can be implicitly cast to SCALAR. > 3) Any numeric type can be implicitly cast to NUMBER. > 4) In some cases, numbers can be implicitly converted to another > number type. The rules can be seen in the commit message of the > patches. > > For comparison, we say that any numbers can be compared with each > other. However, you may find some cases where the result is not > what it should be. This will be fixed in another issue. > > Also, comparison with SCALAR does not work correctly. This will > also be fixed later after we decide how this should work. > Hi, I looked. I found nothing significant. Some of my statements here may be repeitions of earlier statements. Some tests with arithmetic: These are examples in the version 2.4 manual: box.execute([[select '7' + '7' /* result is 14, metadata = scalar */;]]) box.execute([[SELECT 5 / '5' /* result is 1 */;]]) I assume that this will be fixed soon. Some tests of functions: box.execute([[SELECT ABS('') /* result is 0 */;]]) box.execute([[SELECT CAST('' AS DOUBLE) /* result is error */;]]) box.execute([[SELECT CHAR('') /* result is '\0' */;]]) box.execute([[SELECT GREATEST('',1e500,X'00',0,CAST(100 AS DOUBLE));]]) box.execute([[SELECT LENGTH(1234.56) /* result is 7 */;]]) box.execute([[SELECT LIKELIHOOD('a' = 'b', '0.1') /* result is error */;]]) box.execute([[SELECT LOWER(5) /* result is '5' */;]]) box.execute([[SELECT POSITION(14, 3.14159) /* result is error */;]]) box.execute([[SELECT QUOTE(5) /* result is 5 */;]]) box.execute([[SELECT RANDOMBLOB('') /* result is null */;]]) box.execute([[SELECT ROUND('44.7') /* result is 45 */;]]) box.execute([[SELECT SOUNDEX(14) /* result is */;;} box.execute([[SELECT SOUNDEX(14) /* result is ?000 */;]]) box.execute([[SELECT SUBSTR(5,5,5) /* result is '' */;]]) box.execute([[SELECT TRIM('1' from 121) /* result is 2 */;]]) box.execute([[SELECT UNICODE(0) /* result is 48 */;]]) box.execute([[SELECT UPPER(5) /* result is '5' */;]]) box.execute([[SELECT ZEROBLOB('1') /* result is '/0' */;]]) I see that sometimes strings are accepted where numbers are expected. I see that sometimes numbers are accepted where strings are expected. Maybe this isn't always bad, but maybe you intend to make changes. A test with -nan: box.execute([[DROP TABLE t;]]) box.execute([[CREATE TABLE t (s1 INTEGER PRIMARY KEY, s2 DOUBLE, s3 INTEGER);]]) box.space.T:insert{1, ffi.cast('double',math.sqrt(-1)),0} box.execute([[UPDATE t SET s3 = s2;]]) This succeeds, column s3 becomes NULL. I guess this is an "implicit cast" of -nan to NULL, I think that is okay. A test with foreign keys: box.execute([[DROP TABLE t2;]]) box.execute([[DROP TABLE t1;]]) box.execute([[CREATE TABLE t1 (s1 DOUBLE PRIMARY KEY);]]) box.execute([[CREATE TABLE t2 (s1 INTEGER PRIMARY KEY REFERENCES t1);]]) This fails. The standard requirement is "The declared type of each referencing column shall be comparable to the declared type of the corresponding referenced column." So if it is easy to declare that the statement is legal, you would be very slightly more compliant with the standard. A test with a numeric target: " tarantool> box.execute([[CREATE TABLE t (s1 INTEGER PRIMARY KEY);]]) --- - row_count: 1 ... tarantool> box.execute([[INSERT INTO t VALUES ('0');]]) --- - null - 'Type mismatch: can not convert ''0'' (type: text) to integer' ... " Fine, but it might be better if the error message said "type: string". A test with a constraint: box.execute([[DROP TABLE t4;]]) box.execute([[CREATE TABLE t4 (s1 INTEGER PRIMARY KEY);]]) box.execute([[ALTER TABLE t4 ADD CONSTRAINT c CHECK (s1 <> '');]]) box.execute([[INSERT INTO t4 VALUES (0);]]) The result is an error. Fine. I wonder whether it would be easy to check during ALTER rather than during INSERT. A test with LIMIT: n = -1 box.execute([[SELECT 5 LIMIT ?;]],{n}) n = 1 s = box.prepare([[SELECT 5 LIMIT ?;]]) box.execute(s.stmt_id,{n}) n = -1 box.execute(s.stmt_id,{n}) This is a situation that I worried about. It works as planned. Fine. As you know, we do not agree about SCALAR. If you are sure that this behaviour change will be in version 2.5, then please order me or Elena Shebunyaeva to put a warning in version 2.4 release notes https://www.tarantool.io/en/doc/2.4/whats_new/ or in the SQL section which mentions "implicit casting" several times. Peter Gulutzan From imeevma at tarantool.org Wed Jul 8 19:58:19 2020 From: imeevma at tarantool.org (Mergen Imeev) Date: Wed, 8 Jul 2020 19:58:19 +0300 Subject: [Tarantool-discussions] Check implicit cast between strings and numbers In-Reply-To: <1924f068-8ec2-ad91-569a-7eeecaad9767@ocelot.ca> References: <2a8578c7-10f5-fd38-409e-4e996f07b865@tarantool.org> <1924f068-8ec2-ad91-569a-7eeecaad9767@ocelot.ca> Message-ID: <857a0e74-cb78-4b1e-4704-5502b72473f8@tarantool.org> Hi, Thank you for testing! I added check types of arguments of the functions, could you look once more? There are some functions in which we support both STRING and BLOB as arguments, for example substr(). Do you think we should leave them as they are now, or can we change them so that they support only STRING arguments? On 06.07.2020 22:51, Peter Gulutzan wrote: > Hi, > > > On 2020-07-06 7:52 a.m., Mergen Imeev wrote: >> Hi, >> >> Peter, could you please take a look at my branch and say if >> something is wrong with the implicit casts between strings and >> numerics. I mean, there shouldn't be any. >> >> Branch: imeevma/gh-3809-disallow-imlicit-cast-from-string-to-nums >> >> There should be no implicit casts for assignment after this patch, >> except for the following: >> 1) Any type can be implicitly cast to ANY. >> 2) Any scalar type can be implicitly cast to SCALAR. >> 3) Any numeric type can be implicitly cast to NUMBER. >> 4) In some cases, numbers can be implicitly converted to another >> number type. The rules can be seen in the commit message of the >> patches. >> >> For comparison, we say that any numbers can be compared with each >> other. However, you may find some cases where the result is not >> what it should be. This will be fixed in another issue. >> >> Also, comparison with SCALAR does not work correctly. This will >> also be fixed later after we decide how this should work. >> > Hi, > > I looked. I found nothing significant. > Some of my statements here may be repeitions of earlier statements. > > Some tests with arithmetic: These are examples in the version 2.4 manual: > box.execute([[select '7' + '7' /* result is 14, metadata = scalar */;]]) > box.execute([[SELECT 5 / '5' /* result is 1 */;]]) > I assume that this will be fixed soon. > > Some tests of functions: > box.execute([[SELECT ABS('') /* result is 0 */;]]) > box.execute([[SELECT CAST('' AS DOUBLE) /* result is error */;]]) > box.execute([[SELECT CHAR('') /* result is '\0' */;]]) > box.execute([[SELECT GREATEST('',1e500,X'00',0,CAST(100 AS DOUBLE));]]) > box.execute([[SELECT LENGTH(1234.56) /* result is 7 */;]]) > box.execute([[SELECT LIKELIHOOD('a' = 'b', '0.1') /* result is error > */;]]) > box.execute([[SELECT LOWER(5) /* result is '5' */;]]) > box.execute([[SELECT POSITION(14, 3.14159) /* result is error */;]]) > box.execute([[SELECT QUOTE(5) /* result is 5 */;]]) > box.execute([[SELECT RANDOMBLOB('') /* result is null */;]]) > box.execute([[SELECT ROUND('44.7') /* result is 45 */;]]) > box.execute([[SELECT SOUNDEX(14) /* result is */;;} > box.execute([[SELECT SOUNDEX(14) /* result is ?000 */;]]) > box.execute([[SELECT SUBSTR(5,5,5) /* result is '' */;]]) > box.execute([[SELECT TRIM('1' from 121) /* result is 2 */;]]) > box.execute([[SELECT UNICODE(0) /* result is 48 */;]]) > box.execute([[SELECT UPPER(5) /* result is '5' */;]]) > box.execute([[SELECT ZEROBLOB('1') /* result is '/0' */;]]) > I see that sometimes strings are accepted where numbers are expected. > I see that sometimes numbers are accepted where strings are expected. > Maybe this isn't always bad, but maybe you intend to make changes. > > A test with -nan: > box.execute([[DROP TABLE t;]]) > box.execute([[CREATE TABLE t (s1 INTEGER PRIMARY KEY, s2 DOUBLE, s3 > INTEGER);]]) > box.space.T:insert{1, ffi.cast('double',math.sqrt(-1)),0} > box.execute([[UPDATE t SET s3 = s2;]]) > This succeeds, column s3 becomes NULL. > I guess this is an "implicit cast" of -nan to NULL, I think that is okay. > > A test with foreign keys: > box.execute([[DROP TABLE t2;]]) > box.execute([[DROP TABLE t1;]]) > box.execute([[CREATE TABLE t1 (s1 DOUBLE PRIMARY KEY);]]) > box.execute([[CREATE TABLE t2 (s1 INTEGER PRIMARY KEY REFERENCES t1);]]) > This fails. The standard requirement is > "The declared type of each referencing column shall be comparable > to the declared type of the corresponding referenced column." > So if it is easy to declare that the statement is legal, > you would be very slightly more compliant with the standard. > > A test with a numeric target: > " > tarantool> box.execute([[CREATE TABLE t (s1 INTEGER PRIMARY KEY);]]) > --- > - row_count: 1 > ... > > tarantool> box.execute([[INSERT INTO t VALUES ('0');]]) > --- > - null > - 'Type mismatch: can not convert ''0'' (type: text) to integer' > ... > " > Fine, but it might be better if the error message said "type: string". > > A test with a constraint: > box.execute([[DROP TABLE t4;]]) > box.execute([[CREATE TABLE t4 (s1 INTEGER PRIMARY KEY);]]) > box.execute([[ALTER TABLE t4 ADD CONSTRAINT c CHECK (s1 <> '');]]) > box.execute([[INSERT INTO t4 VALUES (0);]]) > The result is an error. Fine. > I wonder whether it would be easy to check during ALTER rather than > during INSERT. > > A test with LIMIT: > n = -1 > box.execute([[SELECT 5 LIMIT ?;]],{n}) > n = 1 > s = box.prepare([[SELECT 5 LIMIT ?;]]) > box.execute(s.stmt_id,{n}) > n = -1 > box.execute(s.stmt_id,{n}) > This is a situation that I worried about. It works as planned. Fine. > > As you know, we do not agree about SCALAR. > > If you are sure that this behaviour change will be in version 2.5, > then please order me or Elena Shebunyaeva to put a warning in > version 2.4 release notes > https://www.tarantool.io/en/doc/2.4/whats_new/ > or in the SQL section which mentions "implicit casting" several times. > > Peter Gulutzan > From pgulutzan at ocelot.ca Wed Jul 8 21:58:40 2020 From: pgulutzan at ocelot.ca (Peter Gulutzan) Date: Wed, 8 Jul 2020 12:58:40 -0600 Subject: [Tarantool-discussions] Check implicit cast between strings and numbers In-Reply-To: <857a0e74-cb78-4b1e-4704-5502b72473f8@tarantool.org> References: <2a8578c7-10f5-fd38-409e-4e996f07b865@tarantool.org> <1924f068-8ec2-ad91-569a-7eeecaad9767@ocelot.ca> <857a0e74-cb78-4b1e-4704-5502b72473f8@tarantool.org> Message-ID: <643ea7d2-97f9-a301-3350-999e21afd2d6@ocelot.ca> Hi, On 2020-07-08 10:58 a.m., Mergen Imeev wrote: > Hi, > > Thank you for testing! I added check types of arguments of the > functions, could you look once more? > > There are some functions in which we support both STRING and BLOB > as arguments, for example substr(). Do you think we should leave > them as they are now, or can we change them so that they support > only STRING arguments? I think those functions are SELECT HEX('?'), HEX(CAST('?' AS VARBINARY)); SELECT LENGTH('?'), LENGTH(CAST('?' AS VARBINARY)); SELECT LOWER('?'), LOWER(CAST('?' AS VARBINARY)); SELECT POSITION('1', '?1'), POSITION('1', CAST('?1' AS VARBINARY)); SELECT PRINTF('%s', '?'), PRINTF('%s', CAST('?' AS VARBINARY)); SELECT QUOTE('?'), QUOTE(CAST('?' AS VARBINARY)); SELECT REPLACE('1?1','1','2'), REPLACE(CAST('1?1' AS VARBINARY),'1','2'); SELECT SOUNDEX('?'), SOUNDEX(CAST('?' AS VARBINARY)); SELECT SUBSTR('1?1',2,1), SUBSTR(CAST('1?1' AS VARBINARY),2,1); SELECT TRIM(TRAILING '1' FROM '1?1'), TRIM(TRAILING '1' FROM CAST('1?1' AS VARBINARY)); SELECT UNICODE('?'), UNICODE(CAST('?' AS VARBINARY)); SELECT UPPER('?'), UPPER(CAST('?' AS VARBINARY)); They all work now. I think that is good, we should leave them as they are now. That does not mean they are all perfect, though. Here are complaints, which are not relevant to your question but which we should look at someday. As you can see in the comments on issue#4145 sql: remake string value functions https://github.com/tarantool/tarantool/issues/4145 sometimes functions with multiple parameters can have multiple data types. As you can see from the comments on issue#3929 Length functions for SQL character strings https://github.com/tarantool/tarantool/issues/3929 As you know, I objected to the idea that CHARACTER_LENGTH() should be applied to VARBINARY values, but it was a firm order from Konstantin Osipov. If we accepted that these functions should look for bytes (rather than characters) when the values are VARBINARY, then some of the results would differ. Peter Gulutzan From imeevma at tarantool.org Mon Jul 27 15:24:29 2020 From: imeevma at tarantool.org (Mergen Imeev) Date: Mon, 27 Jul 2020 15:24:29 +0300 Subject: [Tarantool-discussions] The result type and argument types of the built-in SQL functions. Message-ID: <20200727122429.GA49280@tarantool.org> Hi, Peter! I would like to ask you a few questions about the result type and argument types of the SQL built-in functions. I suggest changing the result types of some functions. A table with the current result type and the suggested result type is below. FUNCTION NAME CURRENT SUGGESTED abs number number avg number double char string string character_length integer unsigned char_length integer unsigned coalesce scalar scalar count integer unsigned greatest scalar scalar group_concat string string hex string string ifnull integer scalar least scalar scalar length integer unsigned like integer boolean likelihood boolean scalar likely boolean scalar lower string string max scalar scalar min scalar scalar nullif scalar scalar position integer unsigned printf string string quote string string random integer integer randomblob varbinary varbinary replace string string round integer number row_count integer unsigned soundex string string substr string string sum number number total number double trim string string typeof string string unicode string unsigned unlikely boolean scalar upper string string version string string zeroblob varbinary varbinary The second question is about the types of arguments to built-in functions. I suggest this: FUNCTION NAME TYPES OF ARGUMENTS abs number avg number char* unsigned character_length string char_length string coalesce scalar count scalar greatest scalar group_concat scalar, scalar hex scalar ifnull scalar, scalar least* scalar length string like string, string, string likelihood scalar, double likely scalar lower string max scalar min scalar nullif scalar, scalar position string, string printf* scalar quote scalar randomblob unsigned replace string, string, string round double, unsigned soundex string substr string, integer, integer sum number total number trim* string typeof scalar unicode string unlikely scalar upper string zeroblob unsigned * - all arguments must be of this type. Also, we have to decide on BLOB instead of STRING. Last time you wrote that we should allow BLOB instead of STRING, but I think it will be rather inconvenient, because in this case we have to write SCALAR instead of STRING in function definition and check the type of the argument inside the function. Because of this, it will be a little incompatible with the definition of a function that will be placed in the '_func' system space. I mean, the definition will state that it accepts SCALAR, but in reality it will only accept STRING and BLOB. So, I think we should disallow BLOB instead of STRING, or decide in which functions we allow BLOB instead of STRING. And one more question. I think we are going to add MAP and ARRAY types in SQL in the near future, so it might be a good idea to write ANY instead of SCALAR for some of these functions. What do you think about this? From pgulutzan at ocelot.ca Mon Jul 27 22:39:29 2020 From: pgulutzan at ocelot.ca (Peter Gulutzan) Date: Mon, 27 Jul 2020 13:39:29 -0600 Subject: [Tarantool-discussions] The result type and argument types of the built-in SQL functions. In-Reply-To: <20200727122429.GA49280@tarantool.org> References: <20200727122429.GA49280@tarantool.org> Message-ID: <0b155dec-3a58-a8fd-3aa8-343bf47e1e69@ocelot.ca> Hi, On 2020-07-27 6:24 a.m., Mergen Imeev wrote: > Hi, Peter! > > I would like to ask you a few questions about the result type and argument > types of the SQL built-in functions. > > I suggest changing the result types of some functions. A table with the current > result type and the suggested result type is below. > > FUNCTION NAME??? ??? CURRENT??? ??? SUGGESTED > abs??? ??? ??? number??? ??? number > avg??? ??? ??? number??? ??? double > char??? ??? ??? string??? ??? string > character_length??? integer??? ??? unsigned > char_length??? ??? integer??? ??? unsigned > coalesce??? ??? scalar??? ??? scalar > count??? ??? ??? integer??? ??? unsigned > greatest??? ??? scalar??? ??? scalar > group_concat??? ??? string??? ??? string > hex??? ??? ??? string??? ??? string > ifnull??? ??? ??? integer??? ??? scalar > least??? ??? ??? scalar??? ??? scalar > length??? ??? ??? integer??? ??? unsigned > like??? ??? ??? integer??? ??? boolean > likelihood??? ??? boolean??? ??? scalar > likely??? ??? ??? boolean??? ??? scalar > lower??? ??? ??? string??? ??? string > max??? ??? ??? scalar??? ??? scalar > min??? ??? ??? scalar??? ??? scalar > nullif??? ??? ??? scalar??? ??? scalar > position??? ??? integer??? ??? unsigned > printf??? ??? ??? string??? ??? string > quote??? ??? ??? string??? ??? string > random??? ??? ??? integer??? ??? integer > randomblob??? ??? varbinary??? varbinary > replace??? ??? ??? string??? ??? string > round??? ??? ??? integer??? ??? number > row_count??? ??? integer??? ??? unsigned > soundex??? ??? ??? string??? ??? string > substr??? ??? ??? string??? ??? string > sum??? ??? ??? number??? ??? number > total??? ??? ??? number??? ??? double > trim??? ??? ??? string??? ??? string > typeof??? ??? ??? string??? ??? string > unicode??? ??? ??? string??? ??? unsigned > unlikely??? ??? boolean??? ??? scalar > upper??? ??? ??? string??? ??? string > version??? ??? ??? string??? ??? string > zeroblob??? ??? varbinary??? varbinary > > > The second question is about the types of arguments to built-in functions. > > I suggest this: > > FUNCTION NAME??? ??? TYPES OF ARGUMENTS > abs??? ??? ??? number > avg??? ??? ??? number > char*??? ??? ??? unsigned > character_length??? string > char_length??? ??? string > coalesce??? ??? scalar > count??? ??? ??? scalar > greatest??? ??? scalar > group_concat??? ??? scalar, scalar > hex??? ??? ??? scalar > ifnull??? ??? ??? scalar, scalar > least*??? ??? ??? scalar > length??? ??? ??? string > like??? ??? ??? string, string, string > likelihood??? ??? scalar, double > likely??? ??? ??? scalar > lower??? ??? ??? string > max??? ??? ??? scalar > min??? ??? ??? scalar > nullif??? ??? ??? scalar, scalar > position??? ??? string, string > printf*??? ??? ??? scalar > quote??? ??? ??? scalar > randomblob??? ??? unsigned > replace??? ??? ??? string, string, string > round??? ??? ??? double, unsigned > soundex??? ??? ??? string > substr??? ??? ??? string, integer, integer > sum??? ??? ??? number > total??? ??? ??? number > trim*??? ??? ??? string > typeof??? ??? ??? scalar > unicode??? ??? ??? string > unlikely??? ??? scalar > upper??? ??? ??? string > zeroblob??? ??? unsigned > > * - all arguments must be of this type. > > > Also, we have to decide on BLOB instead of STRING. Last time you wrote that we > should allow BLOB instead of STRING, but I think it will be rather inconvenient, > because in this case we have to write SCALAR instead of STRING in function > definition and check the type of the argument inside the function. Because of > this, it will be a little incompatible with the definition of a function that > will be placed in the '_func' system space. I mean, the definition will state > that it accepts SCALAR, but in reality it will only accept STRING and BLOB. > > So, I think we should disallow BLOB instead of STRING, or decide in which > functions we allow BLOB instead of STRING. > > > And one more question. I think we are going to add MAP and ARRAY types in SQL in > the near future, so it might be a good idea to write ANY instead of SCALAR for > some of these functions. What do you think about this? > Re your table of "current" result data types. I think we must define "current". In version 2.4, SELECT TYPEOF(LENGTH('')); returns 'integer'. In version 2.6, SELECT TYPEOF(LENGTH('')); returns 'unsigned'. In other words, somebody has already made changes, for tarantool-master. However, I did not document that the return will be 'integer' so this is a "change in behaviour" but not a "change in documented behaviour". And I think we must define 'result type'. You write that GREATEST etc. return 'scalar'. But of course SELECT TYPEOF(GREATEST(1,'a')); returns 'string'. So I assume you do not mean that the result value has the Tarantool 'scalar' type, you only mean that the result value will be anything that Tarantool/SQL currently supports. Re returning UNSIGNED instead of INTEGER for ...LENGTH(), ROW_COUNT(), etc. (a) This data type is not standard and is not built-in for most major DBMSs. Even in MySQL https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html UNSIGNED is an attribute of a data type, as in BIGINT UNSIGNED, not a data type. (b) The maximum INTEGER value is the same as the maximum UNSIGNED value, 18446744073709551615, so the change is not necessary. (c) Although in Tarantool it is not likely, there are generic programs that might be checking whether the result of a function is negative. For example, in ODBC https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlrowcount-function?v, a result of an equivalent of our ROW_COUNT() function can be -1. ... Therefore I am not enthusiastic about this change. Re returning NUMBER from ROUND(): You say that the current return is 'integer', but when I try it, I get 'double'. I think this current behaviour is acceptable. However, if ROUND(1) returns INTEGER, that is good too -- the general idea, not a law, would be "if the result data type can be the same as the input data type(s), let it be the same". Re types of arguments: You suggest that TRIM() etc. must have 'string' arguments, but currently they can be 'varbinary' and I don't see why that is bad. You suggest that CHAR() must have 'unsigned' argument, but currently it can be some other type, and well, *maybe* that is bad. I don't object to strictness, but worry that I might have to document "sometimes we do an implicit cast, some other times we are strict". Re BLOB instead of string. This is related to the fact that TRIM() etc. currently do not need to have 'string' arguments, they can have 'varbinary' arguments. I admit that many (maybe most) other vendors expect character strings in such cases. But the standard suggests that something very similar is allowed, the DB2 manual says it is allowed, the MariaDB manual is not explicit but here I show it is allowed: " mariadb>SELECT HEX(TRIM(X'D0' FROM CAST('?' AS BINARY))); OK 1 rows affected (0.0 seconds) +--------------------------------------------+ | HEX(TRIM(X'D0' FROM CAST('?' AS BINARY)))? | +--------------------------------------------+ | 94???????????????????????????????????????? | +--------------------------------------------+ " I believe that Tarantool should continue to allow varbinary arguments. Re MAP and ARRAY types "in the near future": I think we must define "near future". Currently in SQL we do not even have the Lua DECIMAL or UUID data types. Kirill Yukhin made the issue Implement DECIMAL data type #4415 on August 8 2019, saying "After DECIMAL type was introduced to the core, its time to implement this type in SQL frontend." We are nearly at August 8 2020, so apparently it takes more than one year to put a data type in SQL even though it is already in the core. ( So I think that maps and arrays, which I think are more difficult, will not exist in SQL for two years. I am not worried. However, it is interesting to imagine UPPER(array of strings) -- should we return upper of all elements? UPPER(map) -- should we return upper of both the key and the value? and so on. I believe Lua non-scalar values should be flattened in SQL, so perhaps the questions can all be avoided. Peter Gulutzan From imeevma at tarantool.org Tue Jul 28 14:28:14 2020 From: imeevma at tarantool.org (Mergen Imeev) Date: Tue, 28 Jul 2020 14:28:14 +0300 Subject: [Tarantool-discussions] The result type and argument types of the built-in SQL functions. In-Reply-To: <0b155dec-3a58-a8fd-3aa8-343bf47e1e69@ocelot.ca> References: <20200727122429.GA49280@tarantool.org> <0b155dec-3a58-a8fd-3aa8-343bf47e1e69@ocelot.ca> Message-ID: <20200728112814.GA4061@tarantool.org> Hi! On Mon, Jul 27, 2020 at 01:39:29PM -0600, Peter Gulutzan wrote: > Hi, > > > On 2020-07-27 6:24 a.m., Mergen Imeev wrote: > > > > Re your table of "current" result data types. I think we must define > "current". > In version 2.4, SELECT TYPEOF(LENGTH('')); returns 'integer'. When I say function result type, I mean this: tarantool> SELECT LENGTH(''); --- - metadata: - name: COLUMN_1 type: integer rows: - [0] ... As you can see, it says that the result type of the LENGTH() function is INTEGER. However, as you said, the type of the value we got is actually UNSIGNED. I will fix this after we come to an agreement. > In version 2.6, SELECT TYPEOF(LENGTH('')); returns 'unsigned'. > In other words, somebody has already made changes, for tarantool-master. > However, I did not document that the return will be 'integer' so this is > a "change in behaviour" but not a "change in documented behaviour". > > And I think we must define 'result type'. > You write that GREATEST etc. return 'scalar'. > But of course SELECT TYPEOF(GREATEST(1,'a')); returns 'string'. > So I assume you do not mean that the result value has the Tarantool 'scalar' > type, you only mean that the result value will be anything that > Tarantool/SQL currently supports. I agree that this is true for the value that was received. However, result type functions are more like column types: tarantool> SELECT GREATEST(1,'a'); --- - metadata: - name: COLUMN_1 type: scalar rows: - ['a'] ... > > Re returning UNSIGNED instead of INTEGER for ...LENGTH(), ROW_COUNT(), etc. > (a) This data type is not standard and is not built-in for most major DBMSs. > Even in MySQL > https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html > UNSIGNED is an attribute of a data type, as in BIGINT UNSIGNED, not a data > type. > (b) The maximum INTEGER value is the same as the maximum UNSIGNED value, > 18446744073709551615, so the change is not necessary. > (c) Although in Tarantool it is not likely, there are generic programs that > might > be checking whether the result of a function is negative. For example, in > ODBC > https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlrowcount-function?v, > a result of an equivalent of our ROW_COUNT() function can be -1. > ... Therefore I am not enthusiastic about this change. I have no objection here. > > Re returning NUMBER from ROUND(): > You say that the current return is 'integer', but when I try it, > I get 'double'. I think this current behaviour is acceptable. I talked about this: tarantool> SELECT ROUND(1.2345, 2); --- - metadata: - name: COLUMN_1 type: integer rows: - [1.23] ... As you can see, the result is not INTEGER, even if it is written that it is INTEGER. > However, if ROUND(1) returns INTEGER, that is good too -- > the general idea, not a law, would be > "if the result data type can be the same as the input data type(s), > let it be the same". I believe ROUND (a) and ROUND (a, 0) should return INTEGER. I still think that we should consider NUMBER as the return type of a function. However, this does not mean that the type of the values we get as a result will be NUMBER. It will be either INTEGER or DOUBLE. Also, I have an important question: why don't we treat NUMBER the same way we treat SCALAR? Before adding the DOUBLE type, it was possible to say that NUMBER contains INTEGER and real values. So it was in the same position as INTEGER and UNSIGNED right now. However, after adding DOUBLE, all numeric values can be either INTEGER or DOUBLE, so we don't need NUMBER values. I suggest that we allow NUMBER to be the column type, but there should be no NUMBER values. What do you think about this? > > Re types of arguments: > You suggest that TRIM() etc. must have 'string' arguments, > but currently they can be 'varbinary' and I don't see why that is bad. > You suggest that CHAR() must have 'unsigned' argument, > but currently it can be some other type, and well, *maybe* that is bad. > I don't object to strictness, but worry that I might have to document > "sometimes we do an implicit cast, some other times we are strict". I believe the values given as arguments should follow the "IMPLICIT CAST FOR ASSIGNMENT" rules. I plan to use the same mechanism, so I don't see any problems here. For exapmle, ROUND(1.234, 2.5) will work the same as ROUND(1.234, 2). > > Re BLOB instead of string. This is related to the fact that > TRIM() etc. currently do not need to have 'string' arguments, > they can have 'varbinary' arguments. I admit that many (maybe > most) other vendors expect character strings in such cases. > But the standard suggests that something very similar is allowed, > the DB2 manual says it is allowed, > the MariaDB manual is not explicit but here I show it is allowed: > " > mariadb>SELECT HEX(TRIM(X'D0' FROM CAST('?' AS BINARY))); > OK 1 rows affected (0.0 seconds) > +--------------------------------------------+ > | HEX(TRIM(X'D0' FROM CAST('?' AS BINARY)))? | > +--------------------------------------------+ > | 94???????????????????????????????????????? | > +--------------------------------------------+ > " > I believe that Tarantool should continue to allow varbinary arguments. So you mean that all functions that can accept STRING must accept VARBINARY? Do you think we should be thinking about a new type that should contain STRING and VARBINARY the same way NUMBER contains INTEGER and DOUBLE? If you agree with this, can you suggest a name for the new type? > > Re MAP and ARRAY types "in the near future": > I think we must define "near future". > Currently in SQL we do not even have the Lua DECIMAL or UUID data types. > Kirill Yukhin made the issue Implement DECIMAL data type #4415 > on August 8 2019, saying > "After DECIMAL type was introduced to the core, its time to implement this > type in SQL frontend." > We are nearly at August 8 2020, so apparently it takes more than one year to > put > a data type in SQL even though it is already in the core. > ( > So I think that maps and arrays, which I think are more difficult, > will not exist in SQL for two years. I am not worried. It is actually in plans for the next release, however I won't argue with you here. > However, it is interesting to imagine > UPPER(array of strings) -- should we return upper of all elements? > UPPER(map) -- should we return upper of both the key and the value? > and so on. I didn't even thought about these cases. I mean, if function should accept STRING and we will give her an ARRAY, that it should throw an error. After all ARRAY is not STRING. > I believe Lua non-scalar values should be flattened in SQL, > so perhaps the questions can all be avoided. > > Peter Gulutzan > Could you take another look at the tables? Also, here's what the function definition would look like after my patches: tarantool> box.execute([[select "name", "param_list", "returns", "aggregate" from "_func" where "language" = 'SQL_BUILTIN' order by "name" LIMIT 10;]]) --- - metadata: - name: name type: string - name: param_list type: array - name: returns type: string - name: aggregate type: string rows: - ['ABS', ['number'], 'number', 'none'] - ['AVG', ['number'], 'number', 'group'] - ['CEIL', [], 'any', 'none'] - ['CEILING', [], 'any', 'none'] - ['CHAR', ['unsigned'], 'string', 'none'] - ['CHARACTER_LENGTH', ['scalar'], 'integer', 'none'] - ['CHAR_LENGTH', ['scalar'], 'integer', 'none'] - ['COALESCE', ['scalar'], 'scalar', 'none'] - ['COUNT', ['scalar'], 'integer', 'group'] - ['CURRENT_DATE', [], 'any', 'none'] ... I used SCALAR instead of STRING, so we may use VARBINARY instead of STRING. Below is an updated table of function result types. Remember that this is not always the type of the value that we get from the function. However, all values that we receive as a result of executing the function must be of the specified type. FUNCTION NAME CURRENT SUGGESTED abs number number avg number double char string string character_length integer integer char_length integer integer coalesce scalar scalar count integer integer greatest scalar scalar group_concat string string hex string string ifnull integer scalar least scalar scalar length integer integer like integer boolean likelihood boolean scalar likely boolean scalar lower string string max scalar scalar min scalar scalar nullif scalar scalar position integer integer printf string string quote string string random integer integer randomblob varbinary varbinary replace string string round integer number row_count integer integer soundex string string substr string string sum number number total number double trim string string typeof string string unicode string integer unlikely boolean scalar upper string string version string string zeroblob varbinary varbinary Below is an updated table of function argument types. Note that the IMPLICIT CAST FOR ASSIGNMENT rules will be applied before values are passed to the function as arguments. This means that if the function takes STRING and we give it an INTEGER value, we will get an error even before we call the function. FUNCTION NAME TYPES OF ARGUMENTS abs number avg number char* unsigned character_length scalar char_length scalar coalesce scalar count scalar greatest scalar group_concat scalar, scalar hex scalar ifnull scalar, scalar least* scalar length scalar like scalar, scalar, scalar likelihood scalar, double likely scalar lower scalar max scalar min scalar nullif scalar, scalar position scalar, scalar printf* scalar quote scalar randomblob unsigned replace scalar, scalar, scalar round double, unsigned soundex scalar substr scalar, integer, integer sum number total number trim* scalar typeof scalar unicode scalar unlikely scalar upper scalar zeroblob unsigned I still believe that not all functions that accept STRING have to to accept VARBINARY. But that's up to you. From tsafin at tarantool.org Wed Jul 29 11:50:09 2020 From: tsafin at tarantool.org (Timur Safin) Date: Wed, 29 Jul 2020 11:50:09 +0300 Subject: [Tarantool-discussions] The result type and argument types of the built-in SQL functions. In-Reply-To: <0b155dec-3a58-a8fd-3aa8-343bf47e1e69@ocelot.ca> References: <20200727122429.GA49280@tarantool.org> <0b155dec-3a58-a8fd-3aa8-343bf47e1e69@ocelot.ca> Message-ID: <018501d66585$443d4490$ccb7cdb0$@tarantool.org> : From: Peter Gulutzan : Subject: Re: The result type and argument types of the built-in SQL : functions. : ... : Re returning UNSIGNED instead of INTEGER for ...LENGTH(), ROW_COUNT(), : etc. : (a) This data type is not standard and is not built-in for most major : DBMSs. : Even in MySQL : https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html : UNSIGNED is an attribute of a data type, as in BIGINT UNSIGNED, not a : data type. : (b) The maximum INTEGER value is the same as the maximum UNSIGNED value, : 18446744073709551615, so the change is not necessary. Agreed that UNSIGNED is unnecessary and should be covered by INTEGER type. : : Re returning NUMBER from ROUND(): : You say that the current return is 'integer', but when I try it, : I get 'double'. I think this current behaviour is acceptable. : However, if ROUND(1) returns INTEGER, that is good too -- : the general idea, not a law, would be : "if the result data type can be the same as the input data type(s), : let it be the same". I'd vote for NUMBER in the places where we need either integer or double. NUMBER looks more Lua-ish (and we do want to have consistent behavior between Lua and SQL as much as possible), i.e. if you have double value which happens to be fraction-less then it will be normalized and look as integer. : : Re types of arguments: : You suggest that TRIM() etc. must have 'string' arguments, : but currently they can be 'varbinary' and I don't see why that is bad. Yup, this is part of blob vs string question. : Re BLOB instead of string. This is related to the fact that : TRIM() etc. currently do not need to have 'string' arguments, : they can have 'varbinary' arguments. I admit that many (maybe : most) other vendors expect character strings in such cases. I do agree that that there are no single place where string should be behave anyhow different than varbinary. Blob is string, guid is string, thus in any case where string is acceptable should be others acceptable. : : Re MAP and ARRAY types "in the near future": : I think we must define "near future". : Currently in SQL we do not even have the Lua DECIMAL or UUID data types. : Kirill Yukhin made the issue Implement DECIMAL data type #4415 : on August 8 2019, saying : "After DECIMAL type was introduced to the core, its time to implement : this type in SQL frontend." : We are nearly at August 8 2020, so apparently it takes more than one : year to put : a data type in SQL even though it is already in the core. But should we do anything here - if it's blob with underlying string storage? Is it anyhow different to varbinary? : ( : So I think that maps and arrays, which I think are more difficult, : will not exist in SQL for two years. I am not worried. I feel your pain, but (hope) your estimations here are a little bit too conservative :) : However, it is interesting to imagine : UPPER(array of strings) -- should we return upper of all elements? : UPPER(map) -- should we return upper of both the key and the value? : and so on. : I believe Lua non-scalar values should be flattened in SQL, : so perhaps the questions can all be avoided. : : Peter Gulutzan I believe Lua non-scalar should be incompatible with scalar context and should generate runtime errors. But it's too early to discuss this though. Timur From tsafin at tarantool.org Wed Jul 29 11:54:12 2020 From: tsafin at tarantool.org (Timur Safin) Date: Wed, 29 Jul 2020 11:54:12 +0300 Subject: [Tarantool-discussions] The result type and argument types of the built-in SQL functions. In-Reply-To: <20200727122429.GA49280@tarantool.org> References: <20200727122429.GA49280@tarantool.org> Message-ID: <018701d66585$d4c081d0$7e418570$@tarantool.org> I general, I agree that UNSIGNED is unnecessary, but the rest of table is pretty much ok with me. (Though blob vs string discussion opened elsewhere is important) With one small note below... : From: Mergen Imeev : Subject: The result type and argument types of the built-in SQL functions. : : Hi, Peter! : : I would like to ask you a few questions about the result type and argument : types of the SQL built-in functions. : : I suggest changing the result types of some functions. A table with the : current : result type and the suggested result type is below. : : FUNCTION NAME CURRENT SUGGESTED : abs number number : avg number double : char string string : character_length integer unsigned ... : soundex string string : substr string string : sum number number : total number double Why total should be different than sum? (I'd use the same number) Regards, Timur From imeevma at tarantool.org Wed Jul 29 12:34:50 2020 From: imeevma at tarantool.org (Mergen Imeev) Date: Wed, 29 Jul 2020 12:34:50 +0300 Subject: [Tarantool-discussions] The result type and argument types of the built-in SQL functions. In-Reply-To: <018701d66585$d4c081d0$7e418570$@tarantool.org> References: <20200727122429.GA49280@tarantool.org> <018701d66585$d4c081d0$7e418570$@tarantool.org> Message-ID: <20200729093450.GA14590@tarantool.org> On Wed, Jul 29, 2020 at 11:54:12AM +0300, Timur Safin wrote: > > I general, I agree that UNSIGNED is unnecessary, but the rest of table > is pretty much ok with me. (Though blob vs string discussion opened elsewhere > is important) > > With one small note below... > > : From: Mergen Imeev > : Subject: The result type and argument types of the built-in SQL functions. > : > : Hi, Peter! > : > : I would like to ask you a few questions about the result type and argument > : types of the SQL built-in functions. > : > : I suggest changing the result types of some functions. A table with the > : current > : result type and the suggested result type is below. > : > : FUNCTION NAME CURRENT SUGGESTED > : abs number number > : avg number double > : char string string > : character_length integer unsigned > ... > : soundex string string > : substr string string > : sum number number > : total number double > > Why total should be different than sum? (I'd use the same number) > In SQLite TOTAL always returns DOUBLE. SUM may return INTEGER if all values are of INTEGER type. > > Regards, > Timur > From pgulutzan at ocelot.ca Wed Jul 29 19:47:01 2020 From: pgulutzan at ocelot.ca (Peter Gulutzan) Date: Wed, 29 Jul 2020 10:47:01 -0600 Subject: [Tarantool-discussions] The result type and argument types of the built-in SQL functions. In-Reply-To: <20200728112814.GA4061@tarantool.org> References: <20200727122429.GA49280@tarantool.org> <0b155dec-3a58-a8fd-3aa8-343bf47e1e69@ocelot.ca> <20200728112814.GA4061@tarantool.org> Message-ID: Hi, I am aware that Timur Safin has already replied, but I will answer one thing at a time. On 2020-07-28 5:28 a.m., Mergen Imeev wrote: > Hi! > > On Mon, Jul 27, 2020 at 01:39:29PM -0600, Peter Gulutzan wrote: >> Hi, >> >> >> On 2020-07-27 6:24 a.m., Mergen Imeev wrote: >? >>> >> >> Re your table of "current" result data types. I think we must define >> "current". >> In version 2.4, SELECT TYPEOF(LENGTH('')); returns 'integer'. > When I say function result type, I mean this: > > tarantool> SELECT LENGTH(''); > --- > - metadata: >?? - name: COLUMN_1 >???? type: integer >?? rows: >?? - [0] > ... > > As you can see, it says that the result type of the LENGTH() function is > INTEGER. However, as you said, the type of the value we got is actually > UNSIGNED. I will fix this after we come to an agreement. > Okay, I guess you mean that the metadata type will be the same as TYPEOF, but that might mean they both say 'integer' or that might mean that they both say 'unsigned'. Fine for now. >> In version 2.6, SELECT TYPEOF(LENGTH('')); returns 'unsigned'. >> In other words, somebody has already made changes, for tarantool-master. >> However, I did not document that the return will be 'integer' so this is >> a "change in behaviour" but not a "change in documented behaviour". >> >> And I think we must define 'result type'. >> You write that GREATEST etc. return 'scalar'. >> But of course SELECT TYPEOF(GREATEST(1,'a')); returns 'string'. >> So I assume you do not mean that the result value has the Tarantool 'scalar' >> type, you only mean that the result value will be anything that >> Tarantool/SQL currently supports. > I agree that this is true for the value that was received. However, result type > functions are more like column types: > > tarantool> SELECT GREATEST(1,'a'); > --- > - metadata: >?? - name: COLUMN_1 >???? type: scalar >?? rows: >?? - ['a'] > ... > You are right that sometimes a 'metadata' type is like a column type, that is, it must be scalar when it might contain more than one primitive type. However, in the case of GREATEST, I do not see that this is necessary, there is only one value and it is a STRING. >> >> Re returning UNSIGNED instead of INTEGER for ...LENGTH(), ROW_COUNT(), etc. >> (a) This data type is not standard and is not built-in for most major DBMSs. >> Even in MySQL >> https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html >> UNSIGNED is an attribute of a data type, as in BIGINT UNSIGNED, not a data >> type. >> (b) The maximum INTEGER value is the same as the maximum UNSIGNED value, >> 18446744073709551615, so the change is not necessary. >> (c) Although in Tarantool it is not likely, there are generic programs that >> might >> be checking whether the result of a function is negative. For example, in >> ODBC >> https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlrowcount-function?v, >> a result of an equivalent of our ROW_COUNT() function can be -1. >> ... Therefore I am not enthusiastic about this change. > I have no objection here. > Does anyone else have an objection here? If not, does that mean the returned data type will be INTEGER? >> >> Re returning NUMBER from ROUND(): >> You say that the current return is 'integer', but when I try it, >> I get 'double'. I think this current behaviour is acceptable. > I talked about this: > > tarantool> SELECT ROUND(1.2345, 2); > --- > - metadata: >?? - name: COLUMN_1 >???? type: integer >?? rows: >?? - [1.23] > ... > > As you can see, the result is not INTEGER, even if it is written that it is > INTEGER. > Oh. So this is a bug? >> However, if ROUND(1) returns INTEGER, that is good too -- >> the general idea, not a law, would be >> "if the result data type can be the same as the input data type(s), >> let it be the same". > I believe ROUND (a) and ROUND (a, 0) should return INTEGER. I still think that > we should consider NUMBER as the return type of a function. However, this does > not mean that the type of the values we get as a result will be NUMBER. It will > be either INTEGER or DOUBLE. > > Also, I have an important question: why don't we treat NUMBER the same way we > treat SCALAR? Before adding the DOUBLE type, it was possible to say that NUMBER > contains INTEGER and real values. So it was in the same position as INTEGER and > UNSIGNED right now. However, after adding DOUBLE, all numeric values can be > either INTEGER or DOUBLE, so we don't need NUMBER values. I suggest that we > allow NUMBER to be the column type, but there should be no NUMBER values. What > do you think about this? > >> >> Re types of arguments: >> You suggest that TRIM() etc. must have 'string' arguments, >> but currently they can be 'varbinary' and I don't see why that is bad. >> You suggest that CHAR() must have 'unsigned' argument, >> but currently it can be some other type, and well, *maybe* that is bad. >> I don't object to strictness, but worry that I might have to document >> "sometimes we do an implicit cast, some other times we are strict". > I believe the values given as arguments should follow the "IMPLICIT CAST FOR > ASSIGNMENT" rules. I plan to use the same mechanism, so I don't see any problems > here. For exapmle, ROUND(1.234, 2.5) will work the same as ROUND(1.234, 2). > >> >> Re BLOB instead of string. This is related to the fact that >> TRIM() etc. currently do not need to have 'string' arguments, >> they can have 'varbinary' arguments. I admit that many (maybe >> most) other vendors expect character strings in such cases. >> But the standard suggests that something very similar is allowed, >> the DB2 manual says it is allowed, >> the MariaDB manual is not explicit but here I show it is allowed: >> " >> mariadb>SELECT HEX(TRIM(X'D0' FROM CAST('?' AS BINARY))); >> OK 1 rows affected (0.0 seconds) >> +--------------------------------------------+ >> | HEX(TRIM(X'D0' FROM CAST('?' AS BINARY)))? | >> +--------------------------------------------+ >> | 94???????????????????????????????????????? | >> +--------------------------------------------+ >> " >> I believe that Tarantool should continue to allow varbinary arguments. > So you mean that all functions that can accept STRING must accept VARBINARY? > > Do you think we should be thinking about a new type that should contain STRING > and VARBINARY the same way NUMBER contains INTEGER and DOUBLE? If you agree with > this, can you suggest a name for the new type? > Let's look at them. Version 2.6. SELECT HEX('A'),HEX(X'41'); It works now. It's in the manual: "may be either a string or a byte sequence". I do not see why it should fail. SELECT LENGTH('?'), LENGTH(X'D094'),CHAR_LENGTH('?'),CHAR_LENGTH(X'D094'); It works now. It's in the manual: "Return the number of characters in the expression, or the number of bytes in the expression. It depends on the data type: ...". I do not see why it should fail. Of course, ,CHAR_LENGTH(X'D094') returns the number of bytes not the number of characters, and I know that is odd, but it was K. Osipov's order. SELECT LOWER('I'),LOWER(X'49'); It fails now. The manual says the required syntax is "LOWER(string-expression)". I suppose that, since there are no chaacters, we could say that LOWER(X'49') = X'49'. But I do not see that that would help any users. The same considerations apply for SELECT UPPER('a'),UPPER(X'61');. By the way, we are assuming that 'I' is UTF-8 with a DUCET collation, but LOWER('I' COLLATE "binary") = 'i'. I think that makes no sense, but it is convenient for users -- if this "bug" were fixed, then they would have to specify a non-default collation (because "binary" is default). Am I making sense? SELECT POSITION('A', '?A'),POSITION(X'41', X'D09441'); It works now. It's in the manual: "The data types of the expressions must be either STRING or VARBINARY. If the expressions have data type STRING, then the result is the character position. If the expressions have data type VARBINARY, then the result is the byte position." I do not see why it should fail. SELECT PRINTF('%d',5),PRINTF(X'2564',5); It works now. The manual does not clearly say that it should work, but says that the first argument should be a "string expression". I doubt that any user will use VARBINARY, but ... I do not see why it should fail. CREATE TABLE u (s1 VARCHAR(1) PRIMARY KEY, s2 VARBINARY); INSERT INTO u VALUES ('A',X'41'); SELECT QUOTE(s1),QUOTE(s2) FROM u; It works now. But the result is odd: tarantool>SELECT QUOTE(s1),QUOTE(s2) FROM u; OK 1 rows selected (0.0 seconds) +----------+----------+ | COLUMN_1 | COLUMN_2 | +----------+----------+ | 'A'????? | X'41'??? | +----------+----------+ Surely putting quote marks i.e. X'27' around X'41' should result in VARBINARY X'274127'. I think this is a bug. But ... I do not see why it should fail. By the way, the manual is wrong. It says the argument should be "string-literal". But, as the example shows, it can be a column value. I think this is a bug too. I will put it on my little "to do" list. SELECT REPLACE('A','A','C'),REPLACE(X'41',X'41',X'43'); It works now. It is in the manual: "The expressions should all have data type STRING or VARBINARY." I do not see why it should fail. SELECT SOUNDEX('A'),SOUNDEX(X'41'); It fails now. The manual does not say that it should fail, but does say "The algorithm works with characters in the Latin alphabet and works best with English words." I suppose that, since there are no chaacters, we could say that SOUNDEX(X'41') = any junk. But I do not see that that would help any users. SELECT SUBSTR('?AB',3),SUBSTR(X'D09441', 3); It works now. It is in the manual: "If expression-1 has data type VARBINARY rather than data type STRING, then positioning and counting is by bytes rather than by characters." I do not see why it should fail. TRIM() It works now. Earlier I showed a result from MariaDB. Here is a result from Tarantool for the same query. " tarantool>SELECT HEX(TRIM(X'D0' FROM CAST('?' AS VARBINARY))); OK 1 rows selected (0.0 seconds) +----------+ | COLUMN_1 | +----------+ | 94?????? | +----------+ " I had to say VARBINARY not BINARY, anyway, the result is the same. It is in the manual: "The expressions should have data type STRING or VARBINARY." I do not see why it should fail. UPPER() See above re LOWER(). Summary: the current behaviour is documented and is good, except: (1) CHAR_LENGTH(varbinary-string) and PRINTF(varbinary-string...) ??? works but are useless. (2) LOWER(varbinary-string) and UPPER(varbinary-string) ??? don't work, but they could work, but would be useless. ??? Also I think they are slightly buggy but for the user's benefit. (3) QUOTE(varbinary-string) ??? works but I think it is slightly buggy. I suppose you have heard the English expression: If it is not broken, then fix it anyway. I understand; I work that way muself. But I bet that you have better things to do. >> >> Re MAP and ARRAY types "in the near future": >> I think we must define "near future". >> Currently in SQL we do not even have the Lua DECIMAL or UUID data types. >> Kirill Yukhin made the issue Implement DECIMAL data type #4415 >> on August 8 2019, saying >> "After DECIMAL type was introduced to the core, its time to implement this >> type in SQL frontend." >> We are nearly at August 8 2020, so apparently it takes more than one year to >> put >> a data type in SQL even though it is already in the core. >> ( >> So I think that maps and arrays, which I think are more difficult, >> will not exist in SQL for two years. I am not worried. > It is actually in plans for the next release, however I won't argue with you > here. > Thank you for not arguing, but I am not as polite as you, so I will add this. I looked at issue#4763 sql: introduce type https://github.com/tarantool/tarantool/issues/4763 The last thing I see is "kyukhin added this to the wishlist milestone on Mar 6". It is not about supporting a MAP data type in SQL, it is about adding a function that can read a Lua map. We can already do something like that, as I mentioned to Nikita Pettik. >> However, it is interesting to imagine >> UPPER(array of strings) -- should we return upper of all elements? >> UPPER(map) -- should we return upper of both the key and the value? >> and so on. > I didn't even thought about these cases. I mean, if function should accept > STRING and we will give her an ARRAY, that it should throw an error. After all > ARRAY is not STRING. > >> I believe Lua non-scalar values should be flattened in SQL, >> so perhaps the questions can all be avoided. >> >> Peter Gulutzan >> > > Could you take another look at the tables? > > Also, here's what the function definition would look like after my patches: > > tarantool> box.execute([[select "name", "param_list", "returns", "aggregate" from "_func" where "language" = 'SQL_BUILTIN' order by "name" LIMIT 10;]]) > --- > - metadata: >?? - name: name >???? type: string >?? - name: param_list >???? type: array >?? - name: returns >???? type: string >?? - name: aggregate >???? type: string >?? rows: >?? - ['ABS', ['number'], 'number', 'none'] >?? - ['AVG', ['number'], 'number', 'group'] >?? - ['CEIL', [], 'any', 'none'] >?? - ['CEILING', [], 'any', 'none'] >?? - ['CHAR', ['unsigned'], 'string', 'none'] >?? - ['CHARACTER_LENGTH', ['scalar'], 'integer', 'none'] >?? - ['CHAR_LENGTH', ['scalar'], 'integer', 'none'] >?? - ['COALESCE', ['scalar'], 'scalar', 'none'] >?? - ['COUNT', ['scalar'], 'integer', 'group'] >?? - ['CURRENT_DATE', [], 'any', 'none'] > ... > > I used SCALAR instead of STRING, so we may use VARBINARY instead of STRING. > I suppose that you cannot say 'string or varbinary' in this output, right? > > Below is an updated table of function result types. Remember that this is not > always the type of the value that we get from the function. However, all values > that we receive as a result of executing the function must be of the specified > type. > > > FUNCTION NAME??? ??? CURRENT??? ??? SUGGESTED > abs??? ??? ??? number??? ??? number > avg??? ??? ??? number??? ??? double > char??? ??? ??? string??? ??? string > character_length??? integer??? ??? integer > char_length??? ??? integer??? ??? integer > coalesce??? ??? scalar??? ??? scalar > count??? ??? ??? integer??? ??? integer > greatest??? ??? scalar??? ??? scalar > group_concat??? ??? string??? ??? string > hex??? ??? ??? string??? ??? string > ifnull??? ??? ??? integer??? ??? scalar > least??? ??? ??? scalar??? ??? scalar > length??? ??? ??? integer??? ??? integer > like??? ??? ??? integer??? ??? boolean > likelihood??? ??? boolean??? ??? scalar > likely??? ??? ??? boolean??? ??? scalar > lower??? ??? ??? string??? ??? string > max??? ??? ??? scalar??? ??? scalar > min??? ??? ??? scalar??? ??? scalar > nullif??? ??? ??? scalar??? ??? scalar > position??? ??? integer??? ??? integer > printf??? ??? ??? string??? ??? string > quote??? ??? ??? string??? ??? string > random??? ??? ??? integer??? ??? integer > randomblob??? ??? varbinary??? varbinary > replace??? ??? ??? string??? ??? string > round??? ??? ??? integer??? ??? number > row_count??? ??? integer??? ??? integer > soundex??? ??? ??? string??? ??? string > substr??? ??? ??? string??? ??? string > sum??? ??? ??? number??? ??? number > total??? ??? ??? number??? ??? double > trim??? ??? ??? string??? ??? string > typeof??? ??? ??? string??? ??? string > unicode??? ??? ??? string??? ??? integer > unlikely??? ??? boolean??? ??? scalar > upper??? ??? ??? string??? ??? string > version??? ??? ??? string??? ??? string > zeroblob??? ??? varbinary??? varbinary > > > Below is an updated table of function argument types. Note that the IMPLICIT > CAST FOR ASSIGNMENT rules will be applied before values are passed to the > function as arguments. This means that if the function takes STRING and we give > it an INTEGER value, we will get an error even before we call the function. > > FUNCTION NAME??? ??? TYPES OF ARGUMENTS > abs??? ??? ??? number > avg??? ??? ??? number > char*??? ??? ??? unsigned > character_length??? scalar > char_length??? ??? scalar > coalesce??? ??? scalar > count??? ??? ??? scalar > greatest??? ??? scalar > group_concat??? ??? scalar, scalar > hex??? ??? ??? scalar > ifnull??? ??? ??? scalar, scalar > least*??? ??? ??? scalar > length??? ??? ??? scalar > like??? ??? ??? scalar, scalar, scalar > likelihood??? ??? scalar, double > likely??? ??? ??? scalar > lower??? ??? ??? scalar > max??? ??? ??? scalar > min??? ??? ??? scalar > nullif??? ??? ??? scalar, scalar > position??? ??? scalar, scalar > printf*??? ??? ??? scalar > quote??? ??? ??? scalar > randomblob??? ??? unsigned > replace??? ??? ??? scalar, scalar, scalar > round??? ??? ??? double, unsigned > soundex??? ??? ??? scalar > substr??? ??? ??? scalar, integer, integer > sum??? ??? ??? number > total??? ??? ??? number > trim*??? ??? ??? scalar > typeof??? ??? ??? scalar > unicode??? ??? ??? scalar > unlikely??? ??? scalar > upper??? ??? ??? scalar > zeroblob??? ??? unsigned > > > I still believe that not all functions that accept STRING have to to accept > VARBINARY. But that's up to you. > I gave a very long answer about each function that I think is relevant. I did not know that it is up to me -- if I am now the mail.ru boss, nobody warned me. Peter Gulutzan From pgulutzan at ocelot.ca Wed Jul 29 20:25:56 2020 From: pgulutzan at ocelot.ca (Peter Gulutzan) Date: Wed, 29 Jul 2020 11:25:56 -0600 Subject: [Tarantool-discussions] The result type and argument types of the built-in SQL functions. In-Reply-To: <018501d66585$443d4490$ccb7cdb0$@tarantool.org> References: <20200727122429.GA49280@tarantool.org> <0b155dec-3a58-a8fd-3aa8-343bf47e1e69@ocelot.ca> <018501d66585$443d4490$ccb7cdb0$@tarantool.org> Message-ID: <09c281c8-784a-9f7f-ae62-0ca5f7d7351e@ocelot.ca> Hi, On 2020-07-29 2:50 a.m., Timur Safin wrote: > > > : From: Peter Gulutzan > : Subject: Re: The result type and argument types of the built-in SQL > : functions. > : > ... > : Re returning UNSIGNED instead of INTEGER for ...LENGTH(), ROW_COUNT(), > : etc. > : (a) This data type is not standard and is not built-in for most major > : DBMSs. > : Even in MySQL > : https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html > : UNSIGNED is an attribute of a data type, as in BIGINT UNSIGNED, not a > : data type. > : (b) The maximum INTEGER value is the same as the maximum UNSIGNED value, > : 18446744073709551615, so the change is not necessary. > > Agreed that UNSIGNED is unnecessary and should be covered by INTEGER type. > > : > : Re returning NUMBER from ROUND(): > : You say that the current return is 'integer', but when I try it, > : I get 'double'. I think this current behaviour is acceptable. > : However, if ROUND(1) returns INTEGER, that is good too -- > : the general idea, not a law, would be > : "if the result data type can be the same as the input data type(s), > : let it be the same". > > I'd vote for NUMBER in the places where we need either integer or double. > NUMBER looks more Lua-ish (and we do want to have consistent behavior > between Lua and SQL as much as possible), i.e. if you have double value > which happens to be fraction-less then it will be normalized and > look as integer. > Okay, we vote differently. > : > : Re types of arguments: > : You suggest that TRIM() etc. must have 'string' arguments, > : but currently they can be 'varbinary' and I don't see why that is bad. > > Yup, this is part of blob vs string question. > > : Re BLOB instead of string. This is related to the fact that > : TRIM() etc. currently do not need to have 'string' arguments, > : they can have 'varbinary' arguments. I admit that many (maybe > : most) other vendors expect character strings in such cases. > > I do agree that that there are no single place where string > should be behave anyhow different than varbinary. Blob is string, > guid is string, thus in any case where string is acceptable should > be others acceptable. > > I think that we agree, but gave a longer answer in an earlier email. You mention guid, which reminds me about the UUID data type. Someday we will have to decide ... remembering that all number values sort together, in a column defined as SCALAR, which contains STRING and VARBINARY and UUID values, do the UUID and STRING values sort together, or do the UUID and VARBINARY values sort together, or do the UUID values come after all the STRING and VARBINARY values? > : > : Re MAP and ARRAY types "in the near future": > : I think we must define "near future". > : Currently in SQL we do not even have the Lua DECIMAL or UUID data types. > : Kirill Yukhin made the issue Implement DECIMAL data type #4415 > : on August 8 2019, saying > : "After DECIMAL type was introduced to the core, its time to implement > : this type in SQL frontend." > : We are nearly at August 8 2020, so apparently it takes more than one > : year to put > : a data type in SQL even though it is already in the core. > > But should we do anything here - if it's blob with underlying string > storage? Is it anyhow different to varbinary? > > I am not sure what you refer to. Do you mean that ARRAY and MAP should be treated as blobs? Would there be any change to what box.execute([[SELECT * FROM "_space";]]) returns? > : ( > : So I think that maps and arrays, which I think are more difficult, > : will not exist in SQL for two years. I am not worried. > > I feel your pain, but (hope) your estimations here are a little bit > too conservative :) > I explained in an earlier email what I thought about issue#4763. But I am far away (maybe I am the only Tarantool-related worker outside the Moscow area?) so of course you know about many things that I do not know about. > : However, it is interesting to imagine > : UPPER(array of strings) -- should we return upper of all elements? > : UPPER(map) -- should we return upper of both the key and the value? > : and so on. > : I believe Lua non-scalar values should be flattened in SQL, > : so perhaps the questions can all be avoided. > : > : Peter Gulutzan > > I believe Lua non-scalar should be incompatible with scalar context and > should generate runtime errors. But it's too early to discuss this though. > > Timur Peter Gulutzan