Tarantool discussions archive
 help / color / mirror / Atom feed
* [Tarantool-discussions] Check implicit cast between strings and numbers
@ 2020-07-06 13:52 Mergen Imeev
  2020-07-06 19:51 ` Peter Gulutzan
  0 siblings, 1 reply; 4+ messages in thread
From: Mergen Imeev @ 2020-07-06 13:52 UTC (permalink / raw)
  To: tarantool-discussions, Peter Gulutzan, Nikita Pettik,
	Vladislav Shpilevoy, tsafin

Hi,

Peter, could you please take a look at my branch and say if
something is wrong with the implicit casts between strings and
numerics. I mean, there shouldn't be any.

Branch: imeevma/gh-3809-disallow-imlicit-cast-from-string-to-nums

There should be no implicit casts for assignment after this patch,
except for the following:
1) Any type can be implicitly cast to ANY.
2) Any scalar type can be implicitly cast to SCALAR.
3) Any numeric type can be implicitly cast to NUMBER.
4) In some cases, numbers can be implicitly converted to another
number type. The rules can be seen in the commit message of the
patches.

For comparison, we say that any numbers can be compared with each
other. However, you may find some cases where the result is not
what it should be. This will be fixed in another issue.

Also, comparison with SCALAR does not work correctly. This will
also be fixed later after we decide how this should work.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-discussions] Check implicit cast between strings and numbers
  2020-07-06 13:52 [Tarantool-discussions] Check implicit cast between strings and numbers Mergen Imeev
@ 2020-07-06 19:51 ` Peter Gulutzan
  2020-07-08 16:58   ` Mergen Imeev
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Gulutzan @ 2020-07-06 19:51 UTC (permalink / raw)
  To: Mergen Imeev, tarantool-discussions, Nikita Pettik,
	Vladislav Shpilevoy, tsafin

Hi,


On 2020-07-06 7:52 a.m., Mergen Imeev wrote:
> Hi,
>
> Peter, could you please take a look at my branch and say if
> something is wrong with the implicit casts between strings and
> numerics. I mean, there shouldn't be any.
>
> Branch: imeevma/gh-3809-disallow-imlicit-cast-from-string-to-nums
>
> There should be no implicit casts for assignment after this patch,
> except for the following:
> 1) Any type can be implicitly cast to ANY.
> 2) Any scalar type can be implicitly cast to SCALAR.
> 3) Any numeric type can be implicitly cast to NUMBER.
> 4) In some cases, numbers can be implicitly converted to another
> number type. The rules can be seen in the commit message of the
> patches.
>
> For comparison, we say that any numbers can be compared with each
> other. However, you may find some cases where the result is not
> what it should be. This will be fixed in another issue.
>
> Also, comparison with SCALAR does not work correctly. This will
> also be fixed later after we decide how this should work.
>
Hi,

I looked. I found nothing significant.
Some of my statements here may be repeitions of earlier statements.

Some tests with arithmetic: These are examples in the version 2.4 manual:
box.execute([[select '7' + '7' /* result is 14, metadata = scalar */;]])
box.execute([[SELECT 5 / '5' /* result is 1 */;]])
I assume that this will be fixed soon.

Some tests of functions:
box.execute([[SELECT ABS('') /* result is 0 */;]])
box.execute([[SELECT CAST('' AS DOUBLE) /* result is error */;]])
box.execute([[SELECT CHAR('') /* result is '\0' */;]])
box.execute([[SELECT GREATEST('',1e500,X'00',0,CAST(100 AS DOUBLE));]])
box.execute([[SELECT LENGTH(1234.56) /* result is 7 */;]])
box.execute([[SELECT LIKELIHOOD('a' = 'b', '0.1') /* result is error */;]])
box.execute([[SELECT LOWER(5) /* result is '5' */;]])
box.execute([[SELECT POSITION(14, 3.14159) /* result is error */;]])
box.execute([[SELECT QUOTE(5) /* result is 5 */;]])
box.execute([[SELECT RANDOMBLOB('') /* result is null */;]])
box.execute([[SELECT ROUND('44.7') /* result is 45 */;]])
box.execute([[SELECT SOUNDEX(14) /* result is */;;}
box.execute([[SELECT SOUNDEX(14) /* result is ?000 */;]])
box.execute([[SELECT SUBSTR(5,5,5) /* result is '' */;]])
box.execute([[SELECT TRIM('1' from 121) /* result is 2 */;]])
box.execute([[SELECT UNICODE(0) /* result is 48 */;]])
box.execute([[SELECT UPPER(5) /* result is '5' */;]])
box.execute([[SELECT ZEROBLOB('1') /* result is '/0' */;]])
I see that sometimes strings are accepted where numbers are expected.
I see that sometimes numbers are accepted where strings are expected.
Maybe this isn't always bad, but maybe you intend to make changes.

A test with -nan:
box.execute([[DROP TABLE t;]])
box.execute([[CREATE TABLE t (s1 INTEGER PRIMARY KEY, s2 DOUBLE, s3 
INTEGER);]])
box.space.T:insert{1, ffi.cast('double',math.sqrt(-1)),0}
box.execute([[UPDATE t SET s3 = s2;]])
This succeeds, column s3 becomes NULL.
I guess this is an "implicit cast" of -nan to NULL, I think that is okay.

A test with foreign keys:
box.execute([[DROP TABLE t2;]])
box.execute([[DROP TABLE t1;]])
box.execute([[CREATE TABLE t1 (s1 DOUBLE PRIMARY KEY);]])
box.execute([[CREATE TABLE t2 (s1 INTEGER PRIMARY KEY REFERENCES t1);]])
This fails. The standard requirement is
"The declared type of each referencing column shall be comparable
to the declared type of the corresponding referenced column."
So if it is easy to declare that the statement is legal,
you would be very slightly more compliant with the standard.

A test with a numeric target:
"
tarantool> box.execute([[CREATE TABLE t (s1 INTEGER PRIMARY KEY);]])
---
- row_count: 1
...

tarantool> box.execute([[INSERT INTO t VALUES ('0');]])
---
- null
- 'Type mismatch: can not convert ''0'' (type: text) to integer'
...
"
Fine, but it might be better if the error message said "type: string".

A test with a constraint:
box.execute([[DROP TABLE t4;]])
box.execute([[CREATE TABLE t4 (s1 INTEGER PRIMARY KEY);]])
box.execute([[ALTER TABLE t4 ADD CONSTRAINT c CHECK (s1 <> '');]])
box.execute([[INSERT INTO t4 VALUES (0);]])
The result is an error. Fine.
I wonder whether it would be easy to check during ALTER rather than 
during INSERT.

A test with LIMIT:
n = -1
box.execute([[SELECT 5 LIMIT ?;]],{n})
n = 1
s = box.prepare([[SELECT 5 LIMIT ?;]])
box.execute(s.stmt_id,{n})
n = -1
box.execute(s.stmt_id,{n})
This is a situation that I worried about. It works as planned. Fine.

As you know, we do not agree about SCALAR.

If you are sure that this behaviour change will be in version 2.5,
then please order me or Elena Shebunyaeva to put a warning in
version 2.4 release notes
https://www.tarantool.io/en/doc/2.4/whats_new/
or in the SQL section which mentions "implicit casting" several times.

Peter Gulutzan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-discussions] Check implicit cast between strings and numbers
  2020-07-06 19:51 ` Peter Gulutzan
@ 2020-07-08 16:58   ` Mergen Imeev
  2020-07-08 18:58     ` Peter Gulutzan
  0 siblings, 1 reply; 4+ messages in thread
From: Mergen Imeev @ 2020-07-08 16:58 UTC (permalink / raw)
  To: Peter Gulutzan, tarantool-discussions, Nikita Pettik,
	Vladislav Shpilevoy, tsafin

Hi,

Thank you for testing! I added check types of arguments of the
functions, could you look once more?

There are some functions in which we support both STRING and BLOB
as arguments, for example substr(). Do you think we should leave
them as they are now, or can we change them so that they support
only STRING arguments?

On 06.07.2020 22:51, Peter Gulutzan wrote:
> Hi,
>
>
> On 2020-07-06 7:52 a.m., Mergen Imeev wrote:
>> Hi,
>>
>> Peter, could you please take a look at my branch and say if
>> something is wrong with the implicit casts between strings and
>> numerics. I mean, there shouldn't be any.
>>
>> Branch: imeevma/gh-3809-disallow-imlicit-cast-from-string-to-nums
>>
>> There should be no implicit casts for assignment after this patch,
>> except for the following:
>> 1) Any type can be implicitly cast to ANY.
>> 2) Any scalar type can be implicitly cast to SCALAR.
>> 3) Any numeric type can be implicitly cast to NUMBER.
>> 4) In some cases, numbers can be implicitly converted to another
>> number type. The rules can be seen in the commit message of the
>> patches.
>>
>> For comparison, we say that any numbers can be compared with each
>> other. However, you may find some cases where the result is not
>> what it should be. This will be fixed in another issue.
>>
>> Also, comparison with SCALAR does not work correctly. This will
>> also be fixed later after we decide how this should work.
>>
> Hi,
>
> I looked. I found nothing significant.
> Some of my statements here may be repeitions of earlier statements.
>
> Some tests with arithmetic: These are examples in the version 2.4 manual:
> box.execute([[select '7' + '7' /* result is 14, metadata = scalar */;]])
> box.execute([[SELECT 5 / '5' /* result is 1 */;]])
> I assume that this will be fixed soon.
>
> Some tests of functions:
> box.execute([[SELECT ABS('') /* result is 0 */;]])
> box.execute([[SELECT CAST('' AS DOUBLE) /* result is error */;]])
> box.execute([[SELECT CHAR('') /* result is '\0' */;]])
> box.execute([[SELECT GREATEST('',1e500,X'00',0,CAST(100 AS DOUBLE));]])
> box.execute([[SELECT LENGTH(1234.56) /* result is 7 */;]])
> box.execute([[SELECT LIKELIHOOD('a' = 'b', '0.1') /* result is error 
> */;]])
> box.execute([[SELECT LOWER(5) /* result is '5' */;]])
> box.execute([[SELECT POSITION(14, 3.14159) /* result is error */;]])
> box.execute([[SELECT QUOTE(5) /* result is 5 */;]])
> box.execute([[SELECT RANDOMBLOB('') /* result is null */;]])
> box.execute([[SELECT ROUND('44.7') /* result is 45 */;]])
> box.execute([[SELECT SOUNDEX(14) /* result is */;;}
> box.execute([[SELECT SOUNDEX(14) /* result is ?000 */;]])
> box.execute([[SELECT SUBSTR(5,5,5) /* result is '' */;]])
> box.execute([[SELECT TRIM('1' from 121) /* result is 2 */;]])
> box.execute([[SELECT UNICODE(0) /* result is 48 */;]])
> box.execute([[SELECT UPPER(5) /* result is '5' */;]])
> box.execute([[SELECT ZEROBLOB('1') /* result is '/0' */;]])
> I see that sometimes strings are accepted where numbers are expected.
> I see that sometimes numbers are accepted where strings are expected.
> Maybe this isn't always bad, but maybe you intend to make changes.
>
> A test with -nan:
> box.execute([[DROP TABLE t;]])
> box.execute([[CREATE TABLE t (s1 INTEGER PRIMARY KEY, s2 DOUBLE, s3 
> INTEGER);]])
> box.space.T:insert{1, ffi.cast('double',math.sqrt(-1)),0}
> box.execute([[UPDATE t SET s3 = s2;]])
> This succeeds, column s3 becomes NULL.
> I guess this is an "implicit cast" of -nan to NULL, I think that is okay.
>
> A test with foreign keys:
> box.execute([[DROP TABLE t2;]])
> box.execute([[DROP TABLE t1;]])
> box.execute([[CREATE TABLE t1 (s1 DOUBLE PRIMARY KEY);]])
> box.execute([[CREATE TABLE t2 (s1 INTEGER PRIMARY KEY REFERENCES t1);]])
> This fails. The standard requirement is
> "The declared type of each referencing column shall be comparable
> to the declared type of the corresponding referenced column."
> So if it is easy to declare that the statement is legal,
> you would be very slightly more compliant with the standard.
>
> A test with a numeric target:
> "
> tarantool> box.execute([[CREATE TABLE t (s1 INTEGER PRIMARY KEY);]])
> ---
> - row_count: 1
> ...
>
> tarantool> box.execute([[INSERT INTO t VALUES ('0');]])
> ---
> - null
> - 'Type mismatch: can not convert ''0'' (type: text) to integer'
> ...
> "
> Fine, but it might be better if the error message said "type: string".
>
> A test with a constraint:
> box.execute([[DROP TABLE t4;]])
> box.execute([[CREATE TABLE t4 (s1 INTEGER PRIMARY KEY);]])
> box.execute([[ALTER TABLE t4 ADD CONSTRAINT c CHECK (s1 <> '');]])
> box.execute([[INSERT INTO t4 VALUES (0);]])
> The result is an error. Fine.
> I wonder whether it would be easy to check during ALTER rather than 
> during INSERT.
>
> A test with LIMIT:
> n = -1
> box.execute([[SELECT 5 LIMIT ?;]],{n})
> n = 1
> s = box.prepare([[SELECT 5 LIMIT ?;]])
> box.execute(s.stmt_id,{n})
> n = -1
> box.execute(s.stmt_id,{n})
> This is a situation that I worried about. It works as planned. Fine.
>
> As you know, we do not agree about SCALAR.
>
> If you are sure that this behaviour change will be in version 2.5,
> then please order me or Elena Shebunyaeva to put a warning in
> version 2.4 release notes
> https://www.tarantool.io/en/doc/2.4/whats_new/
> or in the SQL section which mentions "implicit casting" several times.
>
> Peter Gulutzan
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-discussions] Check implicit cast between strings and numbers
  2020-07-08 16:58   ` Mergen Imeev
@ 2020-07-08 18:58     ` Peter Gulutzan
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Gulutzan @ 2020-07-08 18:58 UTC (permalink / raw)
  To: Mergen Imeev, tarantool-discussions, Nikita Pettik,
	Vladislav Shpilevoy, tsafin

Hi,

On 2020-07-08 10:58 a.m., Mergen Imeev wrote:
 > Hi,
 >
 > Thank you for testing! I added check types of arguments of the
 > functions, could you look once more?
 >
 > There are some functions in which we support both STRING and BLOB
 > as arguments, for example substr(). Do you think we should leave
 > them as they are now, or can we change them so that they support
 > only STRING arguments?
<cut>

I think those functions are
SELECT HEX('Г'), HEX(CAST('Г' AS VARBINARY));
SELECT LENGTH('Г'), LENGTH(CAST('Г' AS VARBINARY));
SELECT LOWER('Г'), LOWER(CAST('Г' AS VARBINARY));
SELECT POSITION('1', 'Г1'), POSITION('1', CAST('Г1' AS VARBINARY));
SELECT PRINTF('%s', 'Г'), PRINTF('%s', CAST('Г' AS VARBINARY));
SELECT QUOTE('Г'), QUOTE(CAST('Г' AS VARBINARY));
SELECT REPLACE('1Г1','1','2'), REPLACE(CAST('1Г1' AS VARBINARY),'1','2');
SELECT SOUNDEX('Г'), SOUNDEX(CAST('Г' AS VARBINARY));
SELECT SUBSTR('1Г1',2,1), SUBSTR(CAST('1Г1' AS VARBINARY),2,1);
SELECT TRIM(TRAILING '1' FROM '1Г1'), TRIM(TRAILING '1' FROM CAST('1Г1' 
AS VARBINARY));
SELECT UNICODE('Г'), UNICODE(CAST('Г' AS VARBINARY));
SELECT UPPER('Г'), UPPER(CAST('Г' AS VARBINARY));

They all work now.
I think that is good, we should leave them as they are now.

That does not mean they are all perfect, though.
Here are complaints, which are not relevant to
your question but which we should look at someday.

As you can see in the comments on issue#4145
sql: remake string value functions
https://github.com/tarantool/tarantool/issues/4145
sometimes functions with multiple parameters can
have multiple data types.

As you can see from the comments on issue#3929
Length functions for SQL character strings
https://github.com/tarantool/tarantool/issues/3929
As you know, I objected to the idea that CHARACTER_LENGTH()
should be applied to VARBINARY values, but
it was a firm order from Konstantin Osipov.

If we accepted that these functions should look
for bytes (rather than characters) when the values
are VARBINARY, then some of the results would differ.


Peter Gulutzan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-07-08 18:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-06 13:52 [Tarantool-discussions] Check implicit cast between strings and numbers Mergen Imeev
2020-07-06 19:51 ` Peter Gulutzan
2020-07-08 16:58   ` Mergen Imeev
2020-07-08 18:58     ` Peter Gulutzan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox