[Tarantool-discussions] Implicit cast for assignment between numeric types and type mismatch error description.

Tarantool discussions archive
 help / color / mirror / Atom feed

* [Tarantool-discussions] Implicit cast for assignment between numeric types and type mismatch error description.
@ 2020-06-22 11:52 Mergen Imeev
  2020-06-22 18:43 ` Peter Gulutzan
  0 siblings, 1 reply; 6+ messages in thread
From: Mergen Imeev @ 2020-06-22 11:52 UTC (permalink / raw)
  To: pgulutzan, tarantool-discussions, korablev, Vladislav Shpilevoy, tsafin

Hello Peter!

I have two questions.

The first. I am sorry to ask about this again, but I want to
discuss once again the implicit cast for assignment between
numeric types. I wrote earlier that I plan to use C rules for
implicit casting. However, after implementation, this looks
terribly wrong for me. Temporarily, I decided to allow implicit
casting only if the number after conversion from one type to
another using C rules and back remains unchanged. For example, a
double 50.0 can be implicitly cast to an integer 50, but a
double 50.5 cannot be cast to an integer, and an integer 2^60-1
cannot be cast to a double.

However, I am not sure if this is correct. Could you suggest rules
for implicit casting to assign numeric types?

The second question is about type mismatch error text.

We currently have two types of type mismatch errors.

The first:
- 'Type mismatch: can not convert some_text to integer'

The second:
- 'Type mismatch: can not convert text to integer'

In the first case, we use the value in the error description, in
the second we use the type.

Now we have decided on the same opinion.

We have 3 options:
1) use value in error description:
Type mismatch: can not convert 'some_text' to integer
2) use type in error description:
Type mismatch: can not convert text to integer
3) use both in error description:
Type mismatch: can not convert 'some_text'(text) to integer

The first and third options look a little unsafe, since the values
will be shown in the logs. However, they provide more information
than the second option.

Which one do you think is the best? Or maybe you can offer another
option?

In addition, we must remember that at some point the description
becomes the same as in Lua (issue #5074):
'Tuple field 1 type does not match one required by operation:
expected integer'

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for assignment between numeric types and type mismatch error description.
  2020-06-22 11:52 [Tarantool-discussions] Implicit cast for assignment between numeric types and type mismatch error description Mergen Imeev
@ 2020-06-22 18:43 ` Peter Gulutzan
  2020-06-23 16:15   ` Mergen Imeev
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Gulutzan @ 2020-06-22 18:43 UTC (permalink / raw)
  To: Mergen Imeev, tarantool-discussions, korablev,
	Vladislav Shpilevoy, tsafin

Hi,

On 2020-06-22 5:52 a.m., Mergen Imeev wrote:
 > Hello Peter!
 >
 > I have two questions.
 >
 > The first. I am sorry to ask about this again, but I want to
 > discuss once again the implicit cast for assignment between
 > numeric types. I wrote earlier that I plan to use C rules for
 > implicit casting. However, after implementation, this looks
 > terribly wrong for me. Temporarily, I decided to allow implicit
 > casting only if the number after conversion from one type to
 > another using C rules and back remains unchanged. For example, a
 > double 50.0 can be implicitly cast to an integer 50, but a
 > double 50.5 cannot be cast to an integer, and an integer 2^60-1
 > cannot be cast to a double.
 >
 > However, I am not sure if this is correct. Could you suggest rules
 > for implicit casting to assign numeric types?
 >

Re "double 50.5"
On 2020-04-30 in thread (CCed to several people and to dev)
"Re: [Tarantool-discussions] Implicit cast for ASSIGNMENT"
I read:
Mergen Imeev> I think that the rules for implicit casting when assigning 
value
Mergen Imeev> of numeric type must be the same as in C.
Peter Gulutzan> You mean there should be truncation not rounding?
Mergen Imeev> Yes.
Therefore I expected
box.execute([[CREATE TABLE t0 (s1 INTEGER PRIMARY KEY);]])
box.execute([[INSERT INTO t0 VALUES (?);]], {50.5})
would be okay, result = 50.

Re "integer 2^60-1":
Currently I can say:
box.execute([[CREATE TABLE t (s1 DOUBLE PRIMARY KEY);]])
box.execute([[INSERT INTO t VALUES (?);]], 2^60 - 1)
and is okay today, result = 2^60 - 1 (I think).

I suggest the rules that were discussed earlier:
Loss of significant digits (overflow) is a runtime error.
Loss of insignificant digits is not an error.
I think this should be possible because explicit casts work:
box.execute([[SELECT CAST(? AS INTEGER);]],{2^64}) -- result = error
box.execute([[SELECT CAST(50.5 AS INTEGER);]]) -- result = 50

 > The second question is about type mismatch error text.
 >
 > We currently have two types of type mismatch errors.
 >
 > The first:
 > - 'Type mismatch: can not convert some_text to integer'
 >
 > The second:
 > - 'Type mismatch: can not convert text to integer'
 >
 > In the first case, we use the value in the error description, in
 > the second we use the type.
 >
 > Now we have decided on the same opinion.
 >
 > We have 3 options:
 > 1) use value in error description:
 > Type mismatch: can not convert 'some_text' to integer
 > 2) use type in error description:
 > Type mismatch: can not convert text to integer
 > 3) use both in error description:
 > Type mismatch: can not convert 'some_text'(text) to integer
 >
 > The first and third options look a little unsafe, since the values
 > will be shown in the logs. However, they provide more information
 > than the second option.
 >
 > Which one do you think is the best? Or maybe you can offer another
 > option?
 >
 > In addition, we must remember that at some point the description
 > becomes the same as in Lua (issue #5074):
 > 'Tuple field 1 type does not match one required by operation:
 > expected integer'
 >

Re 'Type mismatch: can not convert text to integer':

You use "type", but that is not as specific as it could be.
Why not "data type"?

You use "can not", but in other error messages
I see "cannot", for example
"cannot create trigger on system table" (trigger.c)
Why be inconsistent?

You use "text", but that is a synonym, not what typeof() returns.
Why not "string"?

You use "convert", but look at the title of this email thread.
Why not "implicitly cast"?
Actually "conversion" is better now, it is in the manual.
But I could change the manual if people wanted "cast".

Re 'Type mismatch: can not convert some_text to integer'

I do not know what your plan is, when some_text is extremely
long, or when the value has non-displayable bytes / characters.

I do not know what your plan is, when the source is not
is not a literal value, but instead is ? or is a
column name or is a function result. I guess the answer
is still "the value", but I might be confused if the
error message mentions the value but the value does not
appear in the SQL statement.

You are concentrating on making it clear what the source is.
Why not make it clear what the target is?

I think that the other way to be clear about the problem is:
return the character number of the token that is bad.
But that is not a request about the message content.

Re 'Type mismatch: can not convert 'some_text'(text) to integer'

To a beginner this is helpful, to an experienced user it
is redundant. I will guess that most users, after only a
few days, will have enough experience that they know a
literal inside single quotes is a string. So the question is:
is this so obvious that it is irritating?
I guess not.
So I like this option.

Of course I also like some texts in the SQL standard document,
"data exception -- numeric value out of range" and
"data exception -- invalid character value for cast",
but I don't like them so much that I want to fight about them.

Peter Gulutzan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for assignment between numeric types and type mismatch error description.
  2020-06-22 18:43 ` Peter Gulutzan
@ 2020-06-23 16:15   ` Mergen Imeev
  2020-06-23 19:42     ` Peter Gulutzan
  0 siblings, 1 reply; 6+ messages in thread
From: Mergen Imeev @ 2020-06-23 16:15 UTC (permalink / raw)
  To: Peter Gulutzan, tarantool-discussions, korablev,
	Vladislav Shpilevoy, tsafin

Hi!

On 22.06.2020 21:43, Peter Gulutzan wrote:
> Hi,
>
> On 2020-06-22 5:52 a.m., Mergen Imeev wrote:
> > Hello Peter!
> >
> > I have two questions.
> >
> > The first. I am sorry to ask about this again, but I want to
> > discuss once again the implicit cast for assignment between
> > numeric types. I wrote earlier that I plan to use C rules for
> > implicit casting. However, after implementation, this looks
> > terribly wrong for me. Temporarily, I decided to allow implicit
> > casting only if the number after conversion from one type to
> > another using C rules and back remains unchanged. For example, a
> > double 50.0 can be implicitly cast to an integer 50, but a
> > double 50.5 cannot be cast to an integer, and an integer 2^60-1
> > cannot be cast to a double.
> >
> > However, I am not sure if this is correct. Could you suggest rules
> > for implicit casting to assign numeric types?
> >
>
> Re "double 50.5"
> On 2020-04-30 in thread (CCed to several people and to dev)
> "Re: [Tarantool-discussions] Implicit cast for ASSIGNMENT"
> I read:
> Mergen Imeev> I think that the rules for implicit casting when 
> assigning value
> Mergen Imeev> of numeric type must be the same as in C.
> Peter Gulutzan> You mean there should be truncation not rounding?
> Mergen Imeev> Yes.
> Therefore I expected
> box.execute([[CREATE TABLE t0 (s1 INTEGER PRIMARY KEY);]])
> box.execute([[INSERT INTO t0 VALUES (?);]], {50.5})
> would be okay, result = 50.
>
> Re "integer 2^60-1":
> Currently I can say:
> box.execute([[CREATE TABLE t (s1 DOUBLE PRIMARY KEY);]])
> box.execute([[INSERT INTO t VALUES (?);]], 2^60 - 1)
> and is okay today, result = 2^60 - 1 (I think).
>
> I suggest the rules that were discussed earlier:
> Loss of significant digits (overflow) is a runtime error.
> Loss of insignificant digits is not an error.
> I think this should be possible because explicit casts work:
> box.execute([[SELECT CAST(? AS INTEGER);]],{2^64}) -- result = error
> box.execute([[SELECT CAST(50.5 AS INTEGER);]]) -- result = 50
>
Thanks for the answer. However, I now have another question:
should we apply these rules for cases like this:
SELECT * FROM t LIMIT 2.5;

I mean, we should throw an error here or execute it like this:
SELECT * FROM t LIMIT 2;


> > The second question is about type mismatch error text.
> >
> > We currently have two types of type mismatch errors.
> >
> > The first:
> > - 'Type mismatch: can not convert some_text to integer'
> >
> > The second:
> > - 'Type mismatch: can not convert text to integer'
> >
> > In the first case, we use the value in the error description, in
> > the second we use the type.
> >
> > Now we have decided on the same opinion.
> >
> > We have 3 options:
> > 1) use value in error description:
> > Type mismatch: can not convert 'some_text' to integer
> > 2) use type in error description:
> > Type mismatch: can not convert text to integer
> > 3) use both in error description:
> > Type mismatch: can not convert 'some_text'(text) to integer
> >
> > The first and third options look a little unsafe, since the values
> > will be shown in the logs. However, they provide more information
> > than the second option.
> >
> > Which one do you think is the best? Or maybe you can offer another
> > option?
> >
> > In addition, we must remember that at some point the description
> > becomes the same as in Lua (issue #5074):
> > 'Tuple field 1 type does not match one required by operation:
> > expected integer'
> >
>
> Re 'Type mismatch: can not convert text to integer':
>
> You use "type", but that is not as specific as it could be.
> Why not "data type"?
>
> You use "can not", but in other error messages
> I see "cannot", for example
> "cannot create trigger on system table" (trigger.c)
> Why be inconsistent?
>
> You use "text", but that is a synonym, not what typeof() returns.
> Why not "string"?
>
> You use "convert", but look at the title of this email thread.
> Why not "implicitly cast"?
> Actually "conversion" is better now, it is in the manual.
> But I could change the manual if people wanted "cast".
>
> Re 'Type mismatch: can not convert some_text to integer'
>
> I do not know what your plan is, when some_text is extremely
> long, or when the value has non-displayable bytes / characters.
>
> I do not know what your plan is, when the source is not
> is not a literal value, but instead is ? or is a
> column name or is a function result. I guess the answer
> is still "the value", but I might be confused if the
> error message mentions the value but the value does not
> appear in the SQL statement.
>
> You are concentrating on making it clear what the source is.
> Why not make it clear what the target is?
>
> I think that the other way to be clear about the problem is:
> return the character number of the token that is bad.
> But that is not a request about the message content.
>
> Re 'Type mismatch: can not convert 'some_text'(text) to integer'
>
> To a beginner this is helpful, to an experienced user it
> is redundant. I will guess that most users, after only a
> few days, will have enough experience that they know a
> literal inside single quotes is a string. So the question is:
> is this so obvious that it is irritating?
> I guess not.
> So I like this option.
>
> Of course I also like some texts in the SQL standard document,
> "data exception -- numeric value out of range" and
> "data exception -- invalid character value for cast",
> but I don't like them so much that I want to fight about them.
>
> Peter Gulutzan
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for assignment between numeric types and type mismatch error description.
  2020-06-23 16:15   ` Mergen Imeev
@ 2020-06-23 19:42     ` Peter Gulutzan
  2020-06-25  8:02       ` Mergen Imeev
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Gulutzan @ 2020-06-23 19:42 UTC (permalink / raw)
  To: Mergen Imeev, tarantool-discussions, korablev,
	Vladislav Shpilevoy, tsafin

Hi,

On 2020-06-23 10:15 a.m., Mergen Imeev wrote:
<cut>
 > Thanks for the answer. However, I now have another question:
 > should we apply these rules for cases like this:
 > SELECT * FROM t LIMIT 2.5;
 >
 > I mean, we should throw an error here or execute it like this:
 > SELECT * FROM t LIMIT 2;
<cut>

In the SQL-standard equivalent of LIMIT n,
which is FETCH FIRST n ROWS, that is a syntax
error because
"The declared type of <fetch first row count>
shall be an exact numeric with scale 0 (zero)."
For us it is a runtime error ...
"
tarantool> n = 0.000000001
---
...
tarantool> box.execute([[SELECT 5 LIMIT ?;]], {n})
---
- null
- 'Failed to execute SQL statement: Only positive
   integers are allowed in the LIMIT
   clause'
...
"
and you are proposing that there should be no error.
On the other hand,
box.execute([[CREATE TABLE t (s1 VARCHAR(2.0) PRIMARY KEY);]])
causes a "Syntax error", and you are not proposing
that we change that.
Another case is
box.execute([[SELECT SUBSTR('abcde',2.99);]])
which is legal, 2.99 is truncated to 2.

Therefore, I think your proposal means:
If n is not an integer,
and that is not detected as a syntax error,
then there should be no error or warning.
And I think the precedent of substr() means:
there should be truncation not rounding.

If I have understood correctly, then I agree.

Peter Gulutzan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for assignment between numeric types and type mismatch error description.
  2020-06-23 19:42     ` Peter Gulutzan
@ 2020-06-25  8:02       ` Mergen Imeev
  2020-06-25 19:39         ` Peter Gulutzan
  0 siblings, 1 reply; 6+ messages in thread
From: Mergen Imeev @ 2020-06-25  8:02 UTC (permalink / raw)
  To: Peter Gulutzan, tarantool-discussions, korablev,
	Vladislav Shpilevoy, tsafin

Hi,

On 23.06.2020 22:42, Peter Gulutzan wrote:
> Hi,
>
> On 2020-06-23 10:15 a.m., Mergen Imeev wrote:
> <cut>
> > Thanks for the answer. However, I now have another question:
> > should we apply these rules for cases like this:
> > SELECT * FROM t LIMIT 2.5;
> >
> > I mean, we should throw an error here or execute it like this:
> > SELECT * FROM t LIMIT 2;
> <cut>
>
> In the SQL-standard equivalent of LIMIT n,
> which is FETCH FIRST n ROWS, that is a syntax
> error because
> "The declared type of <fetch first row count>
> shall be an exact numeric with scale 0 (zero)."
> For us it is a runtime error ...
Should we generate a syntax error here instead of a runtime error?
However, in this case, we will not be able to execute something
like this:
SELECT * FROM t LIMIT 1 + 2;

I will fill an issue if you think so.

Actually, generating an error here seems less painful than making
an integer from a double.

> "
> tarantool> n = 0.000000001
> ---
> ...
> tarantool> box.execute([[SELECT 5 LIMIT ?;]], {n})
> ---
> - null
> - 'Failed to execute SQL statement: Only positive
>   integers are allowed in the LIMIT
>   clause'
> ...
> "
> and you are proposing that there should be no error.
No, I think the error should be here. However, I'm not sure, so I
asked you.

> On the other hand,
> box.execute([[CREATE TABLE t (s1 VARCHAR(2.0) PRIMARY KEY);]])
> causes a "Syntax error", and you are not proposing
> that we change that.
> Another case is
> box.execute([[SELECT SUBSTR('abcde',2.99);]])
> which is legal, 2.99 is truncated to 2.

I think this is worth fixing. Or is it better the way it is now?


>
> Therefore, I think your proposal means:
> If n is not an integer,
> and that is not detected as a syntax error,
> then there should be no error or warning.
> And I think the precedent of substr() means:
> there should be truncation not rounding.
>
> If I have understood correctly, then I agree.
>
> Peter Gulutzan
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for assignment between numeric types and type mismatch error description.
  2020-06-25  8:02       ` Mergen Imeev
@ 2020-06-25 19:39         ` Peter Gulutzan
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Gulutzan @ 2020-06-25 19:39 UTC (permalink / raw)
  To: Mergen Imeev, tarantool-discussions, korablev,
	Vladislav Shpilevoy, tsafin

Hi,

On 2020-06-25 2:02 a.m., Mergen Imeev wrote:
 > Hi,
 >
 > On 23.06.2020 22:42, Peter Gulutzan wrote:
 >> Hi,
 >>
 >> On 2020-06-23 10:15 a.m., Mergen Imeev wrote:
 >> <cut>
 >> > Thanks for the answer. However, I now have another question:
 >> > should we apply these rules for cases like this:
 >> > SELECT * FROM t LIMIT 2.5;
 >> >
 >> > I mean, we should throw an error here or execute it like this:
 >> > SELECT * FROM t LIMIT 2;
 >> <cut>
 >>
 >> In the SQL-standard equivalent of LIMIT n,
 >> which is FETCH FIRST n ROWS, that is a syntax
 >> error because
 >> "The declared type of <fetch first row count>
 >> shall be an exact numeric with scale 0 (zero)."
 >> For us it is a runtime error ...
 > Should we generate a syntax error here instead of a runtime error?
 > However, in this case, we will not be able to execute something
 > like this:
 > SELECT * FROM t LIMIT 1 + 2;
 >
 > I will fill an issue if you think so.
 >
 > Actually, generating an error here seems less painful than making
 > an integer from a double.
 >
 >> "
 >> tarantool> n = 0.000000001
 >> ---
 >> ...
 >> tarantool> box.execute([[SELECT 5 LIMIT ?;]], {n})
 >> ---
 >> - null
 >> - 'Failed to execute SQL statement: Only positive
 >>   integers are allowed in the LIMIT
 >>   clause'
 >> ...
 >> "
 >> and you are proposing that there should be no error.
 > No, I think the error should be here. However, I'm not sure, so I
 > asked you.
 >
 >> On the other hand,
 >> box.execute([[CREATE TABLE t (s1 VARCHAR(2.0) PRIMARY KEY);]])
 >> causes a "Syntax error", and you are not proposing
 >> that we change that.
 >> Another case is
 >> box.execute([[SELECT SUBSTR('abcde',2.99);]])
 >> which is legal, 2.99 is truncated to 2.
 >
 > I think this is worth fixing. Or is it better the way it is now?
 >
 >
 >>
 >> Therefore, I think your proposal means:
 >> If n is not an integer,
 >> and that is not detected as a syntax error,
 >> then there should be no error or warning.
 >> And I think the precedent of substr() means:
 >> there should be truncation not rounding.
 >>
 >> If I have understood correctly, then I agree.
 >>
 >> Peter Gulutzan
 >>
 >>

I did not understand correctly, but now I do.

Your example SELECT * FROM t LIMIT 1 + 2; must
be allowed because it is documented behaviour.
The manual says, in section "LIMIT clause":
"Expressions may contain integers and arithmetic
operators or functions, for example ABS(-3 / 1).
However, the result must be an integer value
greater than or equal to zero."
Therefore a syntax error check could only look
for a single literal value, and would be
unnecessary because the runtime check would
continue to exist.

Currently this is legal:
SELECT char(1.1),
        randomblob(1.1),
        substr('a', 1.1, 1.1),
        zeroblob(1.1)
        GROUP BY 1.1
        ORDER BY 1.1;
But this is not legal:
SELECT 1.1
        LIMIT 1.1
        OFFSET 1.1;
That is not consistent handling of cases
where (as far as I can see) the only
sensible arguments are unsigned integers.
Therefore, although this might cause a
million-row insertion to fail on the
millionth row, 1.1 should cause an error.
A very-low-priority bug.

Peter Gulutzan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-25 19:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-22 11:52 [Tarantool-discussions] Implicit cast for assignment between numeric types and type mismatch error description Mergen Imeev
2020-06-22 18:43 ` Peter Gulutzan
2020-06-23 16:15   ` Mergen Imeev
2020-06-23 19:42     ` Peter Gulutzan
2020-06-25  8:02       ` Mergen Imeev
2020-06-25 19:39         ` Peter Gulutzan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox