[Tarantool-discussions] Implicit cast in SQL
Mergen Imeev
imeevma at tarantool.org
Sun Jan 12 15:40:00 MSK 2020
Hi,
thank you for answering.
On Sat, Jan 11, 2020 at 02:49:41PM -0700, Peter Gulutzan wrote:
> Hi,
>
> On 2020-01-11 6:33 a.m., Mergen Imeev wrote:
> > Hi all,
> > I have a question to discuss. I know that we have discussed this
> > issue quite a few times.
>
> Very true! I thought that most of these matters were settled.
> Some of the discussions were with K. Osipov or CCed to dev,
> for example the long Re: [dev] cast thread.
> Should they be on the email "To:" list?
>
I remember this, I also remember that at some point it was
decided that "we don’t have time now, so let's postpone the
discussion to a later date." I think it was somewhere in
the thread "Check types of arguments of functions." So, I
think the time has come.
> > But, since we have a few open issues on
> > github, I think the final answer has not yet been found.
>
> I see two open issues on github say implicit cast:
> #3809 sql: implicit type cast problems during assignment
> https://github.com/tarantool/tarantool/issues/3809
> #4230 sql: implicit type cast during comparison
> https://github.com/tarantool/tarantool/issues/4230
> Are there others that I have missed?
>
I think that there is at least two more:
"#4159: sql: length takes not-string values"
https://github.com/tarantool/tarantool/issues/4159
"#4470: SQL arithmetic types are broken"
https://github.com/tarantool/tarantool/issues/4470
> > I want to
> > completely close the issue this time.
> >
> > I would like to start with an implicit cast. Implicit cast
> > consists of implicit type conversion for COMPARISON and implicit
> > type conversion for ASSIGNMENT.
>
> I will compare your suggestion with the chart that I wrote for
> the documentation, which was the result of my understanding from
> prior discussions. The documentation has not been published,
> but can be seen in section Data type conversion, lines 1632 and following,
> in
> https://github.com/tarantool/doc/blob/pgulutzan-2.3/doc/reference/reference_sql/sql.rst
>
> > Currently, all types can be divided into four categories:
> > numeric (UNSIGNED, INTEGER, NUMBER, DOUBLE), string (STRING),
> > binary (VARBINARY) and boolean (BOOLEAN). We also have a scalar
> > type, but any value of a scalar type has some subtype that can be
> > placed in one of these four categories.
> >
>
> Please don't call them "subtypes". I proposed that we should regard
> SCALAR as a supertype (and therefore the rules for the supertype
> would apply for the subtype), but that was rejected.
>
Ok, I won't call them 'subtypes' anymore.
> > So, I suggest this:
> > Implicit type conversion for COMPARISON:
> > 1) types of the same category can be implicitly cast to each
> > other;
>
> Or "implicitly cast to the type whose range includes both values".
>
> > 2) types of different categories cannot be implicitly cast to
> > each other.
> >
>
> The chart does not agree.
>
> * The chart says A (always) numeric can be compared to string.
> But I was fooled, I did not look in both directions.
> Now S (sometimes) a numeric can be compared to a string.
> For example, 1 = 'a' is legal, 'a' = 1 is illegal.
> But anyway, it is currently not always true that types of different
> categories cannot be implicitly cast to each other.
>
> Incidentally, I believe this behaviour is bad, I argued against
> implicit casting of numbers and strings for either comparison
> or assignment. But other people, probably the majority,
> argued that it was good, so it is the current behaviour.
>
> Also, in SCALAR columns, comparisons of values of different
> data types are allowed for the sake of ORDER BY (booleans
> are less than strings etc.) but not allowed for the sake of
> WHERE (searches for booleans cause an error if there are
> strings in the column etc.). This is, I believe, a bad thing.
>
I suggest starting with a comparison. It is possible that
I previously thought that implicit casting is something
good. But not now. For some reason, I began to believe
that implicit casting is evil.
First, I would like to make some definitions. Let's say that type
A can be naturally cast to type B if and only if any value of type
A does not change the internal representation when casting to type
B.
Currently, in SQL, natural casts are used in the following three
situations:
1) UNSIGNED can be naturally cast to INTEGER;
2) any numeric type can be naturally cast to a NUMBER;
3) any scalar type can be naturally cast to a SCALAR.
So, keeping this definition in mind, I propose these rules for
comparison:
1) If both values are of numeric types, they can be compared.
2) If one value has a type that can be naturally cast to the type
of another value, then they can be compared.
3) If none of the previous rules can be applied to values, then
they cannot be compared.
So, for the type set currently available in SQL, I mean the
following:
1) You can compare two values of numeric types. No implicit cast
is required. Special function used.
2) A value of type BOOLEAN can only be compared with a value of
type BOOLEAN. No implicit cast is required.
3) A value of type STRING can only be compared with a value of
type STRING. No implicit cast is required.
4) A value of type VARBINARY can only be compared with a value of
type VARBINARY. No implicit cast is required.
5) The value of any scalar type can be compared with the value of
the SCALAR type. Here the first value is naturally cast to SCALAR.
6) In the other situations values cannot be compared with each
other.
If I'm not mistaken, something similar is used in the non-SQL part
of Tarantool.
What do you think about this?
> > Implicit type conversion for ASSIGNMENT:
> > 1) If a user-defined cast function (UDCF) is defined, then this
> > function is used;
> > 2) If UDCF is not defined, then types of the same category can be
> > implicitly cast to each other, and types of different categories
> > cannot be implicitly cast to each other.
> >
>
> The chart does not agree.
>
> * The chart says A (always) numeric can be assigned to string,
> and S (sometimes) string can be assigned to number.
>
> * The chart says numeric to numeric implicit conversion is S
> (sometimes allowed) not A (always allowed) because of range
> differences, for example -1 will not fit in UNSIGNED.
> Of course, you can reply that the cast is allowed but the
> assignment is not allowed, if you prefer.
>
> I have assumed that function calls may involve implicit assignment.
>
> > I think this is what ANSI says.
> >
> > We currently do not have such a thing as UDCF. I suggest to fill
> > an issue and use the same rules for ASSIGNMENT as for COMPARISON.
> > In fact, I think we can create a special option for assigning
> > UCDF during the rework of implicit type conversion for ASSIGNMENT.
> >
> > What do you think about this?
> >
>
> When you say user-defined casts, do you really mean that
> Tarantool's users will be given facilities that allow them
> to write the code that handles comparison and assignment?
I meant this, but only for assignment. Do we need any
implicit casts other than natural in comparison?
> I think this might be hard, but will wait till I see the issue.
>
> Peter Gulutzan
>
More information about the Tarantool-discussions
mailing list