Tarantool discussions archive
 help / color / mirror / Atom feed
* [Tarantool-discussions] Implicit cast for COMPARISON
@ 2020-01-21 11:11 Mergen Imeev
  2020-01-21 18:09 ` Peter Gulutzan
  0 siblings, 1 reply; 18+ messages in thread
From: Mergen Imeev @ 2020-01-21 11:11 UTC (permalink / raw)
  To: tarantool-discussions

Hi all,
I would like once again ask a question about implicit cast for
COMPARISON. Currently the chart of implicit cast for COMPARISON
looks this way:
            To BOOL | To INT | To DBL | To STR | To BIN
---------------------------------------------------------
From BOOL | A       | -      | -      | -      | -
From INT  | -       | A      | A      | A      | -
From DBL  | -       | A      | A      | A      | -
From STR  | -       | S      | S      | A      | -
From BIN  | -       | -      | -      | -      | A

I suggested to change this chart so that is should look this way:
            To BOOL | To NUM | To STR | To BIN
-----------------------------------------------
From BOOL | A       | -      | -      | -
From NUM  | -       | A      | -      | -
From STR  | -       | -      | A      | -
From BIN  | -       | -      | -      | A

At the moment the only thing that different in these charts is
impicit cast from STRING to number and from number to STRING.

I asked the Russian community in Telegram about whether to allow
this implicit cast for COMPARISON. At the moment, 31 people have
voted, and 25 of them have voted for dropping this implicit cast.

So, once again I suggest using these rules:
1) The values of numeric types are comparable without any implicit
cast.
2) If the type of one of the values contains the type of another
value, then they are comparable.
3) In any other cases, the values are not comparable.

About "one type contains another type": we can say that INTEGER
contains UNSIGNED; NUMBER contains INTEGER, UNSIGNED, DOUBLE;
SCALAR contains all available in SQL types.

Since all numeric types are comparable, my suggestion means that a
value of any type can be compared with a value of type SCALAR. And
in this comparison, the same rules will be used as in no-SQL
Tarantool. For example, any value of type STRING is greater than
any value of type INTEGER.

What do you think about this?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-01-21 11:11 [Tarantool-discussions] Implicit cast for COMPARISON Mergen Imeev
@ 2020-01-21 18:09 ` Peter Gulutzan
  2020-01-22 14:13   ` Mergen Imeev
  2020-01-22 16:43   ` Peter Gulutzan
  0 siblings, 2 replies; 18+ messages in thread
From: Peter Gulutzan @ 2020-01-21 18:09 UTC (permalink / raw)
  To: Mergen Imeev, tarantool-discussions

Hi,

On 2020-01-21 4:11 a.m., Mergen Imeev wrote:
 > Hi all,
 > I would like once again ask a question about implicit cast for
 > COMPARISON. Currently the chart of implicit cast for COMPARISON
 > looks this way:
 >             To BOOL | To INT | To DBL | To STR | To BIN
 > ---------------------------------------------------------
 > From BOOL | A       | -      | -      | -      | -
 > From INT  | -       | A      | A      | A      | -
 > From DBL  | -       | A      | A      | A      | -
 > From STR  | -       | S      | S      | A      | -
 > From BIN  | -       | -      | -      | -      | A
 >
 > I suggested to change this chart so that is should look this way:
 >             To BOOL | To NUM | To STR | To BIN
 > -----------------------------------------------
 > From BOOL | A       | -      | -      | -
 > From NUM  | -       | A      | -      | -
 > From STR  | -       | -      | A      | -
 > From BIN  | -       | -      | -      | A
 >
 > At the moment the only thing that different in these charts is
 > impicit cast from STRING to number and from number to STRING.
 >
 > I asked the Russian community in Telegram about whether to allow
 > this implicit cast for COMPARISON. At the moment, 31 people have
 > voted, and 25 of them have voted for dropping this implicit cast.
 >
 > So, once again I suggest using these rules:
 > 1) The values of numeric types are comparable without any implicit
 > cast.
 > 2) If the type of one of the values contains the type of another
 > value, then they are comparable.
 > 3) In any other cases, the values are not comparable.
 >
 > About "one type contains another type": we can say that INTEGER
 > contains UNSIGNED; NUMBER contains INTEGER, UNSIGNED, DOUBLE;
 > SCALAR contains all available in SQL types.
 >
 > Since all numeric types are comparable, my suggestion means that a
 > value of any type can be compared with a value of type SCALAR. And
 > in this comparison, the same rules will be used as in no-SQL
 > Tarantool. For example, any value of type STRING is greater than
 > any value of type INTEGER.
 >
 > What do you think about this?
 >

I believe that the words "compared with a value of type SCALAR"
are technically incorrect -- columns can be defined as having a
SCALAR type but the values in those columns are not SCALAR.
But (I hope) that detail probably does not affect the proposition.

This would reverse what some developers decided earlier
in dev threads "Clarify the rule" and "[dev] cast".
There would be no implicit casts, the SCALAR rules would apply.
Therefore:

SELECT 1 = '1';
Currently, returns: TRUE.
After fix, returns: FALSE.

CREATE TABLE t (s1 SCALAR PRIMARY KEY);
INSERT INTO t VALUES (FALSE),('A');
SELECT COUNT(*) FROM t WHERE s1 <> 'A';
Currently, returns: Type mismatch: can not convert text to boolean.
After fix, returns: 1.

CREATE TABLE t (s1 INTEGER PRIMARY KEY);
INSERT INTO t VALUES (1);
SELECT COUNT(*) FROM t WHERE s1 <> 'A';
Currently, returns: Type mismatch: can not convert text to boolean.
After fix, returns: 1.

If the above example is correct, it differs from NoSQL behaviour:
box.schema.space.create('T')
box.space.T:create_index('I')
box.space.T:insert{1}
box.space.T:select('A')
Currently, returns: Supplied key type of part 0 does not match index 
part type
After fix, returns: Supplied key type of part 0 does not match index 
part type
(No change because the proposed fix is for SQL not Lua)

And, since there is no change suggested for assignment,
implicit cast would still occur. Therefore:

CREATE TABLE t (s1 INTEGER PRIMARY KEY);
INSERT INTO t VALUES ('1');
Currently, returns: no error
After fix, returns: no error
(No change because assignments cause implicit cast)

SELECT UPPER(FALSE) FROM t;
Currently, returns: 'FALSE' (string)
After fix, returns: 'FALSE' (string)
(No change because functions can cause assignments)

I think that the proposed change is good.

Peter Gulutzan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-01-21 18:09 ` Peter Gulutzan
@ 2020-01-22 14:13   ` Mergen Imeev
  2020-01-22 14:24     ` Nikita Pettik
  2020-01-30 18:51     ` Konstantin Osipov
  2020-01-22 16:43   ` Peter Gulutzan
  1 sibling, 2 replies; 18+ messages in thread
From: Mergen Imeev @ 2020-01-22 14:13 UTC (permalink / raw)
  To: Peter Gulutzan; +Cc: tarantool-discussions

Hi! Thanks for answering.

On Tue, Jan 21, 2020 at 11:09:25AM -0700, Peter Gulutzan wrote:
> Hi,
> 
> On 2020-01-21 4:11 a.m., Mergen Imeev wrote:
> > Hi all,
> > I would like once again ask a question about implicit cast for
> > COMPARISON. Currently the chart of implicit cast for COMPARISON
> > looks this way:
> >             To BOOL | To INT | To DBL | To STR | To BIN
> > ---------------------------------------------------------
> > From BOOL | A       | -      | -      | -      | -
> > From INT  | -       | A      | A      | A      | -
> > From DBL  | -       | A      | A      | A      | -
> > From STR  | -       | S      | S      | A      | -
> > From BIN  | -       | -      | -      | -      | A
> >
> > I suggested to change this chart so that is should look this way:
> >             To BOOL | To NUM | To STR | To BIN
> > -----------------------------------------------
> > From BOOL | A       | -      | -      | -
> > From NUM  | -       | A      | -      | -
> > From STR  | -       | -      | A      | -
> > From BIN  | -       | -      | -      | A
> >
> > At the moment the only thing that different in these charts is
> > impicit cast from STRING to number and from number to STRING.
> >
> > I asked the Russian community in Telegram about whether to allow
> > this implicit cast for COMPARISON. At the moment, 31 people have
> > voted, and 25 of them have voted for dropping this implicit cast.
> >
> > So, once again I suggest using these rules:
> > 1) The values of numeric types are comparable without any implicit
> > cast.
> > 2) If the type of one of the values contains the type of another
> > value, then they are comparable.
> > 3) In any other cases, the values are not comparable.
> >
> > About "one type contains another type": we can say that INTEGER
> > contains UNSIGNED; NUMBER contains INTEGER, UNSIGNED, DOUBLE;
> > SCALAR contains all available in SQL types.
> >
> > Since all numeric types are comparable, my suggestion means that a
> > value of any type can be compared with a value of type SCALAR. And
> > in this comparison, the same rules will be used as in no-SQL
> > Tarantool. For example, any value of type STRING is greater than
> > any value of type INTEGER.
> >
> > What do you think about this?
> >
> 
> I believe that the words "compared with a value of type SCALAR"
> are technically incorrect -- columns can be defined as having a
> SCALAR type but the values in those columns are not SCALAR.
> But (I hope) that detail probably does not affect the proposition.
> 
Sorry, but I suggested a different approach: each value has
its own type, and SCALAR is one of the types. This means
that each value in a column of type SCALAR is of type
SCALAR. If we compare a value of type SCALAR with a value
of any other type, we will apply the SCALAR rules. If not
both values have numeric types and none of them has SCALAR
type, we get an error.

> This would reverse what some developers decided earlier
> in dev threads "Clarify the rule" and "[dev] cast".
> There would be no implicit casts, the SCALAR rules would apply.
> Therefore:
> 
> SELECT 1 = '1';
> Currently, returns: TRUE.
> After fix, returns: FALSE.
> 
No, after fix it will throw an error.

> CREATE TABLE t (s1 SCALAR PRIMARY KEY);
> INSERT INTO t VALUES (FALSE),('A');
> SELECT COUNT(*) FROM t WHERE s1 <> 'A';
> Currently, returns: Type mismatch: can not convert text to boolean.
> After fix, returns: 1.
> 
This is true.

> CREATE TABLE t (s1 INTEGER PRIMARY KEY);
> INSERT INTO t VALUES (1);
> SELECT COUNT(*) FROM t WHERE s1 <> 'A';
> Currently, returns: Type mismatch: can not convert text to boolean.
> After fix, returns: 1.
> 
Here we will see an error.

> If the above example is correct, it differs from NoSQL behaviour:
> box.schema.space.create('T')
> box.space.T:create_index('I')
> box.space.T:insert{1}
> box.space.T:select('A')
> Currently, returns: Supplied key type of part 0 does not match index part
> type
> After fix, returns: Supplied key type of part 0 does not match index part
> type
> (No change because the proposed fix is for SQL not Lua)
> 
> And, since there is no change suggested for assignment,
> implicit cast would still occur. Therefore:
> 
> CREATE TABLE t (s1 INTEGER PRIMARY KEY);
> INSERT INTO t VALUES ('1');
> Currently, returns: no error
> After fix, returns: no error
> (No change because assignments cause implicit cast)
> 
> SELECT UPPER(FALSE) FROM t;
> Currently, returns: 'FALSE' (string)
> After fix, returns: 'FALSE' (string)
> (No change because functions can cause assignments)
> 
> I think that the proposed change is good.
> 
> Peter Gulutzan
> 
> 
Still, I think the idea of using SCALAR rules is viable.
But I suggest adding this as one of the session settings.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-01-22 14:13   ` Mergen Imeev
@ 2020-01-22 14:24     ` Nikita Pettik
  2020-01-30 18:51     ` Konstantin Osipov
  1 sibling, 0 replies; 18+ messages in thread
From: Nikita Pettik @ 2020-01-22 14:24 UTC (permalink / raw)
  To: Mergen Imeev; +Cc: tarantool-discussions

On 22 Jan 17:13, Mergen Imeev wrote:
> Hi! Thanks for answering.
> 
> On Tue, Jan 21, 2020 at 11:09:25AM -0700, Peter Gulutzan wrote:
> > Hi,
> > 
> > On 2020-01-21 4:11 a.m., Mergen Imeev wrote:
> > If the above example is correct, it differs from NoSQL behaviour:
> > box.schema.space.create('T')
> > box.space.T:create_index('I')
> > box.space.T:insert{1}
> > box.space.T:select('A')
> > Currently, returns: Supplied key type of part 0 does not match index part
> > type
> > After fix, returns: Supplied key type of part 0 does not match index part
> > type
> > (No change because the proposed fix is for SQL not Lua)
> > 
> > And, since there is no change suggested for assignment,
> > implicit cast would still occur. Therefore:
> > 
> > CREATE TABLE t (s1 INTEGER PRIMARY KEY);
> > INSERT INTO t VALUES ('1');
> > Currently, returns: no error
> > After fix, returns: no error
> > (No change because assignments cause implicit cast)
> > 
> > SELECT UPPER(FALSE) FROM t;
> > Currently, returns: 'FALSE' (string)
> > After fix, returns: 'FALSE' (string)
> > (No change because functions can cause assignments)
> > 
> > I think that the proposed change is good.
> > 
> > Peter Gulutzan
> > 
> > 
> Still, I think the idea of using SCALAR rules is viable.
> But I suggest adding this as one of the session settings.

IMHO pretty bad idea: too much effort is required to support it;
on the other hand, I can't come up with real use case for this
feature. Configurable type system is an evil.
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-01-21 18:09 ` Peter Gulutzan
  2020-01-22 14:13   ` Mergen Imeev
@ 2020-01-22 16:43   ` Peter Gulutzan
  2020-01-30 18:52     ` Konstantin Osipov
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Gulutzan @ 2020-01-22 16:43 UTC (permalink / raw)
  To: Mergen Imeev, tarantool-discussions

Hi,

On 2020-01-21 11:09 a.m., Peter Gulutzan wrote:
<cut>
 > I think that the proposed change is good.
I withdraw that remark. I misunderstood the proposal.

Peter Gulutzan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-01-22 14:13   ` Mergen Imeev
  2020-01-22 14:24     ` Nikita Pettik
@ 2020-01-30 18:51     ` Konstantin Osipov
  1 sibling, 0 replies; 18+ messages in thread
From: Konstantin Osipov @ 2020-01-30 18:51 UTC (permalink / raw)
  To: Mergen Imeev; +Cc: tarantool-discussions

* Mergen Imeev <imeevma@tarantool.org> [20/01/22 17:18]:
> Still, I think the idea of using SCALAR rules is viable.
> But I suggest adding this as one of the session settings.

This is fundamental part of the language. It should not depend
on a session setting - this makes it very easy to shoot oneself in
the foot.



-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-01-22 16:43   ` Peter Gulutzan
@ 2020-01-30 18:52     ` Konstantin Osipov
  2020-02-03 11:35       ` Mergen Imeev
  0 siblings, 1 reply; 18+ messages in thread
From: Konstantin Osipov @ 2020-01-30 18:52 UTC (permalink / raw)
  To: Peter Gulutzan; +Cc: tarantool-discussions

* Peter Gulutzan <pgulutzan@ocelot.ca> [20/01/22 19:47]:
> Hi,
> 
> On 2020-01-21 11:09 a.m., Peter Gulutzan wrote:
> <cut>
> > I think that the proposed change is good.
> I withdraw that remark. I misunderstood the proposal.

I side with PeterG.

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-01-30 18:52     ` Konstantin Osipov
@ 2020-02-03 11:35       ` Mergen Imeev
  2020-02-03 15:02         ` Konstantin Osipov
  2020-02-04 18:50         ` Nikita Pettik
  0 siblings, 2 replies; 18+ messages in thread
From: Mergen Imeev @ 2020-02-03 11:35 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-discussions

On Thu, Jan 30, 2020 at 09:52:16PM +0300, Konstantin Osipov wrote:
> * Peter Gulutzan <pgulutzan@ocelot.ca> [20/01/22 19:47]:
> > Hi,
> > 
> > On 2020-01-21 11:09 a.m., Peter Gulutzan wrote:
> > <cut>
> > > I think that the proposed change is good.
> > I withdraw that remark. I misunderstood the proposal.
> 
> I side with PeterG.
> 
> -- 
> Konstantin Osipov, Moscow, Russia

Hi,

Let me clarify: you think that during comparison it makes sense
that STRING is implicitly converted to numbers?

If so, then let's think about it. I think we have some questions
to discuss in this case:
1) I think it makes sense that numeric types can be compared
without any conversion. Do you agree? We have a special function
that implements a comparison between integers and floating point
numbers. If you do not agree with me, then make your suggestion.
2) STRING can be implicitly cast to a number. In case it can be
cast to INTEGER, it will be INTEGER. In case it can be cast to
DOUBLE, it will be DOUBLE. Do you agree? Should we return to this
issue after the implementation of DECIMAL?
3) Can STRING be implicitly cast to BOOLEAN?
4) Can STRING be implicitly cast to BINARY?
5) In case STRING cannot be implicitly cast to the type of the
other operand, should we allow implicitly casting the other
operand to STRING? For example, from "'123r' > 124" move to
"'123r' > '124'"?
6) Do you agree that only implicit casting from/to STRING is
allowed? I mean that nothing else can be implicitly cast during a
comparison with any other type if one of the types does not
contain the other.
7) We need to clarify the rules when comparing SCALAR values. I
think we cannot use the Tarantool rules here, as the Tarantool
rules indicate that “100 < '2' == true”, but we decided that
"100 > '2' == true", since '2' implicitly cast to 2. Could you
suggest the rules that we should use here?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-02-03 11:35       ` Mergen Imeev
@ 2020-02-03 15:02         ` Konstantin Osipov
  2020-02-04 18:50         ` Nikita Pettik
  1 sibling, 0 replies; 18+ messages in thread
From: Konstantin Osipov @ 2020-02-03 15:02 UTC (permalink / raw)
  To: Mergen Imeev; +Cc: tarantool-discussions

* Mergen Imeev <imeevma@tarantool.org> [20/02/03 14:39]:
> > > On 2020-01-21 11:09 a.m., Peter Gulutzan wrote:
> > > <cut>
> > > > I think that the proposed change is good.
> > > I withdraw that remark. I misunderstood the proposal.
> > 
> > I side with PeterG.
> > 
> > -- 
> > Konstantin Osipov, Moscow, Russia
> 
> Hi,
> 
> Let me clarify: you think that during comparison it makes sense
> that STRING is implicitly converted to numbers?


The issue is not with any specific problem, which may require a
solution, the issue is with how the solution is packaged in your
proposal.


Shall strings be implicitly converted to numbers or numbers
implicitly converted to strings during comparison? Probably not.
Especially, if, as you pointed out, there are other rules for
SCALAR or ANY bags of types, for which implicit conversion will
not taking place.


Shall we use the rules you suggested? I am afraid not, first of
all because it's unclear whether this is the end of the story.

While PeterG, I believe, would advocate strict adherence to ANSI,
my take is that we should not give existing postgresql or oracle
users too much trouble here. The reason I don't include MySQL on
the list is that I know that mysql has historically been way more
messy with its implicit conversion rules, so I'd not copy it here. 

Can we begin fixing this one thing at a time? This would help
everyone quicker find balance between these priorities.
E.g. can we make
sure that all numeric types are comparable (as it seems already
everyone's desire), comparison rules are
clear to everyone and everyone agrees with them, fix all bugs and
cover it with tests?

Then we can do the same with string/blob types. 

Then we can look at comparison and implicit conversion rules
between type families, and also carefully study existing vendors,
so we don't break too much existing PostgreSQL or MySQL queries
with more strict comparison rules or funny comparison order.

> If so, then let's think about it. I think we have some questions
> to discuss in this case:
> 1) I think it makes sense that numeric types can be compared
> without any conversion. Do you agree? We have a special function
> that implements a comparison between integers and floating point
> numbers. If you do not agree with me, then make your suggestion.

I agree it makes sense, but to be on the safe side, I'd also check 
how existing vendors handle it - so that we don't introduce some
obvious inconvenience.

> 2) STRING can be implicitly cast to a number. In case it can be
> cast to INTEGER, it will be INTEGER. In case it can be cast to
> DOUBLE, it will be DOUBLE. Do you agree? Should we return to this
> issue after the implementation of DECIMAL?

I don't know. I'd say DECIMAL is a game changer. I'd say nothing
should be implicitly cast to floating point type once there is an
exact type.
Besides, there are collate/locale issues. Are we going to handle
"1 234 567"? What about "1,234,567"?

> 3) Can STRING be implicitly cast to BOOLEAN?

Look at vendors and ANSI, deal with one problem at a time. It's
not a major issue. The biggest issue is string -> number, and 
it's best to deal with it after decimal is part of SQL.

> 4) Can STRING be implicitly cast to BINARY?

Ditto, look at vendors and ANSI, move into an own patch..

The same applies  to the rest of your questions: there is no user
value in us trying to provide what we believe is a "better"
semantics over the existing vendors and ansi in such small
details. Even if we think it's safe. 


> 5) In case STRING cannot be implicitly cast to the type of the
> other operand, should we allow implicitly casting the other
> operand to STRING? For example, from "'123r' > 124" move to
> "'123r' > '124'"?


> 6) Do you agree that only implicit casting from/to STRING is
> allowed? I mean that nothing else can be implicitly cast during a
> comparison with any other type if one of the types does not
> contain the other.
> 7) We need to clarify the rules when comparing SCALAR values. I
> think we cannot use the Tarantool rules here, as the Tarantool
> rules indicate that “100 < '2' == true”, but we decided that
> "100 > '2' == true", since '2' implicitly cast to 2. Could you
> suggest the rules that we should use here?
> 

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-02-03 11:35       ` Mergen Imeev
  2020-02-03 15:02         ` Konstantin Osipov
@ 2020-02-04 18:50         ` Nikita Pettik
  2020-02-05  2:57           ` Peter Gulutzan
                             ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Nikita Pettik @ 2020-02-04 18:50 UTC (permalink / raw)
  To: Mergen Imeev; +Cc: tarantool-discussions

On 03 Feb 14:35, Mergen Imeev wrote:
> On Thu, Jan 30, 2020 at 09:52:16PM +0300, Konstantin Osipov wrote:
> > * Peter Gulutzan <pgulutzan@ocelot.ca> [20/01/22 19:47]:
> > > Hi,
> > > 
> > > On 2020-01-21 11:09 a.m., Peter Gulutzan wrote:
> > > <cut>
> > > > I think that the proposed change is good.
> > > I withdraw that remark. I misunderstood the proposal.
> > 
> > I side with PeterG.
> > 
> > -- 
> > Konstantin Osipov, Moscow, Russia
> 
> Hi,
> 
> Let me clarify: you think that during comparison it makes sense
> that STRING is implicitly converted to numbers?
> 
> If so, then let's think about it. I think we have some questions
> to discuss in this case:
> 1) I think it makes sense that numeric types can be compared
> without any conversion. Do you agree? We have a special function
> that implements a comparison between integers and floating point
> numbers. If you do not agree with me, then make your suggestion.
> 2) STRING can be implicitly cast to a number. In case it can be
> cast to INTEGER, it will be INTEGER. In case it can be cast to
> DOUBLE, it will be DOUBLE. Do you agree?

Does it mean that '1.0' is converted to integer when it is compared
with number?

> Should we return to this
> issue after the implementation of DECIMAL?
> 3) Can STRING be implicitly cast to BOOLEAN?

No, it can't.

> 4) Can STRING be implicitly cast to BINARY?

No, it can't

> 5) In case STRING cannot be implicitly cast to the type of the
> other operand, should we allow implicitly casting the other
> operand to STRING? For example, from "'123r' > 124" move to
> "'123r' > '124'"?

No, we shouldn't. What is more, now comparison operations are not
commutative:

tarantool> select '123r' = 124
---
- null
- 'Type mismatch: can not convert 123r to numeric'
...

tarantool> select 124 = '123r'
---
- metadata:
  - name: 124 = '123r'
    type: boolean
  rows:
  - [false]
...

Both comparisons should result in error.

> 6) Do you agree that only implicit casting from/to STRING is
> allowed? I mean that nothing else can be implicitly cast during a
> comparison with any other type if one of the types does not
> contain the other.

What about scalar/any types?

> 7) We need to clarify the rules when comparing SCALAR values. I
> think we cannot use the Tarantool rules here, as the Tarantool
> rules indicate that “100 < '2' == true”, but we decided that
> "100 > '2' == true", since '2' implicitly cast to 2. Could you
> suggest the rules that we should use here?

There's already existing solution: while fetching value from space,
we preserve its initial field type. For SCALAR values we may use one
rules, for values fetched from INTEGER/STRING fields - apply another ones.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-02-04 18:50         ` Nikita Pettik
@ 2020-02-05  2:57           ` Peter Gulutzan
  2020-02-07 14:24             ` Nikita Pettik
  2020-02-05  7:52           ` Mergen Imeev
  2020-02-06 12:41           ` Mergen Imeev
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Gulutzan @ 2020-02-05  2:57 UTC (permalink / raw)
  To: Nikita Pettik, Mergen Imeev; +Cc: tarantool-discussions

Hi,

On 2020-02-04 11:50 a.m., Nikita Pettik wrote:

 > On 03 Feb 14:35, Mergen Imeev wrote:
<cut>
 >> 7) We need to clarify the rules when comparing SCALAR values. I
 >> think we cannot use the Tarantool rules here, as the Tarantool
 >> rules indicate that “100 < '2' == true”, but we decided that
 >> "100 > '2' == true", since '2' implicitly cast to 2. Could you
 >> suggest the rules that we should use here?
 >
 > There's already existing solution: while fetching value from space,
 > we preserve its initial field type. For SCALAR values we may use one
 > rules, for values fetched from INTEGER/STRING fields - apply another 
ones.

I do not think it is always so simple.

First, with regard to the implicit cast part of the question,
the current situation is:
CREATE TABLE t (A SCALAR PRIMARY KEY, B INTEGER);
INSERT INTO t VALUES (1,1);
SELECT * FROM t WHERE A = B AND A < '0' AND B > '0';
Result: 1 row.
It looks odd, but that is what happens if the SCALAR rules
and the non-SCALAR rules can fit in the same statement.
This could be solved by making implicit cast illegal.

But you still want to have two sets of rules, and (behaviour change) 
values can be SCALAR.

Suppose CREATE TABLE t (scalar_column SCALAR PRIMARY KEY,
                         non_scalar_column INT);

CAST(non_scalar_column AS SCALAR) result data type is SCALAR?

scalar_column < non_scalar_column is legal?

scalar_column < 1 /* data type of 1 is INTEGER */ ... is legal?

SUM(scalar_column) is SCALAR?

scalar_column + non_scalar_column is SCALAR?

SELECT scalar_column UNION SELECT non_scalar_column is SCALAR?

UPDATE t SET non_scalar_column = scalar_column is legal?

Peter Gulutzan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-02-04 18:50         ` Nikita Pettik
  2020-02-05  2:57           ` Peter Gulutzan
@ 2020-02-05  7:52           ` Mergen Imeev
  2020-02-06 12:41           ` Mergen Imeev
  2 siblings, 0 replies; 18+ messages in thread
From: Mergen Imeev @ 2020-02-05  7:52 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-discussions

On Tue, Feb 04, 2020 at 09:50:11PM +0300, Nikita Pettik wrote:
> On 03 Feb 14:35, Mergen Imeev wrote:
> > On Thu, Jan 30, 2020 at 09:52:16PM +0300, Konstantin Osipov wrote:
> > > * Peter Gulutzan <pgulutzan@ocelot.ca> [20/01/22 19:47]:
> > > > Hi,
> > > > 
> > > > On 2020-01-21 11:09 a.m., Peter Gulutzan wrote:
> > > > <cut>
> > > > > I think that the proposed change is good.
> > > > I withdraw that remark. I misunderstood the proposal.
> > > 
> > > I side with PeterG.
> > > 
> > > -- 
> > > Konstantin Osipov, Moscow, Russia
> > 
> > Hi,
> > 
> > Let me clarify: you think that during comparison it makes sense
> > that STRING is implicitly converted to numbers?
> > 
> > If so, then let's think about it. I think we have some questions
> > to discuss in this case:
> > 1) I think it makes sense that numeric types can be compared
> > without any conversion. Do you agree? We have a special function
> > that implements a comparison between integers and floating point
> > numbers. If you do not agree with me, then make your suggestion.
> > 2) STRING can be implicitly cast to a number. In case it can be
> > cast to INTEGER, it will be INTEGER. In case it can be cast to
> > DOUBLE, it will be DOUBLE. Do you agree?
> 
> Does it mean that '1.0' is converted to integer when it is compared
> with number?
> 
No. I meant to apply the current rules.

> > Should we return to this
> > issue after the implementation of DECIMAL?
> > 3) Can STRING be implicitly cast to BOOLEAN?
> 
> No, it can't.
> 
> > 4) Can STRING be implicitly cast to BINARY?
> 
> No, it can't
> 
> > 5) In case STRING cannot be implicitly cast to the type of the
> > other operand, should we allow implicitly casting the other
> > operand to STRING? For example, from "'123r' > 124" move to
> > "'123r' > '124'"?
> 
> No, we shouldn't. What is more, now comparison operations are not
> commutative:
> 
> tarantool> select '123r' = 124
> ---
> - null
> - 'Type mismatch: can not convert 123r to numeric'
> ...
> 
> tarantool> select 124 = '123r'
> ---
> - metadata:
>   - name: 124 = '123r'
>     type: boolean
>   rows:
>   - [false]
> ...
> 
> Both comparisons should result in error.
> 
> > 6) Do you agree that only implicit casting from/to STRING is
> > allowed? I mean that nothing else can be implicitly cast during a
> > comparison with any other type if one of the types does not
> > contain the other.
> 
> What about scalar/any types?
> 
String "if one of the types does not contain the other."
means that any scalar type can be cast to SCALAR and
any type can be cast to ANY.

> > 7) We need to clarify the rules when comparing SCALAR values. I
> > think we cannot use the Tarantool rules here, as the Tarantool
> > rules indicate that “100 < '2' == true”, but we decided that
> > "100 > '2' == true", since '2' implicitly cast to 2. Could you
> > suggest the rules that we should use here?
> 
> There's already existing solution: while fetching value from space,
> we preserve its initial field type. For SCALAR values we may use one
> rules, for values fetched from INTEGER/STRING fields - apply another ones.
> 
I think it is confusing. Also, I agree with Peter here.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-02-04 18:50         ` Nikita Pettik
  2020-02-05  2:57           ` Peter Gulutzan
  2020-02-05  7:52           ` Mergen Imeev
@ 2020-02-06 12:41           ` Mergen Imeev
  2020-02-06 13:09             ` Mergen Imeev
  2 siblings, 1 reply; 18+ messages in thread
From: Mergen Imeev @ 2020-02-06 12:41 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-discussions

On Tue, Feb 04, 2020 at 09:50:11PM +0300, Nikita Pettik wrote:
<cut>

> > 7) We need to clarify the rules when comparing SCALAR values. I
> > think we cannot use the Tarantool rules here, as the Tarantool
> > rules indicate that “100 < '2' == true”, but we decided that
> > "100 > '2' == true", since '2' implicitly cast to 2. Could you
> > suggest the rules that we should use here?
> 
> There's already existing solution: while fetching value from space,
> we preserve its initial field type. For SCALAR values we may use one
> rules, for values fetched from INTEGER/STRING fields - apply another ones.
> 

Hi,
After some thinking, I came to the conclusion that your idea is
pretty good. I suggest this algorithm for comparison:
if (has NULL)
	<return NULL>
else if (same mp_type) \\MP_TYPE == MEM_TYPE here.
	<compare>
else if (both numeric)
	<compare>
else if (has SCALAR)
	<compare using SCALAR rules>
else if (STRING compared with numeric)
	<if possible cast string to number and compare, else error>
else
	<error>

In this case, if we still decide to remove the implicit cast from
STRING, we can do this quite easily.

Please note that this is a simplified version. For example, we do
not always need to return NULL if one of the operands is NULL.

In addition, this implementation means that we are moving the
implicit cast for comparison from OP_ApplyType to opcodes
responsible for comparing.

What do you think about this?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-02-06 12:41           ` Mergen Imeev
@ 2020-02-06 13:09             ` Mergen Imeev
  0 siblings, 0 replies; 18+ messages in thread
From: Mergen Imeev @ 2020-02-06 13:09 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-discussions

On Thu, Feb 06, 2020 at 03:41:06PM +0300, Mergen Imeev wrote:
> On Tue, Feb 04, 2020 at 09:50:11PM +0300, Nikita Pettik wrote:
> <cut>
> 
> > > 7) We need to clarify the rules when comparing SCALAR values. I
> > > think we cannot use the Tarantool rules here, as the Tarantool
> > > rules indicate that “100 < '2' == true”, but we decided that
> > > "100 > '2' == true", since '2' implicitly cast to 2. Could you
> > > suggest the rules that we should use here?
> > 
> > There's already existing solution: while fetching value from space,
> > we preserve its initial field type. For SCALAR values we may use one
> > rules, for values fetched from INTEGER/STRING fields - apply another ones.
> > 
> 
> Hi,
> After some thinking, I came to the conclusion that your idea is
> pretty good. I suggest this algorithm for comparison:
> if (has NULL)
> 	<return NULL>
> else if (same mp_type) \\MP_TYPE == MEM_TYPE here.
> 	<compare>
> else if (both numeric)
> 	<compare>
> else if (has SCALAR)
> 	<compare using SCALAR rules>
> else if (STRING compared with numeric)
> 	<if possible cast string to number and compare, else error>
> else
> 	<error>
> 
> In this case, if we still decide to remove the implicit cast from
> STRING, we can do this quite easily.
> 
> Please note that this is a simplified version. For example, we do
> not always need to return NULL if one of the operands is NULL.
> 
> In addition, this implementation means that we are moving the
> implicit cast for comparison from OP_ApplyType to opcodes
> responsible for comparing.
> 
> What do you think about this?
> 

Actually, currently it partly works this way:

tarantool> CREATE TABLE t(i INT PRIMARY KEY, a SCALAR);
---
- row_count: 1
...

tarantool> INSERT INTO t VALUES (1,1), (2,2), (3,3), (4,4), (5,5);
---
- row_count: 5
...

tarantool> SELECT COUNT(i) FROM t WHERE '2' > a;
---
- metadata:
  - name: COUNT(i)
    type: integer
  rows:
  - [5]
...


Still, not always:

tarantool> SELECT COUNT(i) FROM t WHERE '2' > a + 1;
---
- metadata:
  - name: COUNT(i)
    type: integer
  rows:
  - [0]
...


Also, it doesn't work for constants:

tarantool> select 2 > '1';
---
- metadata:
  - name: 2 > '1'
    type: boolean
  rows:
  - [true]
...

tarantool> select 2 > CAST('1' AS SCALAR);
---
- metadata:
  - name: 2 > CAST('1' AS SCALAR)
    type: boolean
  rows:
  - [true]
...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-02-05  2:57           ` Peter Gulutzan
@ 2020-02-07 14:24             ` Nikita Pettik
  2020-02-07 14:40               ` Konstantin Osipov
  2020-02-07 22:30               ` Peter Gulutzan
  0 siblings, 2 replies; 18+ messages in thread
From: Nikita Pettik @ 2020-02-07 14:24 UTC (permalink / raw)
  To: Peter Gulutzan; +Cc: tarantool-discussions

On 04 Feb 19:57, Peter Gulutzan wrote:
> Hi,
> 
> On 2020-02-04 11:50 a.m., Nikita Pettik wrote:
> 
> > On 03 Feb 14:35, Mergen Imeev wrote:
> <cut>
> >> 7) We need to clarify the rules when comparing SCALAR values. I
> >> think we cannot use the Tarantool rules here, as the Tarantool
> >> rules indicate that “100 < '2' == true”, but we decided that
> >> "100 > '2' == true", since '2' implicitly cast to 2. Could you
> >> suggest the rules that we should use here?
> >
> > There's already existing solution: while fetching value from space,
> > we preserve its initial field type. For SCALAR values we may use one
> > rules, for values fetched from INTEGER/STRING fields - apply another ones.
> 
> I do not think it is always so simple.
> 
> First, with regard to the implicit cast part of the question,
> the current situation is:
> CREATE TABLE t (A SCALAR PRIMARY KEY, B INTEGER);
> INSERT INTO t VALUES (1,1);
> SELECT * FROM t WHERE A = B AND A < '0' AND B > '0';
> Result: 1 row.
> It looks odd, but that is what happens if the SCALAR rules
> and the non-SCALAR rules can fit in the same statement.
> This could be solved by making implicit cast illegal.

Yes, I stick to the point that implicit cast should be illegal.
 
> But you still want to have two sets of rules, and (behaviour change) values
> can be SCALAR.
> 
> Suppose CREATE TABLE t (scalar_column SCALAR PRIMARY KEY,
>                         non_scalar_column INT);
> 
> CAST(non_scalar_column AS SCALAR) result data type is SCALAR?

Yes.
 
> scalar_column < non_scalar_column is legal?

Yes, comparison rules are supposed to be the same as in NoSQL Tarantool.

> scalar_column < 1 /* data type of 1 is INTEGER */ ... is legal?

Yes.

> SUM(scalar_column) is SCALAR?

No, it is not. Taking into account removing implicit cast between
numeric and string types, SUM() is assumed to be finished only in
case all arguments are of numeric type. What is more SUM() features
NUMBER return type (in its function definition).

> scalar_column + non_scalar_column is SCALAR?

No. If addition is allowed, type of resulting value will be numeric
(integer or float).

> SELECT scalar_column UNION SELECT non_scalar_column is SCALAR?

I guess yes (at least it seems to be reasonable to me).

> UPDATE t SET non_scalar_column = scalar_column is legal?

It depends on particular value of scalar_column.

To sum up, my proposal is to allow comparing scalar values with values of
any other type. On the other hand, result of such operations like
assignment or addition should depend on particular values of SCALAR
type.

Alternatively, we can abandon this idea and operate apply comparison
rules depending purely on mp_ types of particular scalar values. But
then rules in NoSQL and SQL will be different.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-02-07 14:24             ` Nikita Pettik
@ 2020-02-07 14:40               ` Konstantin Osipov
  2020-02-07 22:30               ` Peter Gulutzan
  1 sibling, 0 replies; 18+ messages in thread
From: Konstantin Osipov @ 2020-02-07 14:40 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-discussions

* Nikita Pettik <korablev@tarantool.org> [20/02/07 17:29]:
> Alternatively, we can abandon this idea and operate apply comparison
> rules depending purely on mp_ types of particular scalar values. But
> then rules in NoSQL and SQL will be different.

This would be worst thing of all. mp_type is an encoding format,
which, btw, I would like to ditch from in-memory tuple format
going forward (only keep for data on disk and wirte) - 'cause
compression benefits it gives do not outweigh the performance
slow down of variable representation or need to pack/unpack.

There should be a single set of data types for SQL and NoSQL and
the rules should be the same. If NoSQL types don't have some ansi
features, then these features should be added to nosql, and sql
should use nosql.

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-02-07 14:24             ` Nikita Pettik
  2020-02-07 14:40               ` Konstantin Osipov
@ 2020-02-07 22:30               ` Peter Gulutzan
  2020-02-11 13:32                 ` Mergen Imeev
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Gulutzan @ 2020-02-07 22:30 UTC (permalink / raw)
  To: Nikita Pettik; +Cc: tarantool-discussions

Hi,

On 2020-02-07 7:24 a.m., Nikita Pettik wrote:

 > <cut>
 > To sum up, my proposal is to allow comparing scalar values with values of
 > any other type. On the other hand, result of such operations like
 > assignment or addition should depend on particular values of SCALAR
 > type.
 >
 > Alternatively, we can abandon this idea and operate apply comparison
 > rules depending purely on mp_ types of particular scalar values. But
 > then rules in NoSQL and SQL will be different.
 >

I hope that you will abandon this idea,
but let us pretend that both these behaviour changes are accepted:
(1) No implicit cast
(2) Values can be SCALAR
with these implications or follow-ups.

1. A value is SCALAR if and only if:
It is an item in a column defined as SCALAR, or
It is the result of CAST(value AS SCALAR), or
It is the result LUA('return scalar-value') or similar Lua function.

2. If a value is SCALAR, then it has two types:
according to SQL, and according to Lua.
Example:
box.execute([[INSERT INTO b VALUES ('a');]])
box.execute([[CREATE TABLE b (s1 SCALAR PRIMARY KEY);]]
box.execute([[SELECT TYPEOF(s1) FROM b;]])
type(box.space.B:select('a')[1][1])
The SQL statement will return 'scalar',
the Lua statement will return 'string'.
You might decide to redefine or rename TYPEOF(),
but you cannot avoid that there are now two types,
and probably need a new built-in function PRIMITIVE_TYPE().

3. For deciding whether an operation "is legal",
we depend on defined type. For example, LENGTH(x)
is legal if defined type is STRING or VARBINARY
or SCALAR. For deciding whether an operation
"is an error", we depend on primitive type.
For example, if scalar_value = 5.5 then
LENGTH(scalar_value) is an error although it is legal.
Presumably the error is a "runtime error" although I
suppose it is possible to know in advance whether all
values of a scalar column have the same primitive type.

4. These situations are subject to "legal but runtime error"
because the scalar value's primitive type is expected to be a number:
Arithmetic operators + - * / % << >> & |
Functions ABS(), CHAR(), AVG(), SUM(), TOTAL().
The result is never SCALAR.

5. These situations are subject to "legal but runtime error"
because the scalar value's primitive type is expected to be a BOOLEAN:
Logical operators NOT AND OR.
The result is never SCALAR.
Example: if scalar_value is FALSE, then
SELECT scalar_value OR scalar_value is legal and
result data type is BOOLEAN (not SCALAR).
Example: if scalar_value is 'FALSE', then
SELECT NOT scalar_value is a runtime error (never cast).

6. These situations are subject to "legal but runtime error"
because the scalar value's primitive type is expected to be a STRING:
There are situations where a primitive value is expected to be a STRING:
Functions LOWER() PRINTF() QUOTE() REPLACE() SOUNDEX() TRIM() UNICODE() 
UPPER()
Clause COLLATE

7. These situations are subject to "legal but runtime error"
because the scalar value's primitive type is expected to be STRING or 
VARBINARY:
Situations where primitive value is expected to be STRING or VARBINARY:
Functions HEX() LENGTH() POSITION() SUBSTR()

8. The concatenation situation looks like a special case
because the scalar value's primitive type is expected to be STRING or 
VARBINARY:
If primitive types are not the same, error.
If primitive types are not either STRING or VARBINARY, error.
If scalar_value || scalar_value: okay, result is SCALAR.
If scalar_value || string_value: okay, result is STRING.
If scalar_value || varbinary_value: okay, result is VARBINARY.
This is just a guess, and it is slightly different from what you
decided for arithmetic operations.
The same rules apply for GROUP_CONCAT().

9. These situations are not subject to "legal but runtime error":
Functions GREATEST() LEAST()
Example: LEAST(FALSE, 1, 'x', X'44') is FALSE and result data type
is BOOLEAN. This is the current behaviour, and it means that we
follow SCALAR rules even though none of the operands is SCALAR.
The same rules apply for MIN() or MAX() if the operand is SCALAR.

10. Comparison operations involving a SCALAR value cause:
Comparison according to SCALAR rules, therefore always legal.
Comparison operations > = < <= > >= LIKE BETWEEN CASE
Implied comparisons UNIQUE DISTINCT GROUP ORDER UNION
Example:
CREATE TABLE t (s1 SCALAR PRIMARY KEY);
INSERT INTO t VALUES (FALSE), (1), ('1'), (X'31');
SELECT * FROM t WHERE s1 = '1' OR s1 > '1';
Currently this causes a "type mismatch" error.
But this error will be fixed if your plan is accepted.

11. Assignment to scalar from any data type is always legal.
Assignment from scalar to any data type is legal if and only
if the primitive type is compatible.

12 SELECT scalar_column UNION SELECT non_scalar_column is SCALAR.
(Also EXCEPT and INTERSECT)


In emails in March 2019 I wrote about compatibility
with other DBMSs and with ODBC. The thread was
"The rules for the scalar data type in SQL".
I'll quote the first email as a reminder,
some of it is still relevant. By the way K. Yukhin replied:
"As far as I understand SCALAR is a bag of types. No value
can ever have type SCALAR."



-------- Forwarded Message --------
Subject:     The rules for the scalar data type in SQL
Date:     Fri, 8 Mar 2019 16:31:02 -0700
From:     Peter Gulutzan <pgulutzan@ocelot.ca>
To:     Nikita Pettik <korablev@tarantool.org>
CC:     dev@tarantool.org


Hi,

What are the rules for the SCALAR data type in SQL?
The decisions have been made, but eventually I'll need to find out what 
they are.
So I'll guess and see whether anyone objects.

Tarantool had two choices:
Choice#1: SCALAR is a "primitive" data type, on the same level as 
INTEGER, VARCHAR, BLOB, etc.
Choice#2: SCALAR is a "complex" data type, two instances can have two 
different primitive data types.
Tarantool chose #2.
I could describe this with terms like "union as in C but without labels",
or "limited polymorphism", etc. but fear that using analogies would 
cause arguments.

I can think of only three SQL DBMSs that have something close to this:
Hive UNIONTYPE 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-UnionTypesunionUnionTypes
SQL Server SQL_VARIANT Re SQL Server SQL_VARIANT 
https://docs.microsoft.com/en-us/sql/t-sql/data-types/sql-variant-transact-sql?view=sql-server-2017
Oracle ANYDATA 
https://docs.oracle.com/database/121/ARPLS/t_anydat.htm#ARPLS077
This is good because Tarantool can claim to have a big feature which
other open-source DBMSs lack, that has resulted from its "NoSQL + SQL" 
heritage.
Although some of the same functionality is available via structured 
user-defined
types or JSON or XML, we're talking about a simple fully-predefined 
solution.

1. The name of the data type is SCALAR, not UNIONTYPE or SQL_VARIANT
or ANYDATA, to be compatible with Tarantool/NoSQL.
There are no other words in a SCALAR item's definition,
for example "SCALAR(5000)" is illegal and "SCALAR COLLATE x" is illegal
and Hive-like "SCALAR<integer,double>" is illegal.
SCALAR is a new reserved word.

2. Any item defined as SCALAR has an underlying primitive type.
For example, after
CREATE TABLE t (s1 SCALAR PRIMARY KEY);
INSERT INTO t VALUES (55),(X'41');
The underlying primitive type of the item in the first row is INTEGER
because literal 55 has data type INTEGER, and the underlying primitive
type in the second row is BLOB (when we start supporting BLOBs again).
An item's primitive type is far more important than its defined type.
Incidentally Tarantool might find the primitive type by looking at the
way MsgPack stores, but that is an implementation detail.

3. SCALAR is not a supertype. Before Choice#1 was accepted, I
tried to apply the logic of supertypes for further rules,
for example saying: "in integer-versus-real comparisons we can
coerce the integer to the real because real is a supertype,
and SCALAR is a supertype of both integer and real because all
their instances can be represented in SCALAR, therefore for
real-to-scalar comparisons we can coerce the real to a scalar".
But Choice#1 was rejected, so supertype logic is illegal.

4. There is no literal syntax which implies data type SCALAR.
Standard SQL indicates date/time/timestamp literals by preceding
them with a data type name e.g. DATE '1990-01-01', and we could
do the equivalent thing by saying SCALAR X'41' etc., but we won't
because it would be meaningless, the primitive type of SCALAR X'41'
would still be BLOB and that is what matters.

5. TYPEOF(x) is never SCALAR, it is always the
underlying data type. This is true even if x is null
(we still think there is a "data type" named "null").
In fact there is no function that is guaranteed to
return the defined data type.
For example, TYPEOF(CAST(1 AS SCALAR)); returns
INTEGER, not SCALAR. But sometimes I still see 'blob'.

6. For any operation that requires casting from an item defined
as SCALAR, the syntax is legal but the operation may fail at runtime.
At runtime, Tarantool detects the underlying primitive data type and 
applies the
rules for that. For example, if a definition is
CREATE TABLE t (s1 INT PRIMARY KEY, s2 SCALAR);
and within any row s2 = 'a', that is, its underlying primitive type is
varchar/char/text or whatever we decide to call character strings,
then UPDATE t SET s1 = s2; is illegal.
We usually do not know that in advance.

7. For any dyadic operation that requires implicit casting to an item 
defined
as SCALAR, the syntax is legal and the operation will not fail at runtime.
Take this situation: comparison with SCALAR and a primitive type.
The implicit cast is primitive-to-scalar not scalar-to-primitive.
For example, after
CREATE TABLE t (s1 SCALAR PRIMARY KEY);
INSERT INTO t VALUES (X'41');
SELECT * FROM t WHERE s1 > 'a';
The comparison is valid, because
we know the ordering of X'41' and 'a' in Tarantool/NoSQL
'scalar'. This is not because of supertyping (see Rule#3),
and we do not have implicit casting to SCALAR
(see Rule#6), so I think this is inconsistent, but it
looks better than an error.

8. The result data type of min/max/sum operation
on a column defined as SCALAR is SCALAR,
unless the result values is NULL. For example:
CREATE TABLE t (s1 INT, s2 SCALAR PRIMARY KEY);
INSERT INTO t VALUES (1,X'44'),(2,11),(3,1E4),(4,'a');
SELECT MIN(s2),MAX(s2),SUM(s2) FROM t;
The result is: - - [11, 'D', 10011].
That is only possible with Tarantool/NoSQL scalar rules,
but we have to ignore the illegal arithmetic (that X'44' + 11 is illegal).
Sometimes typeof(min-max-sum(s2)) is 'blob' but that's just a bug.

9. The result data type of a primitive combination is never
SCALAR, because we in effect use TYPEOF(item) not
the defined data type of the item.
(I use the word "combination" in the way that the
standard document uses it for section ""Result of
data type combinations".) Therefore for
MAX(1E308, 'a', 0, X'00')
which was the subject of an argument in issue#4032
https://github.com/tarantool/tarantool/issues/4032
There I used a "supertype" argument, and Nikita Pettik
rejected it, which is proper (see Rule#3).
I also admit that standard SQL wouldn't allow this.
However, since primitive-to-scalar comparisons are
legal, and because I believe the standard-SQL limitation
is solely due to lack of a SCALAR data type, I am not
convinced that MAX(1E308, 'a', 0, X'00') should cause
an error rather than a scalar result.

10. The result data type of union/except/intersect is
based on the primitive data type, but does not matter because
select x'41' union all select 5;
is legal anyway.

11. Casting SCALAR to char/varchar/text results in a char/varchar/text
with indefinite size and binary collation. Casting
SCALAR to any other primitive type results, as far as I
know, in no loss of information.

12. Probably for ODBC the type is binary, as for sql_variant.

Peter Gulutzan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Tarantool-discussions] Implicit cast for COMPARISON
  2020-02-07 22:30               ` Peter Gulutzan
@ 2020-02-11 13:32                 ` Mergen Imeev
  0 siblings, 0 replies; 18+ messages in thread
From: Mergen Imeev @ 2020-02-11 13:32 UTC (permalink / raw)
  To: Peter Gulutzan; +Cc: tarantool-discussions

Hi,
Thank you for the answer. My comments below.

On Fri, Feb 07, 2020 at 03:30:57PM -0700, Peter Gulutzan wrote:
> Hi,
> 
> On 2020-02-07 7:24 a.m., Nikita Pettik wrote:
> 
> > <cut>
> > To sum up, my proposal is to allow comparing scalar values with values of
> > any other type. On the other hand, result of such operations like
> > assignment or addition should depend on particular values of SCALAR
> > type.
> >
> > Alternatively, we can abandon this idea and operate apply comparison
> > rules depending purely on mp_ types of particular scalar values. But
> > then rules in NoSQL and SQL will be different.
> >
> 
> I hope that you will abandon this idea,
I am not sure that we can do this. For example:

tarantool> box.execute("select true in (1, '2', 3, '4');")
---
- metadata:
  - name: true in (1, '2', 3, '4')
    type: boolean
  rows:
  - [false]
...

Here we use an ephemeral space to store all the values from
a given vector. To search for "true" we use BOX, which
automatically applies SCALAR rules for comparison. (At the
moment, all columns of ephemeral spaces are of SCALAR type).

At the same time:

tarantool> box.execute("select true in (1, '2');")
---
- null
- 'Type mismatch: can not convert unsigned to boolean'
...

Here we have only two values for comparison, so we use OP_Eq
operation twice. This means that we apply SQL rules for
comparison.

> but let us pretend that both these behaviour changes are accepted:
> (1) No implicit cast
> (2) Values can be SCALAR
> with these implications or follow-ups.
> 
> 1. A value is SCALAR if and only if:
> It is an item in a column defined as SCALAR, or
> It is the result of CAST(value AS SCALAR), or
> It is the result LUA('return scalar-value') or similar Lua function.
> 
I think it is true.

> 2. If a value is SCALAR, then it has two types:
> according to SQL, and according to Lua.
> Example:
> box.execute([[INSERT INTO b VALUES ('a');]])
> box.execute([[CREATE TABLE b (s1 SCALAR PRIMARY KEY);]]
> box.execute([[SELECT TYPEOF(s1) FROM b;]])
> type(box.space.B:select('a')[1][1])
> The SQL statement will return 'scalar',
> the Lua statement will return 'string'.
> You might decide to redefine or rename TYPEOF(),
> but you cannot avoid that there are now two types,
> and probably need a new built-in function PRIMITIVE_TYPE().
> 
Not exacly:

tarantool> box.execute([[CREATE TABLE b (s1 SCALAR PRIMARY KEY);]])
---
- row_count: 1
...

tarantool> box.execute([[INSERT INTO b VALUES ('a');]])
---
- row_count: 1
...

tarantool> box.execute([[SELECT TYPEOF(s1) FROM b;]])
---
- metadata:
  - name: TYPEOF(s1)
    type: string
  rows:
  - ['string']
...

Function typeof() returns only 'string', 'integer',
'double', 'boolean' and 'varbinary'.

Though, I am not sure if this a feature or a bug.

> 3. For deciding whether an operation "is legal",
> we depend on defined type. For example, LENGTH(x)
> is legal if defined type is STRING or VARBINARY
> or SCALAR. For deciding whether an operation
> "is an error", we depend on primitive type.
> For example, if scalar_value = 5.5 then
> LENGTH(scalar_value) is an error although it is legal.
> Presumably the error is a "runtime error" although I
> suppose it is possible to know in advance whether all
> values of a scalar column have the same primitive type.
> 
If we ignore the implicit cast (and UDCF), then I think
you are right. But I do not think that we can always find
out in advance whether all the values in a column are of
the same type.

> 4. These situations are subject to "legal but runtime error"
> because the scalar value's primitive type is expected to be a number:
> Arithmetic operators + - * / % << >> & |
> Functions ABS(), CHAR(), AVG(), SUM(), TOTAL().
> The result is never SCALAR.
> 
I think it is true.

> 5. These situations are subject to "legal but runtime error"
> because the scalar value's primitive type is expected to be a BOOLEAN:
> Logical operators NOT AND OR.
> The result is never SCALAR.
> Example: if scalar_value is FALSE, then
> SELECT scalar_value OR scalar_value is legal and
> result data type is BOOLEAN (not SCALAR).
> Example: if scalar_value is 'FALSE', then
> SELECT NOT scalar_value is a runtime error (never cast).
> 
I think it is true. However, in case we implement UDCF user
can define implicit cast from STRING to BOOLEAN. I won't say
it anymore, but I think we have to keep this in mind.

> 6. These situations are subject to "legal but runtime error"
> because the scalar value's primitive type is expected to be a STRING:
> There are situations where a primitive value is expected to be a STRING:
> Functions LOWER() PRINTF() QUOTE() REPLACE() SOUNDEX() TRIM() UNICODE()
> UPPER()
> Clause COLLATE
> 
I think it is true.

> 7. These situations are subject to "legal but runtime error"
> because the scalar value's primitive type is expected to be STRING or
> VARBINARY:
> Situations where primitive value is expected to be STRING or VARBINARY:
> Functions HEX() LENGTH() POSITION() SUBSTR()
> 
I think it is true.

> 8. The concatenation situation looks like a special case
> because the scalar value's primitive type is expected to be STRING or
> VARBINARY:
> If primitive types are not the same, error.
> If primitive types are not either STRING or VARBINARY, error.
> If scalar_value || scalar_value: okay, result is SCALAR.
> If scalar_value || string_value: okay, result is STRING.
> If scalar_value || varbinary_value: okay, result is VARBINARY.
> This is just a guess, and it is slightly different from what you
> decided for arithmetic operations.
> The same rules apply for GROUP_CONCAT().
> 
Why not use the same rules as for arithmetic operations?
I mean:
1) If primitive types are not the same, error.
2) If primitive types are not either STRING or VARBINARY, error.
3) If scalar_value || scalar_value: okay, result is SCALAR.
4) If scalar_value || string_value: okay, result is SCALAR.
5) If scalar_value || varbinary_value: okay, result is SCALAR.

> 9. These situations are not subject to "legal but runtime error":
> Functions GREATEST() LEAST()
> Example: LEAST(FALSE, 1, 'x', X'44') is FALSE and result data type
> is BOOLEAN. This is the current behaviour, and it means that we
> follow SCALAR rules even though none of the operands is SCALAR.
> The same rules apply for MIN() or MAX() if the operand is SCALAR.
> 
I think that is true. The mechanics here are the same as in
the example shown at the beginning of the letter.

> 10. Comparison operations involving a SCALAR value cause:
> Comparison according to SCALAR rules, therefore always legal.
> Comparison operations > = < <= > >= LIKE BETWEEN CASE
> Implied comparisons UNIQUE DISTINCT GROUP ORDER UNION
> Example:
> CREATE TABLE t (s1 SCALAR PRIMARY KEY);
> INSERT INTO t VALUES (FALSE), (1), ('1'), (X'31');
> SELECT * FROM t WHERE s1 = '1' OR s1 > '1';
> Currently this causes a "type mismatch" error.
> But this error will be fixed if your plan is accepted.
> 
Here you can see classes and their order:

CLASS_NIL	= 0
CLASS_BOOL	= 1
CLASS_NUMBER	= 2
CLASS_STR	= 3
CLASS_BIN	= 4
CLASS_ARRAY	= 5
CLASS_MAP	= 6

It mean that any value of type BINARY more that any value
of type STRING and so on.

> 11. Assignment to scalar from any data type is always legal.
> Assignment from scalar to any data type is legal if and only
> if the primitive type is compatible.
> 
I think it is true.

> 12 SELECT scalar_column UNION SELECT non_scalar_column is SCALAR.
> (Also EXCEPT and INTERSECT)
> 
I think it is true.

> 
> In emails in March 2019 I wrote about compatibility
> with other DBMSs and with ODBC. The thread was
> "The rules for the scalar data type in SQL".
> I'll quote the first email as a reminder,
> some of it is still relevant. By the way K. Yukhin replied:
> "As far as I understand SCALAR is a bag of types. No value
> can ever have type SCALAR."
> 
> 
> 
> -------- Forwarded Message --------
> Subject:     The rules for the scalar data type in SQL
> Date:     Fri, 8 Mar 2019 16:31:02 -0700
> From:     Peter Gulutzan <pgulutzan@ocelot.ca>
> To:     Nikita Pettik <korablev@tarantool.org>
> CC:     dev@tarantool.org
> 
> 
> Hi,
> 
> What are the rules for the SCALAR data type in SQL?
> The decisions have been made, but eventually I'll need to find out what they
> are.
> So I'll guess and see whether anyone objects.
> 
> Tarantool had two choices:
> Choice#1: SCALAR is a "primitive" data type, on the same level as INTEGER,
> VARCHAR, BLOB, etc.
> Choice#2: SCALAR is a "complex" data type, two instances can have two
> different primitive data types.
> Tarantool chose #2.
> I could describe this with terms like "union as in C but without labels",
> or "limited polymorphism", etc. but fear that using analogies would cause
> arguments.
> 
> I can think of only three SQL DBMSs that have something close to this:
> Hive UNIONTYPE https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-UnionTypesunionUnionTypes
> SQL Server SQL_VARIANT Re SQL Server SQL_VARIANT https://docs.microsoft.com/en-us/sql/t-sql/data-types/sql-variant-transact-sql?view=sql-server-2017
> Oracle ANYDATA
> https://docs.oracle.com/database/121/ARPLS/t_anydat.htm#ARPLS077
> This is good because Tarantool can claim to have a big feature which
> other open-source DBMSs lack, that has resulted from its "NoSQL + SQL"
> heritage.
> Although some of the same functionality is available via structured
> user-defined
> types or JSON or XML, we're talking about a simple fully-predefined
> solution.
> 
> 1. The name of the data type is SCALAR, not UNIONTYPE or SQL_VARIANT
> or ANYDATA, to be compatible with Tarantool/NoSQL.
> There are no other words in a SCALAR item's definition,
> for example "SCALAR(5000)" is illegal and "SCALAR COLLATE x" is illegal
> and Hive-like "SCALAR<integer,double>" is illegal.
> SCALAR is a new reserved word.
> 
> 2. Any item defined as SCALAR has an underlying primitive type.
> For example, after
> CREATE TABLE t (s1 SCALAR PRIMARY KEY);
> INSERT INTO t VALUES (55),(X'41');
> The underlying primitive type of the item in the first row is INTEGER
> because literal 55 has data type INTEGER, and the underlying primitive
> type in the second row is BLOB (when we start supporting BLOBs again).
> An item's primitive type is far more important than its defined type.
> Incidentally Tarantool might find the primitive type by looking at the
> way MsgPack stores, but that is an implementation detail.
> 
> 3. SCALAR is not a supertype. Before Choice#1 was accepted, I
> tried to apply the logic of supertypes for further rules,
> for example saying: "in integer-versus-real comparisons we can
> coerce the integer to the real because real is a supertype,
> and SCALAR is a supertype of both integer and real because all
> their instances can be represented in SCALAR, therefore for
> real-to-scalar comparisons we can coerce the real to a scalar".
> But Choice#1 was rejected, so supertype logic is illegal.
> 
> 4. There is no literal syntax which implies data type SCALAR.
> Standard SQL indicates date/time/timestamp literals by preceding
> them with a data type name e.g. DATE '1990-01-01', and we could
> do the equivalent thing by saying SCALAR X'41' etc., but we won't
> because it would be meaningless, the primitive type of SCALAR X'41'
> would still be BLOB and that is what matters.
> 
> 5. TYPEOF(x) is never SCALAR, it is always the
> underlying data type. This is true even if x is null
> (we still think there is a "data type" named "null").
> In fact there is no function that is guaranteed to
> return the defined data type.
> For example, TYPEOF(CAST(1 AS SCALAR)); returns
> INTEGER, not SCALAR. But sometimes I still see 'blob'.
> 
> 6. For any operation that requires casting from an item defined
> as SCALAR, the syntax is legal but the operation may fail at runtime.
> At runtime, Tarantool detects the underlying primitive data type and applies
> the
> rules for that. For example, if a definition is
> CREATE TABLE t (s1 INT PRIMARY KEY, s2 SCALAR);
> and within any row s2 = 'a', that is, its underlying primitive type is
> varchar/char/text or whatever we decide to call character strings,
> then UPDATE t SET s1 = s2; is illegal.
> We usually do not know that in advance.
> 
> 7. For any dyadic operation that requires implicit casting to an item
> defined
> as SCALAR, the syntax is legal and the operation will not fail at runtime.
> Take this situation: comparison with SCALAR and a primitive type.
> The implicit cast is primitive-to-scalar not scalar-to-primitive.
> For example, after
> CREATE TABLE t (s1 SCALAR PRIMARY KEY);
> INSERT INTO t VALUES (X'41');
> SELECT * FROM t WHERE s1 > 'a';
> The comparison is valid, because
> we know the ordering of X'41' and 'a' in Tarantool/NoSQL
> 'scalar'. This is not because of supertyping (see Rule#3),
> and we do not have implicit casting to SCALAR
> (see Rule#6), so I think this is inconsistent, but it
> looks better than an error.
> 
> 8. The result data type of min/max/sum operation
> on a column defined as SCALAR is SCALAR,
> unless the result values is NULL. For example:
> CREATE TABLE t (s1 INT, s2 SCALAR PRIMARY KEY);
> INSERT INTO t VALUES (1,X'44'),(2,11),(3,1E4),(4,'a');
> SELECT MIN(s2),MAX(s2),SUM(s2) FROM t;
> The result is: - - [11, 'D', 10011].
> That is only possible with Tarantool/NoSQL scalar rules,
> but we have to ignore the illegal arithmetic (that X'44' + 11 is illegal).
> Sometimes typeof(min-max-sum(s2)) is 'blob' but that's just a bug.
> 
> 9. The result data type of a primitive combination is never
> SCALAR, because we in effect use TYPEOF(item) not
> the defined data type of the item.
> (I use the word "combination" in the way that the
> standard document uses it for section ""Result of
> data type combinations".) Therefore for
> MAX(1E308, 'a', 0, X'00')
> which was the subject of an argument in issue#4032
> https://github.com/tarantool/tarantool/issues/4032
> There I used a "supertype" argument, and Nikita Pettik
> rejected it, which is proper (see Rule#3).
> I also admit that standard SQL wouldn't allow this.
> However, since primitive-to-scalar comparisons are
> legal, and because I believe the standard-SQL limitation
> is solely due to lack of a SCALAR data type, I am not
> convinced that MAX(1E308, 'a', 0, X'00') should cause
> an error rather than a scalar result.
> 
> 10. The result data type of union/except/intersect is
> based on the primitive data type, but does not matter because
> select x'41' union all select 5;
> is legal anyway.
> 
> 11. Casting SCALAR to char/varchar/text results in a char/varchar/text
> with indefinite size and binary collation. Casting
> SCALAR to any other primitive type results, as far as I
> know, in no loss of information.
> 
> 12. Probably for ODBC the type is binary, as for sql_variant.
> 
> Peter Gulutzan
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-02-11 13:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-21 11:11 [Tarantool-discussions] Implicit cast for COMPARISON Mergen Imeev
2020-01-21 18:09 ` Peter Gulutzan
2020-01-22 14:13   ` Mergen Imeev
2020-01-22 14:24     ` Nikita Pettik
2020-01-30 18:51     ` Konstantin Osipov
2020-01-22 16:43   ` Peter Gulutzan
2020-01-30 18:52     ` Konstantin Osipov
2020-02-03 11:35       ` Mergen Imeev
2020-02-03 15:02         ` Konstantin Osipov
2020-02-04 18:50         ` Nikita Pettik
2020-02-05  2:57           ` Peter Gulutzan
2020-02-07 14:24             ` Nikita Pettik
2020-02-07 14:40               ` Konstantin Osipov
2020-02-07 22:30               ` Peter Gulutzan
2020-02-11 13:32                 ` Mergen Imeev
2020-02-05  7:52           ` Mergen Imeev
2020-02-06 12:41           ` Mergen Imeev
2020-02-06 13:09             ` Mergen Imeev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox