* [Tarantool-discussions] Implicit cast for COMPARISON @ 2020-01-21 11:11 Mergen Imeev 2020-01-21 18:09 ` Peter Gulutzan 0 siblings, 1 reply; 18+ messages in thread From: Mergen Imeev @ 2020-01-21 11:11 UTC (permalink / raw) To: tarantool-discussions Hi all, I would like once again ask a question about implicit cast for COMPARISON. Currently the chart of implicit cast for COMPARISON looks this way: To BOOL | To INT | To DBL | To STR | To BIN --------------------------------------------------------- From BOOL | A | - | - | - | - From INT | - | A | A | A | - From DBL | - | A | A | A | - From STR | - | S | S | A | - From BIN | - | - | - | - | A I suggested to change this chart so that is should look this way: To BOOL | To NUM | To STR | To BIN ----------------------------------------------- From BOOL | A | - | - | - From NUM | - | A | - | - From STR | - | - | A | - From BIN | - | - | - | A At the moment the only thing that different in these charts is impicit cast from STRING to number and from number to STRING. I asked the Russian community in Telegram about whether to allow this implicit cast for COMPARISON. At the moment, 31 people have voted, and 25 of them have voted for dropping this implicit cast. So, once again I suggest using these rules: 1) The values of numeric types are comparable without any implicit cast. 2) If the type of one of the values contains the type of another value, then they are comparable. 3) In any other cases, the values are not comparable. About "one type contains another type": we can say that INTEGER contains UNSIGNED; NUMBER contains INTEGER, UNSIGNED, DOUBLE; SCALAR contains all available in SQL types. Since all numeric types are comparable, my suggestion means that a value of any type can be compared with a value of type SCALAR. And in this comparison, the same rules will be used as in no-SQL Tarantool. For example, any value of type STRING is greater than any value of type INTEGER. What do you think about this? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-01-21 11:11 [Tarantool-discussions] Implicit cast for COMPARISON Mergen Imeev @ 2020-01-21 18:09 ` Peter Gulutzan 2020-01-22 14:13 ` Mergen Imeev 2020-01-22 16:43 ` Peter Gulutzan 0 siblings, 2 replies; 18+ messages in thread From: Peter Gulutzan @ 2020-01-21 18:09 UTC (permalink / raw) To: Mergen Imeev, tarantool-discussions Hi, On 2020-01-21 4:11 a.m., Mergen Imeev wrote: > Hi all, > I would like once again ask a question about implicit cast for > COMPARISON. Currently the chart of implicit cast for COMPARISON > looks this way: > To BOOL | To INT | To DBL | To STR | To BIN > --------------------------------------------------------- > From BOOL | A | - | - | - | - > From INT | - | A | A | A | - > From DBL | - | A | A | A | - > From STR | - | S | S | A | - > From BIN | - | - | - | - | A > > I suggested to change this chart so that is should look this way: > To BOOL | To NUM | To STR | To BIN > ----------------------------------------------- > From BOOL | A | - | - | - > From NUM | - | A | - | - > From STR | - | - | A | - > From BIN | - | - | - | A > > At the moment the only thing that different in these charts is > impicit cast from STRING to number and from number to STRING. > > I asked the Russian community in Telegram about whether to allow > this implicit cast for COMPARISON. At the moment, 31 people have > voted, and 25 of them have voted for dropping this implicit cast. > > So, once again I suggest using these rules: > 1) The values of numeric types are comparable without any implicit > cast. > 2) If the type of one of the values contains the type of another > value, then they are comparable. > 3) In any other cases, the values are not comparable. > > About "one type contains another type": we can say that INTEGER > contains UNSIGNED; NUMBER contains INTEGER, UNSIGNED, DOUBLE; > SCALAR contains all available in SQL types. > > Since all numeric types are comparable, my suggestion means that a > value of any type can be compared with a value of type SCALAR. And > in this comparison, the same rules will be used as in no-SQL > Tarantool. For example, any value of type STRING is greater than > any value of type INTEGER. > > What do you think about this? > I believe that the words "compared with a value of type SCALAR" are technically incorrect -- columns can be defined as having a SCALAR type but the values in those columns are not SCALAR. But (I hope) that detail probably does not affect the proposition. This would reverse what some developers decided earlier in dev threads "Clarify the rule" and "[dev] cast". There would be no implicit casts, the SCALAR rules would apply. Therefore: SELECT 1 = '1'; Currently, returns: TRUE. After fix, returns: FALSE. CREATE TABLE t (s1 SCALAR PRIMARY KEY); INSERT INTO t VALUES (FALSE),('A'); SELECT COUNT(*) FROM t WHERE s1 <> 'A'; Currently, returns: Type mismatch: can not convert text to boolean. After fix, returns: 1. CREATE TABLE t (s1 INTEGER PRIMARY KEY); INSERT INTO t VALUES (1); SELECT COUNT(*) FROM t WHERE s1 <> 'A'; Currently, returns: Type mismatch: can not convert text to boolean. After fix, returns: 1. If the above example is correct, it differs from NoSQL behaviour: box.schema.space.create('T') box.space.T:create_index('I') box.space.T:insert{1} box.space.T:select('A') Currently, returns: Supplied key type of part 0 does not match index part type After fix, returns: Supplied key type of part 0 does not match index part type (No change because the proposed fix is for SQL not Lua) And, since there is no change suggested for assignment, implicit cast would still occur. Therefore: CREATE TABLE t (s1 INTEGER PRIMARY KEY); INSERT INTO t VALUES ('1'); Currently, returns: no error After fix, returns: no error (No change because assignments cause implicit cast) SELECT UPPER(FALSE) FROM t; Currently, returns: 'FALSE' (string) After fix, returns: 'FALSE' (string) (No change because functions can cause assignments) I think that the proposed change is good. Peter Gulutzan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-01-21 18:09 ` Peter Gulutzan @ 2020-01-22 14:13 ` Mergen Imeev 2020-01-22 14:24 ` Nikita Pettik 2020-01-30 18:51 ` Konstantin Osipov 2020-01-22 16:43 ` Peter Gulutzan 1 sibling, 2 replies; 18+ messages in thread From: Mergen Imeev @ 2020-01-22 14:13 UTC (permalink / raw) To: Peter Gulutzan; +Cc: tarantool-discussions Hi! Thanks for answering. On Tue, Jan 21, 2020 at 11:09:25AM -0700, Peter Gulutzan wrote: > Hi, > > On 2020-01-21 4:11 a.m., Mergen Imeev wrote: > > Hi all, > > I would like once again ask a question about implicit cast for > > COMPARISON. Currently the chart of implicit cast for COMPARISON > > looks this way: > > To BOOL | To INT | To DBL | To STR | To BIN > > --------------------------------------------------------- > > From BOOL | A | - | - | - | - > > From INT | - | A | A | A | - > > From DBL | - | A | A | A | - > > From STR | - | S | S | A | - > > From BIN | - | - | - | - | A > > > > I suggested to change this chart so that is should look this way: > > To BOOL | To NUM | To STR | To BIN > > ----------------------------------------------- > > From BOOL | A | - | - | - > > From NUM | - | A | - | - > > From STR | - | - | A | - > > From BIN | - | - | - | A > > > > At the moment the only thing that different in these charts is > > impicit cast from STRING to number and from number to STRING. > > > > I asked the Russian community in Telegram about whether to allow > > this implicit cast for COMPARISON. At the moment, 31 people have > > voted, and 25 of them have voted for dropping this implicit cast. > > > > So, once again I suggest using these rules: > > 1) The values of numeric types are comparable without any implicit > > cast. > > 2) If the type of one of the values contains the type of another > > value, then they are comparable. > > 3) In any other cases, the values are not comparable. > > > > About "one type contains another type": we can say that INTEGER > > contains UNSIGNED; NUMBER contains INTEGER, UNSIGNED, DOUBLE; > > SCALAR contains all available in SQL types. > > > > Since all numeric types are comparable, my suggestion means that a > > value of any type can be compared with a value of type SCALAR. And > > in this comparison, the same rules will be used as in no-SQL > > Tarantool. For example, any value of type STRING is greater than > > any value of type INTEGER. > > > > What do you think about this? > > > > I believe that the words "compared with a value of type SCALAR" > are technically incorrect -- columns can be defined as having a > SCALAR type but the values in those columns are not SCALAR. > But (I hope) that detail probably does not affect the proposition. > Sorry, but I suggested a different approach: each value has its own type, and SCALAR is one of the types. This means that each value in a column of type SCALAR is of type SCALAR. If we compare a value of type SCALAR with a value of any other type, we will apply the SCALAR rules. If not both values have numeric types and none of them has SCALAR type, we get an error. > This would reverse what some developers decided earlier > in dev threads "Clarify the rule" and "[dev] cast". > There would be no implicit casts, the SCALAR rules would apply. > Therefore: > > SELECT 1 = '1'; > Currently, returns: TRUE. > After fix, returns: FALSE. > No, after fix it will throw an error. > CREATE TABLE t (s1 SCALAR PRIMARY KEY); > INSERT INTO t VALUES (FALSE),('A'); > SELECT COUNT(*) FROM t WHERE s1 <> 'A'; > Currently, returns: Type mismatch: can not convert text to boolean. > After fix, returns: 1. > This is true. > CREATE TABLE t (s1 INTEGER PRIMARY KEY); > INSERT INTO t VALUES (1); > SELECT COUNT(*) FROM t WHERE s1 <> 'A'; > Currently, returns: Type mismatch: can not convert text to boolean. > After fix, returns: 1. > Here we will see an error. > If the above example is correct, it differs from NoSQL behaviour: > box.schema.space.create('T') > box.space.T:create_index('I') > box.space.T:insert{1} > box.space.T:select('A') > Currently, returns: Supplied key type of part 0 does not match index part > type > After fix, returns: Supplied key type of part 0 does not match index part > type > (No change because the proposed fix is for SQL not Lua) > > And, since there is no change suggested for assignment, > implicit cast would still occur. Therefore: > > CREATE TABLE t (s1 INTEGER PRIMARY KEY); > INSERT INTO t VALUES ('1'); > Currently, returns: no error > After fix, returns: no error > (No change because assignments cause implicit cast) > > SELECT UPPER(FALSE) FROM t; > Currently, returns: 'FALSE' (string) > After fix, returns: 'FALSE' (string) > (No change because functions can cause assignments) > > I think that the proposed change is good. > > Peter Gulutzan > > Still, I think the idea of using SCALAR rules is viable. But I suggest adding this as one of the session settings. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-01-22 14:13 ` Mergen Imeev @ 2020-01-22 14:24 ` Nikita Pettik 2020-01-30 18:51 ` Konstantin Osipov 1 sibling, 0 replies; 18+ messages in thread From: Nikita Pettik @ 2020-01-22 14:24 UTC (permalink / raw) To: Mergen Imeev; +Cc: tarantool-discussions On 22 Jan 17:13, Mergen Imeev wrote: > Hi! Thanks for answering. > > On Tue, Jan 21, 2020 at 11:09:25AM -0700, Peter Gulutzan wrote: > > Hi, > > > > On 2020-01-21 4:11 a.m., Mergen Imeev wrote: > > If the above example is correct, it differs from NoSQL behaviour: > > box.schema.space.create('T') > > box.space.T:create_index('I') > > box.space.T:insert{1} > > box.space.T:select('A') > > Currently, returns: Supplied key type of part 0 does not match index part > > type > > After fix, returns: Supplied key type of part 0 does not match index part > > type > > (No change because the proposed fix is for SQL not Lua) > > > > And, since there is no change suggested for assignment, > > implicit cast would still occur. Therefore: > > > > CREATE TABLE t (s1 INTEGER PRIMARY KEY); > > INSERT INTO t VALUES ('1'); > > Currently, returns: no error > > After fix, returns: no error > > (No change because assignments cause implicit cast) > > > > SELECT UPPER(FALSE) FROM t; > > Currently, returns: 'FALSE' (string) > > After fix, returns: 'FALSE' (string) > > (No change because functions can cause assignments) > > > > I think that the proposed change is good. > > > > Peter Gulutzan > > > > > Still, I think the idea of using SCALAR rules is viable. > But I suggest adding this as one of the session settings. IMHO pretty bad idea: too much effort is required to support it; on the other hand, I can't come up with real use case for this feature. Configurable type system is an evil. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-01-22 14:13 ` Mergen Imeev 2020-01-22 14:24 ` Nikita Pettik @ 2020-01-30 18:51 ` Konstantin Osipov 1 sibling, 0 replies; 18+ messages in thread From: Konstantin Osipov @ 2020-01-30 18:51 UTC (permalink / raw) To: Mergen Imeev; +Cc: tarantool-discussions * Mergen Imeev <imeevma@tarantool.org> [20/01/22 17:18]: > Still, I think the idea of using SCALAR rules is viable. > But I suggest adding this as one of the session settings. This is fundamental part of the language. It should not depend on a session setting - this makes it very easy to shoot oneself in the foot. -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-01-21 18:09 ` Peter Gulutzan 2020-01-22 14:13 ` Mergen Imeev @ 2020-01-22 16:43 ` Peter Gulutzan 2020-01-30 18:52 ` Konstantin Osipov 1 sibling, 1 reply; 18+ messages in thread From: Peter Gulutzan @ 2020-01-22 16:43 UTC (permalink / raw) To: Mergen Imeev, tarantool-discussions Hi, On 2020-01-21 11:09 a.m., Peter Gulutzan wrote: <cut> > I think that the proposed change is good. I withdraw that remark. I misunderstood the proposal. Peter Gulutzan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-01-22 16:43 ` Peter Gulutzan @ 2020-01-30 18:52 ` Konstantin Osipov 2020-02-03 11:35 ` Mergen Imeev 0 siblings, 1 reply; 18+ messages in thread From: Konstantin Osipov @ 2020-01-30 18:52 UTC (permalink / raw) To: Peter Gulutzan; +Cc: tarantool-discussions * Peter Gulutzan <pgulutzan@ocelot.ca> [20/01/22 19:47]: > Hi, > > On 2020-01-21 11:09 a.m., Peter Gulutzan wrote: > <cut> > > I think that the proposed change is good. > I withdraw that remark. I misunderstood the proposal. I side with PeterG. -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-01-30 18:52 ` Konstantin Osipov @ 2020-02-03 11:35 ` Mergen Imeev 2020-02-03 15:02 ` Konstantin Osipov 2020-02-04 18:50 ` Nikita Pettik 0 siblings, 2 replies; 18+ messages in thread From: Mergen Imeev @ 2020-02-03 11:35 UTC (permalink / raw) To: Konstantin Osipov; +Cc: tarantool-discussions On Thu, Jan 30, 2020 at 09:52:16PM +0300, Konstantin Osipov wrote: > * Peter Gulutzan <pgulutzan@ocelot.ca> [20/01/22 19:47]: > > Hi, > > > > On 2020-01-21 11:09 a.m., Peter Gulutzan wrote: > > <cut> > > > I think that the proposed change is good. > > I withdraw that remark. I misunderstood the proposal. > > I side with PeterG. > > -- > Konstantin Osipov, Moscow, Russia Hi, Let me clarify: you think that during comparison it makes sense that STRING is implicitly converted to numbers? If so, then let's think about it. I think we have some questions to discuss in this case: 1) I think it makes sense that numeric types can be compared without any conversion. Do you agree? We have a special function that implements a comparison between integers and floating point numbers. If you do not agree with me, then make your suggestion. 2) STRING can be implicitly cast to a number. In case it can be cast to INTEGER, it will be INTEGER. In case it can be cast to DOUBLE, it will be DOUBLE. Do you agree? Should we return to this issue after the implementation of DECIMAL? 3) Can STRING be implicitly cast to BOOLEAN? 4) Can STRING be implicitly cast to BINARY? 5) In case STRING cannot be implicitly cast to the type of the other operand, should we allow implicitly casting the other operand to STRING? For example, from "'123r' > 124" move to "'123r' > '124'"? 6) Do you agree that only implicit casting from/to STRING is allowed? I mean that nothing else can be implicitly cast during a comparison with any other type if one of the types does not contain the other. 7) We need to clarify the rules when comparing SCALAR values. I think we cannot use the Tarantool rules here, as the Tarantool rules indicate that “100 < '2' == true”, but we decided that "100 > '2' == true", since '2' implicitly cast to 2. Could you suggest the rules that we should use here? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-02-03 11:35 ` Mergen Imeev @ 2020-02-03 15:02 ` Konstantin Osipov 2020-02-04 18:50 ` Nikita Pettik 1 sibling, 0 replies; 18+ messages in thread From: Konstantin Osipov @ 2020-02-03 15:02 UTC (permalink / raw) To: Mergen Imeev; +Cc: tarantool-discussions * Mergen Imeev <imeevma@tarantool.org> [20/02/03 14:39]: > > > On 2020-01-21 11:09 a.m., Peter Gulutzan wrote: > > > <cut> > > > > I think that the proposed change is good. > > > I withdraw that remark. I misunderstood the proposal. > > > > I side with PeterG. > > > > -- > > Konstantin Osipov, Moscow, Russia > > Hi, > > Let me clarify: you think that during comparison it makes sense > that STRING is implicitly converted to numbers? The issue is not with any specific problem, which may require a solution, the issue is with how the solution is packaged in your proposal. Shall strings be implicitly converted to numbers or numbers implicitly converted to strings during comparison? Probably not. Especially, if, as you pointed out, there are other rules for SCALAR or ANY bags of types, for which implicit conversion will not taking place. Shall we use the rules you suggested? I am afraid not, first of all because it's unclear whether this is the end of the story. While PeterG, I believe, would advocate strict adherence to ANSI, my take is that we should not give existing postgresql or oracle users too much trouble here. The reason I don't include MySQL on the list is that I know that mysql has historically been way more messy with its implicit conversion rules, so I'd not copy it here. Can we begin fixing this one thing at a time? This would help everyone quicker find balance between these priorities. E.g. can we make sure that all numeric types are comparable (as it seems already everyone's desire), comparison rules are clear to everyone and everyone agrees with them, fix all bugs and cover it with tests? Then we can do the same with string/blob types. Then we can look at comparison and implicit conversion rules between type families, and also carefully study existing vendors, so we don't break too much existing PostgreSQL or MySQL queries with more strict comparison rules or funny comparison order. > If so, then let's think about it. I think we have some questions > to discuss in this case: > 1) I think it makes sense that numeric types can be compared > without any conversion. Do you agree? We have a special function > that implements a comparison between integers and floating point > numbers. If you do not agree with me, then make your suggestion. I agree it makes sense, but to be on the safe side, I'd also check how existing vendors handle it - so that we don't introduce some obvious inconvenience. > 2) STRING can be implicitly cast to a number. In case it can be > cast to INTEGER, it will be INTEGER. In case it can be cast to > DOUBLE, it will be DOUBLE. Do you agree? Should we return to this > issue after the implementation of DECIMAL? I don't know. I'd say DECIMAL is a game changer. I'd say nothing should be implicitly cast to floating point type once there is an exact type. Besides, there are collate/locale issues. Are we going to handle "1 234 567"? What about "1,234,567"? > 3) Can STRING be implicitly cast to BOOLEAN? Look at vendors and ANSI, deal with one problem at a time. It's not a major issue. The biggest issue is string -> number, and it's best to deal with it after decimal is part of SQL. > 4) Can STRING be implicitly cast to BINARY? Ditto, look at vendors and ANSI, move into an own patch.. The same applies to the rest of your questions: there is no user value in us trying to provide what we believe is a "better" semantics over the existing vendors and ansi in such small details. Even if we think it's safe. > 5) In case STRING cannot be implicitly cast to the type of the > other operand, should we allow implicitly casting the other > operand to STRING? For example, from "'123r' > 124" move to > "'123r' > '124'"? > 6) Do you agree that only implicit casting from/to STRING is > allowed? I mean that nothing else can be implicitly cast during a > comparison with any other type if one of the types does not > contain the other. > 7) We need to clarify the rules when comparing SCALAR values. I > think we cannot use the Tarantool rules here, as the Tarantool > rules indicate that “100 < '2' == true”, but we decided that > "100 > '2' == true", since '2' implicitly cast to 2. Could you > suggest the rules that we should use here? > -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-02-03 11:35 ` Mergen Imeev 2020-02-03 15:02 ` Konstantin Osipov @ 2020-02-04 18:50 ` Nikita Pettik 2020-02-05 2:57 ` Peter Gulutzan ` (2 more replies) 1 sibling, 3 replies; 18+ messages in thread From: Nikita Pettik @ 2020-02-04 18:50 UTC (permalink / raw) To: Mergen Imeev; +Cc: tarantool-discussions On 03 Feb 14:35, Mergen Imeev wrote: > On Thu, Jan 30, 2020 at 09:52:16PM +0300, Konstantin Osipov wrote: > > * Peter Gulutzan <pgulutzan@ocelot.ca> [20/01/22 19:47]: > > > Hi, > > > > > > On 2020-01-21 11:09 a.m., Peter Gulutzan wrote: > > > <cut> > > > > I think that the proposed change is good. > > > I withdraw that remark. I misunderstood the proposal. > > > > I side with PeterG. > > > > -- > > Konstantin Osipov, Moscow, Russia > > Hi, > > Let me clarify: you think that during comparison it makes sense > that STRING is implicitly converted to numbers? > > If so, then let's think about it. I think we have some questions > to discuss in this case: > 1) I think it makes sense that numeric types can be compared > without any conversion. Do you agree? We have a special function > that implements a comparison between integers and floating point > numbers. If you do not agree with me, then make your suggestion. > 2) STRING can be implicitly cast to a number. In case it can be > cast to INTEGER, it will be INTEGER. In case it can be cast to > DOUBLE, it will be DOUBLE. Do you agree? Does it mean that '1.0' is converted to integer when it is compared with number? > Should we return to this > issue after the implementation of DECIMAL? > 3) Can STRING be implicitly cast to BOOLEAN? No, it can't. > 4) Can STRING be implicitly cast to BINARY? No, it can't > 5) In case STRING cannot be implicitly cast to the type of the > other operand, should we allow implicitly casting the other > operand to STRING? For example, from "'123r' > 124" move to > "'123r' > '124'"? No, we shouldn't. What is more, now comparison operations are not commutative: tarantool> select '123r' = 124 --- - null - 'Type mismatch: can not convert 123r to numeric' ... tarantool> select 124 = '123r' --- - metadata: - name: 124 = '123r' type: boolean rows: - [false] ... Both comparisons should result in error. > 6) Do you agree that only implicit casting from/to STRING is > allowed? I mean that nothing else can be implicitly cast during a > comparison with any other type if one of the types does not > contain the other. What about scalar/any types? > 7) We need to clarify the rules when comparing SCALAR values. I > think we cannot use the Tarantool rules here, as the Tarantool > rules indicate that “100 < '2' == true”, but we decided that > "100 > '2' == true", since '2' implicitly cast to 2. Could you > suggest the rules that we should use here? There's already existing solution: while fetching value from space, we preserve its initial field type. For SCALAR values we may use one rules, for values fetched from INTEGER/STRING fields - apply another ones. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-02-04 18:50 ` Nikita Pettik @ 2020-02-05 2:57 ` Peter Gulutzan 2020-02-07 14:24 ` Nikita Pettik 2020-02-05 7:52 ` Mergen Imeev 2020-02-06 12:41 ` Mergen Imeev 2 siblings, 1 reply; 18+ messages in thread From: Peter Gulutzan @ 2020-02-05 2:57 UTC (permalink / raw) To: Nikita Pettik, Mergen Imeev; +Cc: tarantool-discussions Hi, On 2020-02-04 11:50 a.m., Nikita Pettik wrote: > On 03 Feb 14:35, Mergen Imeev wrote: <cut> >> 7) We need to clarify the rules when comparing SCALAR values. I >> think we cannot use the Tarantool rules here, as the Tarantool >> rules indicate that “100 < '2' == true”, but we decided that >> "100 > '2' == true", since '2' implicitly cast to 2. Could you >> suggest the rules that we should use here? > > There's already existing solution: while fetching value from space, > we preserve its initial field type. For SCALAR values we may use one > rules, for values fetched from INTEGER/STRING fields - apply another ones. I do not think it is always so simple. First, with regard to the implicit cast part of the question, the current situation is: CREATE TABLE t (A SCALAR PRIMARY KEY, B INTEGER); INSERT INTO t VALUES (1,1); SELECT * FROM t WHERE A = B AND A < '0' AND B > '0'; Result: 1 row. It looks odd, but that is what happens if the SCALAR rules and the non-SCALAR rules can fit in the same statement. This could be solved by making implicit cast illegal. But you still want to have two sets of rules, and (behaviour change) values can be SCALAR. Suppose CREATE TABLE t (scalar_column SCALAR PRIMARY KEY, non_scalar_column INT); CAST(non_scalar_column AS SCALAR) result data type is SCALAR? scalar_column < non_scalar_column is legal? scalar_column < 1 /* data type of 1 is INTEGER */ ... is legal? SUM(scalar_column) is SCALAR? scalar_column + non_scalar_column is SCALAR? SELECT scalar_column UNION SELECT non_scalar_column is SCALAR? UPDATE t SET non_scalar_column = scalar_column is legal? Peter Gulutzan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-02-05 2:57 ` Peter Gulutzan @ 2020-02-07 14:24 ` Nikita Pettik 2020-02-07 14:40 ` Konstantin Osipov 2020-02-07 22:30 ` Peter Gulutzan 0 siblings, 2 replies; 18+ messages in thread From: Nikita Pettik @ 2020-02-07 14:24 UTC (permalink / raw) To: Peter Gulutzan; +Cc: tarantool-discussions On 04 Feb 19:57, Peter Gulutzan wrote: > Hi, > > On 2020-02-04 11:50 a.m., Nikita Pettik wrote: > > > On 03 Feb 14:35, Mergen Imeev wrote: > <cut> > >> 7) We need to clarify the rules when comparing SCALAR values. I > >> think we cannot use the Tarantool rules here, as the Tarantool > >> rules indicate that “100 < '2' == true”, but we decided that > >> "100 > '2' == true", since '2' implicitly cast to 2. Could you > >> suggest the rules that we should use here? > > > > There's already existing solution: while fetching value from space, > > we preserve its initial field type. For SCALAR values we may use one > > rules, for values fetched from INTEGER/STRING fields - apply another ones. > > I do not think it is always so simple. > > First, with regard to the implicit cast part of the question, > the current situation is: > CREATE TABLE t (A SCALAR PRIMARY KEY, B INTEGER); > INSERT INTO t VALUES (1,1); > SELECT * FROM t WHERE A = B AND A < '0' AND B > '0'; > Result: 1 row. > It looks odd, but that is what happens if the SCALAR rules > and the non-SCALAR rules can fit in the same statement. > This could be solved by making implicit cast illegal. Yes, I stick to the point that implicit cast should be illegal. > But you still want to have two sets of rules, and (behaviour change) values > can be SCALAR. > > Suppose CREATE TABLE t (scalar_column SCALAR PRIMARY KEY, > non_scalar_column INT); > > CAST(non_scalar_column AS SCALAR) result data type is SCALAR? Yes. > scalar_column < non_scalar_column is legal? Yes, comparison rules are supposed to be the same as in NoSQL Tarantool. > scalar_column < 1 /* data type of 1 is INTEGER */ ... is legal? Yes. > SUM(scalar_column) is SCALAR? No, it is not. Taking into account removing implicit cast between numeric and string types, SUM() is assumed to be finished only in case all arguments are of numeric type. What is more SUM() features NUMBER return type (in its function definition). > scalar_column + non_scalar_column is SCALAR? No. If addition is allowed, type of resulting value will be numeric (integer or float). > SELECT scalar_column UNION SELECT non_scalar_column is SCALAR? I guess yes (at least it seems to be reasonable to me). > UPDATE t SET non_scalar_column = scalar_column is legal? It depends on particular value of scalar_column. To sum up, my proposal is to allow comparing scalar values with values of any other type. On the other hand, result of such operations like assignment or addition should depend on particular values of SCALAR type. Alternatively, we can abandon this idea and operate apply comparison rules depending purely on mp_ types of particular scalar values. But then rules in NoSQL and SQL will be different. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-02-07 14:24 ` Nikita Pettik @ 2020-02-07 14:40 ` Konstantin Osipov 2020-02-07 22:30 ` Peter Gulutzan 1 sibling, 0 replies; 18+ messages in thread From: Konstantin Osipov @ 2020-02-07 14:40 UTC (permalink / raw) To: Nikita Pettik; +Cc: tarantool-discussions * Nikita Pettik <korablev@tarantool.org> [20/02/07 17:29]: > Alternatively, we can abandon this idea and operate apply comparison > rules depending purely on mp_ types of particular scalar values. But > then rules in NoSQL and SQL will be different. This would be worst thing of all. mp_type is an encoding format, which, btw, I would like to ditch from in-memory tuple format going forward (only keep for data on disk and wirte) - 'cause compression benefits it gives do not outweigh the performance slow down of variable representation or need to pack/unpack. There should be a single set of data types for SQL and NoSQL and the rules should be the same. If NoSQL types don't have some ansi features, then these features should be added to nosql, and sql should use nosql. -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-02-07 14:24 ` Nikita Pettik 2020-02-07 14:40 ` Konstantin Osipov @ 2020-02-07 22:30 ` Peter Gulutzan 2020-02-11 13:32 ` Mergen Imeev 1 sibling, 1 reply; 18+ messages in thread From: Peter Gulutzan @ 2020-02-07 22:30 UTC (permalink / raw) To: Nikita Pettik; +Cc: tarantool-discussions Hi, On 2020-02-07 7:24 a.m., Nikita Pettik wrote: > <cut> > To sum up, my proposal is to allow comparing scalar values with values of > any other type. On the other hand, result of such operations like > assignment or addition should depend on particular values of SCALAR > type. > > Alternatively, we can abandon this idea and operate apply comparison > rules depending purely on mp_ types of particular scalar values. But > then rules in NoSQL and SQL will be different. > I hope that you will abandon this idea, but let us pretend that both these behaviour changes are accepted: (1) No implicit cast (2) Values can be SCALAR with these implications or follow-ups. 1. A value is SCALAR if and only if: It is an item in a column defined as SCALAR, or It is the result of CAST(value AS SCALAR), or It is the result LUA('return scalar-value') or similar Lua function. 2. If a value is SCALAR, then it has two types: according to SQL, and according to Lua. Example: box.execute([[INSERT INTO b VALUES ('a');]]) box.execute([[CREATE TABLE b (s1 SCALAR PRIMARY KEY);]] box.execute([[SELECT TYPEOF(s1) FROM b;]]) type(box.space.B:select('a')[1][1]) The SQL statement will return 'scalar', the Lua statement will return 'string'. You might decide to redefine or rename TYPEOF(), but you cannot avoid that there are now two types, and probably need a new built-in function PRIMITIVE_TYPE(). 3. For deciding whether an operation "is legal", we depend on defined type. For example, LENGTH(x) is legal if defined type is STRING or VARBINARY or SCALAR. For deciding whether an operation "is an error", we depend on primitive type. For example, if scalar_value = 5.5 then LENGTH(scalar_value) is an error although it is legal. Presumably the error is a "runtime error" although I suppose it is possible to know in advance whether all values of a scalar column have the same primitive type. 4. These situations are subject to "legal but runtime error" because the scalar value's primitive type is expected to be a number: Arithmetic operators + - * / % << >> & | Functions ABS(), CHAR(), AVG(), SUM(), TOTAL(). The result is never SCALAR. 5. These situations are subject to "legal but runtime error" because the scalar value's primitive type is expected to be a BOOLEAN: Logical operators NOT AND OR. The result is never SCALAR. Example: if scalar_value is FALSE, then SELECT scalar_value OR scalar_value is legal and result data type is BOOLEAN (not SCALAR). Example: if scalar_value is 'FALSE', then SELECT NOT scalar_value is a runtime error (never cast). 6. These situations are subject to "legal but runtime error" because the scalar value's primitive type is expected to be a STRING: There are situations where a primitive value is expected to be a STRING: Functions LOWER() PRINTF() QUOTE() REPLACE() SOUNDEX() TRIM() UNICODE() UPPER() Clause COLLATE 7. These situations are subject to "legal but runtime error" because the scalar value's primitive type is expected to be STRING or VARBINARY: Situations where primitive value is expected to be STRING or VARBINARY: Functions HEX() LENGTH() POSITION() SUBSTR() 8. The concatenation situation looks like a special case because the scalar value's primitive type is expected to be STRING or VARBINARY: If primitive types are not the same, error. If primitive types are not either STRING or VARBINARY, error. If scalar_value || scalar_value: okay, result is SCALAR. If scalar_value || string_value: okay, result is STRING. If scalar_value || varbinary_value: okay, result is VARBINARY. This is just a guess, and it is slightly different from what you decided for arithmetic operations. The same rules apply for GROUP_CONCAT(). 9. These situations are not subject to "legal but runtime error": Functions GREATEST() LEAST() Example: LEAST(FALSE, 1, 'x', X'44') is FALSE and result data type is BOOLEAN. This is the current behaviour, and it means that we follow SCALAR rules even though none of the operands is SCALAR. The same rules apply for MIN() or MAX() if the operand is SCALAR. 10. Comparison operations involving a SCALAR value cause: Comparison according to SCALAR rules, therefore always legal. Comparison operations > = < <= > >= LIKE BETWEEN CASE Implied comparisons UNIQUE DISTINCT GROUP ORDER UNION Example: CREATE TABLE t (s1 SCALAR PRIMARY KEY); INSERT INTO t VALUES (FALSE), (1), ('1'), (X'31'); SELECT * FROM t WHERE s1 = '1' OR s1 > '1'; Currently this causes a "type mismatch" error. But this error will be fixed if your plan is accepted. 11. Assignment to scalar from any data type is always legal. Assignment from scalar to any data type is legal if and only if the primitive type is compatible. 12 SELECT scalar_column UNION SELECT non_scalar_column is SCALAR. (Also EXCEPT and INTERSECT) In emails in March 2019 I wrote about compatibility with other DBMSs and with ODBC. The thread was "The rules for the scalar data type in SQL". I'll quote the first email as a reminder, some of it is still relevant. By the way K. Yukhin replied: "As far as I understand SCALAR is a bag of types. No value can ever have type SCALAR." -------- Forwarded Message -------- Subject: The rules for the scalar data type in SQL Date: Fri, 8 Mar 2019 16:31:02 -0700 From: Peter Gulutzan <pgulutzan@ocelot.ca> To: Nikita Pettik <korablev@tarantool.org> CC: dev@tarantool.org Hi, What are the rules for the SCALAR data type in SQL? The decisions have been made, but eventually I'll need to find out what they are. So I'll guess and see whether anyone objects. Tarantool had two choices: Choice#1: SCALAR is a "primitive" data type, on the same level as INTEGER, VARCHAR, BLOB, etc. Choice#2: SCALAR is a "complex" data type, two instances can have two different primitive data types. Tarantool chose #2. I could describe this with terms like "union as in C but without labels", or "limited polymorphism", etc. but fear that using analogies would cause arguments. I can think of only three SQL DBMSs that have something close to this: Hive UNIONTYPE https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-UnionTypesunionUnionTypes SQL Server SQL_VARIANT Re SQL Server SQL_VARIANT https://docs.microsoft.com/en-us/sql/t-sql/data-types/sql-variant-transact-sql?view=sql-server-2017 Oracle ANYDATA https://docs.oracle.com/database/121/ARPLS/t_anydat.htm#ARPLS077 This is good because Tarantool can claim to have a big feature which other open-source DBMSs lack, that has resulted from its "NoSQL + SQL" heritage. Although some of the same functionality is available via structured user-defined types or JSON or XML, we're talking about a simple fully-predefined solution. 1. The name of the data type is SCALAR, not UNIONTYPE or SQL_VARIANT or ANYDATA, to be compatible with Tarantool/NoSQL. There are no other words in a SCALAR item's definition, for example "SCALAR(5000)" is illegal and "SCALAR COLLATE x" is illegal and Hive-like "SCALAR<integer,double>" is illegal. SCALAR is a new reserved word. 2. Any item defined as SCALAR has an underlying primitive type. For example, after CREATE TABLE t (s1 SCALAR PRIMARY KEY); INSERT INTO t VALUES (55),(X'41'); The underlying primitive type of the item in the first row is INTEGER because literal 55 has data type INTEGER, and the underlying primitive type in the second row is BLOB (when we start supporting BLOBs again). An item's primitive type is far more important than its defined type. Incidentally Tarantool might find the primitive type by looking at the way MsgPack stores, but that is an implementation detail. 3. SCALAR is not a supertype. Before Choice#1 was accepted, I tried to apply the logic of supertypes for further rules, for example saying: "in integer-versus-real comparisons we can coerce the integer to the real because real is a supertype, and SCALAR is a supertype of both integer and real because all their instances can be represented in SCALAR, therefore for real-to-scalar comparisons we can coerce the real to a scalar". But Choice#1 was rejected, so supertype logic is illegal. 4. There is no literal syntax which implies data type SCALAR. Standard SQL indicates date/time/timestamp literals by preceding them with a data type name e.g. DATE '1990-01-01', and we could do the equivalent thing by saying SCALAR X'41' etc., but we won't because it would be meaningless, the primitive type of SCALAR X'41' would still be BLOB and that is what matters. 5. TYPEOF(x) is never SCALAR, it is always the underlying data type. This is true even if x is null (we still think there is a "data type" named "null"). In fact there is no function that is guaranteed to return the defined data type. For example, TYPEOF(CAST(1 AS SCALAR)); returns INTEGER, not SCALAR. But sometimes I still see 'blob'. 6. For any operation that requires casting from an item defined as SCALAR, the syntax is legal but the operation may fail at runtime. At runtime, Tarantool detects the underlying primitive data type and applies the rules for that. For example, if a definition is CREATE TABLE t (s1 INT PRIMARY KEY, s2 SCALAR); and within any row s2 = 'a', that is, its underlying primitive type is varchar/char/text or whatever we decide to call character strings, then UPDATE t SET s1 = s2; is illegal. We usually do not know that in advance. 7. For any dyadic operation that requires implicit casting to an item defined as SCALAR, the syntax is legal and the operation will not fail at runtime. Take this situation: comparison with SCALAR and a primitive type. The implicit cast is primitive-to-scalar not scalar-to-primitive. For example, after CREATE TABLE t (s1 SCALAR PRIMARY KEY); INSERT INTO t VALUES (X'41'); SELECT * FROM t WHERE s1 > 'a'; The comparison is valid, because we know the ordering of X'41' and 'a' in Tarantool/NoSQL 'scalar'. This is not because of supertyping (see Rule#3), and we do not have implicit casting to SCALAR (see Rule#6), so I think this is inconsistent, but it looks better than an error. 8. The result data type of min/max/sum operation on a column defined as SCALAR is SCALAR, unless the result values is NULL. For example: CREATE TABLE t (s1 INT, s2 SCALAR PRIMARY KEY); INSERT INTO t VALUES (1,X'44'),(2,11),(3,1E4),(4,'a'); SELECT MIN(s2),MAX(s2),SUM(s2) FROM t; The result is: - - [11, 'D', 10011]. That is only possible with Tarantool/NoSQL scalar rules, but we have to ignore the illegal arithmetic (that X'44' + 11 is illegal). Sometimes typeof(min-max-sum(s2)) is 'blob' but that's just a bug. 9. The result data type of a primitive combination is never SCALAR, because we in effect use TYPEOF(item) not the defined data type of the item. (I use the word "combination" in the way that the standard document uses it for section ""Result of data type combinations".) Therefore for MAX(1E308, 'a', 0, X'00') which was the subject of an argument in issue#4032 https://github.com/tarantool/tarantool/issues/4032 There I used a "supertype" argument, and Nikita Pettik rejected it, which is proper (see Rule#3). I also admit that standard SQL wouldn't allow this. However, since primitive-to-scalar comparisons are legal, and because I believe the standard-SQL limitation is solely due to lack of a SCALAR data type, I am not convinced that MAX(1E308, 'a', 0, X'00') should cause an error rather than a scalar result. 10. The result data type of union/except/intersect is based on the primitive data type, but does not matter because select x'41' union all select 5; is legal anyway. 11. Casting SCALAR to char/varchar/text results in a char/varchar/text with indefinite size and binary collation. Casting SCALAR to any other primitive type results, as far as I know, in no loss of information. 12. Probably for ODBC the type is binary, as for sql_variant. Peter Gulutzan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-02-07 22:30 ` Peter Gulutzan @ 2020-02-11 13:32 ` Mergen Imeev 0 siblings, 0 replies; 18+ messages in thread From: Mergen Imeev @ 2020-02-11 13:32 UTC (permalink / raw) To: Peter Gulutzan; +Cc: tarantool-discussions Hi, Thank you for the answer. My comments below. On Fri, Feb 07, 2020 at 03:30:57PM -0700, Peter Gulutzan wrote: > Hi, > > On 2020-02-07 7:24 a.m., Nikita Pettik wrote: > > > <cut> > > To sum up, my proposal is to allow comparing scalar values with values of > > any other type. On the other hand, result of such operations like > > assignment or addition should depend on particular values of SCALAR > > type. > > > > Alternatively, we can abandon this idea and operate apply comparison > > rules depending purely on mp_ types of particular scalar values. But > > then rules in NoSQL and SQL will be different. > > > > I hope that you will abandon this idea, I am not sure that we can do this. For example: tarantool> box.execute("select true in (1, '2', 3, '4');") --- - metadata: - name: true in (1, '2', 3, '4') type: boolean rows: - [false] ... Here we use an ephemeral space to store all the values from a given vector. To search for "true" we use BOX, which automatically applies SCALAR rules for comparison. (At the moment, all columns of ephemeral spaces are of SCALAR type). At the same time: tarantool> box.execute("select true in (1, '2');") --- - null - 'Type mismatch: can not convert unsigned to boolean' ... Here we have only two values for comparison, so we use OP_Eq operation twice. This means that we apply SQL rules for comparison. > but let us pretend that both these behaviour changes are accepted: > (1) No implicit cast > (2) Values can be SCALAR > with these implications or follow-ups. > > 1. A value is SCALAR if and only if: > It is an item in a column defined as SCALAR, or > It is the result of CAST(value AS SCALAR), or > It is the result LUA('return scalar-value') or similar Lua function. > I think it is true. > 2. If a value is SCALAR, then it has two types: > according to SQL, and according to Lua. > Example: > box.execute([[INSERT INTO b VALUES ('a');]]) > box.execute([[CREATE TABLE b (s1 SCALAR PRIMARY KEY);]] > box.execute([[SELECT TYPEOF(s1) FROM b;]]) > type(box.space.B:select('a')[1][1]) > The SQL statement will return 'scalar', > the Lua statement will return 'string'. > You might decide to redefine or rename TYPEOF(), > but you cannot avoid that there are now two types, > and probably need a new built-in function PRIMITIVE_TYPE(). > Not exacly: tarantool> box.execute([[CREATE TABLE b (s1 SCALAR PRIMARY KEY);]]) --- - row_count: 1 ... tarantool> box.execute([[INSERT INTO b VALUES ('a');]]) --- - row_count: 1 ... tarantool> box.execute([[SELECT TYPEOF(s1) FROM b;]]) --- - metadata: - name: TYPEOF(s1) type: string rows: - ['string'] ... Function typeof() returns only 'string', 'integer', 'double', 'boolean' and 'varbinary'. Though, I am not sure if this a feature or a bug. > 3. For deciding whether an operation "is legal", > we depend on defined type. For example, LENGTH(x) > is legal if defined type is STRING or VARBINARY > or SCALAR. For deciding whether an operation > "is an error", we depend on primitive type. > For example, if scalar_value = 5.5 then > LENGTH(scalar_value) is an error although it is legal. > Presumably the error is a "runtime error" although I > suppose it is possible to know in advance whether all > values of a scalar column have the same primitive type. > If we ignore the implicit cast (and UDCF), then I think you are right. But I do not think that we can always find out in advance whether all the values in a column are of the same type. > 4. These situations are subject to "legal but runtime error" > because the scalar value's primitive type is expected to be a number: > Arithmetic operators + - * / % << >> & | > Functions ABS(), CHAR(), AVG(), SUM(), TOTAL(). > The result is never SCALAR. > I think it is true. > 5. These situations are subject to "legal but runtime error" > because the scalar value's primitive type is expected to be a BOOLEAN: > Logical operators NOT AND OR. > The result is never SCALAR. > Example: if scalar_value is FALSE, then > SELECT scalar_value OR scalar_value is legal and > result data type is BOOLEAN (not SCALAR). > Example: if scalar_value is 'FALSE', then > SELECT NOT scalar_value is a runtime error (never cast). > I think it is true. However, in case we implement UDCF user can define implicit cast from STRING to BOOLEAN. I won't say it anymore, but I think we have to keep this in mind. > 6. These situations are subject to "legal but runtime error" > because the scalar value's primitive type is expected to be a STRING: > There are situations where a primitive value is expected to be a STRING: > Functions LOWER() PRINTF() QUOTE() REPLACE() SOUNDEX() TRIM() UNICODE() > UPPER() > Clause COLLATE > I think it is true. > 7. These situations are subject to "legal but runtime error" > because the scalar value's primitive type is expected to be STRING or > VARBINARY: > Situations where primitive value is expected to be STRING or VARBINARY: > Functions HEX() LENGTH() POSITION() SUBSTR() > I think it is true. > 8. The concatenation situation looks like a special case > because the scalar value's primitive type is expected to be STRING or > VARBINARY: > If primitive types are not the same, error. > If primitive types are not either STRING or VARBINARY, error. > If scalar_value || scalar_value: okay, result is SCALAR. > If scalar_value || string_value: okay, result is STRING. > If scalar_value || varbinary_value: okay, result is VARBINARY. > This is just a guess, and it is slightly different from what you > decided for arithmetic operations. > The same rules apply for GROUP_CONCAT(). > Why not use the same rules as for arithmetic operations? I mean: 1) If primitive types are not the same, error. 2) If primitive types are not either STRING or VARBINARY, error. 3) If scalar_value || scalar_value: okay, result is SCALAR. 4) If scalar_value || string_value: okay, result is SCALAR. 5) If scalar_value || varbinary_value: okay, result is SCALAR. > 9. These situations are not subject to "legal but runtime error": > Functions GREATEST() LEAST() > Example: LEAST(FALSE, 1, 'x', X'44') is FALSE and result data type > is BOOLEAN. This is the current behaviour, and it means that we > follow SCALAR rules even though none of the operands is SCALAR. > The same rules apply for MIN() or MAX() if the operand is SCALAR. > I think that is true. The mechanics here are the same as in the example shown at the beginning of the letter. > 10. Comparison operations involving a SCALAR value cause: > Comparison according to SCALAR rules, therefore always legal. > Comparison operations > = < <= > >= LIKE BETWEEN CASE > Implied comparisons UNIQUE DISTINCT GROUP ORDER UNION > Example: > CREATE TABLE t (s1 SCALAR PRIMARY KEY); > INSERT INTO t VALUES (FALSE), (1), ('1'), (X'31'); > SELECT * FROM t WHERE s1 = '1' OR s1 > '1'; > Currently this causes a "type mismatch" error. > But this error will be fixed if your plan is accepted. > Here you can see classes and their order: CLASS_NIL = 0 CLASS_BOOL = 1 CLASS_NUMBER = 2 CLASS_STR = 3 CLASS_BIN = 4 CLASS_ARRAY = 5 CLASS_MAP = 6 It mean that any value of type BINARY more that any value of type STRING and so on. > 11. Assignment to scalar from any data type is always legal. > Assignment from scalar to any data type is legal if and only > if the primitive type is compatible. > I think it is true. > 12 SELECT scalar_column UNION SELECT non_scalar_column is SCALAR. > (Also EXCEPT and INTERSECT) > I think it is true. > > In emails in March 2019 I wrote about compatibility > with other DBMSs and with ODBC. The thread was > "The rules for the scalar data type in SQL". > I'll quote the first email as a reminder, > some of it is still relevant. By the way K. Yukhin replied: > "As far as I understand SCALAR is a bag of types. No value > can ever have type SCALAR." > > > > -------- Forwarded Message -------- > Subject: The rules for the scalar data type in SQL > Date: Fri, 8 Mar 2019 16:31:02 -0700 > From: Peter Gulutzan <pgulutzan@ocelot.ca> > To: Nikita Pettik <korablev@tarantool.org> > CC: dev@tarantool.org > > > Hi, > > What are the rules for the SCALAR data type in SQL? > The decisions have been made, but eventually I'll need to find out what they > are. > So I'll guess and see whether anyone objects. > > Tarantool had two choices: > Choice#1: SCALAR is a "primitive" data type, on the same level as INTEGER, > VARCHAR, BLOB, etc. > Choice#2: SCALAR is a "complex" data type, two instances can have two > different primitive data types. > Tarantool chose #2. > I could describe this with terms like "union as in C but without labels", > or "limited polymorphism", etc. but fear that using analogies would cause > arguments. > > I can think of only three SQL DBMSs that have something close to this: > Hive UNIONTYPE https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-UnionTypesunionUnionTypes > SQL Server SQL_VARIANT Re SQL Server SQL_VARIANT https://docs.microsoft.com/en-us/sql/t-sql/data-types/sql-variant-transact-sql?view=sql-server-2017 > Oracle ANYDATA > https://docs.oracle.com/database/121/ARPLS/t_anydat.htm#ARPLS077 > This is good because Tarantool can claim to have a big feature which > other open-source DBMSs lack, that has resulted from its "NoSQL + SQL" > heritage. > Although some of the same functionality is available via structured > user-defined > types or JSON or XML, we're talking about a simple fully-predefined > solution. > > 1. The name of the data type is SCALAR, not UNIONTYPE or SQL_VARIANT > or ANYDATA, to be compatible with Tarantool/NoSQL. > There are no other words in a SCALAR item's definition, > for example "SCALAR(5000)" is illegal and "SCALAR COLLATE x" is illegal > and Hive-like "SCALAR<integer,double>" is illegal. > SCALAR is a new reserved word. > > 2. Any item defined as SCALAR has an underlying primitive type. > For example, after > CREATE TABLE t (s1 SCALAR PRIMARY KEY); > INSERT INTO t VALUES (55),(X'41'); > The underlying primitive type of the item in the first row is INTEGER > because literal 55 has data type INTEGER, and the underlying primitive > type in the second row is BLOB (when we start supporting BLOBs again). > An item's primitive type is far more important than its defined type. > Incidentally Tarantool might find the primitive type by looking at the > way MsgPack stores, but that is an implementation detail. > > 3. SCALAR is not a supertype. Before Choice#1 was accepted, I > tried to apply the logic of supertypes for further rules, > for example saying: "in integer-versus-real comparisons we can > coerce the integer to the real because real is a supertype, > and SCALAR is a supertype of both integer and real because all > their instances can be represented in SCALAR, therefore for > real-to-scalar comparisons we can coerce the real to a scalar". > But Choice#1 was rejected, so supertype logic is illegal. > > 4. There is no literal syntax which implies data type SCALAR. > Standard SQL indicates date/time/timestamp literals by preceding > them with a data type name e.g. DATE '1990-01-01', and we could > do the equivalent thing by saying SCALAR X'41' etc., but we won't > because it would be meaningless, the primitive type of SCALAR X'41' > would still be BLOB and that is what matters. > > 5. TYPEOF(x) is never SCALAR, it is always the > underlying data type. This is true even if x is null > (we still think there is a "data type" named "null"). > In fact there is no function that is guaranteed to > return the defined data type. > For example, TYPEOF(CAST(1 AS SCALAR)); returns > INTEGER, not SCALAR. But sometimes I still see 'blob'. > > 6. For any operation that requires casting from an item defined > as SCALAR, the syntax is legal but the operation may fail at runtime. > At runtime, Tarantool detects the underlying primitive data type and applies > the > rules for that. For example, if a definition is > CREATE TABLE t (s1 INT PRIMARY KEY, s2 SCALAR); > and within any row s2 = 'a', that is, its underlying primitive type is > varchar/char/text or whatever we decide to call character strings, > then UPDATE t SET s1 = s2; is illegal. > We usually do not know that in advance. > > 7. For any dyadic operation that requires implicit casting to an item > defined > as SCALAR, the syntax is legal and the operation will not fail at runtime. > Take this situation: comparison with SCALAR and a primitive type. > The implicit cast is primitive-to-scalar not scalar-to-primitive. > For example, after > CREATE TABLE t (s1 SCALAR PRIMARY KEY); > INSERT INTO t VALUES (X'41'); > SELECT * FROM t WHERE s1 > 'a'; > The comparison is valid, because > we know the ordering of X'41' and 'a' in Tarantool/NoSQL > 'scalar'. This is not because of supertyping (see Rule#3), > and we do not have implicit casting to SCALAR > (see Rule#6), so I think this is inconsistent, but it > looks better than an error. > > 8. The result data type of min/max/sum operation > on a column defined as SCALAR is SCALAR, > unless the result values is NULL. For example: > CREATE TABLE t (s1 INT, s2 SCALAR PRIMARY KEY); > INSERT INTO t VALUES (1,X'44'),(2,11),(3,1E4),(4,'a'); > SELECT MIN(s2),MAX(s2),SUM(s2) FROM t; > The result is: - - [11, 'D', 10011]. > That is only possible with Tarantool/NoSQL scalar rules, > but we have to ignore the illegal arithmetic (that X'44' + 11 is illegal). > Sometimes typeof(min-max-sum(s2)) is 'blob' but that's just a bug. > > 9. The result data type of a primitive combination is never > SCALAR, because we in effect use TYPEOF(item) not > the defined data type of the item. > (I use the word "combination" in the way that the > standard document uses it for section ""Result of > data type combinations".) Therefore for > MAX(1E308, 'a', 0, X'00') > which was the subject of an argument in issue#4032 > https://github.com/tarantool/tarantool/issues/4032 > There I used a "supertype" argument, and Nikita Pettik > rejected it, which is proper (see Rule#3). > I also admit that standard SQL wouldn't allow this. > However, since primitive-to-scalar comparisons are > legal, and because I believe the standard-SQL limitation > is solely due to lack of a SCALAR data type, I am not > convinced that MAX(1E308, 'a', 0, X'00') should cause > an error rather than a scalar result. > > 10. The result data type of union/except/intersect is > based on the primitive data type, but does not matter because > select x'41' union all select 5; > is legal anyway. > > 11. Casting SCALAR to char/varchar/text results in a char/varchar/text > with indefinite size and binary collation. Casting > SCALAR to any other primitive type results, as far as I > know, in no loss of information. > > 12. Probably for ODBC the type is binary, as for sql_variant. > > Peter Gulutzan > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-02-04 18:50 ` Nikita Pettik 2020-02-05 2:57 ` Peter Gulutzan @ 2020-02-05 7:52 ` Mergen Imeev 2020-02-06 12:41 ` Mergen Imeev 2 siblings, 0 replies; 18+ messages in thread From: Mergen Imeev @ 2020-02-05 7:52 UTC (permalink / raw) To: Nikita Pettik; +Cc: tarantool-discussions On Tue, Feb 04, 2020 at 09:50:11PM +0300, Nikita Pettik wrote: > On 03 Feb 14:35, Mergen Imeev wrote: > > On Thu, Jan 30, 2020 at 09:52:16PM +0300, Konstantin Osipov wrote: > > > * Peter Gulutzan <pgulutzan@ocelot.ca> [20/01/22 19:47]: > > > > Hi, > > > > > > > > On 2020-01-21 11:09 a.m., Peter Gulutzan wrote: > > > > <cut> > > > > > I think that the proposed change is good. > > > > I withdraw that remark. I misunderstood the proposal. > > > > > > I side with PeterG. > > > > > > -- > > > Konstantin Osipov, Moscow, Russia > > > > Hi, > > > > Let me clarify: you think that during comparison it makes sense > > that STRING is implicitly converted to numbers? > > > > If so, then let's think about it. I think we have some questions > > to discuss in this case: > > 1) I think it makes sense that numeric types can be compared > > without any conversion. Do you agree? We have a special function > > that implements a comparison between integers and floating point > > numbers. If you do not agree with me, then make your suggestion. > > 2) STRING can be implicitly cast to a number. In case it can be > > cast to INTEGER, it will be INTEGER. In case it can be cast to > > DOUBLE, it will be DOUBLE. Do you agree? > > Does it mean that '1.0' is converted to integer when it is compared > with number? > No. I meant to apply the current rules. > > Should we return to this > > issue after the implementation of DECIMAL? > > 3) Can STRING be implicitly cast to BOOLEAN? > > No, it can't. > > > 4) Can STRING be implicitly cast to BINARY? > > No, it can't > > > 5) In case STRING cannot be implicitly cast to the type of the > > other operand, should we allow implicitly casting the other > > operand to STRING? For example, from "'123r' > 124" move to > > "'123r' > '124'"? > > No, we shouldn't. What is more, now comparison operations are not > commutative: > > tarantool> select '123r' = 124 > --- > - null > - 'Type mismatch: can not convert 123r to numeric' > ... > > tarantool> select 124 = '123r' > --- > - metadata: > - name: 124 = '123r' > type: boolean > rows: > - [false] > ... > > Both comparisons should result in error. > > > 6) Do you agree that only implicit casting from/to STRING is > > allowed? I mean that nothing else can be implicitly cast during a > > comparison with any other type if one of the types does not > > contain the other. > > What about scalar/any types? > String "if one of the types does not contain the other." means that any scalar type can be cast to SCALAR and any type can be cast to ANY. > > 7) We need to clarify the rules when comparing SCALAR values. I > > think we cannot use the Tarantool rules here, as the Tarantool > > rules indicate that “100 < '2' == true”, but we decided that > > "100 > '2' == true", since '2' implicitly cast to 2. Could you > > suggest the rules that we should use here? > > There's already existing solution: while fetching value from space, > we preserve its initial field type. For SCALAR values we may use one > rules, for values fetched from INTEGER/STRING fields - apply another ones. > I think it is confusing. Also, I agree with Peter here. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-02-04 18:50 ` Nikita Pettik 2020-02-05 2:57 ` Peter Gulutzan 2020-02-05 7:52 ` Mergen Imeev @ 2020-02-06 12:41 ` Mergen Imeev 2020-02-06 13:09 ` Mergen Imeev 2 siblings, 1 reply; 18+ messages in thread From: Mergen Imeev @ 2020-02-06 12:41 UTC (permalink / raw) To: Nikita Pettik; +Cc: tarantool-discussions On Tue, Feb 04, 2020 at 09:50:11PM +0300, Nikita Pettik wrote: <cut> > > 7) We need to clarify the rules when comparing SCALAR values. I > > think we cannot use the Tarantool rules here, as the Tarantool > > rules indicate that “100 < '2' == true”, but we decided that > > "100 > '2' == true", since '2' implicitly cast to 2. Could you > > suggest the rules that we should use here? > > There's already existing solution: while fetching value from space, > we preserve its initial field type. For SCALAR values we may use one > rules, for values fetched from INTEGER/STRING fields - apply another ones. > Hi, After some thinking, I came to the conclusion that your idea is pretty good. I suggest this algorithm for comparison: if (has NULL) <return NULL> else if (same mp_type) \\MP_TYPE == MEM_TYPE here. <compare> else if (both numeric) <compare> else if (has SCALAR) <compare using SCALAR rules> else if (STRING compared with numeric) <if possible cast string to number and compare, else error> else <error> In this case, if we still decide to remove the implicit cast from STRING, we can do this quite easily. Please note that this is a simplified version. For example, we do not always need to return NULL if one of the operands is NULL. In addition, this implementation means that we are moving the implicit cast for comparison from OP_ApplyType to opcodes responsible for comparing. What do you think about this? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Tarantool-discussions] Implicit cast for COMPARISON 2020-02-06 12:41 ` Mergen Imeev @ 2020-02-06 13:09 ` Mergen Imeev 0 siblings, 0 replies; 18+ messages in thread From: Mergen Imeev @ 2020-02-06 13:09 UTC (permalink / raw) To: Nikita Pettik; +Cc: tarantool-discussions On Thu, Feb 06, 2020 at 03:41:06PM +0300, Mergen Imeev wrote: > On Tue, Feb 04, 2020 at 09:50:11PM +0300, Nikita Pettik wrote: > <cut> > > > > 7) We need to clarify the rules when comparing SCALAR values. I > > > think we cannot use the Tarantool rules here, as the Tarantool > > > rules indicate that “100 < '2' == true”, but we decided that > > > "100 > '2' == true", since '2' implicitly cast to 2. Could you > > > suggest the rules that we should use here? > > > > There's already existing solution: while fetching value from space, > > we preserve its initial field type. For SCALAR values we may use one > > rules, for values fetched from INTEGER/STRING fields - apply another ones. > > > > Hi, > After some thinking, I came to the conclusion that your idea is > pretty good. I suggest this algorithm for comparison: > if (has NULL) > <return NULL> > else if (same mp_type) \\MP_TYPE == MEM_TYPE here. > <compare> > else if (both numeric) > <compare> > else if (has SCALAR) > <compare using SCALAR rules> > else if (STRING compared with numeric) > <if possible cast string to number and compare, else error> > else > <error> > > In this case, if we still decide to remove the implicit cast from > STRING, we can do this quite easily. > > Please note that this is a simplified version. For example, we do > not always need to return NULL if one of the operands is NULL. > > In addition, this implementation means that we are moving the > implicit cast for comparison from OP_ApplyType to opcodes > responsible for comparing. > > What do you think about this? > Actually, currently it partly works this way: tarantool> CREATE TABLE t(i INT PRIMARY KEY, a SCALAR); --- - row_count: 1 ... tarantool> INSERT INTO t VALUES (1,1), (2,2), (3,3), (4,4), (5,5); --- - row_count: 5 ... tarantool> SELECT COUNT(i) FROM t WHERE '2' > a; --- - metadata: - name: COUNT(i) type: integer rows: - [5] ... Still, not always: tarantool> SELECT COUNT(i) FROM t WHERE '2' > a + 1; --- - metadata: - name: COUNT(i) type: integer rows: - [0] ... Also, it doesn't work for constants: tarantool> select 2 > '1'; --- - metadata: - name: 2 > '1' type: boolean rows: - [true] ... tarantool> select 2 > CAST('1' AS SCALAR); --- - metadata: - name: 2 > CAST('1' AS SCALAR) type: boolean rows: - [true] ... ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2020-02-11 13:32 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-21 11:11 [Tarantool-discussions] Implicit cast for COMPARISON Mergen Imeev 2020-01-21 18:09 ` Peter Gulutzan 2020-01-22 14:13 ` Mergen Imeev 2020-01-22 14:24 ` Nikita Pettik 2020-01-30 18:51 ` Konstantin Osipov 2020-01-22 16:43 ` Peter Gulutzan 2020-01-30 18:52 ` Konstantin Osipov 2020-02-03 11:35 ` Mergen Imeev 2020-02-03 15:02 ` Konstantin Osipov 2020-02-04 18:50 ` Nikita Pettik 2020-02-05 2:57 ` Peter Gulutzan 2020-02-07 14:24 ` Nikita Pettik 2020-02-07 14:40 ` Konstantin Osipov 2020-02-07 22:30 ` Peter Gulutzan 2020-02-11 13:32 ` Mergen Imeev 2020-02-05 7:52 ` Mergen Imeev 2020-02-06 12:41 ` Mergen Imeev 2020-02-06 13:09 ` Mergen Imeev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox