From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rhino.ch-server.com (rhino.ch-server.com [209.59.190.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 6EB4E46970E for ; Sat, 8 Feb 2020 01:31:01 +0300 (MSK) References: <20200121111108.GA18881@tarantool.org> <20200130185216.GC26109@atlas> <20200203113519.GA9896@tarantool.org> <20200204185011.GF1049@tarantool.org> <20200207142446.GA1110@tarantool.org> From: Peter Gulutzan Message-ID: <5cc6a434-18d2-c149-aa62-e5fa097b0882@ocelot.ca> Date: Fri, 7 Feb 2020 15:30:57 -0700 MIME-Version: 1.0 In-Reply-To: <20200207142446.GA1110@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-US Subject: Re: [Tarantool-discussions] Implicit cast for COMPARISON List-Id: Tarantool development process List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikita Pettik Cc: tarantool-discussions@dev.tarantool.org Hi, On 2020-02-07 7:24 a.m., Nikita Pettik wrote: > > To sum up, my proposal is to allow comparing scalar values with values of > any other type. On the other hand, result of such operations like > assignment or addition should depend on particular values of SCALAR > type. > > Alternatively, we can abandon this idea and operate apply comparison > rules depending purely on mp_ types of particular scalar values. But > then rules in NoSQL and SQL will be different. > I hope that you will abandon this idea, but let us pretend that both these behaviour changes are accepted: (1) No implicit cast (2) Values can be SCALAR with these implications or follow-ups. 1. A value is SCALAR if and only if: It is an item in a column defined as SCALAR, or It is the result of CAST(value AS SCALAR), or It is the result LUA('return scalar-value') or similar Lua function. 2. If a value is SCALAR, then it has two types: according to SQL, and according to Lua. Example: box.execute([[INSERT INTO b VALUES ('a');]]) box.execute([[CREATE TABLE b (s1 SCALAR PRIMARY KEY);]] box.execute([[SELECT TYPEOF(s1) FROM b;]]) type(box.space.B:select('a')[1][1]) The SQL statement will return 'scalar', the Lua statement will return 'string'. You might decide to redefine or rename TYPEOF(), but you cannot avoid that there are now two types, and probably need a new built-in function PRIMITIVE_TYPE(). 3. For deciding whether an operation "is legal", we depend on defined type. For example, LENGTH(x) is legal if defined type is STRING or VARBINARY or SCALAR. For deciding whether an operation "is an error", we depend on primitive type. For example, if scalar_value = 5.5 then LENGTH(scalar_value) is an error although it is legal. Presumably the error is a "runtime error" although I suppose it is possible to know in advance whether all values of a scalar column have the same primitive type. 4. These situations are subject to "legal but runtime error" because the scalar value's primitive type is expected to be a number: Arithmetic operators + - * / % << >> & | Functions ABS(), CHAR(), AVG(), SUM(), TOTAL(). The result is never SCALAR. 5. These situations are subject to "legal but runtime error" because the scalar value's primitive type is expected to be a BOOLEAN: Logical operators NOT AND OR. The result is never SCALAR. Example: if scalar_value is FALSE, then SELECT scalar_value OR scalar_value is legal and result data type is BOOLEAN (not SCALAR). Example: if scalar_value is 'FALSE', then SELECT NOT scalar_value is a runtime error (never cast). 6. These situations are subject to "legal but runtime error" because the scalar value's primitive type is expected to be a STRING: There are situations where a primitive value is expected to be a STRING: Functions LOWER() PRINTF() QUOTE() REPLACE() SOUNDEX() TRIM() UNICODE() UPPER() Clause COLLATE 7. These situations are subject to "legal but runtime error" because the scalar value's primitive type is expected to be STRING or VARBINARY: Situations where primitive value is expected to be STRING or VARBINARY: Functions HEX() LENGTH() POSITION() SUBSTR() 8. The concatenation situation looks like a special case because the scalar value's primitive type is expected to be STRING or VARBINARY: If primitive types are not the same, error. If primitive types are not either STRING or VARBINARY, error. If scalar_value || scalar_value: okay, result is SCALAR. If scalar_value || string_value: okay, result is STRING. If scalar_value || varbinary_value: okay, result is VARBINARY. This is just a guess, and it is slightly different from what you decided for arithmetic operations. The same rules apply for GROUP_CONCAT(). 9. These situations are not subject to "legal but runtime error": Functions GREATEST() LEAST() Example: LEAST(FALSE, 1, 'x', X'44') is FALSE and result data type is BOOLEAN. This is the current behaviour, and it means that we follow SCALAR rules even though none of the operands is SCALAR. The same rules apply for MIN() or MAX() if the operand is SCALAR. 10. Comparison operations involving a SCALAR value cause: Comparison according to SCALAR rules, therefore always legal. Comparison operations > = < <= > >= LIKE BETWEEN CASE Implied comparisons UNIQUE DISTINCT GROUP ORDER UNION Example: CREATE TABLE t (s1 SCALAR PRIMARY KEY); INSERT INTO t VALUES (FALSE), (1), ('1'), (X'31'); SELECT * FROM t WHERE s1 = '1' OR s1 > '1'; Currently this causes a "type mismatch" error. But this error will be fixed if your plan is accepted. 11. Assignment to scalar from any data type is always legal. Assignment from scalar to any data type is legal if and only if the primitive type is compatible. 12 SELECT scalar_column UNION SELECT non_scalar_column is SCALAR. (Also EXCEPT and INTERSECT) In emails in March 2019 I wrote about compatibility with other DBMSs and with ODBC. The thread was "The rules for the scalar data type in SQL". I'll quote the first email as a reminder, some of it is still relevant. By the way K. Yukhin replied: "As far as I understand SCALAR is a bag of types. No value can ever have type SCALAR." -------- Forwarded Message -------- Subject:     The rules for the scalar data type in SQL Date:     Fri, 8 Mar 2019 16:31:02 -0700 From:     Peter Gulutzan To:     Nikita Pettik CC:     dev@tarantool.org Hi, What are the rules for the SCALAR data type in SQL? The decisions have been made, but eventually I'll need to find out what they are. So I'll guess and see whether anyone objects. Tarantool had two choices: Choice#1: SCALAR is a "primitive" data type, on the same level as INTEGER, VARCHAR, BLOB, etc. Choice#2: SCALAR is a "complex" data type, two instances can have two different primitive data types. Tarantool chose #2. I could describe this with terms like "union as in C but without labels", or "limited polymorphism", etc. but fear that using analogies would cause arguments. I can think of only three SQL DBMSs that have something close to this: Hive UNIONTYPE https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-UnionTypesunionUnionTypes SQL Server SQL_VARIANT Re SQL Server SQL_VARIANT https://docs.microsoft.com/en-us/sql/t-sql/data-types/sql-variant-transact-sql?view=sql-server-2017 Oracle ANYDATA https://docs.oracle.com/database/121/ARPLS/t_anydat.htm#ARPLS077 This is good because Tarantool can claim to have a big feature which other open-source DBMSs lack, that has resulted from its "NoSQL + SQL" heritage. Although some of the same functionality is available via structured user-defined types or JSON or XML, we're talking about a simple fully-predefined solution. 1. The name of the data type is SCALAR, not UNIONTYPE or SQL_VARIANT or ANYDATA, to be compatible with Tarantool/NoSQL. There are no other words in a SCALAR item's definition, for example "SCALAR(5000)" is illegal and "SCALAR COLLATE x" is illegal and Hive-like "SCALAR" is illegal. SCALAR is a new reserved word. 2. Any item defined as SCALAR has an underlying primitive type. For example, after CREATE TABLE t (s1 SCALAR PRIMARY KEY); INSERT INTO t VALUES (55),(X'41'); The underlying primitive type of the item in the first row is INTEGER because literal 55 has data type INTEGER, and the underlying primitive type in the second row is BLOB (when we start supporting BLOBs again). An item's primitive type is far more important than its defined type. Incidentally Tarantool might find the primitive type by looking at the way MsgPack stores, but that is an implementation detail. 3. SCALAR is not a supertype. Before Choice#1 was accepted, I tried to apply the logic of supertypes for further rules, for example saying: "in integer-versus-real comparisons we can coerce the integer to the real because real is a supertype, and SCALAR is a supertype of both integer and real because all their instances can be represented in SCALAR, therefore for real-to-scalar comparisons we can coerce the real to a scalar". But Choice#1 was rejected, so supertype logic is illegal. 4. There is no literal syntax which implies data type SCALAR. Standard SQL indicates date/time/timestamp literals by preceding them with a data type name e.g. DATE '1990-01-01', and we could do the equivalent thing by saying SCALAR X'41' etc., but we won't because it would be meaningless, the primitive type of SCALAR X'41' would still be BLOB and that is what matters. 5. TYPEOF(x) is never SCALAR, it is always the underlying data type. This is true even if x is null (we still think there is a "data type" named "null"). In fact there is no function that is guaranteed to return the defined data type. For example, TYPEOF(CAST(1 AS SCALAR)); returns INTEGER, not SCALAR. But sometimes I still see 'blob'. 6. For any operation that requires casting from an item defined as SCALAR, the syntax is legal but the operation may fail at runtime. At runtime, Tarantool detects the underlying primitive data type and applies the rules for that. For example, if a definition is CREATE TABLE t (s1 INT PRIMARY KEY, s2 SCALAR); and within any row s2 = 'a', that is, its underlying primitive type is varchar/char/text or whatever we decide to call character strings, then UPDATE t SET s1 = s2; is illegal. We usually do not know that in advance. 7. For any dyadic operation that requires implicit casting to an item defined as SCALAR, the syntax is legal and the operation will not fail at runtime. Take this situation: comparison with SCALAR and a primitive type. The implicit cast is primitive-to-scalar not scalar-to-primitive. For example, after CREATE TABLE t (s1 SCALAR PRIMARY KEY); INSERT INTO t VALUES (X'41'); SELECT * FROM t WHERE s1 > 'a'; The comparison is valid, because we know the ordering of X'41' and 'a' in Tarantool/NoSQL 'scalar'. This is not because of supertyping (see Rule#3), and we do not have implicit casting to SCALAR (see Rule#6), so I think this is inconsistent, but it looks better than an error. 8. The result data type of min/max/sum operation on a column defined as SCALAR is SCALAR, unless the result values is NULL. For example: CREATE TABLE t (s1 INT, s2 SCALAR PRIMARY KEY); INSERT INTO t VALUES (1,X'44'),(2,11),(3,1E4),(4,'a'); SELECT MIN(s2),MAX(s2),SUM(s2) FROM t; The result is: - - [11, 'D', 10011]. That is only possible with Tarantool/NoSQL scalar rules, but we have to ignore the illegal arithmetic (that X'44' + 11 is illegal). Sometimes typeof(min-max-sum(s2)) is 'blob' but that's just a bug. 9. The result data type of a primitive combination is never SCALAR, because we in effect use TYPEOF(item) not the defined data type of the item. (I use the word "combination" in the way that the standard document uses it for section ""Result of data type combinations".) Therefore for MAX(1E308, 'a', 0, X'00') which was the subject of an argument in issue#4032 https://github.com/tarantool/tarantool/issues/4032 There I used a "supertype" argument, and Nikita Pettik rejected it, which is proper (see Rule#3). I also admit that standard SQL wouldn't allow this. However, since primitive-to-scalar comparisons are legal, and because I believe the standard-SQL limitation is solely due to lack of a SCALAR data type, I am not convinced that MAX(1E308, 'a', 0, X'00') should cause an error rather than a scalar result. 10. The result data type of union/except/intersect is based on the primitive data type, but does not matter because select x'41' union all select 5; is legal anyway. 11. Casting SCALAR to char/varchar/text results in a char/varchar/text with indefinite size and binary collation. Casting SCALAR to any other primitive type results, as far as I know, in no loss of information. 12. Probably for ODBC the type is binary, as for sql_variant. Peter Gulutzan