[Tarantool-patches] [PATCH v2 2/3] sql: updated explicit conversion table

Mergen Imeev imeevma at tarantool.org
Sun Jun 20 21:52:53 MSK 2021


Thank you for the fixes you did. See my 27 comments below.

1. In general I do not think that it is good idea to squash all previous 
three
patches to one patch. I think that the first of previous patches was 
quite good,
the second was useless and the third had some issues, but now all these 
patches
are squashed.

On Fri, Jun 11, 2021 at 10:48:12AM +0300, Timur Safin wrote:
> * fixes for `BOOLEAN` expressions in explicit converstion tables
> 
> We need to modify explicit casts table according to the RFC developed
> previously in /doc/rfc/5910-consistent-sql-lua-types.md. This patch
> introduces changes where BOOLEAN type involved, thus, for simplicity
> sake, we mark unchanged cells in the table below as '.'
2. I still do not think that path to RFC is needed here. Also, what 
about '.'?

> 
> Since now on BOOLEAN will be only compatible with itself and STRINGs
> (and recursively with SCALAR, which includes both those types). We
> remove all other possible combinations which used to be allowed before.
> 
> We updated all relevant tests we already have for some combinations.
> 
> * introduce `ANY` as target for explicit conversions
> 
> For consistency sake it was decided to provide
> `CAST(anything TO ANY)` as allowed, but still noop.
> 
3. I still do not like that you added cast to unsupported field.

4. Why typeof() of value cast to any returns 'any'?
tarantool> box.execute('select typeof(cast(1 AS any));')
---
- metadata:
   - name: COLUMN_1
     type: string
   rows:
   - ['any']
...

tarantool> box.execute('select typeof(cast(1 AS scalar));')
---
- metadata:
   - name: COLUMN_1
     type: string
   rows:
   - ['integer']
...


> Note, that there is no direct way in SQL to create literal and expression
> of type `ANY`, thus there is no defined CAST from ANY to anything
> else, only to `ANY`.
> 
> * We had to rename %wildcard token to ANYTHING, since we needed
>   token ANY for real life usage.
> 
5. Just a thought - why you cannot use 'ANYTHING' for field type ANY?
Or 'ANY_KW', how it was already done for INTEGER?

> * As a byproduct, we fixed #6010 to make cast to unsigned behaving reasonably
> 
6. I think this change is good enough to have its own patch.

> * To make sure that all consistency checks are systematic, we have introduced
>   special test for checking conversion rules - e_casts.test.lua. This
>   patch introduces explicit table part:
> 
>   * e_casts.test.lua is generating expressions of `CAST(input AS output)`
>     form. All combinations check whether we expectedly succeeded or failed,
>     given the explicit conversion table from RFC 5910-consistent-sql-lua-types.md;
> 
>   * At the moment we skip DECIMALs, ARRAYs and MAPs as not yet
>     fully supported. Need to be revisited later;
> 
>   * NB! there is `-verbose` mode one may activate to enable more detailed
>     reporting during debugging.
> 
7. I used './test-run.py sql-tap/e_casts.test.lua --verbose -j1' and saw 
this:
[001] sql-tap/e_casts.test.lua                        memtx [001] TAP 
version 13
[001] 1..3
[001]     # e_casts - check consistency of implicit conversion table
[001]     1..169
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,any] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,unsigned] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,string] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,double] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,integer] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,boolean] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,varbinary] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,number] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,decimal] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,uuid] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,array] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,map] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[any,scalar] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,any] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,unsigned] 2 ~= 2
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,string] 2 ~= 1
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,double] 2 ~= 1
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,integer] 2 ~= 1
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,boolean] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,varbinary] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,number] 2 ~= 1
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,decimal] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,uuid] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,array] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,map] nil ~= nil
[001]     ok - /home/mergen/work/dev/test/sql-tap/e_casts.test.lua:245 
[unsigned,scalar] 2 ~= 1
...

What is it?

> Relates to #5910, #6009
> Part of #4470
> Fixes #6010
> ---
>  extra/mkkeywordhash.c           |   3 +-
>  src/box/sql/mem.c               |  81 +++-----
>  src/box/sql/parse.y             |   9 +-
>  test/sql-tap/cse.test.lua       |  12 +-
>  test/sql-tap/e_casts.test.lua   | 340 ++++++++++++++++++++++++++++++++
>  test/sql-tap/e_select1.test.lua |   2 +-
>  test/sql-tap/in1.test.lua       |   8 +-
>  test/sql-tap/in4.test.lua       |   2 +-
>  test/sql-tap/misc3.test.lua     |   2 +-
>  test/sql/boolean.result         |  38 +---
>  test/sql/types.result           |  28 ++-
>  11 files changed, 406 insertions(+), 119 deletions(-)
>  create mode 100755 test/sql-tap/e_casts.test.lua
> 
> diff --git a/extra/mkkeywordhash.c b/extra/mkkeywordhash.c
> index 0d998506c..00f65ce8e 100644
> --- a/extra/mkkeywordhash.c
> +++ b/extra/mkkeywordhash.c
> @@ -60,6 +60,7 @@ static Keyword aKeywordTable[] = {
>    { "ALL",                    "TK_ALL",         true  },
>    { "ALTER",                  "TK_ALTER",       true  },
>    { "ANALYZE",                "TK_STANDARD",    true  },
> +  { "ANY",                    "TK_ANY",         true  },
>    { "AND",                    "TK_AND",         true  },
>    { "AS",                     "TK_AS",          true  },
>    { "ASC",                    "TK_ASC",         true  },
> @@ -179,7 +180,7 @@ static Keyword aKeywordTable[] = {
>    { "WITH",                   "TK_WITH",        true  },
>    { "WHEN",                   "TK_WHEN",        true  },
>    { "WHERE",                  "TK_WHERE",       true  },
> -  { "ANY",                    "TK_STANDARD",    true  },
> +  { "ANYTHING",               "TK_STANDARD",    true  },
>    { "ASENSITIVE",             "TK_STANDARD",    true  },
>    { "BLOB",                   "TK_STANDARD",    true  },
>    { "BINARY",                 "TK_ID",          true  },
> diff --git a/src/box/sql/mem.c b/src/box/sql/mem.c
> index 6f3bf52e5..8bb6b672c 100644
> --- a/src/box/sql/mem.c
> +++ b/src/box/sql/mem.c
> @@ -574,17 +574,6 @@ int_to_str0(struct Mem *mem)
>  	return mem_copy_str0(mem, str);
>  }
>  
> -static inline int
> -int_to_bool(struct Mem *mem)
> -{
> -	assert((mem->type & (MEM_TYPE_INT | MEM_TYPE_UINT)) != 0);
> -	mem->u.b = mem->u.i != 0;
> -	mem->type = MEM_TYPE_BOOL;
> -	assert(mem->flags == 0);
> -	mem->field_type = FIELD_TYPE_BOOLEAN;
> -	return 0;
> -}
> -
>  static inline int
>  str_to_str0(struct Mem *mem)
>  {
> @@ -810,28 +799,6 @@ double_to_str0(struct Mem *mem)
>  	return 0;
>  }
>  
> -static inline int
> -double_to_bool(struct Mem *mem)
> -{
> -	assert(mem->type == MEM_TYPE_DOUBLE);
> -	mem->u.b = mem->u.r != 0.;
> -	mem->type = MEM_TYPE_BOOL;
> -	assert(mem->flags == 0);
> -	mem->field_type = FIELD_TYPE_BOOLEAN;
> -	return 0;
> -}
> -
> -static inline int
> -bool_to_int(struct Mem *mem)
> -{
> -	assert(mem->type == MEM_TYPE_BOOL);
> -	mem->u.u = (uint64_t)mem->u.b;
> -	mem->type = MEM_TYPE_UINT;
> -	assert(mem->flags == 0);
> -	mem->field_type = FIELD_TYPE_UNSIGNED;
> -	return 0;
> -}
> -
>  static inline int
>  bool_to_str0(struct Mem *mem)
>  {
> @@ -872,18 +839,37 @@ uuid_to_bin(struct Mem *mem)
>  	return mem_copy_bin(mem, (char *)&mem->u.uuid, UUID_LEN);
>  }
>  
> +int
> +mem_to_uint(struct Mem *mem)
> +{
> +	assert(mem->type < MEM_TYPE_INVALID);
> +	if (mem->type == MEM_TYPE_UINT)
> +		return 0;
> +	if (mem->type == MEM_TYPE_INT) {
> +		mem_set_uint(mem, (uint64_t)mem->u.i);
8. Could you explain to me, what it this? Since when -1 can be cast to 
UNSIGNED?
Where it is described? Why this is allowed?
tarantool> box.execute('select CAST(-1 as unsigned);')
---
- metadata:
   - name: COLUMN_1
     type: unsigned
   rows:
   - [18446744073709551615]
...


> +		return 0;
> +	}
> +	if ((mem->type & (MEM_TYPE_STR | MEM_TYPE_BIN)) != 0)
> +		return bytes_to_uint(mem);
> +	if (mem->type == MEM_TYPE_DOUBLE)
> +		return double_to_uint(mem);
> +	return -1;
> +}
> +
>  int
>  mem_to_int(struct Mem *mem)
>  {
>  	assert(mem->type < MEM_TYPE_INVALID);
> -	if ((mem->type & (MEM_TYPE_INT | MEM_TYPE_UINT)) != 0)
> +	if (mem->type == MEM_TYPE_INT)
> +		return 0;
> +	if (mem->type == MEM_TYPE_UINT) {
> +		mem_set_int(mem, (int64_t)mem->u.u, false);
9. So, you want to set unsigned value as integer. Could you tell me what 
does
this change do?

>  		return 0;
> +	}
>  	if ((mem->type & (MEM_TYPE_STR | MEM_TYPE_BIN)) != 0)
>  		return bytes_to_int(mem);
>  	if (mem->type == MEM_TYPE_DOUBLE)
>  		return double_to_int(mem);
> -	if (mem->type == MEM_TYPE_BOOL)
> -		return bool_to_int(mem);
>  	return -1;
>  }
>  
> @@ -919,8 +905,6 @@ mem_to_number(struct Mem *mem)
>  	assert(mem->type < MEM_TYPE_INVALID);
>  	if (mem_is_num(mem))
>  		return 0;
> -	if (mem->type == MEM_TYPE_BOOL)
> -		return bool_to_int(mem);
>  	if ((mem->type & (MEM_TYPE_STR | MEM_TYPE_BIN)) != 0) {
>  		if (bytes_to_int(mem) == 0)
>  			return 0;
> @@ -994,19 +978,7 @@ mem_cast_explicit(struct Mem *mem, enum field_type type)
>  	}
>  	switch (type) {
>  	case FIELD_TYPE_UNSIGNED:
> -		switch (mem->type) {
> -		case MEM_TYPE_UINT:
> -			return 0;
> -		case MEM_TYPE_STR:
> -		case MEM_TYPE_BIN:
> -			return bytes_to_uint(mem);
> -		case MEM_TYPE_DOUBLE:
> -			return double_to_int(mem);
> -		case MEM_TYPE_BOOL:
> -			return bool_to_int(mem);
> -		default:
> -			return -1;
> -		}
> +		return mem_to_uint(mem);
>  	case FIELD_TYPE_STRING:
>  		return mem_to_str(mem);
>  	case FIELD_TYPE_DOUBLE:
> @@ -1017,13 +989,8 @@ mem_cast_explicit(struct Mem *mem, enum field_type type)
>  		switch (mem->type) {
10. Just a thought. I see no harm to have this switch here, but it has 
only two
options. Why not change it to 'if'?

>  		case MEM_TYPE_BOOL:
>  			return 0;
> -		case MEM_TYPE_INT:
> -		case MEM_TYPE_UINT:
> -			return int_to_bool(mem);
>  		case MEM_TYPE_STR:
>  			return str_to_bool(mem);
> -		case MEM_TYPE_DOUBLE:
> -			return double_to_bool(mem);
>  		default:
>  			return -1;
>  		}
> @@ -1049,6 +1016,8 @@ mem_cast_explicit(struct Mem *mem, enum field_type type)
>  		if ((mem->type & (MEM_TYPE_MAP | MEM_TYPE_ARRAY)) != 0)
>  			return -1;
>  		return 0;
> +	case FIELD_TYPE_ANY:
> +		return 0;
>  	default:
>  		break;
>  	}
> diff --git a/src/box/sql/parse.y b/src/box/sql/parse.y
> index bd041e862..fbff6ffa7 100644
> --- a/src/box/sql/parse.y
> +++ b/src/box/sql/parse.y
> @@ -270,7 +270,7 @@ columnlist ::= tcons.
>    QUERY KEY OFFSET RAISE RELEASE REPLACE RESTRICT
>    RENAME CTIME_KW IF ENABLE DISABLE UUID
>    .
> -%wildcard ANY.
> +%wildcard ANYTHING.
>  
>  
>  // And "ids" is an identifer-or-string.
> @@ -1113,7 +1113,7 @@ expr(A) ::= expr(A) COLLATE id(C). {
>    A.zEnd = &C.z[C.n];
>  }
>  
> -expr(A) ::= CAST(X) LP expr(E) AS typedef(T) RP(Y). {
> +expr(A) ::= CAST(X) LP expr(E) AS cast_typedef(T) RP(Y). {
>    spanSet(&A,&X,&Y); /*A-overwrites-X*/
>    A.pExpr = sql_expr_new_dequoted(pParse->db, TK_CAST, NULL);
>    if (A.pExpr == NULL) {
> @@ -1887,3 +1887,8 @@ number_typedef(A) ::= UNSIGNED . { A.type = FIELD_TYPE_UNSIGNED; }
>   *   (void) C;
>   *}
>   */
> +
> +// cast_typedef is almost typedef but with ANY type enabled
> +%type cast_typedef { struct type_def }
> +cast_typedef(A) ::= typedef(T) . { A = T; }
> +cast_typedef(A) ::= ANY . { A.type = FIELD_TYPE_ANY; }
> diff --git a/test/sql-tap/cse.test.lua b/test/sql-tap/cse.test.lua
> index e09f955a0..932f83545 100755
> --- a/test/sql-tap/cse.test.lua
> +++ b/test/sql-tap/cse.test.lua
> @@ -31,7 +31,7 @@ test:do_test(
>              INSERT INTO t1 VALUES(2,21,22,23,24,25);
>          ]]
>          return test:execsql [[
> -            SELECT b, -b, ~b, NOT CAST(b AS BOOLEAN), NOT NOT CAST(b AS BOOLEAN), b-b, b+b, b*b, b/b, b FROM t1
> +            SELECT b, -b, ~b, NOT (b <> 0), NOT NOT (b <> 0), b - b, b + b, b * b, b / b, b FROM t1
>          ]]
>      end, {
>          -- <cse-1.1>
> @@ -102,7 +102,7 @@ test:do_execsql_test(
>  test:do_execsql_test(
>      "cse-1.6.3",
>      [[
> -        SELECT CASE WHEN CAST(b AS BOOLEAN) THEN d WHEN CAST(e AS BOOLEAN)  THEN f ELSE 999 END, b, c, d FROM t1
> +        SELECT CASE WHEN b <> 0 THEN d WHEN e <> 0 THEN f ELSE 999 END, b, c, d FROM t1
>      ]], {
>          -- <cse-1.6.3>
>          13, 11, 12, 13, 23, 21, 22, 23
> @@ -112,7 +112,7 @@ test:do_execsql_test(
>  test:do_execsql_test(
>      "cse-1.6.4",
>      [[
> -        SELECT b, c, d, CASE WHEN CAST(b AS BOOLEAN) THEN d WHEN CAST(e AS BOOLEAN) THEN f ELSE 999 END FROM t1
> +        SELECT b, c, d, CASE WHEN b <> 0 THEN d WHEN e <> 0 THEN f ELSE 999 END FROM t1
>      ]], {
>          -- <cse-1.6.4>
>          11, 12, 13, 13, 21, 22, 23, 23
> @@ -122,7 +122,7 @@ test:do_execsql_test(
>  test:do_execsql_test(
>      "cse-1.6.5",
>      [[
> -        SELECT b, c, d, CASE WHEN false THEN d WHEN CAST(e AS BOOLEAN) THEN f ELSE 999 END FROM t1
> +        SELECT b, c, d, CASE WHEN false THEN d WHEN e <> 0 THEN f ELSE 999 END FROM t1
>      ]], {
>          -- <cse-1.6.5>
>          11, 12, 13, 15, 21, 22, 23, 25
> @@ -132,7 +132,7 @@ test:do_execsql_test(
>  test:do_execsql_test(
>      "cse-1.7",
>      [[
> -        SELECT a, -a, ~a, NOT CAST(a AS BOOLEAN), NOT NOT CAST(a AS BOOLEAN), a-a, a+a, a*a, a/a, a FROM t1
> +        SELECT a, -a, ~a, NOT (a <> 0), NOT NOT (a <> 0), a - a, a + a, a * a, a / a, a FROM t1
>      ]], {
>          -- <cse-1.7>
>          1, -1 ,-2, false, true, 0, 2, 1, 1, 1, 2, -2, -3, false, true, 0, 4, 4, 1, 2
> @@ -152,7 +152,7 @@ test:do_execsql_test(
>  test:do_execsql_test(
>      "cse-1.9",
>      [[
> -        SELECT NOT CAST(b AS BOOLEAN), ~b, NOT NOT CAST(b AS BOOLEAN), b FROM t1
> +        SELECT NOT (b <> 0), ~b, NOT NOT (b <> 0), b FROM t1
>      ]], {
>          -- <cse-1.9>
>          false, -12, true, 11, false, -22, true, 21
> diff --git a/test/sql-tap/e_casts.test.lua b/test/sql-tap/e_casts.test.lua
> new file mode 100755
> index 000000000..32d7e8e0c
> --- /dev/null
> +++ b/test/sql-tap/e_casts.test.lua
> @@ -0,0 +1,340 @@
> +#!/usr/bin/env tarantool
> +local tap = require("tap")
> +local test = tap.test("errno")
> +
> +test:plan(1)
> +
> +local yaml = require("yaml")
> +local ffi = require("ffi")
> +
> +local verbose = 0
> +
> +if arg[1] == '-v' or arg[1] == '--verbose' then
> +    verbose = 1
> +end
> +
> +ffi.cdef [[
> +    enum field_type {
> +        FIELD_TYPE_ANY = 0,
> +        FIELD_TYPE_UNSIGNED,
> +        FIELD_TYPE_STRING,
> +        FIELD_TYPE_NUMBER,
> +        FIELD_TYPE_DOUBLE,
> +        FIELD_TYPE_INTEGER,
> +        FIELD_TYPE_BOOLEAN,
> +        FIELD_TYPE_VARBINARY,
> +        FIELD_TYPE_SCALAR,
> +        FIELD_TYPE_DECIMAL,
> +        FIELD_TYPE_UUID,
> +        FIELD_TYPE_ARRAY,
> +        FIELD_TYPE_MAP,
> +        field_type_MAX
> +    };
> +]]
11. Since this cdef is only copy-paste from current actual enum, I still see
not reason to keep it here. Or you could use one from module.h, if you want
to keep it.

> +
> +-- Date/time/interval types to be uncommented and used
> +-- once corresponding box implementation completed
> +local t_any = ffi.C.FIELD_TYPE_ANY
> +local t_unsigned = ffi.C.FIELD_TYPE_UNSIGNED
> +local t_string = ffi.C.FIELD_TYPE_STRING
> +local t_number = ffi.C.FIELD_TYPE_NUMBER
> +local t_double = ffi.C.FIELD_TYPE_DOUBLE
> +local t_integer = ffi.C.FIELD_TYPE_INTEGER
> +local t_boolean = ffi.C.FIELD_TYPE_BOOLEAN
> +local t_varbinary = ffi.C.FIELD_TYPE_VARBINARY
> +local t_scalar = ffi.C.FIELD_TYPE_SCALAR
> +local t_decimal = ffi.C.FIELD_TYPE_DECIMAL
> +-- local t_date = -1
> +-- local t_time = -2
> +-- local t_timestamp = -3
> +-- local t_interval = -4
> +local t_uuid = ffi.C.FIELD_TYPE_UUID
> +local t_array = ffi.C.FIELD_TYPE_ARRAY
> +local t_map = ffi.C.FIELD_TYPE_MAP
> +
> +local proper_order = {
> +    t_any,
> +    t_unsigned,
> +    t_string,
> +    t_double,
> +    t_integer,
> +    t_boolean,
> +    t_varbinary,
> +    t_number,
> +    t_decimal,
> +    t_uuid,
> +    -- t_date,
> +    -- t_time,
> +    -- t_timestamp,
> +    -- t_interval,
> +    t_array,
> +    t_map,
> +    t_scalar,
> +}
> +
> +local type_names = {
> +    [t_any]       = 'any',
> +    [t_unsigned]  = 'unsigned',
> +    [t_string]    = 'string',
> +    [t_double]    = 'double',
> +    [t_integer]   = 'integer',
> +    [t_boolean]   = 'boolean',
> +    [t_varbinary] = 'varbinary',
> +    [t_number]    = 'number',
> +    [t_decimal]   = 'decimal',
> +    [t_uuid]      = 'uuid',
> +    -- [t_date]      = 'date',
> +    -- [t_time]      = 'time',
> +    -- [t_timestamp] = 'timestamp',
> +    -- [t_interval]  = 'interval',
> +    [t_array]     = 'array',
> +    [t_map]       = 'map',
> +    [t_scalar]    = 'scalar',
> +}
> +
> +-- not all types implemented/enabled at the moment
> +-- but we do keep their projected status in the
> +-- spec table
> +local enabled_type = {
> +    [t_any]       = false, -- there is no way in SQL to instantiate ANY type expression
> +    [t_unsigned]  = true,
> +    [t_string]    = true,
> +    [t_double]    = true,
> +    [t_integer]   = true,
> +    [t_boolean]   = true,
> +    [t_varbinary] = true,
> +    [t_number]    = true,
> +    [t_decimal]   = false,
> +    [t_uuid]      = true,
> +    -- [t_date]     = false,
> +    -- [t_time]     = false,
> +    -- [t_timestamp]= false,
> +    -- [t_interval] = False,
> +    [t_array]     = false,
> +    [t_map]       = false,
> +    [t_scalar]    = true,
> +}
> +
> +-- Enabled types which may be targets for explicit casts
> +local enabled_type_cast = table.deepcopy(enabled_type)
> +enabled_type_cast[t_any] = true
> +
> +-- table of _TSV_ (tab separated values)
> +-- copied from sql-lua-tables-v5.xls
> +local explicit_casts_table_spec = {
> +    [t_any] =     {"Y", "S", "Y", "S", "S", "S", "S", "S", "S", "S", "S", "S", "S"},
> +    [t_unsigned]= {"Y", "Y", "Y", "Y", "Y", "" , "" , "Y", "Y", "" , "" , "" , "Y"},
> +    [t_string] =  {"Y", "S", "Y", "S", "S", "S", "Y", "S", "S", "S", "S", "S", "Y"},
> +    [t_double] =  {"Y", "S", "Y", "Y", "S", "" , "" , "Y", "Y", "" , "" , "" , "Y"},
> +    [t_integer] = {"Y", "S", "Y", "Y", "Y", "" , "" , "Y", "Y", "" , "" , "" , "Y"},
> +    [t_boolean] = {"Y", "" , "Y", "" , "" , "Y", "" , "" , "" , "" , "" , "" , "Y"},
> +    [t_varbinary]={"Y", "" , "Y", "N", "" , "" , "Y", "" , "" , "S", "" , "" , "Y"},
> +    [t_number] =  {"Y", "S", "Y", "Y", "S", "" , "" , "Y", "Y", "" , "" , "" , "Y"},
> +    [t_decimal] = {"Y", "S", "Y", "S", "S", "" , "" , "Y", "Y", "" , "" , "" , "Y"},
> +    [t_uuid] =    {"Y", "" , "Y", "" , "" , "" , "Y", "" , "" , "Y", "" , "" , "Y"},
> +    [t_array] =   {"Y", "N", "Y", "N", "N", "N", "" , "" , "" , "" , "Y", "" , "N"},
> +    [t_map] =     {"Y", "N", "Y", "N", "N", "N", "" , "" , "" , "" , "" , "Y", "N"},
> +    [t_scalar] =  {"Y", "S", "Y", "S", "S", "S", "S", "S", "S", "S", "" , "" , "Y"},
> +}
12. Again, why there is 'N' and '' both in table? Could you show me where in
RFC we have this table that contains both '' and 'N'?

> +
> +local c_no = 0
> +local c_maybe = 1
> +local c_yes = 2
> +
> +local function normalize_cast(v)
> +    local xlat =  {
> +        ['Y'] = c_yes,
> +        ['S'] = c_maybe,
> +        ['N'] = c_no,
> +    }
> +    return xlat[v ~= nil and v or 'N']
> +end
> +
> +local function human_cast(v)
> +    local xlat = {
> +        [c_yes] = 'Y',
> +        [c_maybe] = 'S',
> +        [c_no] = ' '
> +    }
> +    return xlat[v ~= nil and v or c_no]
> +end
> +
> +local function load_casts_spec(spec_table, enabled_from, enabled_to)
> +    local casts = {}
> +    for _, t_from  in ipairs(proper_order) do
> +        casts[t_from] = {}
> +        for j, t_to  in ipairs(proper_order) do
> +            if enabled_from[t_from] and enabled_to[t_to] then
> +                casts[t_from][t_to] = normalize_cast(spec_table[t_from][j])
> +            end
> +        end
> +    end
> +    return casts
> +end
> +
> +local function label_for(from, to, title)
> +    local parent_frame = debug.getinfo(2, "nSl")
> +    local filename = parent_frame.source:sub(1,1) == "@" and parent_frame.source:sub(2)
> +    local line = parent_frame.currentline
> +    return string.format("%s:%d [%s,%s] %s", filename, line,
> +                         type_names[from], type_names[to], title)
> +end
> +
> +local function show_casts_table(table)
> +    local max_len = #"12. varbinary" + 1
> +
> +    -- show banner
> +    local col_names = ''
> +    for _, t_val in ipairs(proper_order) do
> +        col_names = col_names .. string.format("%2d |", t_val)
> +    end
> +    col_names = string.sub(col_names, 1, #col_names-1)
> +    print(string.format("%"..max_len.."s|%s|", "", col_names))
> +    -- show splitter
> +    local banner = '+---+---+---+---+---+---+---+---+---+---+---+---+---+'
> +    print(string.format("%"..max_len.."s%s", "", banner))
> +
> +    for _, from in ipairs(proper_order) do
> +        local line = ''
> +        for _, to in ipairs(proper_order) do
> +            line = line .. string.format("%2s |", human_cast(table[from][to]))
> +        end
> +        line = string.sub(line, 1, #line-1)
> +        local s = string.format("%2d.%10s |%s|", from, type_names[from], line)
> +        print(s)
> +    end
> +    print(string.format("%"..max_len.."s%s", "", banner))
> +end
13. Since you use '-verbose' mode only in your debug, can you move these
functions out from here? I still do not see any use of this code.

> +
> +local explicit_casts = load_casts_spec(explicit_casts_table_spec, enabled_type, enabled_type_cast)
> +
> +if verbose > 0 then
> +    show_casts_table(explicit_casts)
> +end
> +
> +local function merge_tables(...)
> +    local n = select('#', ...)
> +    local tables = {...}
> +    local result = {}
> +
> +    for i=1,n do
> +        local t = tables[i]
> +        assert(type(tables[i]) == 'table')
> +        for _, v in pairs(t) do
> +            table.insert(result, v)
> +        end
> +    end
> +    return result
> +end
> +
> +local gen_type_samples = {
> +        [t_unsigned]  = {"0", "1", "2"},
14. I believe you once said something about testing values that are close to
all limits of numeric types. Why I cannot see them here?

> +        [t_integer]   = {"-678", "-1", "0", "1", "2", "3", "678"},
> +        [t_double]    = {"0.0", "123.4", "-567.8"},
> +        [t_string]    = {"''", "'1'", "'123.4'", "'-1.5'", "'abc'", "'def'",
> +                        "'TRUE'", "'FALSE'", "'22222222-1111-1111-1111-111111111111'"},
> +        [t_boolean]   = {"false", "true", "null"},
> +        [t_varbinary] = {"X'312E3233'", "X'2D392E3837'", "X'302E30303031'",
> +                        "x'22222222111111111111111111111111'"},
> +        [t_uuid]      = {'uuid()'},
> +}
> +
> +local function gen_type_exprs(type)
> +    if type == t_number then
> +        return merge_tables(gen_type_samples[t_unsigned],
> +                            gen_type_samples[t_integer],
> +                            gen_type_samples[t_double])
> +    end
> +    if type == t_scalar then
> +        return merge_tables(gen_type_samples[t_unsigned],
> +                            gen_type_samples[t_integer],
> +                            gen_type_samples[t_double],
> +                            gen_type_samples[t_string],
> +                            gen_type_samples[t_boolean],
> +                            gen_type_samples[t_varbinary])
> +    end
> +    return gen_type_samples[type] or {}
> +end
> +
> +-- explicit
> +local function gen_explicit_cast_from_to(t_from, t_to)
> +    local queries = {}
> +    local from_exprs = gen_type_exprs(t_from)
> +    local to_typename = type_names[t_to]
> +    for _, expr in pairs(from_exprs) do
> +        table.insert(queries,
> +                     string.format([[ select cast(%s as %s); ]], expr, to_typename))
> +    end
> +    return queries
> +end
> +
> +local function catch_query(query)
> +    local result = {pcall(box.execute, query)}
> +
> +    if not result[1] or result[3] ~= nil then
> +        return false, result[3]
> +    end
> +    return true, result[2]
> +end
15. Since you use all these functions and do not want to remove them, I 
suggest
to add comments to them.

> +
> +-- 1. Check explicit casts table
> +local function test_check_explicit_casts(test)
> +    -- checking validity of all `CAST(from AS to)` combinations
> +    test:plan(322)
> +    for _, from in ipairs(proper_order) do
> +        for _, to in ipairs(proper_order) do
> +            -- skip ANY, DECIMAL, UUID, etc.
16. Why UUID is skipped? Also, why ANY is skipped? You can get values of 
type
ANY by using CAST(values AS ANY), I believe. Or I am wrong?

> +            if enabled_type[from] and enabled_type_cast[to] then
> +                local gen = gen_explicit_cast_from_to(from, to)
> +                local failures = {}
> +                local successes = {}
> +                local castable = false
> +                local expected = explicit_casts[from][to]
> +
> +                if verbose > 0 then
> +                    print(expected, yaml.encode(gen))
> +                end
> +
> +                for _, v in pairs(gen) do
> +                    local ok, result
> +                    ok, result = catch_query(v)
> +
> +                    if verbose > 0 then
> +                        print(string.format("V> ok = %s, result = %s, query = %s",
> +                            ok, result, v))
> +                    end
> +
> +                    local title  = string.format("%s => %s", v, human_cast(expected))
> +                    if expected == c_yes then
> +                        test:ok(true == ok, label_for(from, to, title))
> +                    elseif expected == c_no then
> +                        test:ok(false == ok, label_for(from, to, title))
> +                    else
> +                        -- we can't report immediately for c_maybe because some
> +                        -- cases allowed to fail, so postpone decision
> +                        if ok then
> +                            castable = true
> +                            table.insert(successes, {result, v})
> +                        else
> +                            table.insert(failures, {result, v})
> +                        end
> +                    end
> +                end
> +
> +                -- ok, we aggregated stats for c_maybe mode - check it now
> +                if expected == c_maybe then
> +                        local title  = string.format("%s => %s",
> +                                                    #gen and gen[1]..'...' or '',
> +                                                    human_cast(expected))
> +                        test:ok(castable, label_for(from, to, title),
> +                                failures)
> +                end
16. I already said my expectations about these tests. Since you know 
(and print)
all values used in the test, why cannot you determine if the cast should 
fail?
For example, this test is actually useless if `CAST(1.0 AS UNSIGNED)` 
fails, since
this cast is marked as 'sometimes'. If you do so, you can just print 'Y' 
or 'N'.
Also, why your tests does not check unallowed casts? Or, at least, they 
are not
shown.

> +            end
> +        end
> +    end
> +end
> +
> +test:test("e_casts - check explicit casts", test_check_explicit_casts)
> +
> +test:check()
> +os.exit()
> diff --git a/test/sql-tap/e_select1.test.lua b/test/sql-tap/e_select1.test.lua
> index ab0faa376..28ea1d82f 100755
> --- a/test/sql-tap/e_select1.test.lua
> +++ b/test/sql-tap/e_select1.test.lua
> @@ -910,7 +910,7 @@ test:do_select_tests(
>  
>          {"3", "SELECT sum(b+1) FROM z1 NATURAL LEFT JOIN z3", {-43.06}},
>          {"4", "SELECT sum(b+2) FROM z1 NATURAL LEFT JOIN z3", {-38.06}},
> -        {"5", "SELECT sum(CAST(b IS NOT NULL AS INTEGER)) FROM z1 NATURAL LEFT JOIN z3", {5}},
> +        {"5", "SELECT sum(CASE WHEN b IS NOT NULL THEN 1 ELSE 0 END) FROM z1 NATURAL LEFT JOIN z3", {5}},
>      })
>  
>  -- EVIDENCE-OF: R-26684-40576 Each non-aggregate expression in the
> diff --git a/test/sql-tap/in1.test.lua b/test/sql-tap/in1.test.lua
> index 4b51da6e8..da3c07768 100755
> --- a/test/sql-tap/in1.test.lua
> +++ b/test/sql-tap/in1.test.lua
> @@ -100,7 +100,7 @@ test:do_execsql_test(
>  test:do_execsql_test(
>      "in-1.7",
>      [[
> -        SELECT a+ 100*CAST((a BETWEEN 1 and 3) AS INTEGER) FROM t1 ORDER BY b
> +        SELECT a+ 100*(CASE WHEN (a BETWEEN 1 and 3) THEN 1 ELSE 0 END) FROM t1 ORDER BY b
17. Since you already change this line, add spaces before and after all
operations with two operands.

>      ]], {
>          -- <in-1.7>
>          101, 102, 103, 4, 5, 6, 7, 8, 9, 10
> @@ -157,7 +157,7 @@ test:do_execsql_test(
>  test:do_execsql_test(
>      "in-2.5",
>      [[
> -        SELECT a+100*(CAST(b IN (8,16,24) AS INTEGER)) FROM t1 ORDER BY b
> +        SELECT a+100*(CASE WHEN b IN (8,16,24) THEN 1 ELSE 0 END) FROM t1 ORDER BY b
18. Same.

>      ]], {
>          -- <in-2.5>
>          1, 2, 103, 104, 5, 6, 7, 8, 9, 10
> @@ -207,7 +207,7 @@ test:do_execsql_test(
>  test:do_execsql_test(
>      "in-2.10",
>      [[
> -        SELECT a FROM t1 WHERE LEAST(0, CAST(b IN (a,30) AS INT)) <> 0
> +        SELECT a FROM t1 WHERE LEAST(0, CASE WHEN (b IN (a,30)) THEN 1 ELSE 0 END) <> 0
19. I think a space should be after ',' in '(a,30)'.

>      ]], {
>          -- <in-2.10>
>  
> @@ -253,7 +253,7 @@ test:do_execsql_test(
>  test:do_execsql_test(
>      "in-3.3",
>      [[
> -        SELECT a + 100*(CAST(b IN (SELECT b FROM t1 WHERE a<5) AS INTEGER)) FROM t1 ORDER BY b
> +        SELECT a + 100*(CASE WHEN (b IN (SELECT b FROM t1 WHERE a<5)) THEN 1 ELSE 0 END) FROM t1 ORDER BY b
20. Same as 17.

>      ]], {
>          -- <in-3.3>
>          101, 102, 103, 104, 5, 6, 7, 8, 9, 10
> diff --git a/test/sql-tap/in4.test.lua b/test/sql-tap/in4.test.lua
> index 8442944b9..588daefec 100755
> --- a/test/sql-tap/in4.test.lua
> +++ b/test/sql-tap/in4.test.lua
> @@ -133,7 +133,7 @@ test:do_execsql_test(
>          SELECT b FROM t2 WHERE a IN (1.0, 2.1)
>      ]], {
>          -- <in4-2.6>
> -        "one"
> +        "one", "two"
21. Why the result changed?

>          -- </in4-2.6>
>      })
>  
> diff --git a/test/sql-tap/misc3.test.lua b/test/sql-tap/misc3.test.lua
> index 313484b5d..c2dc67355 100755
> --- a/test/sql-tap/misc3.test.lua
> +++ b/test/sql-tap/misc3.test.lua
> @@ -510,7 +510,7 @@ test:do_execsql_test(
>  test:do_execsql_test(
>      "misc-8.2",
>      [[
> -        SELECT count(*) FROM t3 WHERE 1+CAST((b IN ('abc','xyz')) AS INTEGER)==2
> +        SELECT count(*) FROM t3 WHERE b IN ('abc','xyz')
22. What does this test checks now and what it checked before?

>      ]], {
>          -- <misc-8.2>
>          2
> diff --git a/test/sql/boolean.result b/test/sql/boolean.result
> index 177a39fb9..b268eb2fe 100644
> --- a/test/sql/boolean.result
> +++ b/test/sql/boolean.result
> @@ -502,23 +502,13 @@ INSERT INTO t3 VALUES (4, false)
>  -- Check CAST from BOOLEAN to the other types.
>  SELECT cast(true AS INTEGER), cast(false AS INTEGER);
>   | ---
> - | - metadata:
> - |   - name: COLUMN_1
> - |     type: integer
> - |   - name: COLUMN_2
> - |     type: integer
> - |   rows:
> - |   - [1, 0]
> + | - null
> + | - 'Type mismatch: can not convert TRUE to integer'
23. Since all these CAST are now fails, I think it is better to divide this
query into three. Otherwise only the first cast is checked.

>   | ...
>  SELECT cast(true AS NUMBER), cast(false AS NUMBER);
>   | ---
> - | - metadata:
> - |   - name: COLUMN_1
> - |     type: number
> - |   - name: COLUMN_2
> - |     type: number
> - |   rows:
> - |   - [1, 0]
> + | - null
> + | - 'Type mismatch: can not convert TRUE to number'
24. Same.

>   | ...
>  -- gh-4462: ensure that text representation is uppercase.
>  SELECT cast(true AS TEXT), cast(false AS TEXT);
> @@ -545,25 +535,13 @@ SELECT cast(true AS BOOLEAN), cast(false AS BOOLEAN);
>  -- Check CAST to BOOLEAN from the other types.
>  SELECT cast(100 AS BOOLEAN), cast(1 AS BOOLEAN), cast(0 AS BOOLEAN);
>   | ---
> - | - metadata:
> - |   - name: COLUMN_1
> - |     type: boolean
> - |   - name: COLUMN_2
> - |     type: boolean
> - |   - name: COLUMN_3
> - |     type: boolean
> - |   rows:
> - |   - [true, true, false]
> + | - null
> + | - 'Type mismatch: can not convert 100 to boolean'
>   | ...
25. Same.

>  SELECT cast(0.123 AS BOOLEAN), cast(0.0 AS BOOLEAN);
>   | ---
> - | - metadata:
> - |   - name: COLUMN_1
> - |     type: boolean
> - |   - name: COLUMN_2
> - |     type: boolean
> - |   rows:
> - |   - [true, false]
> + | - null
> + | - 'Type mismatch: can not convert 0.123 to boolean'
>   | ...
26. Same.

>  SELECT cast('true' AS BOOLEAN), cast('false' AS BOOLEAN);
>   | ---
> diff --git a/test/sql/types.result b/test/sql/types.result
> index 687ca3b15..90a8bc5ec 100644
> --- a/test/sql/types.result
> +++ b/test/sql/types.result
> @@ -1035,11 +1035,8 @@ box.execute("SELECT CAST(18446744073709551615 AS SCALAR);")
>  ...
>  box.execute("SELECT CAST(18446744073709551615 AS BOOLEAN);")
>  ---
> -- metadata:
> -  - name: COLUMN_1
> -    type: boolean
> -  rows:
> -  - [true]
> +- null
> +- 'Type mismatch: can not convert 18446744073709551615 to boolean'
>  ...
>  box.execute("SELECT CAST('18446744073709551615' AS INTEGER);")
>  ---
> @@ -1084,8 +1081,11 @@ box.execute("SELECT CAST(123 AS UNSIGNED);")
>  ...
>  box.execute("SELECT CAST(-123 AS UNSIGNED);")
>  ---
> -- null
> -- 'Type mismatch: can not convert -123 to unsigned'
> +- metadata:
> +  - name: COLUMN_1
> +    type: unsigned
> +  rows:
> +  - [18446744073709551493]
>  ...
27. Where does such behaviour is described?

>  box.execute("SELECT CAST(1.5 AS UNSIGNED);")
>  ---
> @@ -1097,19 +1097,13 @@ box.execute("SELECT CAST(1.5 AS UNSIGNED);")
>  ...
>  box.execute("SELECT CAST(-1.5 AS UNSIGNED);")
>  ---
> -- metadata:
> -  - name: COLUMN_1
> -    type: unsigned
> -  rows:
> -  - [-1]
> +- null
> +- 'Type mismatch: can not convert -1.5 to unsigned'
>  ...
>  box.execute("SELECT CAST(true AS UNSIGNED);")
>  ---
> -- metadata:
> -  - name: COLUMN_1
> -    type: unsigned
> -  rows:
> -  - [1]
> +- null
> +- 'Type mismatch: can not convert TRUE to unsigned'
>  ...
>  box.execute("SELECT CAST('123' AS UNSIGNED);")
>  ---
> -- 
> 2.29.2
> 


More information about the Tarantool-patches mailing list