[Tarantool-patches] [PATCH 3/3] sql: introduced explicit casts test e_casts.test.lua
Timur Safin
tsafin at tarantool.org
Thu Jun 3 00:04:39 MSK 2021
: From: Mergen Imeev <imeevma at tarantool.org>
: Sent: Tuesday, June 1, 2021 5:03 PM
: To: Timur Safin <tsafin at tarantool.org>
: Cc: tarantool-patches at dev.tarantool.org
: Subject: Re: [PATCH 3/3] sql: introduced explicit casts test
: e_casts.test.lua
:
: Thank you for the patch. See 13 comments below.
:
: On Tue, May 25, 2021 at 12:01:02PM +0300, Timur Safin wrote:
: > * e_casts.test.lua is generating expressions of `cast(input as output)`
: > form. And checks whether we expectedly succeed or fail, given the
: > previously agreed excplicit conversion table from
: > /doc/rfc/5910-consistent-sql-lua-types.md;
: >
: > * At the moment we skip DECIMALs, UUIDs, ARRAYs and MAPs as not yet
: > fully supported. Need to be reenabled later;
: >
: > * there is `-verbose` mode one may activate to enable more detailed
: > reporting.
: 1. How to use this mode? I mean,
: ./test-run.py sql-tap/e_casts.test.lua --verbose
: gives a lot simpler result that I expect from this much of code.
For debugging of various failures I run it directly via script as simple as
that:
20:05 $ ~/bin/sql-tap-run.sh sql-tap/e_casts.test.lua -v | head -40
TAP version 13
1..3
| 0 | 1 | 2 | 4 | 5 | 6 | 7 | 3 | 9 |10 |11 |12 | 8 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+
0. any | | | | | | | | | | | | | |
1. unsigned | | Y | Y | Y | Y | | | Y | | | | | Y |
2. string | | S | Y | S | S | S | Y | S | | | | | Y |
4. double | | S | Y | Y | S | | | Y | | | | | Y |
5. integer | | S | Y | Y | Y | | | Y | | | | | Y |
6. boolean | | | Y | | | Y | | | | | | | Y |
7. varbinary | | | Y | | | | Y | | | | | | Y |
3. number | | S | Y | Y | S | | | Y | | | | | Y |
9. decimal | | | | | | | | | | | | | |
10. uuid | | | | | | | | | | | | | |
11. array | | | | | | | | | | | | | |
12. map | | | | | | | | | | | | | |
8. scalar | | S | Y | S | S | S | S | S | | | | | Y |
+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 4 | 5 | 6 | 7 | 3 | 9 |10 |11 |12 | 8 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+
0. any | | | | | | | | | | | | | |
1. unsigned | | Y | Y | Y | Y | | | Y | | | | | Y |
2. string | | S | Y | S | S | | Y | S | | | | | Y |
4. double | | S | Y | Y | S | | | Y | | | | | Y |
5. integer | | S | Y | Y | Y | | | Y | | | | | Y |
6. boolean | | | | | | Y | | | | | | | Y |
7. varbinary | | | Y | | | | Y | | | | | | Y |
3. number | | S | Y | Y | S | | | Y | | | | | Y |
9. decimal | | | | | | | | | | | | | |
10. uuid | | | | | | | | | | | | | |
11. array | | | | | | | | | | | | | |
12. map | | | | | | | | | | | | | |
8. scalar | | S | S | S | S | S | S | S | | | | | S |
+---+---+---+---+---+---+---+---+---+---+---+---+---+
# e_casts - check consistency of implicit conversion table
1..169
ok - sql-tap/e_casts.test.lua:244 [any,any] nil ~= nil
ok - sql-tap/e_casts.test.lua:244 [any,unsigned] nil ~= nil
ok - sql-tap/e_casts.test.lua:244 [any,string] nil ~= nil
ok - sql-tap/e_casts.test.lua:244 [any,double] nil ~= nil
...
I need to see those tables (explicit and implicit) visualized to make
sure I have them correctly entered and be compatible with written to RFC.
This output is incompatible (and unnecessary) for TAP mode, that's why it's
not activated by default.
:
: 2. What means e_casts? If it is for explicit casts, why not name it
: something
: like "explicit_cast.test.lua".
Because it's not planned to be only about explicit casts :)
[You'll see it's development in the next version of patchset]
And this strange e_* prefix is inherited from the times when I wanted
to resurrect e_expr.test.lua Now, having asked about it, I do realize
it makes no much sense and probably should be removed.
Will rename to sql-tap/casts.test.lua
:
: 3. As far as I remember, "sometimes" doesn't mean that CAST(1.0 as UNSIGNED)
: can return 1 or can throw an error. It means that there are rules that
: define
: when values of one type can be converted to values of another type. So,
: instead of checking failures for "maybe" we can check that "from" value
: should fail or not according to these rules and use proper test.
Yes, sometimes meant that some part of data might be safely converted, but
others will not. And here we should prepare proper set where some values
are valid for conversion and others are not. (See my answer below)
:
: Also, using this approach we can try to use randomized "from" values. Just
: thought.
I thought about it some day, but no, randomized data will introduce unnecessary
volatility to testing and will not add any extra value to this particular case.
A well prepared data-set provides here much better results, when we
actually check only compile time reactions, and not run-time.
Though, eventually this approach might be modified for fuzzing of this part
of code, and there randomization will be necessary, but not today.
:
: 4. Why you test only literals? Nothing indicates that values from spaces
: shouldn't be tested here.
At the moment of writing this patchset I've not yet figured out how to
conveniently deal with generated fields of expected types. Now, for implicit
conversion type I do have easy approach, and will add this case for run-time
explicit conversion of different type expressions.
:
: >
: > Relates to #5910, #6009
: > Part of #4407
: 5. Wrong issue number. Also, I still think that "relates to" is not
: needed here.
Yup thanks for the catch about 4470! [But I'd still keep relation to
consistent-type spec - that's part of larger work]
:
: > ---
: > test/sql-tap/e_casts.test.lua | 355 ++++++++++++++++++++++++++++++++++
: > 1 file changed, 355 insertions(+)
: > create mode 100755 test/sql-tap/e_casts.test.lua
: >
: > diff --git a/test/sql-tap/e_casts.test.lua b/test/sql-tap/e_casts.test.lua
: > new file mode 100755
: > index 000000000..eafebd5bc
: > --- /dev/null
: > +++ b/test/sql-tap/e_casts.test.lua
: > @@ -0,0 +1,355 @@
: > +#!/usr/bin/env tarantool
: > +local test = require("sqltester")
: 6. What point to use sqltester here is you do not use its functions? You
: can use
: 'tap' instead.
Good point - sqltester is not used here.
:
: > +test:plan(222)
: > +
: > +local yaml = require("yaml")
: > +local ffi = require("ffi")
: > +
: > +local verbose = 0
: > +
: > +if arg[1] == '-v' or arg[1] == '--verbose' then
: 7. How to use '-v' option?
:
: > + verbose = 1
: > +end
: > +
: > +ffi.cdef [[
: > + enum field_type {
: > + FIELD_TYPE_ANY = 0,
: > + FIELD_TYPE_UNSIGNED,
: > + FIELD_TYPE_STRING,
: > + FIELD_TYPE_NUMBER,
: > + FIELD_TYPE_DOUBLE,
: > + FIELD_TYPE_INTEGER,
: > + FIELD_TYPE_BOOLEAN,
: > + FIELD_TYPE_VARBINARY,
: > + FIELD_TYPE_SCALAR,
: > + FIELD_TYPE_DECIMAL,
: > + FIELD_TYPE_UUID,
: > + FIELD_TYPE_ARRAY,
: > + FIELD_TYPE_MAP,
: > + field_type_MAX
: > + };
: > +]]
: 8. Is there any point to use ffi here? Can't you use something simpler?
: For example:
: local types = {
: any = 0,
: unsigned = 1,
: ...
:
: Than instead of t_any you can use types.any and so on.
Originally the point was to just copy-paste the definition from Tarantool
code as-is, without any massaging or modifications.
[Ideally it should be extracted from the actual code src/box/field_def.h code.
And I'll probably do so, eventually, because field_def.h is prepared for such
extraction, and this place is marked with /** \cond public */
/** \endcond public */ boundaries, so enum could be easily extracted]
:
: If you start from 1 instead of 0 you can also use type_names instead of
: proper_order.
At the moment I don't consider that getting rid of ffi.cdef will simplify
anything or provide any extra value.
:
: > +
: > +-- Date/time/interval types to be uncommented and used
: > +-- once corresponding box implementation completed
: > +local t_any = ffi.C.FIELD_TYPE_ANY
: > +local t_unsigned = ffi.C.FIELD_TYPE_UNSIGNED
: > +local t_string = ffi.C.FIELD_TYPE_STRING
: > +local t_number = ffi.C.FIELD_TYPE_NUMBER
: > +local t_double = ffi.C.FIELD_TYPE_DOUBLE
: > +local t_integer = ffi.C.FIELD_TYPE_INTEGER
: > +local t_boolean = ffi.C.FIELD_TYPE_BOOLEAN
: > +local t_varbinary = ffi.C.FIELD_TYPE_VARBINARY
: > +local t_scalar = ffi.C.FIELD_TYPE_SCALAR
: > +local t_decimal = ffi.C.FIELD_TYPE_DECIMAL
: > +-- local t_date = -1
: > +-- local t_time = -2
: > +-- local t_timestamp = -3
: > +-- local t_interval = -4
: > +local t_uuid = ffi.C.FIELD_TYPE_UUID
: > +local t_array = ffi.C.FIELD_TYPE_ARRAY
: > +local t_map = ffi.C.FIELD_TYPE_MAP
: > +
: > +local proper_order = {
: > + t_any,
: > + t_unsigned,
: > + t_string,
: > + t_double,
: > + t_integer,
: > + t_boolean,
: > + t_varbinary,
: > + t_number,
: > + t_decimal,
: > + t_uuid,
: > + -- t_date,
: > + -- t_time,
: > + -- t_timestamp,
: > + -- t_interval,
: 9. I think it is better to check them for "wrong type name" or remove these
: types.
I don't quite get what is "check for 'wrong type name'"? Could you please elaborate?
(Those unused type will be added in the nearest times, in any case)
:
: > + t_array,
: > + t_map,
: > + t_scalar,
: > +}
: > +
: > +local type_names = {
: > + [t_any] = 'any',
: > + [t_unsigned] = 'unsigned',
: > + [t_string] = 'string',
: > + [t_double] = 'double',
: > + [t_integer] = 'integer',
: > + [t_boolean] = 'boolean',
: > + [t_varbinary] = 'varbinary',
: > + [t_number] = 'number',
: > + [t_decimal] = 'decimal',
: > + [t_uuid] = 'uuid',
: > + -- [t_date] = 'date',
: > + -- [t_time] = 'time',
: > + -- [t_timestamp] = 'timestamp',
: > + -- [t_interval] = 'interval',
: > + [t_array] = 'array',
: > + [t_map] = 'map',
: > + [t_scalar] = 'scalar',
: > +}
: > +
: > +-- not all types implemented/enabled at the moment
: > +-- but we do keep their projected status in the
: > +-- spec table
: > +local enabled_type = {
: > + [t_any] = false, -- there is no way in SQL to instantiate ANY
: type expression
: > + [t_unsigned] = true,
: > + [t_string] = true,
: > + [t_double] = true,
: > + [t_integer] = true,
: > + [t_boolean] = true,
: > + [t_varbinary] = true,
: > + [t_number] = true,
: > + [t_decimal] = false,
: > + [t_uuid] = false,
: > + -- [t_date] = false,
: > + -- [t_time] = false,
: > + -- [t_timestamp]= false,
: > + -- [t_interval] = False,
: > + [t_array] = false,
: > + [t_map] = false,
: > + [t_scalar] = true,
: > +}
: > +
: > +-- table of _TSV_ (tab separated values)
: > +-- copied from sql-lua-tables-v5.xls // TNT implicit today
: > +local explicit_casts_table_spec = {
: > + [t_any] = {"Y", "S", "Y", "S", "S", "S", "S", "S", "S", "S", "S",
: "S", "S"},
: > + [t_unsigned]= {"Y", "Y", "Y", "Y", "Y", "" , "" , "Y", "Y", "" , "" ,
: "" , "Y"},
: > + [t_string] = {"Y", "S", "Y", "S", "S", "S", "Y", "S", "S", "S", "S",
: "S", "Y"},
: > + [t_double] = {"Y", "S", "Y", "Y", "S", "" , "" , "Y", "Y", "" , "" ,
: "" , "Y"},
: > + [t_integer] = {"Y", "S", "Y", "Y", "Y", "" , "" , "Y", "Y", "" , "" ,
: "" , "Y"},
: > + [t_boolean] = {"Y", "" , "Y", "" , "" , "Y", "" , "" , "" , "" , "" ,
: "" , "Y"},
: > + [t_varbinary]={"Y", "" , "Y", "N", "" , "" , "Y", "" , "" , "S", "" ,
: "" , "Y"},
: > + [t_number] = {"Y", "S", "Y", "Y", "S", "" , "" , "Y", "Y", "" , "" ,
: "" , "Y"},
: > + [t_decimal] = {"Y", "S", "Y", "S", "S", "" , "" , "Y", "Y", "" , "" ,
: "" , "Y"},
: > + [t_uuid] = {"Y", "" , "Y", "" , "" , "" , "Y", "" , "" , "Y", "" ,
: "" , "Y"},
: > + [t_array] = {"Y", "N", "Y", "N", "N", "N", "" , "" , "" , "" , "Y",
: "" , "N"},
: > + [t_map] = {"Y", "N", "Y", "N", "N", "N", "" , "" , "" , "" , "" ,
: "Y", "N"},
: > + [t_scalar] = {"Y", "S", "Y", "S", "S", "S", "S", "S", "S", "S", "" ,
: "" , "Y"},
: > +}
: 10. Why there is 'N' and '' both in table? Can't be make it use only one
: of these?
Because this is actual copy-paste from original xls on which RFC was based on.
:)
[See attachments in https://github.com/tsafin/tarantool/wiki/SQL-Lua-Consistent-Type-Specification]
:
: > +
: > +-- local extra_checks = false
: > +local explicit_casts = {}
: > +-- local implicit_casts = {}
: > +
: > +-- implicit conversion table is considered consistent if
: > +-- it's sort of symmetric against diagonal
: > +-- (not necessary that always/sometimes are matching
: > +-- but at least something should be presented)
: > +
: > +--[[ local function check_table_consistency(table)
: > + for _, i in ipairs(proper_order) do
: > + local string = ''
: > + for _, j in ipairs(proper_order) do
: > + print(i, j)
: > + -- local item = implicit_casts[i][j]
: > + -- string = string .. (xlat[item] or ' ')
: > + end
: > + print(string)
: > + end
: > +end
: > +]]
: > +
: > + -- if there is enabled extra checks then check ocnsistency of input
: tables
: > + -- just to make sure their sanity
: > +--[[ if extra_checks then
: > + check_table_consistency()
: > + end
: > + ]]
: > +
: 11. Please drop commented part of code.
This is cleaned up in the current version, including implicit casts support,
see the next version of patchset.
:
: > +local c_no = 0
: > +local c_maybe = 1
: > +local c_yes = 2
: > +
: > +local function normalize_cast(v)
: > + local xlat = {
: > + ['Y'] = c_yes,
: > + ['S'] = c_maybe,
: > + ['N'] = c_no,
: > + }
: > + return xlat[v ~= nil and v or 'N']
: > +end
: > +
: > +local function human_cast(v)
: > + local xlat = {
: > + [c_yes] = 'Y',
: > + [c_maybe] = 'S',
: > + [c_no] = ' '
: > + }
: > + return xlat[v ~= nil and v or c_no]
: > +end
: > +
: > +local function load_casts_spec(spec_table)
: > + local casts = {}
: > + for i, t_from in ipairs(proper_order) do
: > + local row = spec_table[t_from]
: > + casts[t_from] = {}
: > + for j, t_to in ipairs(proper_order) do
: > + if enabled_type[t_from] and enabled_type[t_to] then
: > + casts[t_from][t_to] =
: normalize_cast(spec_table[t_from][j])
: > + end
: > + end
: > + end
: > + -- if there is enabled extra checks then check ocnsistency of input
: tables
: > + -- just to make sure their sanity
: > +--[[ if extra_checks then
: > + check_table_consistency()
: > + end ]]
: > +
: > + return casts
: > +end
: > +
: > +explicit_casts = load_casts_spec(explicit_casts_table_spec)
: > +
: > +if verbose > 0 then
: > + local function show_casts_table(table)
: 12. I believe word 'table' is reserved for in Lua. Not sure that it is good
: idea to overwrite this reserved word.
Apparently it's not reserved and we could reuse it as identifier in
local context. But thanks for reminding - that would be very confusing
the moment I'd need to use table.insert or something. (Strange that luacheck
which is always activated in my vscode editor did not complain here a lot)
:
: > + local max_len = #"12. varbinary" + 1
: > +
: > + -- show banner
: > + local col_names = ''
: > + for i, t_val in ipairs(proper_order) do
: > + col_names = col_names .. string.format("%2d |", t_val)
: > + end
: > + col_names = string.sub(col_names, 1, #col_names-1)
: > + print(string.format("%"..max_len.."s|%s|", "", col_names))
: > + -- show splitter
: > + local banner = '+---+---+---+---+---+---+---+---+---+---+---+---
: +---+'
: > + print(string.format("%"..max_len.."s%s", "", banner))
: > +
: > + for i, from in ipairs(proper_order) do
: > + local line = ''
: > + for j, to in ipairs(proper_order) do
: > + line = line .. string.format("%2s |",
: human_cast(table[from][to]))
: > + end
: > + line = string.sub(line, 1, #line-1)
: > + local s = string.format("%2d.%10s |%s|", from,
: type_names[from], line)
: > + print(s)
: > + end
: > + print(string.format("%"..max_len.."s%s", "", banner))
: > + end
: > +
: > + show_casts_table(explicit_casts)
: > +end
: > +
: > +local function merge_tables(...)
: > + local n = select('#', ...)
: > + local tables = {...}
: > + local result = {}
: > +
: > + for i=1,n do
: > + local t = tables[i]
: > + --print(yaml.encode(t))
: > + assert(type(tables[i]) == 'table')
: > + for j,v in pairs(t) do
: > + table.insert(result, v)
: > + end
: > + end
: > + return result
: > +end
: > +
: > +local gen_type_samples = {
: > + [t_unsigned] = {"0", "1", "2"},
: > + [t_integer] = {"-678", "-1", "0", "1", "2", "3", "678"},
: > + [t_double] = {"0.0", "123.4", "-567.8"},
: > + [t_string] = {"''", "'1'", "'abc'", "'def'", "'TRUE'",
: "'FALSE'"},
: > + [t_boolean] = {"false", "true", "null"},
: > + [t_varbinary] = {"X'312E3233'", "X'2D392E3837'",
: "X'302E30303031'"},
: > +}
: > +
: > +local function gen_type_exprs(type)
: > + if type == t_number then
: > + return merge_tables(gen_type_samples[t_unsigned],
: > + gen_type_samples[t_integer],
: > + gen_type_samples[t_double])
: > + end
: > + if type == t_scalar then
: > + return merge_tables(gen_type_samples[t_unsigned],
: > + gen_type_samples[t_integer],
: > + gen_type_samples[t_double],
: > + gen_type_samples[t_string],
: > + gen_type_samples[t_boolean],
: > + gen_type_samples[t_varbinary])
: > + end
: > + return gen_type_samples[type] or {}
: > +end
: 13. Do you need this function? You can just expand gen_type_samples
: using its
: code.
Function allows to avoid copy-paste for values from meta types such as
Number and scalar.
:
: This comment can be implemented for some other functions, for example
: the next
: one. I think it will make code simpler to understand.
[If I correctly understood your sentence] Nope, copy paste will make
code much more verbose, and less observable.
:
: > +
: > +local function gen_sql_cast_from_to(t_from, t_to)
: > + local queries = {}
: > + local from_exprs = gen_type_exprs(t_from)
: > + local to_typename = type_names[t_to]
: > + for _, expr in pairs(from_exprs) do
: > + local query = string.format([[ select cast(%s as %s); ]], expr,
: to_typename)
: > + table.insert(queries, query)
: > + end
: > + return queries
: > +end
: > +
: > +local function catch_query(query)
: > + local result = {pcall(box.execute, query)}
: > +
: > + if not result[1] or result[3] ~= nil then
: > + return false, result[3]
: > + end
: > + return true, result[2]
: > +end
: > +
: > +local function label_for(from, to, query)
: > + local parent_frame = debug.getinfo(2, "nSl")
: > + local filename = parent_frame.source:sub(1,1) == "@" and
: parent_frame.source:sub(2)
: > + local line = parent_frame.currentline
: > + return string.format("%s+%d:[%s,%s] %s", filename, line,
: > + type_names[from], type_names[to], query)
: > +end
: > +
: > +for i, from in ipairs(proper_order) do
: > + for j, to in ipairs(proper_order) do
: > + -- skip ANY, DECIMAL, UUID, etc.
: > + if enabled_type[from] and enabled_type[to] then
: > + local cell = explicit_casts[from][to]
: > + local gen = gen_sql_cast_from_to(from, to)
: > + local failures = {}
: > + local successes = {}
: > + local castable = false
: > + local expected = explicit_casts[from][to]
: > + if verbose > 0 then
: > + print(expected, yaml.encode(gen))
: > + end
: > + for i, v in pairs(gen) do
: > + local ok, result
: > + ok, result = catch_query(v)
: > + if verbose > 0 then
: > + print(string.format("ok = %s, result = %s, query =
: %s",
: > + ok, result, v))
: > +
: > + end
: > + -- print(v, 'ok'..yaml.encode(ok),
: 'result'..yaml.encode(result))
: > + if expected == c_yes then
: > + test:ok(true == ok, label_for(from, to, v))
: > + elseif expected == c_no then
: > + test:ok(false == ok, label_for(from, to, v))
: > + else
: > + -- we can't report immediately for c_maybe because some
: > + -- cases allowed to fail, so postpone decision
: > + if ok then
: > + castable = true
: > + table.insert(successes, {result, v})
: > + else
: > + table.insert(failures, {result, v})
: > + end
: > + end
: > + end
: > +
: > + -- ok, we aggregated stats for c_maybe mode - check it now
: > + if expected == c_maybe then
: > + test:ok(castable, label_for(from, to, #gen and gen[1]
: or ''),
: > + failures)
: > + end
: > + end
: > + end
: > +end
: > +
: > +
: > +test:finish_test()
: > --
: > 2.29.2
: >
More information about the Tarantool-patches
mailing list