[Tarantool-discussions] SQL built-in functions position

Peter Gulutzan pgulutzan at ocelot.ca
Tue Sep 29 22:22:20 MSK 2020


Hi,

On 2020-09-28 2:07 p.m., Vladislav Shpilevoy wrote:
> Hi!
>
> See my response in another email with 4 big reasons why
> storage of SQL-specific functions in _func is a bad idea.
>
> Also see responses on your comments in separate sections.
> I leave references below.


I cannot argue against your arguments about implementation,
and I acknowledged that using _func is not necessary,
but I extract a few things that you said that seem a bit odd to me.

 > Firstly, if a function has variable argument count, why the hell 
should they all
 > be of the same type? It is a silly restriction motivated entirely by SQL
 > specifics like SUM() taking all arguments of the same type, or 
LEAST(), or GREATEST()
 > and so on. That is very-SQL thing. I can't imagine when such a 
restriction may
 > be needed in other supported languages.

I can say
CREATE TABLE t (s1 SCALAR PRIMARY KEY);
INSERT INTO t VALUES (1), (CAST(2 AS UNSIGNED)), (3e3);
SELECT SUM(s1) FROM t;
That is, data types can differ provided that they are numbers.
I see that as a restriction that "makes sense".

 > For example, PRINTF().

I acknowledge that PRINTF() is ugly but it is different from other 
functions.
It is not a "very-SQL thing". It happened to be in SQLite.
https://sqlite.org/printf.html
So I believe that its flaws do not prove something about other functions.

 > Some of the functions are supposed to be used for aggregated values ...

When I wrote, I was only thinking about built-in scalar functions.
AVG COUNT GROUP_CONCAT MAX MIN SUM TOTAL are in _func,
but I thought they could be excluded from SELECT requests.
(Apparently I was wrong, as I'll explain below.)

 > If a function is sandboxed, so it does not affect other users, I 
don't see why
 > would it need restrictions. All built-ins are sandboxed.

Not all, I mentioned an exception (in a private issue).
And I did mention that "RANDOMBLOB with a huge value" might not be harmless.
But in my email I was saying that "in future maybe there will be built-in
functions that should require privileges". You asked for an example, but:
if I come up with any example that doesn't work in a sandbox, would you not
reply that it doesn't need to be built in?

 > ## Users benefit from seeing SQL-specific functions in _func? - No.

True, but not fair. A programmer who writes a client program can write
something that accesses _func and displays something that users can
comprehend. And information_schema.routines will probably be something
that reads _func and displays as an SQL table.

 > So what exactly are the SQL built-in functions so much needed in Lua
 > and C? Looking at the unreadable _func output below, I can't imagine
 > why somebody need any of these functions out of SQL.

You are looking at the date/time functions, which, again, are from SQLite.
My memory is vague but I understood K. Osipov did not want them documented
until we were sure we wanted users to use them.

I had a quick look at what is in _func in version 2.5 ...

tarantool>SELECT "name", "language", "is_sandboxed", "aggregate"
 >FROM "_func"
 >ORDER BY "name";
OK 67 rows selected (0.0 seconds)
+----------------------+-------------+--------------+-----------+
| name                 | language    | is_sandboxed | aggregate |
+----------------------+-------------+--------------+-----------+
| ABS                  | SQL_BUILTIN | FALSE        | none      |
| AVG                  | SQL_BUILTIN | FALSE        | none      |
| CEIL                 | SQL_BUILTIN | FALSE        | none      |
| CEILING              | SQL_BUILTIN | FALSE        | none      |
| CHAR                 | SQL_BUILTIN | FALSE        | none      |
| CHARACTER_LENGTH     | SQL_BUILTIN | FALSE        | none      |
| CHAR_LENGTH          | SQL_BUILTIN | FALSE        | none      |
| COALESCE             | SQL_BUILTIN | FALSE        | none      |
| COUNT                | SQL_BUILTIN | FALSE        | none      |
| CURRENT_DATE         | SQL_BUILTIN | FALSE        | none      |
| CURRENT_TIME         | SQL_BUILTIN | FALSE        | none      |
| CURRENT_TIMESTAMP    | SQL_BUILTIN | FALSE        | none      |
| DATE                 | SQL_BUILTIN | FALSE        | none      |
| DATETIME             | SQL_BUILTIN | FALSE        | none      |
| EVERY                | SQL_BUILTIN | FALSE        | none      |
| EXISTS               | SQL_BUILTIN | FALSE        | none      |
| EXP                  | SQL_BUILTIN | FALSE        | none      |
| EXTRACT              | SQL_BUILTIN | FALSE        | none      |
| FLOOR                | SQL_BUILTIN | FALSE        | none      |
| GREATER              | SQL_BUILTIN | FALSE        | none      |
| GREATEST             | SQL_BUILTIN | FALSE        | none      |
| GROUP_CONCAT         | SQL_BUILTIN | FALSE        | none      |
| HEX                  | SQL_BUILTIN | FALSE        | none      |
| IFNULL               | SQL_BUILTIN | FALSE        | none      |
| JULIANDAY            | SQL_BUILTIN | FALSE        | none      |
| LEAST                | SQL_BUILTIN | FALSE        | none      |
| LENGTH               | SQL_BUILTIN | FALSE        | none      |
| LESSER               | SQL_BUILTIN | FALSE        | none      |
| LIKE                 | SQL_BUILTIN | FALSE        | none      |
| LIKELIHOOD           | SQL_BUILTIN | FALSE        | none      |
| LIKELY               | SQL_BUILTIN | FALSE        | none      |
| LN                   | SQL_BUILTIN | FALSE        | none      |
| LOWER                | SQL_BUILTIN | FALSE        | none      |
| LUA                  | LUA         | FALSE        | none      |
| MAX                  | SQL_BUILTIN | FALSE        | none      |
| MIN                  | SQL_BUILTIN | FALSE        | none      |
| MOD                  | SQL_BUILTIN | FALSE        | none      |
| NULLIF               | SQL_BUILTIN | FALSE        | none      |
| OCTET_LENGTH         | SQL_BUILTIN | FALSE        | none      |
| POSITION             | SQL_BUILTIN | FALSE        | none      |
| POWER                | SQL_BUILTIN | FALSE        | none      |
| PRINTF               | SQL_BUILTIN | FALSE        | none      |
| QUOTE                | SQL_BUILTIN | FALSE        | none      |
| RANDOM               | SQL_BUILTIN | FALSE        | none      |
| RANDOMBLOB           | SQL_BUILTIN | FALSE        | none      |
| REPLACE              | SQL_BUILTIN | FALSE        | none      |
| ROUND                | SQL_BUILTIN | FALSE        | none      |
| ROW_COUNT            | SQL_BUILTIN | FALSE        | none      |
| SOME                 | SQL_BUILTIN | FALSE        | none      |
| SOUNDEX              | SQL_BUILTIN | FALSE        | none      |
| SQRT                 | SQL_BUILTIN | FALSE        | none      |
| STRFTIME             | SQL_BUILTIN | FALSE        | none      |
| SUBSTR               | SQL_BUILTIN | FALSE        | none      |
| SUM                  | SQL_BUILTIN | FALSE        | none      |
| TIME                 | SQL_BUILTIN | FALSE        | none      |
| TOTAL                | SQL_BUILTIN | FALSE        | none      |
| TRIM                 | SQL_BUILTIN | FALSE        | none      |
| TYPEOF               | SQL_BUILTIN | FALSE        | none      |
| UNICODE              | SQL_BUILTIN | FALSE        | none      |
| UNLIKELY             | SQL_BUILTIN | FALSE        | none      |
| UPPER                | SQL_BUILTIN | FALSE        | none      |
| VERSION              | SQL_BUILTIN | FALSE        | none      |
| ZEROBLOB             | SQL_BUILTIN | FALSE        | none      |
| _sql_stat_get        | SQL_BUILTIN | FALSE        | none      |
| _sql_stat_init       | SQL_BUILTIN | FALSE        | none      |
| _sql_stat_push       | SQL_BUILTIN | FALSE        | none      |
| box.schema.user.info | LUA         | FALSE        | none      |
+----------------------+-------------+--------------+-----------+

Well, this doesn't look entirely right, and that I guess means
that you are right -- at this moment.
I do not understand why is_sandboxed is always false.
I do not understand why aggregate is none for aggregate functions.
I have to acknowledge that many of these functions are undocumented
deliberately and we might not want users to seem them, think that
we will always support them, and add them to applications.
(Include: CEIL/CEILING CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP
DATE DATETIME EVERY EXP EXTRACT FLOOR GREATER JULIANDAY LESSER LN
MOD POWER SOME STRFTIME TIME _sql_stat_get sql_stat_init
sql_stat_push box.schema.user.info.)
However, that only means that at this moment the information is bad.
It does not mean that the information should not exist.

I admit that you made points that I hadn't realized,
but stubbornly still stick with a belief in _func.

Peter Gulutzan



More information about the Tarantool-discussions mailing list