From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rhino.ch-server.com (rhino.ch-server.com [209.59.190.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 7E08C469719 for ; Tue, 29 Sep 2020 22:22:26 +0300 (MSK) References: <66362762-8791-bea3-745f-afc1e3eaa199@tarantool.org> <9a788a90-f558-fc6c-1d28-2813e8b721f8@ocelot.ca> <57513bb5-3a4d-0c3c-720e-7e78634ecfe1@tarantool.org> From: Peter Gulutzan Message-ID: Date: Tue, 29 Sep 2020 13:22:20 -0600 MIME-Version: 1.0 In-Reply-To: <57513bb5-3a4d-0c3c-720e-7e78634ecfe1@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-US Subject: Re: [Tarantool-discussions] SQL built-in functions position List-Id: Tarantool development process List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-discussions@dev.tarantool.org Hi, On 2020-09-28 2:07 p.m., Vladislav Shpilevoy wrote: > Hi! > > See my response in another email with 4 big reasons why > storage of SQL-specific functions in _func is a bad idea. > > Also see responses on your comments in separate sections. > I leave references below. I cannot argue against your arguments about implementation, and I acknowledged that using _func is not necessary, but I extract a few things that you said that seem a bit odd to me. > Firstly, if a function has variable argument count, why the hell should they all > be of the same type? It is a silly restriction motivated entirely by SQL > specifics like SUM() taking all arguments of the same type, or LEAST(), or GREATEST() > and so on. That is very-SQL thing. I can't imagine when such a restriction may > be needed in other supported languages. I can say CREATE TABLE t (s1 SCALAR PRIMARY KEY); INSERT INTO t VALUES (1), (CAST(2 AS UNSIGNED)), (3e3); SELECT SUM(s1) FROM t; That is, data types can differ provided that they are numbers. I see that as a restriction that "makes sense". > For example, PRINTF(). I acknowledge that PRINTF() is ugly but it is different from other functions. It is not a "very-SQL thing". It happened to be in SQLite. https://sqlite.org/printf.html So I believe that its flaws do not prove something about other functions. > Some of the functions are supposed to be used for aggregated values ... When I wrote, I was only thinking about built-in scalar functions. AVG COUNT GROUP_CONCAT MAX MIN SUM TOTAL are in _func, but I thought they could be excluded from SELECT requests. (Apparently I was wrong, as I'll explain below.) > If a function is sandboxed, so it does not affect other users, I don't see why > would it need restrictions. All built-ins are sandboxed. Not all, I mentioned an exception (in a private issue). And I did mention that "RANDOMBLOB with a huge value" might not be harmless. But in my email I was saying that "in future maybe there will be built-in functions that should require privileges". You asked for an example, but: if I come up with any example that doesn't work in a sandbox, would you not reply that it doesn't need to be built in? > ## Users benefit from seeing SQL-specific functions in _func? - No. True, but not fair. A programmer who writes a client program can write something that accesses _func and displays something that users can comprehend. And information_schema.routines will probably be something that reads _func and displays as an SQL table. > So what exactly are the SQL built-in functions so much needed in Lua > and C? Looking at the unreadable _func output below, I can't imagine > why somebody need any of these functions out of SQL. You are looking at the date/time functions, which, again, are from SQLite. My memory is vague but I understood K. Osipov did not want them documented until we were sure we wanted users to use them. I had a quick look at what is in _func in version 2.5 ... tarantool>SELECT "name", "language", "is_sandboxed", "aggregate" >FROM "_func" >ORDER BY "name"; OK 67 rows selected (0.0 seconds) +----------------------+-------------+--------------+-----------+ | name                 | language    | is_sandboxed | aggregate | +----------------------+-------------+--------------+-----------+ | ABS                  | SQL_BUILTIN | FALSE        | none      | | AVG                  | SQL_BUILTIN | FALSE        | none      | | CEIL                 | SQL_BUILTIN | FALSE        | none      | | CEILING              | SQL_BUILTIN | FALSE        | none      | | CHAR                 | SQL_BUILTIN | FALSE        | none      | | CHARACTER_LENGTH     | SQL_BUILTIN | FALSE        | none      | | CHAR_LENGTH          | SQL_BUILTIN | FALSE        | none      | | COALESCE             | SQL_BUILTIN | FALSE        | none      | | COUNT                | SQL_BUILTIN | FALSE        | none      | | CURRENT_DATE         | SQL_BUILTIN | FALSE        | none      | | CURRENT_TIME         | SQL_BUILTIN | FALSE        | none      | | CURRENT_TIMESTAMP    | SQL_BUILTIN | FALSE        | none      | | DATE                 | SQL_BUILTIN | FALSE        | none      | | DATETIME             | SQL_BUILTIN | FALSE        | none      | | EVERY                | SQL_BUILTIN | FALSE        | none      | | EXISTS               | SQL_BUILTIN | FALSE        | none      | | EXP                  | SQL_BUILTIN | FALSE        | none      | | EXTRACT              | SQL_BUILTIN | FALSE        | none      | | FLOOR                | SQL_BUILTIN | FALSE        | none      | | GREATER              | SQL_BUILTIN | FALSE        | none      | | GREATEST             | SQL_BUILTIN | FALSE        | none      | | GROUP_CONCAT         | SQL_BUILTIN | FALSE        | none      | | HEX                  | SQL_BUILTIN | FALSE        | none      | | IFNULL               | SQL_BUILTIN | FALSE        | none      | | JULIANDAY            | SQL_BUILTIN | FALSE        | none      | | LEAST                | SQL_BUILTIN | FALSE        | none      | | LENGTH               | SQL_BUILTIN | FALSE        | none      | | LESSER               | SQL_BUILTIN | FALSE        | none      | | LIKE                 | SQL_BUILTIN | FALSE        | none      | | LIKELIHOOD           | SQL_BUILTIN | FALSE        | none      | | LIKELY               | SQL_BUILTIN | FALSE        | none      | | LN                   | SQL_BUILTIN | FALSE        | none      | | LOWER                | SQL_BUILTIN | FALSE        | none      | | LUA                  | LUA         | FALSE        | none      | | MAX                  | SQL_BUILTIN | FALSE        | none      | | MIN                  | SQL_BUILTIN | FALSE        | none      | | MOD                  | SQL_BUILTIN | FALSE        | none      | | NULLIF               | SQL_BUILTIN | FALSE        | none      | | OCTET_LENGTH         | SQL_BUILTIN | FALSE        | none      | | POSITION             | SQL_BUILTIN | FALSE        | none      | | POWER                | SQL_BUILTIN | FALSE        | none      | | PRINTF               | SQL_BUILTIN | FALSE        | none      | | QUOTE                | SQL_BUILTIN | FALSE        | none      | | RANDOM               | SQL_BUILTIN | FALSE        | none      | | RANDOMBLOB           | SQL_BUILTIN | FALSE        | none      | | REPLACE              | SQL_BUILTIN | FALSE        | none      | | ROUND                | SQL_BUILTIN | FALSE        | none      | | ROW_COUNT            | SQL_BUILTIN | FALSE        | none      | | SOME                 | SQL_BUILTIN | FALSE        | none      | | SOUNDEX              | SQL_BUILTIN | FALSE        | none      | | SQRT                 | SQL_BUILTIN | FALSE        | none      | | STRFTIME             | SQL_BUILTIN | FALSE        | none      | | SUBSTR               | SQL_BUILTIN | FALSE        | none      | | SUM                  | SQL_BUILTIN | FALSE        | none      | | TIME                 | SQL_BUILTIN | FALSE        | none      | | TOTAL                | SQL_BUILTIN | FALSE        | none      | | TRIM                 | SQL_BUILTIN | FALSE        | none      | | TYPEOF               | SQL_BUILTIN | FALSE        | none      | | UNICODE              | SQL_BUILTIN | FALSE        | none      | | UNLIKELY             | SQL_BUILTIN | FALSE        | none      | | UPPER                | SQL_BUILTIN | FALSE        | none      | | VERSION              | SQL_BUILTIN | FALSE        | none      | | ZEROBLOB             | SQL_BUILTIN | FALSE        | none      | | _sql_stat_get        | SQL_BUILTIN | FALSE        | none      | | _sql_stat_init       | SQL_BUILTIN | FALSE        | none      | | _sql_stat_push       | SQL_BUILTIN | FALSE        | none      | | box.schema.user.info | LUA         | FALSE        | none      | +----------------------+-------------+--------------+-----------+ Well, this doesn't look entirely right, and that I guess means that you are right -- at this moment. I do not understand why is_sandboxed is always false. I do not understand why aggregate is none for aggregate functions. I have to acknowledge that many of these functions are undocumented deliberately and we might not want users to seem them, think that we will always support them, and add them to applications. (Include: CEIL/CEILING CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP DATE DATETIME EVERY EXP EXTRACT FLOOR GREATER JULIANDAY LESSER LN MOD POWER SOME STRFTIME TIME _sql_stat_get sql_stat_init sql_stat_push box.schema.user.info.) However, that only means that at this moment the information is bad. It does not mean that the information should not exist. I admit that you made points that I hadn't realized, but stubbornly still stick with a belief in _func. Peter Gulutzan