Tarantool discussions archive
 help / color / mirror / Atom feed
* [Tarantool-discussions] SQL built-in functions position
@ 2020-09-27 15:18 Mergen Imeev
  2020-09-27 20:56 ` Peter Gulutzan
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Mergen Imeev @ 2020-09-27 15:18 UTC (permalink / raw)
  To: Peter Gulutzan, Vladislav Shpilevoy, Nikita Pettik, kyukhin,
	tsafin, sergos
  Cc: tarantool-discussions

Hi all. I have a question that I would like to discuss.

The question is about SQL built-in functions. At the moment these 
functions are
partially described in _func and partially in src/box/sql/func.c. I 
received two
completely different suggestions from my reviewers on what to do with these
functions:
1) Move definitions completely to _func. Remove definitions from func.c.
2) Move definitions completely to func.c. Remove definitions from _func.

In the first case, users will be able to see the function definitions. 
Also, in
the future, we may allow these functions to be called from Lua (although not
sure if this is necessary). The main idea is 'all functions have the same
interface'.

In the second case, the implementation is simpler, and we can more easily
implement some features, such as "virtual" functions. For users, the 
definition
can only be seen in the documentation. The main idea is 'SQL built-in 
functions
are part of SQL'.

Which of these approaches do you think is more beneficial to us?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-discussions] SQL built-in functions position
  2020-09-27 15:18 [Tarantool-discussions] SQL built-in functions position Mergen Imeev
@ 2020-09-27 20:56 ` Peter Gulutzan
  2020-09-28 20:07   ` Vladislav Shpilevoy
  2020-09-28 18:19 ` Nikita Pettik
  2020-09-28 20:07 ` Vladislav Shpilevoy
  2 siblings, 1 reply; 10+ messages in thread
From: Peter Gulutzan @ 2020-09-27 20:56 UTC (permalink / raw)
  To: Mergen Imeev, Vladislav Shpilevoy, Nikita Pettik, kyukhin,
	tsafin, sergos
  Cc: tarantool-discussions


Hi,

On 2020-09-27 9:18 a.m., Mergen Imeev wrote:
 > Hi all. I have a question that I would like to discuss.
 >
 > The question is about SQL built-in functions. At the moment these 
functions are
 > partially described in _func and partially in src/box/sql/func.c. I 
received two
 > completely different suggestions from my reviewers on what to do with 
these
 > functions:
 > 1) Move definitions completely to _func. Remove definitions from func.c.
 > 2) Move definitions completely to func.c. Remove definitions from _func.
 >
 > In the first case, users will be able to see the function 
definitions. Also, in
 > the future, we may allow these functions to be called from Lua 
(although not
 > sure if this is necessary). The main idea is 'all functions have the same
 > interface'.
 >
 > In the second case, the implementation is simpler, and we can more easily
 > implement some features, such as "virtual" functions. For users, the 
definition
 > can only be seen in the documentation. The main idea is 'SQL built-in 
functions
 > are part of SQL'.
 >
 > Which of these approaches do you think is more beneficial to us?
 >

I hope you will say _func.

The current built-in functions are harmless, except perhaps for 
RANDOMBLOB with a huge value.
However, in future maybe there will be built-in functions that should 
require privileges.
In that case, I believe, they will have to be in _func (and someday in 
_vfunc)
so that grant() will work for them.

I have tried to redirect the UPPER() function in order to violate 
security, thus:
"
tarantool> function UPPER(x) return x end
---
...
tarantool> box.schema.func.create('UPPER')
---
- error: Function 'UPPER' already exists
...
tarantool> box.schema.func.drop('UPPER')
---
- error: 'Can''t drop function 1: function is SQL built-in'
...
"
This is good behaviour and I think it works because UPPER() is in _func.

I did not document in the manual's SQL section that built-in functions will
be in _func, so removing them is not a regression from documented behaviour.
Also I acknowledge that they don't exist in MySQL/MariaDB
information_schema.routines.
But I think users benefit from being able to see them.


Peter Gulutzan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-discussions] SQL built-in functions position
  2020-09-27 15:18 [Tarantool-discussions] SQL built-in functions position Mergen Imeev
  2020-09-27 20:56 ` Peter Gulutzan
@ 2020-09-28 18:19 ` Nikita Pettik
  2020-09-28 20:07   ` Vladislav Shpilevoy
  2020-09-28 20:07 ` Vladislav Shpilevoy
  2 siblings, 1 reply; 10+ messages in thread
From: Nikita Pettik @ 2020-09-28 18:19 UTC (permalink / raw)
  To: Mergen Imeev; +Cc: tarantool-discussions

On 27 Sep 18:18, Mergen Imeev wrote:
> Hi all. I have a question that I would like to discuss.
> 
> The question is about SQL built-in functions. At the moment these functions
> are
> partially described in _func and partially in src/box/sql/func.c. I received
> two
> completely different suggestions from my reviewers on what to do with these
> functions:
> 1) Move definitions completely to _func. Remove definitions from func.c.

That's my proposal. It makes name collisions check simple, provides unified
interface to invoke built-in and non-built-in functions, allows to grant
and verify priveleges in the same way and so forth. Built-ins are already
declaraed in _func, so reverting this thing would result in another one
unnecessary schema change and upgrade (so I doubt that implementation would
be somehow 'simpler'). Finally part of functions can turn out to be really
usefull in Lua someday such as date()/time(). So to me the choice is kind
of obvious..

> 2) Move definitions completely to func.c. Remove definitions from _func.
> 
> In the first case, users will be able to see the function definitions. Also,
> in
> the future, we may allow these functions to be called from Lua (although not
> sure if this is necessary). The main idea is 'all functions have the same
> interface'.
> 
> In the second case, the implementation is simpler, and we can more easily
> implement some features, such as "virtual" functions. For users, the
> definition
> can only be seen in the documentation. The main idea is 'SQL built-in
> functions
> are part of SQL'.
> 
> Which of these approaches do you think is more beneficial to us?
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-discussions] SQL built-in functions position
  2020-09-27 15:18 [Tarantool-discussions] SQL built-in functions position Mergen Imeev
  2020-09-27 20:56 ` Peter Gulutzan
  2020-09-28 18:19 ` Nikita Pettik
@ 2020-09-28 20:07 ` Vladislav Shpilevoy
  2020-10-01 14:46   ` Kirill Yukhin
  2 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-28 20:07 UTC (permalink / raw)
  To: Mergen Imeev, Peter Gulutzan, Nikita Pettik, kyukhin, tsafin, sergos
  Cc: tarantool-discussions

Hi!

On 27.09.2020 17:18, Mergen Imeev wrote:
> Hi all. I have a question that I would like to discuss.
> 
> The question is about SQL built-in functions. At the moment these functions are
> partially described in _func and partially in src/box/sql/func.c. I received two
> completely different suggestions from my reviewers on what to do with these
> functions:
> 1) Move definitions completely to _func. Remove definitions from func.c.
> 2) Move definitions completely to func.c. Remove definitions from _func.
> 
> In the first case, users will be able to see the function definitions. Also, in
> the future, we may allow these functions to be called from Lua (although not
> sure if this is necessary). The main idea is 'all functions have the same
> interface'.
> 
> In the second case, the implementation is simpler, and we can more easily
> implement some features, such as "virtual" functions. For users, the definition
> can only be seen in the documentation. The main idea is 'SQL built-in functions
> are part of SQL'.
> 
> Which of these approaches do you think is more beneficial to us?

I am the one who proposed the second option. I proposed it because of
several reasons.

==================================================
## Reason 1

The implementation is ugly by design. Since SQL functions are strictly typed,
and some of them may take more than one type, we were forced to implement some
kind of function overload by mangling function names. We are not trying to
implement C++ here, it goes against the _func basic schema.

To workaround that there were added a new ugly option: is_overloaded. Overloaded
means that there is still one function but declared twice, just to check types.
For example, LENGTH and LENGTH_VARBINARY, TRIM, TRIM_VARBINARY, and so on. That
leads to an issue, that we have not existing functions in _func. For example:

	tarantool> box.space._func.index.name:select({'LENGTH_VARBINARY'})
	---
	- - [68, 1, 'LENGTH_VARBINARY', 1, 'SQL_BUILTIN', '', 'function', ['varbinary'], 'integer',
	    'none', 'none', true, false, true, ['SQL'], {'is_overloaded': true}, '', '2020-08-14
	      16:27:52', '2020-08-14 16:27:52']
	...

	tarantool> box.execute("SELECT LENGTH_VARBINARY('abc')")
	---
	- null
	- Function 'LENGTH_VARBINARY' does not exist
	...

Doesn't this look bad? That is -1 point to the argument about 'it is good to
have function visible in _func'. They are visible, but it does not mean anything.

====================
## Reason 2

SQL has vararg functions - the ones which take unknown number of arguments. That
led to addition of one another ugly option: has_vararg. When it is true, all
arguments after last declared argument are forced to have the same type as the
last argument. This looks wrong from all possible sides.

Firstly, if a function has variable argument count, why the hell should they all
be of the same type? It is a silly restriction motivated entirely by SQL
specifics like SUM() taking all arguments of the same type, or LEAST(), or GREATEST()
and so on. That is very-SQL thing. I can't imagine when such a restriction may
be needed in other supported languages.

Secondly, like with the option 'is_overloaded' it complicates output of _func and
makes it harder for a user to create a function, when he sees so many irrelevant
options, needed for SQL only. Just take a look at the _func output in the end of
this email to see how bad is it [1].

Thirdly, it simply does not work for functions which are really vararg, and not
some kind of aggregates, ironically. For example, PRINTF(). To workaround the
workaround-option has_vararg such functions simply turn off their type checks by
making the last argument SCALAR. Well, this looks really bad. It would become a
bit simpler if we would split that option in two - has_vararg to make an
equivalent of ... in C with values of any type, and is_aggregate to check all
values have the same type. But still ugly.

====================
## Reason 3

SQL built-in functions are a part of the language. When we store them separately,
we risk to get SQL implementation different from the built-in SQL functions schema.
For example, we will add a new built-in function, and a user will upgrade, but
won't be able to use it until he upgrades the schema, and we will need to support
that in SQL implementation - that some built-in functions actually may not exist.

====================
## Reason 4

Some of the functions are supposed to be used for aggregated values, and this makes
their implementation not reusable in other languages. That in turn makes their presence
in the common _func space, used for all functions, irrelevant. I am talking about
SUM(), COUNT(), AVG(), probably there are more.

================================================================================
Now talking of the points I received about how good would it be to have these
functions in _func.

====================
## Built-in functions require privileges? - No.

Cite from Peter's email:

	The current built-in functions are harmless, except perhaps for RANDOMBLOB with a huge value.
	However, in future maybe there will be built-in functions that should require privileges.
	In that case, I believe, they will have to be in _func (and someday in _vfunc)
	so that grant() will work for them. 

Cite from Nikita's email:

	allows to grant and verify priveleges in the same way and so forth

What are examples of such built-in functions? Built-in functions are a part
of the language. Restricting access to SQL built-ins is like restricting access
to 'next()', 'string.upper()', 'os.time()' and other functions in Lua. Or
restricting usage of 'operator+' in C++. They are a part of the language. If
something needs restrictions, it does not seem to be a part of the language.
If a function is sandboxed, so it does not affect other users, I don't see why
would it need restrictions. All built-ins are sandboxed.

====================
## Built-in functions prevent duplicates in _func? - No.

Cite from Nikita's email:

	It makes name collisions check simple

This is a matter of a few code lines to add that check to _func triggers
to check new function name in all languages, if necessary. Or we could just make
the functions hash do the checking. So we won't even need to involve SQL-specific
code.

That is by the way questionable. I would think more if we want to forbid
definition of functions clashing with built-ins. That would be a 'legal' overload,
if we would look into _func first, and then into built-ins, but I don't really
know about that. Didn't think of it properly yet.

====================
## Storage in _func does not change _func schema and documentation? - No.

Cite from Peter's email:

	I did not document in the manual's SQL section that built-in functions will
	be in _func, so removing them is not a regression from documented behaviour. 

It is good that their presence in _func is not documented. Because even if
it would, and we would go for _func extension, we would change _func schema
and options anyway, because of the new options 'is_overloaded' and 'has_vararg',
and new fake functions with _VARBINARY suffix.

Cite from Nikita's email:

	Built-ins are already declaraed in _func, so reverting this thing would
	result in another one unnecessary schema change and upgrade (so I doubt that
	implementation would be somehow 'simpler')

This is also wrong. Both versions change _func. But the version with extending
_func makes it bigger, adds new ugly SQL-specific options, and adds not existing
functions with fake names. In the version about functions-as-part-of-language _func
is cleared from these functions, and nothing more happens with the schema. So the
patch will be simpler for sure, even though may get bigger due to more removals.

Besides, their removal from _func would also allow to get rid of the crutch with
language 'SQL_BUILTIN' in _func. It will be present in C, but won't exist in _func.
This is not a language. Languages are SQL and Lua. SQL_BUILTIN was a crutch from
the beginning, to make the functions work anyhow.

====================
## Users benefit from seeing SQL-specific functions in _func? - No.

Look at [1]. The output format is hardly readable. Not only because of the
new options (partially because of them), but also because 1) some functions
here don't really exist - LENGTH_VARBINARY, POSITION_VARBINARY, TRIM_VARBINARY,
SUBSTR_VARBINARY, 2) because of lots of other fields of _func, which make
the output super hard to read and understand what is what, even now.

_func is a dead end if someone wants to see function definitions. For that it
would be better to introduce a new pretty printer, somewhere in box.func maybe.

====================
## Storage in _func unifies functions? - _func output is unified, yes. Other things - no.

That is the only unification. Code still is going to separate them. By
introduction of the new _func options you just move the complexity from a
private place related to SQL only to the public space, related to the whole
_func and other languages.

====================
## Reuse SQL functions in Lua and other languages? - No.

Cite from Nikita's email:

	Finally part of functions can turn out to be really
	usefull in Lua someday such as date()/time()

It is not a secret, that all the languages we support already have
date/time functions. In Lua these are 'os.date', 'os.time', 'fiber.clock',
'fiber.time' and more. In C these are all the standard C functions such
as 'time()', 'gettimeofdat()', 'clock_gettime()', 'clock()', and more. Also
lots of functions for time formatting, but I don't remember their names.

So what exactly are the SQL built-in functions so much needed in Lua
and C? Looking at the unreadable _func output below, I can't imagine
why somebody need any of these functions out of SQL.

================================================================================
## References:

[1] Space _func select.

  - [2, 1, 'TRIM', 1, 'SQL_BUILTIN', '', 'function', ['string', 'unsigned', 'string'],
    'string', 'none', 'none', true, false, true, ['SQL'], {'has_overload': true, 'has_vararg': true},
    '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [3, 1, 'TYPEOF', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'string', 'none',
    'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [4, 1, 'PRINTF', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'string', 'none',
    'none', true, false, true, ['SQL'], {'has_vararg': true}, '', '2020-08-14 16:27:52',
    '2020-08-14 16:27:52']
  - [5, 1, 'UNICODE', 1, 'SQL_BUILTIN', '', 'function', ['string'], 'string', 'none',
    'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [6, 1, 'CHAR', 1, 'SQL_BUILTIN', '', 'function', ['unsigned'], 'string', 'none',
    'none', true, false, true, ['SQL'], {'has_vararg': true}, '', '2020-08-14 16:27:52',
    '2020-08-14 16:27:52']
  - [7, 1, 'HEX', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'string', 'none',
    'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [8, 1, 'VERSION', 1, 'SQL_BUILTIN', '', 'function', [], 'string', 'none', 'none',
    true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [9, 1, 'QUOTE', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'string', 'none',
    'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [10, 1, 'REPLACE', 1, 'SQL_BUILTIN', '', 'function', ['string', 'string', 'string'],
    'string', 'none', 'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52',
    '2020-08-14 16:27:52']
  - [11, 1, 'SUBSTR', 1, 'SQL_BUILTIN', '', 'function', ['string', 'integer', 'integer'],
    'string', 'none', 'none', true, false, true, ['SQL'], {'has_overload': true, 'has_vararg': true},
    '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [12, 1, 'GROUP_CONCAT', 1, 'SQL_BUILTIN', '', 'function', ['scalar', 'scalar'],
    'string', 'group', 'none', false, false, true, ['SQL'], {'has_vararg': true},
    '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [13, 1, 'JULIANDAY', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [14, 1, 'DATE', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none', false,
    false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [15, 1, 'TIME', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none', false,
    false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [16, 1, 'DATETIME', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [17, 1, 'STRFTIME', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [18, 1, 'CURRENT_TIME', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [19, 1, 'CURRENT_TIMESTAMP', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none',
    'none', false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [20, 1, 'CURRENT_DATE', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [21, 1, 'LENGTH', 1, 'SQL_BUILTIN', '', 'function', ['string'], 'integer', 'none',
    'none', true, false, true, ['SQL'], {'has_overload': true}, '', '2020-08-14 16:27:52',
    '2020-08-14 16:27:52']
  - [22, 1, 'POSITION', 1, 'SQL_BUILTIN', '', 'function', ['string', 'string'], 'integer',
    'none', 'none', true, false, true, ['SQL'], {'has_overload': true}, '', '2020-08-14
      16:27:52', '2020-08-14 16:27:52']
  - [23, 1, 'ROUND', 1, 'SQL_BUILTIN', '', 'function', ['double', 'unsigned'], 'double',
    'none', 'none', true, false, true, ['SQL'], {'has_vararg': true}, '', '2020-08-14
      16:27:52', '2020-08-14 16:27:52']
  - [24, 1, 'UPPER', 1, 'SQL_BUILTIN', '', 'function', ['string'], 'string', 'none',
    'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [25, 1, 'LOWER', 1, 'SQL_BUILTIN', '', 'function', ['string'], 'string', 'none',
    'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [26, 1, 'IFNULL', 1, 'SQL_BUILTIN', '', 'function', ['scalar', 'scalar'], 'scalar',
    'none', 'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [27, 1, 'RANDOM', 1, 'SQL_BUILTIN', '', 'function', [], 'integer', 'none', 'none',
    false, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [28, 1, 'CEIL', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none', false,
    false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [29, 1, 'CEILING', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [30, 1, 'CHARACTER_LENGTH', 1, 'SQL_BUILTIN', '', 'function', ['string'], 'integer',
    'none', 'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [31, 1, 'CHAR_LENGTH', 1, 'SQL_BUILTIN', '', 'function', ['string'], 'integer',
    'none', 'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [32, 1, 'FLOOR', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [33, 1, 'MOD', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none', false,
    false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [34, 1, 'OCTET_LENGTH', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [35, 1, 'ROW_COUNT', 1, 'SQL_BUILTIN', '', 'function', [], 'integer', 'none',
    'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [36, 1, 'COUNT', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'integer', 'group',
    'none', false, false, true, ['SQL'], {'has_vararg': true}, '', '2020-08-14 16:27:52',
    '2020-08-14 16:27:52']
  - [37, 1, 'LIKE', 1, 'SQL_BUILTIN', '', 'function', ['string', 'string', 'string'],
    'boolean', 'none', 'none', true, false, true, ['SQL'], {'has_vararg': true}, '',
    '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [38, 1, 'ABS', 1, 'SQL_BUILTIN', '', 'function', ['number'], 'number', 'none',
    'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [39, 1, 'EXP', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none', false,
    false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [40, 1, 'LN', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none', false,
    false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [41, 1, 'POWER', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [42, 1, 'SQRT', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none', false,
    false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [43, 1, 'SUM', 1, 'SQL_BUILTIN', '', 'function', ['number'], 'number', 'group',
    'none', false, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [44, 1, 'TOTAL', 1, 'SQL_BUILTIN', '', 'function', ['number'], 'number', 'group',
    'none', false, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [45, 1, 'AVG', 1, 'SQL_BUILTIN', '', 'function', ['number'], 'number', 'group',
    'none', false, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [46, 1, 'RANDOMBLOB', 1, 'SQL_BUILTIN', '', 'function', ['unsigned'], 'varbinary',
    'none', 'none', false, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [47, 1, 'NULLIF', 1, 'SQL_BUILTIN', '', 'function', ['scalar', 'scalar'], 'scalar',
    'none', 'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [48, 1, 'ZEROBLOB', 1, 'SQL_BUILTIN', '', 'function', ['unsigned'], 'varbinary',
    'none', 'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [49, 1, 'MIN', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'scalar', 'group',
    'none', false, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [50, 1, 'MAX', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'scalar', 'group',
    'none', false, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [51, 1, 'COALESCE', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'scalar', 'none',
    'none', true, false, true, ['SQL'], {'has_vararg': true}, '', '2020-08-14 16:27:52',
    '2020-08-14 16:27:52']
  - [52, 1, 'EVERY', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [53, 1, 'EXISTS', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [54, 1, 'EXTRACT', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [55, 1, 'SOME', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none', false,
    false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [56, 1, 'GREATER', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [57, 1, 'LESSER', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none', 'none',
    false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [58, 1, 'SOUNDEX', 1, 'SQL_BUILTIN', '', 'function', ['string'], 'string', 'none',
    'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [59, 1, 'LIKELIHOOD', 1, 'SQL_BUILTIN', '', 'function', ['scalar', 'double'],
    'scalar', 'none', 'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52',
    '2020-08-14 16:27:52']
  - [60, 1, 'LIKELY', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'scalar', 'none',
    'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [61, 1, 'UNLIKELY', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'scalar', 'none',
    'none', true, false, true, ['SQL'], {}, '', '2020-08-14 16:27:52', '2020-08-14
      16:27:52']
  - [62, 1, '_sql_stat_get', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none',
    'none', false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [63, 1, '_sql_stat_push', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none',
    'none', false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [64, 1, '_sql_stat_init', 1, 'SQL_BUILTIN', '', 'function', [], 'any', 'none',
    'none', false, false, true, [], {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [65, 1, 'LUA', 1, 'LUA', 'function(code) return assert(loadstring(code))() end',
    'function', ['string'], 'any', 'none', 'none', false, false, true, ['LUA', 'SQL'],
    {}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [66, 1, 'GREATEST', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'scalar', 'none',
    'none', true, false, true, ['SQL'], {'has_vararg': true}, '', '2020-08-14 16:27:52',
    '2020-08-14 16:27:52']
  - [67, 1, 'LEAST', 1, 'SQL_BUILTIN', '', 'function', ['scalar'], 'scalar', 'none',
    'none', true, false, true, ['SQL'], {'has_vararg': true}, '', '2020-08-14 16:27:52',
    '2020-08-14 16:27:52']
  - [68, 1, 'LENGTH_VARBINARY', 1, 'SQL_BUILTIN', '', 'function', ['varbinary'], 'integer',
    'none', 'none', true, false, true, ['SQL'], {'is_overloaded': true}, '', '2020-08-14
      16:27:52', '2020-08-14 16:27:52']
  - [69, 1, 'POSITION_VARBINARY', 1, 'SQL_BUILTIN', '', 'function', ['varbinary',
      'varbinary'], 'integer', 'none', 'none', true, false, true, ['SQL'], {'is_overloaded': true},
    '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [70, 1, 'TRIM_VARBINARY', 1, 'SQL_BUILTIN', '', 'function', ['varbinary', 'unsigned',
      'varbinary'], 'string', 'none', 'none', true, false, true, ['SQL'], {'is_overloaded': true,
      'has_vararg': true}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
  - [71, 1, 'SUBSTR_VARBINARY', 1, 'SQL_BUILTIN', '', 'function', ['varbinary', 'integer',
      'integer'], 'string', 'none', 'none', true, false, true, ['SQL'], {'is_overloaded': true,
      'has_vararg': true}, '', '2020-08-14 16:27:52', '2020-08-14 16:27:52']
...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-discussions] SQL built-in functions position
  2020-09-27 20:56 ` Peter Gulutzan
@ 2020-09-28 20:07   ` Vladislav Shpilevoy
  2020-09-29 19:22     ` Peter Gulutzan
  0 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-28 20:07 UTC (permalink / raw)
  To: Peter Gulutzan, Mergen Imeev, Nikita Pettik, kyukhin, tsafin, sergos
  Cc: tarantool-discussions

Hi!

See my response in another email with 4 big reasons why
storage of SQL-specific functions in _func is a bad idea.

Also see responses on your comments in separate sections.
I leave references below.

> The current built-in functions are harmless, except perhaps for RANDOMBLOB with a huge value.
> However, in future maybe there will be built-in functions that should require privileges.
> In that case, I believe, they will have to be in _func (and someday in _vfunc)
> so that grant() will work for them.

See "## Built-in functions require privileges?".

> I have tried to redirect the UPPER() function in order to violate security, thus:
> "
> tarantool> function UPPER(x) return x end
> ---
> ...
> tarantool> box.schema.func.create('UPPER')
> ---
> - error: Function 'UPPER' already exists
> ...
> tarantool> box.schema.func.drop('UPPER')
> ---
> - error: 'Can''t drop function 1: function is SQL built-in'
> ...
> "
> This is good behaviour and I think it works because UPPER() is in _func.

See "## Built-in functions prevent duplicates in _func?".

> I did not document in the manual's SQL section that built-in functions will
> be in _func, so removing them is not a regression from documented behaviour.

See "## Storage in _func does not change _func schema and documentation?".

> But I think users benefit from being able to see them.

See "## Users benefit from seeing SQL-specific functions in _func?".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-discussions] SQL built-in functions position
  2020-09-28 18:19 ` Nikita Pettik
@ 2020-09-28 20:07   ` Vladislav Shpilevoy
  0 siblings, 0 replies; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-28 20:07 UTC (permalink / raw)
  To: Nikita Pettik, Mergen Imeev; +Cc: tarantool-discussions

Hi!

See my response in another email with 4 big reasons why
storage of SQL-specific functions in _func is a bad idea.

Also see responses on your comments in separate sections.
I leave references below.

On 28.09.2020 20:19, Nikita Pettik wrote:
> On 27 Sep 18:18, Mergen Imeev wrote:
>> Hi all. I have a question that I would like to discuss.
>>
>> The question is about SQL built-in functions. At the moment these functions
>> are
>> partially described in _func and partially in src/box/sql/func.c. I received
>> two
>> completely different suggestions from my reviewers on what to do with these
>> functions:
>> 1) Move definitions completely to _func. Remove definitions from func.c.
> 
> That's my proposal. It makes name collisions check simple,

See "## Built-in functions prevent duplicates in _func?".

> provides unified interface to invoke built-in and non-built-in functions,

See "## Storage in _func unifies functions?".

> allows to grant and verify priveleges in the same way and so forth.

See "## Built-in functions require privileges?".

> Built-ins are already declaraed in _func, so reverting this thing would result> in another one unnecessary schema change and upgrade (so I doubt that implementation would
> be somehow 'simpler').

See "## Storage in _func does not change _func schema and documentation?".

> Finally part of functions can turn out to be really
> usefull in Lua someday such as date()/time().

See "## Reuse SQL functions in Lua and other languages?".

> So to me the choice is kind
> of obvious..

It still is not to me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-discussions] SQL built-in functions position
  2020-09-28 20:07   ` Vladislav Shpilevoy
@ 2020-09-29 19:22     ` Peter Gulutzan
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Gulutzan @ 2020-09-29 19:22 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-discussions

Hi,

On 2020-09-28 2:07 p.m., Vladislav Shpilevoy wrote:
> Hi!
>
> See my response in another email with 4 big reasons why
> storage of SQL-specific functions in _func is a bad idea.
>
> Also see responses on your comments in separate sections.
> I leave references below.


I cannot argue against your arguments about implementation,
and I acknowledged that using _func is not necessary,
but I extract a few things that you said that seem a bit odd to me.

 > Firstly, if a function has variable argument count, why the hell 
should they all
 > be of the same type? It is a silly restriction motivated entirely by SQL
 > specifics like SUM() taking all arguments of the same type, or 
LEAST(), or GREATEST()
 > and so on. That is very-SQL thing. I can't imagine when such a 
restriction may
 > be needed in other supported languages.

I can say
CREATE TABLE t (s1 SCALAR PRIMARY KEY);
INSERT INTO t VALUES (1), (CAST(2 AS UNSIGNED)), (3e3);
SELECT SUM(s1) FROM t;
That is, data types can differ provided that they are numbers.
I see that as a restriction that "makes sense".

 > For example, PRINTF().

I acknowledge that PRINTF() is ugly but it is different from other 
functions.
It is not a "very-SQL thing". It happened to be in SQLite.
https://sqlite.org/printf.html
So I believe that its flaws do not prove something about other functions.

 > Some of the functions are supposed to be used for aggregated values ...

When I wrote, I was only thinking about built-in scalar functions.
AVG COUNT GROUP_CONCAT MAX MIN SUM TOTAL are in _func,
but I thought they could be excluded from SELECT requests.
(Apparently I was wrong, as I'll explain below.)

 > If a function is sandboxed, so it does not affect other users, I 
don't see why
 > would it need restrictions. All built-ins are sandboxed.

Not all, I mentioned an exception (in a private issue).
And I did mention that "RANDOMBLOB with a huge value" might not be harmless.
But in my email I was saying that "in future maybe there will be built-in
functions that should require privileges". You asked for an example, but:
if I come up with any example that doesn't work in a sandbox, would you not
reply that it doesn't need to be built in?

 > ## Users benefit from seeing SQL-specific functions in _func? - No.

True, but not fair. A programmer who writes a client program can write
something that accesses _func and displays something that users can
comprehend. And information_schema.routines will probably be something
that reads _func and displays as an SQL table.

 > So what exactly are the SQL built-in functions so much needed in Lua
 > and C? Looking at the unreadable _func output below, I can't imagine
 > why somebody need any of these functions out of SQL.

You are looking at the date/time functions, which, again, are from SQLite.
My memory is vague but I understood K. Osipov did not want them documented
until we were sure we wanted users to use them.

I had a quick look at what is in _func in version 2.5 ...

tarantool>SELECT "name", "language", "is_sandboxed", "aggregate"
 >FROM "_func"
 >ORDER BY "name";
OK 67 rows selected (0.0 seconds)
+----------------------+-------------+--------------+-----------+
| name                 | language    | is_sandboxed | aggregate |
+----------------------+-------------+--------------+-----------+
| ABS                  | SQL_BUILTIN | FALSE        | none      |
| AVG                  | SQL_BUILTIN | FALSE        | none      |
| CEIL                 | SQL_BUILTIN | FALSE        | none      |
| CEILING              | SQL_BUILTIN | FALSE        | none      |
| CHAR                 | SQL_BUILTIN | FALSE        | none      |
| CHARACTER_LENGTH     | SQL_BUILTIN | FALSE        | none      |
| CHAR_LENGTH          | SQL_BUILTIN | FALSE        | none      |
| COALESCE             | SQL_BUILTIN | FALSE        | none      |
| COUNT                | SQL_BUILTIN | FALSE        | none      |
| CURRENT_DATE         | SQL_BUILTIN | FALSE        | none      |
| CURRENT_TIME         | SQL_BUILTIN | FALSE        | none      |
| CURRENT_TIMESTAMP    | SQL_BUILTIN | FALSE        | none      |
| DATE                 | SQL_BUILTIN | FALSE        | none      |
| DATETIME             | SQL_BUILTIN | FALSE        | none      |
| EVERY                | SQL_BUILTIN | FALSE        | none      |
| EXISTS               | SQL_BUILTIN | FALSE        | none      |
| EXP                  | SQL_BUILTIN | FALSE        | none      |
| EXTRACT              | SQL_BUILTIN | FALSE        | none      |
| FLOOR                | SQL_BUILTIN | FALSE        | none      |
| GREATER              | SQL_BUILTIN | FALSE        | none      |
| GREATEST             | SQL_BUILTIN | FALSE        | none      |
| GROUP_CONCAT         | SQL_BUILTIN | FALSE        | none      |
| HEX                  | SQL_BUILTIN | FALSE        | none      |
| IFNULL               | SQL_BUILTIN | FALSE        | none      |
| JULIANDAY            | SQL_BUILTIN | FALSE        | none      |
| LEAST                | SQL_BUILTIN | FALSE        | none      |
| LENGTH               | SQL_BUILTIN | FALSE        | none      |
| LESSER               | SQL_BUILTIN | FALSE        | none      |
| LIKE                 | SQL_BUILTIN | FALSE        | none      |
| LIKELIHOOD           | SQL_BUILTIN | FALSE        | none      |
| LIKELY               | SQL_BUILTIN | FALSE        | none      |
| LN                   | SQL_BUILTIN | FALSE        | none      |
| LOWER                | SQL_BUILTIN | FALSE        | none      |
| LUA                  | LUA         | FALSE        | none      |
| MAX                  | SQL_BUILTIN | FALSE        | none      |
| MIN                  | SQL_BUILTIN | FALSE        | none      |
| MOD                  | SQL_BUILTIN | FALSE        | none      |
| NULLIF               | SQL_BUILTIN | FALSE        | none      |
| OCTET_LENGTH         | SQL_BUILTIN | FALSE        | none      |
| POSITION             | SQL_BUILTIN | FALSE        | none      |
| POWER                | SQL_BUILTIN | FALSE        | none      |
| PRINTF               | SQL_BUILTIN | FALSE        | none      |
| QUOTE                | SQL_BUILTIN | FALSE        | none      |
| RANDOM               | SQL_BUILTIN | FALSE        | none      |
| RANDOMBLOB           | SQL_BUILTIN | FALSE        | none      |
| REPLACE              | SQL_BUILTIN | FALSE        | none      |
| ROUND                | SQL_BUILTIN | FALSE        | none      |
| ROW_COUNT            | SQL_BUILTIN | FALSE        | none      |
| SOME                 | SQL_BUILTIN | FALSE        | none      |
| SOUNDEX              | SQL_BUILTIN | FALSE        | none      |
| SQRT                 | SQL_BUILTIN | FALSE        | none      |
| STRFTIME             | SQL_BUILTIN | FALSE        | none      |
| SUBSTR               | SQL_BUILTIN | FALSE        | none      |
| SUM                  | SQL_BUILTIN | FALSE        | none      |
| TIME                 | SQL_BUILTIN | FALSE        | none      |
| TOTAL                | SQL_BUILTIN | FALSE        | none      |
| TRIM                 | SQL_BUILTIN | FALSE        | none      |
| TYPEOF               | SQL_BUILTIN | FALSE        | none      |
| UNICODE              | SQL_BUILTIN | FALSE        | none      |
| UNLIKELY             | SQL_BUILTIN | FALSE        | none      |
| UPPER                | SQL_BUILTIN | FALSE        | none      |
| VERSION              | SQL_BUILTIN | FALSE        | none      |
| ZEROBLOB             | SQL_BUILTIN | FALSE        | none      |
| _sql_stat_get        | SQL_BUILTIN | FALSE        | none      |
| _sql_stat_init       | SQL_BUILTIN | FALSE        | none      |
| _sql_stat_push       | SQL_BUILTIN | FALSE        | none      |
| box.schema.user.info | LUA         | FALSE        | none      |
+----------------------+-------------+--------------+-----------+

Well, this doesn't look entirely right, and that I guess means
that you are right -- at this moment.
I do not understand why is_sandboxed is always false.
I do not understand why aggregate is none for aggregate functions.
I have to acknowledge that many of these functions are undocumented
deliberately and we might not want users to seem them, think that
we will always support them, and add them to applications.
(Include: CEIL/CEILING CURRENT_DATE CURRENT_TIME CURRENT_TIMESTAMP
DATE DATETIME EVERY EXP EXTRACT FLOOR GREATER JULIANDAY LESSER LN
MOD POWER SOME STRFTIME TIME _sql_stat_get sql_stat_init
sql_stat_push box.schema.user.info.)
However, that only means that at this moment the information is bad.
It does not mean that the information should not exist.

I admit that you made points that I hadn't realized,
but stubbornly still stick with a belief in _func.

Peter Gulutzan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-discussions] SQL built-in functions position
  2020-09-28 20:07 ` Vladislav Shpilevoy
@ 2020-10-01 14:46   ` Kirill Yukhin
  2020-10-01 21:15     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 10+ messages in thread
From: Kirill Yukhin @ 2020-10-01 14:46 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-discussions

Hello,

My comments/answers inlined.

On 28 сен 22:07, Vladislav Shpilevoy wrote:
> Hi!
> 
> On 27.09.2020 17:18, Mergen Imeev wrote:
> > Hi all. I have a question that I would like to discuss.
> > 
> > The question is about SQL built-in functions. At the moment these functions are
> > partially described in _func and partially in src/box/sql/func.c. I received two
> > completely different suggestions from my reviewers on what to do with these
> > functions:
> > 1) Move definitions completely to _func. Remove definitions from func.c.
> > 2) Move definitions completely to func.c. Remove definitions from _func.
> > 
> > In the first case, users will be able to see the function definitions. Also, in
> > the future, we may allow these functions to be called from Lua (although not
> > sure if this is necessary). The main idea is 'all functions have the same
> > interface'.
> > 
> > In the second case, the implementation is simpler, and we can more easily
> > implement some features, such as "virtual" functions. For users, the definition
> > can only be seen in the documentation. The main idea is 'SQL built-in functions
> > are part of SQL'.
> > 
> > Which of these approaches do you think is more beneficial to us?
> 
> I am the one who proposed the second option. I proposed it because of
> several reasons.
> 
> ==================================================
> ## Reason 1
> 
> The implementation is ugly by design. Since SQL functions are strictly typed,
> and some of them may take more than one type, we were forced to implement some
> kind of function overload by mangling function names. We are not trying to
> implement C++ here, it goes against the _func basic schema.

We are implementing SQL built-ins here and that is what syntax dictates. Originally,
wheren we were removing hard code it was intentded to make both languages
interoperable. I see no reason, why _func should exclusively belong to Lua.

We are free to extend _func schema.

> To workaround that there were added a new ugly option: is_overloaded. Overloaded
> means that there is still one function but declared twice, just to check types.

This option is useless since we might implement stable mangler, which won't change
a name if there're no types of args and returns are specified.

> For example, LENGTH and LENGTH_VARBINARY, TRIM, TRIM_VARBINARY, and so on. That
> leads to an issue, that we have not existing functions in _func. For example:
> 
> 	tarantool> box.space._func.index.name:select({'LENGTH_VARBINARY'})
> 	---
> 	- - [68, 1, 'LENGTH_VARBINARY', 1, 'SQL_BUILTIN', '', 'function', ['varbinary'], 'integer',
> 	    'none', 'none', true, false, true, ['SQL'], {'is_overloaded': true}, '', '2020-08-14
> 	      16:27:52', '2020-08-14 16:27:52']
> 	...
> 
> 	tarantool> box.execute("SELECT LENGTH_VARBINARY('abc')")
> 	---
> 	- null
> 	- Function 'LENGTH_VARBINARY' does not exist
> 	...
> 
> Doesn't this look bad? That is -1 point to the argument about 'it is good to
> have function visible in _func'. They are visible, but it does not mean anything.

_func is a service space and it is unusual to use it that way. Just imagine, you're
doing `nm` on some C++ object and trying to invoke mangled names from there. We should
deal with it anyway, since user might create she's own function and there'll be no
way to hardcode it.

> ====================
> ## Reason 2
> 
> SQL has vararg functions - the ones which take unknown number of arguments. That
> led to addition of one another ugly option: has_vararg. When it is true, all
> arguments after last declared argument are forced to have the same type as the
> last argument. This looks wrong from all possible sides.

That is the way all compilers work. They add hidden attribute to function declaration
which manifests it has va_arg in it.

> Firstly, if a function has variable argument count, why the hell should they all
> be of the same type? It is a silly restriction motivated entirely by SQL
> specifics like SUM() taking all arguments of the same type, or LEAST(), or GREATEST()
> and so on. That is very-SQL thing. I can't imagine when such a restriction may
> be needed in other supported languages.

They won't. va_arg must be typeless. Type of actual values must be checked inside
given routine. Just look inside printf().

> Secondly, like with the option 'is_overloaded' it complicates output of _func and
> makes it harder for a user to create a function, when he sees so many irrelevant
> options, needed for SQL only. Just take a look at the _func output in the end of
> this email to see how bad is it [1].

Complication of the output of select() from service space is not an argument at all.

> Thirdly, it simply does not work for functions which are really vararg, and not
> some kind of aggregates, ironically. For example, PRINTF(). To workaround the
> workaround-option has_vararg such functions simply turn off their type checks by
> making the last argument SCALAR. Well, this looks really bad. It would become a
> bit simpler if we would split that option in two - has_vararg to make an
> equivalent of ... in C with values of any type, and is_aggregate to check all
> values have the same type. But still ugly.

See first comment in this chapter.

> ====================
> ## Reason 3
> 
> SQL built-in functions are a part of the language. When we store them separately,
> we risk to get SQL implementation different from the built-in SQL functions schema.
> For example, we will add a new built-in function, and a user will upgrade, but
> won't be able to use it until he upgrades the schema, and we will need to support
> that in SQL implementation - that some built-in functions actually may not exist.

I think this is a good idea to upgrade before use _new_ functionality.

> ====================
> ## Reason 4
> 
> Some of the functions are supposed to be used for aggregated values, and this makes
> their implementation not reusable in other languages. That in turn makes their presence
> in the common _func space, used for all functions, irrelevant. I am talking about
> SUM(), COUNT(), AVG(), probably there are more.

For now, error should be emitted, but in future nothing blocks us from enabling
such aggregates, e.g. for arrays.

> ================================================================================
> Now talking of the points I received about how good would it be to have these
> functions in _func.
> 
> ====================
> ## Built-in functions require privileges? - No.
> 
> Cite from Peter's email:
> 
> 	The current built-in functions are harmless, except perhaps for RANDOMBLOB with a huge value.
> 	However, in future maybe there will be built-in functions that should require privileges.
> 	In that case, I believe, they will have to be in _func (and someday in _vfunc)
> 	so that grant() will work for them. 
> 
> Cite from Nikita's email:
> 
> 	allows to grant and verify priveleges in the same way and so forth
> 
> What are examples of such built-in functions? Built-in functions are a part
> of the language. Restricting access to SQL built-ins is like restricting access
> to 'next()', 'string.upper()', 'os.time()' and other functions in Lua. Or
> restricting usage of 'operator+' in C++. They are a part of the language. If
> something needs restrictions, it does not seem to be a part of the language.
> If a function is sandboxed, so it does not affect other users, I don't see why
> would it need restrictions. All built-ins are sandboxed.
> 
> ====================
> ## Built-in functions prevent duplicates in _func? - No.
> 
> Cite from Nikita's email:
> 
> 	It makes name collisions check simple
> 
> This is a matter of a few code lines to add that check to _func triggers
> to check new function name in all languages, if necessary. Or we could just make
> the functions hash do the checking. So we won't even need to involve SQL-specific
> code.
> 
> That is by the way questionable. I would think more if we want to forbid
> definition of functions clashing with built-ins. That would be a 'legal' overload,
> if we would look into _func first, and then into built-ins, but I don't really
> know about that. Didn't think of it properly yet.
> 
> ====================
> ## Storage in _func does not change _func schema and documentation? - No.
> 
> Cite from Peter's email:
> 
> 	I did not document in the manual's SQL section that built-in functions will
> 	be in _func, so removing them is not a regression from documented behaviour. 
> 
> It is good that their presence in _func is not documented. Because even if
> it would, and we would go for _func extension, we would change _func schema
> and options anyway, because of the new options 'is_overloaded' and 'has_vararg',
> and new fake functions with _VARBINARY suffix.
> 
> Cite from Nikita's email:
> 
> 	Built-ins are already declaraed in _func, so reverting this thing would
> 	result in another one unnecessary schema change and upgrade (so I doubt that
> 	implementation would be somehow 'simpler')
> 
> This is also wrong. Both versions change _func. But the version with extending
> _func makes it bigger, adds new ugly SQL-specific options, and adds not existing
> functions with fake names. In the version about functions-as-part-of-language _func
> is cleared from these functions, and nothing more happens with the schema. So the
> patch will be simpler for sure, even though may get bigger due to more removals.
> 
> Besides, their removal from _func would also allow to get rid of the crutch with
> language 'SQL_BUILTIN' in _func. It will be present in C, but won't exist in _func.
> This is not a language. Languages are SQL and Lua. SQL_BUILTIN was a crutch from
> the beginning, to make the functions work anyhow.

I'd like to make call mechanism for SQL built-ins and use-defined routines the same.

> ====================
> ## Users benefit from seeing SQL-specific functions in _func? - No.
> 
> Look at [1]. The output format is hardly readable. Not only because of the
> new options (partially because of them), but also because 1) some functions
> here don't really exist - LENGTH_VARBINARY, POSITION_VARBINARY, TRIM_VARBINARY,
> SUBSTR_VARBINARY, 2) because of lots of other fields of _func, which make
> the output super hard to read and understand what is what, even now.
> 
> _func is a dead end if someone wants to see function definitions. For that it
> would be better to introduce a new pretty printer, somewhere in box.func maybe.

That is strange argument (and I guess it is duplicate of one ofprevious). _func
is a service space. It should be read with care.

> ====================
> ## Storage in _func unifies functions? - _func output is unified, yes. Other things - no.
> 
> That is the only unification. Code still is going to separate them. By
> introduction of the new _func options you just move the complexity from a
> private place related to SQL only to the public space, related to the whole
> _func and other languages.

We should end up with the same mechanisms.

> ====================
> ## Reuse SQL functions in Lua and other languages? - No.
> 
> Cite from Nikita's email:
> 
> 	Finally part of functions can turn out to be really
> 	usefull in Lua someday such as date()/time()
> 
> It is not a secret, that all the languages we support already have
> date/time functions. In Lua these are 'os.date', 'os.time', 'fiber.clock',
> 'fiber.time' and more. In C these are all the standard C functions such
> as 'time()', 'gettimeofdat()', 'clock_gettime()', 'clock()', and more. Also
> lots of functions for time formatting, but I don't remember their names.
> 
> So what exactly are the SQL built-in functions so much needed in Lua
> and C? Looking at the unreadable _func output below, I can't imagine
> why somebody need any of these functions out of SQL.

Third time about non-readable output? Well, ok. It is unreadable. But that
is not an argument.


That said. I see we have 3 voices against 1.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-discussions] SQL built-in functions position
  2020-10-01 14:46   ` Kirill Yukhin
@ 2020-10-01 21:15     ` Vladislav Shpilevoy
  2020-10-02 15:18       ` Mergen Imeev
  0 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-01 21:15 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: tarantool-discussions

>> ==================================================
>> ## Reason 1
>>
>> The implementation is ugly by design. Since SQL functions are strictly typed,
>> and some of them may take more than one type, we were forced to implement some
>> kind of function overload by mangling function names. We are not trying to
>> implement C++ here, it goes against the _func basic schema.
> 
> We are implementing SQL built-ins here and that is what syntax dictates.

I don't know what is the syntax you are talking about. We discuss whether to store
them in _func, that is all. No talks about syntax. Function invocation syntax is
not related to built-ins.

> Originally,
> wheren we were removing hard code it was intentded to make both languages
> interoperable. I see no reason, why _func should exclusively belong to Lua.

I never said it belongs to Lua. I said there are certain functions belonging to
a language. Lua has its functions like next(), select(), os.time() etc. SQL
has its own functions. _func is for common functions accessible from any language.

> We are free to extend _func schema.

I never said we are not allowed to do that. I said it is not a place for
language-specific functions.

>> To workaround that there were added a new ugly option: is_overloaded. Overloaded
>> means that there is still one function but declared twice, just to check types.
> 
> This option is useless since we might implement stable mangler, which won't change
> a name if there're no types of args and returns are specified.

Mangler should be a part of the language. What if a user registers function
'test()' with types int, uint? Will he be forced to use test_int_uint() name?
Who will do the mangling, unmangling? It does not look like a task to solve on
the _func or even schema level. _func is basically an interface to give
permissions to functions, to load C functions, and to persist their code. It
is not some sub-language nor a compiler.

>> For example, LENGTH and LENGTH_VARBINARY, TRIM, TRIM_VARBINARY, and so on. That
>> leads to an issue, that we have not existing functions in _func. For example:
>>
>> 	tarantool> box.space._func.index.name:select({'LENGTH_VARBINARY'})
>> 	---
>> 	- - [68, 1, 'LENGTH_VARBINARY', 1, 'SQL_BUILTIN', '', 'function', ['varbinary'], 'integer',
>> 	    'none', 'none', true, false, true, ['SQL'], {'is_overloaded': true}, '', '2020-08-14
>> 	      16:27:52', '2020-08-14 16:27:52']
>> 	...
>>
>> 	tarantool> box.execute("SELECT LENGTH_VARBINARY('abc')")
>> 	---
>> 	- null
>> 	- Function 'LENGTH_VARBINARY' does not exist
>> 	...
>>
>> Doesn't this look bad? That is -1 point to the argument about 'it is good to
>> have function visible in _func'. They are visible, but it does not mean anything.
> 
> _func is a service space and it is unusual to use it that way. Just imagine, you're
> doing `nm` on some C++ object and trying to invoke mangled names from there.

I can export a mangled name and even use it. It is at least accessible. See example
in [1]. But you are missing the point. The point was that I was said _func is useful
to be looked at, and here I prove it is not so.

Talking of the comparison with C++ - are you ok? This is a service space to store
permissions, function bodies, and load C functions. It is not a language or a compiler.

> We should
> deal with it anyway, since user might create she's own function and there'll be no
> way to hardcode it.

I couldn't parse this sentence, sorry. Please, rephrase.

>> ====================
>> ## Reason 2
>>
>> SQL has vararg functions - the ones which take unknown number of arguments. That
>> led to addition of one another ugly option: has_vararg. When it is true, all
>> arguments after last declared argument are forced to have the same type as the
>> last argument. This looks wrong from all possible sides.
> 
> That is the way all compilers work. They add hidden attribute to function declaration
> which manifests it has va_arg in it.

It is not a compiler. I will repeat here what I said above:

	_func is basically an interface to give permissions to functions,
	to load C functions, and to persist their code now. It is not some
	sub-language nor a compiler.

>> Secondly, like with the option 'is_overloaded' it complicates output of _func and
>> makes it harder for a user to create a function, when he sees so many irrelevant
>> options, needed for SQL only. Just take a look at the _func output in the end of
>> this email to see how bad is it [1].
> 
> Complication of the output of select() from service space is not an argument at all.

Well, did you read the other emails? This is an answer to the arguments
that _func is useful to be looked at. Here I prove it is not, and you say
the same, what just proves my point. It is strange that you tell that me
and not to those, who made that point.

>> ====================
>> ## Reason 3
>>
>> SQL built-in functions are a part of the language. When we store them separately,
>> we risk to get SQL implementation different from the built-in SQL functions schema.
>> For example, we will add a new built-in function, and a user will upgrade, but
>> won't be able to use it until he upgrades the schema, and we will need to support
>> that in SQL implementation - that some built-in functions actually may not exist.
> 
> I think this is a good idea to upgrade before use _new_ functionality.

You are missing the point again. You have no choice upgrade or not - when you
start a new tarantool binary, your SQL implementation is already upgraded - it is
a part of the binary. The parser, VDBE implementation. But some built-in functions
may not be upgraded yet, until you call box.schema.upgrade(). Your SQL implementation
and its built-in function declarations become desynchronized from the beginning.

>> ====================
>> ## Reason 4
>>
>> Some of the functions are supposed to be used for aggregated values, and this makes
>> their implementation not reusable in other languages. That in turn makes their presence
>> in the common _func space, used for all functions, irrelevant. I am talking about
>> SUM(), COUNT(), AVG(), probably there are more.
> 
> For now, error should be emitted, but in future nothing blocks us from enabling
> such aggregates, e.g. for arrays.

It makes no sense, Lua has own much better aggregation functions. Just look at luafun
library. It is the best way to do aggregation things in Lua, if you are a fan of
one-liners. Or it can be done simply in a cycle with yields and all. For all the
built-in functions of SQL Lua has its own alternatives, also language-specific,
and working best in this language.

>> ================================================================================
>> Now talking of the points I received about how good would it be to have these
>> functions in _func.
>>
>> ====================
>> ## Storage in _func does not change _func schema and documentation? - No.
>>
>> Cite from Peter's email:
>>
>> 	I did not document in the manual's SQL section that built-in functions will
>> 	be in _func, so removing them is not a regression from documented behaviour. 
>>
>> It is good that their presence in _func is not documented. Because even if
>> it would, and we would go for _func extension, we would change _func schema
>> and options anyway, because of the new options 'is_overloaded' and 'has_vararg',
>> and new fake functions with _VARBINARY suffix.
>>
>> Cite from Nikita's email:
>>
>> 	Built-ins are already declaraed in _func, so reverting this thing would
>> 	result in another one unnecessary schema change and upgrade (so I doubt that
>> 	implementation would be somehow 'simpler')
>>
>> This is also wrong. Both versions change _func. But the version with extending
>> _func makes it bigger, adds new ugly SQL-specific options, and adds not existing
>> functions with fake names. In the version about functions-as-part-of-language _func
>> is cleared from these functions, and nothing more happens with the schema. So the
>> patch will be simpler for sure, even though may get bigger due to more removals.
>>
>> Besides, their removal from _func would also allow to get rid of the crutch with
>> language 'SQL_BUILTIN' in _func. It will be present in C, but won't exist in _func.
>> This is not a language. Languages are SQL and Lua. SQL_BUILTIN was a crutch from
>> the beginning, to make the functions work anyhow.
> 
> I'd like to make call mechanism for SQL built-ins and use-defined routines the same.

It is the same, and not related to what we are discussing. Functions still are going
to be stored in an internal hash, but with much more freedom of what we store
together with each function, what metadata. And with freedom to separate them if
needed any time. Any options become possible - vararg, aggregates, any types (even
a new artificial type proposed by Mergen to declare functions both for strings and
varbinaries). With _func we are very limited, because can't do language-specific
things properly, and we shouldn't.

>> ====================
>> ## Users benefit from seeing SQL-specific functions in _func? - No.
>>
>> Look at [1]. The output format is hardly readable. Not only because of the
>> new options (partially because of them), but also because 1) some functions
>> here don't really exist - LENGTH_VARBINARY, POSITION_VARBINARY, TRIM_VARBINARY,
>> SUBSTR_VARBINARY, 2) because of lots of other fields of _func, which make
>> the output super hard to read and understand what is what, even now.
>>
>> _func is a dead end if someone wants to see function definitions. For that it
>> would be better to introduce a new pretty printer, somewhere in box.func maybe.
> 
> That is strange argument (and I guess it is duplicate of one ofprevious). _func
> is a service space. It should be read with care.

As I said above twice, and in the header of this paragraph, it is a direct
answer to Nikita's and Peter's *comments about _func being useful to be looked
at*. It is actually not, this is what I am saying, and you are saying the same
again.

It is repeated, because firstly I listed the reasons why _func can't be used
for built-ins. Then I listed my answers to all what Nikita and Peter said. I
tried to use ## to separate my text into paragraphs with headers for that. But
it looks like you didn't read carefully.

>> ====================
>> ## Reuse SQL functions in Lua and other languages? - No.
>>
>> Cite from Nikita's email:
>>
>> 	Finally part of functions can turn out to be really
>> 	usefull in Lua someday such as date()/time()
>>
>> It is not a secret, that all the languages we support already have
>> date/time functions. In Lua these are 'os.date', 'os.time', 'fiber.clock',
>> 'fiber.time' and more. In C these are all the standard C functions such
>> as 'time()', 'gettimeofdat()', 'clock_gettime()', 'clock()', and more. Also
>> lots of functions for time formatting, but I don't remember their names.
>>
>> So what exactly are the SQL built-in functions so much needed in Lua
>> and C? Looking at the unreadable _func output below, I can't imagine
>> why somebody need any of these functions out of SQL.
> 
> Third time about non-readable output? Well, ok. It is unreadable. But that
> is not an argument.

Are you kidding? Is this *the only word* you noticed in this paragraph?
'Non-readable'? If yes, I don't think you need my opinion. Looks like
you already decided everything, and didn't care much to read what I wrote
here.

> That said. I see we have 3 voices against 1.

I didn't see any analysis from you. You just referred to the compilers
and mangling, only proving that these functions are very language specific,
if their usage requires to invent a compiler. Also you said _func output is
internal like if I was talking not the same, but I was talking the same.

We can get back to that when you will actually read what I am trying to say.

## References

[1] Exporting a mangled name

#### File test.cpp

extern int
func(int a)
{
	exit(a);
}

extern int
func(float a)
{
	exit((int)a);
}

extern "C" void
print_symbol();

extern int
main(void)
{
	print_symbol();
	printf("So the point about mangled names being not accessible is "
	       "false\n");
	return 0;
}

#### File test.c

extern void _Z4funcf(void);

void
print_symbol()
{
	printf("%p\n", &_Z4funcf);
}

#### Build and run

$> gcc -c test.c -o obj1.o
$> g++ -c test.cpp -o obj2.o
$> g++ obj1.o obj2.o
$> ./a.out
0x10333af40
So the point about mangled names being not accessible is false

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-discussions] SQL built-in functions position
  2020-10-01 21:15     ` Vladislav Shpilevoy
@ 2020-10-02 15:18       ` Mergen Imeev
  0 siblings, 0 replies; 10+ messages in thread
From: Mergen Imeev @ 2020-10-02 15:18 UTC (permalink / raw)
  To: Vladislav Shpilevoy, Kirill Yukhin, Peter Gulutzan,
	Nikita Pettik, tsafin, Mons Anderson, sergos, Alexander Turenko
  Cc: tarantool-discussions

Hi all! After today's discussion, we decided that:
1) This discussion will be held again later, after the next release.
2) Main reasons against completely moving built-in SQL functions to _func:
         a) This can cause problems with schema.
         b) This can cause problems in cluster.
3) The main reasons against removing built-in SQL functions from _func are:
         a) Behavior change.
4) To solve the main problem "functions must accept arguments in accordance with
the rules of implicit casting", we decided to fix each function so that they
work according to the specified rules. There will be no general decision at this
point.


On Thu, Oct 01, 2020 at 11:15:36PM +0200, Vladislav Shpilevoy wrote:
> >> ==================================================
> >> ## Reason 1
> >>
> >> The implementation is ugly by design. Since SQL functions are strictly typed,
> >> and some of them may take more than one type, we were forced to implement some
> >> kind of function overload by mangling function names. We are not trying to
> >> implement C++ here, it goes against the _func basic schema.
> > 
> > We are implementing SQL built-ins here and that is what syntax dictates.
> 
> I don't know what is the syntax you are talking about. We discuss whether to store
> them in _func, that is all. No talks about syntax. Function invocation syntax is
> not related to built-ins.
> 
> > Originally,
> > wheren we were removing hard code it was intentded to make both languages
> > interoperable. I see no reason, why _func should exclusively belong to Lua.
> 
> I never said it belongs to Lua. I said there are certain functions belonging to
> a language. Lua has its functions like next(), select(), os.time() etc. SQL
> has its own functions. _func is for common functions accessible from any language.
> 
> > We are free to extend _func schema.
> 
> I never said we are not allowed to do that. I said it is not a place for
> language-specific functions.
> 
> >> To workaround that there were added a new ugly option: is_overloaded. Overloaded
> >> means that there is still one function but declared twice, just to check types.
> > 
> > This option is useless since we might implement stable mangler, which won't change
> > a name if there're no types of args and returns are specified.
> 
> Mangler should be a part of the language. What if a user registers function
> 'test()' with types int, uint? Will he be forced to use test_int_uint() name?
> Who will do the mangling, unmangling? It does not look like a task to solve on
> the _func or even schema level. _func is basically an interface to give
> permissions to functions, to load C functions, and to persist their code. It
> is not some sub-language nor a compiler.
> 
> >> For example, LENGTH and LENGTH_VARBINARY, TRIM, TRIM_VARBINARY, and so on. That
> >> leads to an issue, that we have not existing functions in _func. For example:
> >>
> >> 	tarantool> box.space._func.index.name:select({'LENGTH_VARBINARY'})
> >> 	---
> >> 	- - [68, 1, 'LENGTH_VARBINARY', 1, 'SQL_BUILTIN', '', 'function', ['varbinary'], 'integer',
> >> 	    'none', 'none', true, false, true, ['SQL'], {'is_overloaded': true}, '', '2020-08-14
> >> 	      16:27:52', '2020-08-14 16:27:52']
> >> 	...
> >>
> >> 	tarantool> box.execute("SELECT LENGTH_VARBINARY('abc')")
> >> 	---
> >> 	- null
> >> 	- Function 'LENGTH_VARBINARY' does not exist
> >> 	...
> >>
> >> Doesn't this look bad? That is -1 point to the argument about 'it is good to
> >> have function visible in _func'. They are visible, but it does not mean anything.
> > 
> > _func is a service space and it is unusual to use it that way. Just imagine, you're
> > doing `nm` on some C++ object and trying to invoke mangled names from there.
> 
> I can export a mangled name and even use it. It is at least accessible. See example
> in [1]. But you are missing the point. The point was that I was said _func is useful
> to be looked at, and here I prove it is not so.
> 
> Talking of the comparison with C++ - are you ok? This is a service space to store
> permissions, function bodies, and load C functions. It is not a language or a compiler.
> 
> > We should
> > deal with it anyway, since user might create she's own function and there'll be no
> > way to hardcode it.
> 
> I couldn't parse this sentence, sorry. Please, rephrase.
> 
> >> ====================
> >> ## Reason 2
> >>
> >> SQL has vararg functions - the ones which take unknown number of arguments. That
> >> led to addition of one another ugly option: has_vararg. When it is true, all
> >> arguments after last declared argument are forced to have the same type as the
> >> last argument. This looks wrong from all possible sides.
> > 
> > That is the way all compilers work. They add hidden attribute to function declaration
> > which manifests it has va_arg in it.
> 
> It is not a compiler. I will repeat here what I said above:
> 
> 	_func is basically an interface to give permissions to functions,
> 	to load C functions, and to persist their code now. It is not some
> 	sub-language nor a compiler.
> 
> >> Secondly, like with the option 'is_overloaded' it complicates output of _func and
> >> makes it harder for a user to create a function, when he sees so many irrelevant
> >> options, needed for SQL only. Just take a look at the _func output in the end of
> >> this email to see how bad is it [1].
> > 
> > Complication of the output of select() from service space is not an argument at all.
> 
> Well, did you read the other emails? This is an answer to the arguments
> that _func is useful to be looked at. Here I prove it is not, and you say
> the same, what just proves my point. It is strange that you tell that me
> and not to those, who made that point.
> 
> >> ====================
> >> ## Reason 3
> >>
> >> SQL built-in functions are a part of the language. When we store them separately,
> >> we risk to get SQL implementation different from the built-in SQL functions schema.
> >> For example, we will add a new built-in function, and a user will upgrade, but
> >> won't be able to use it until he upgrades the schema, and we will need to support
> >> that in SQL implementation - that some built-in functions actually may not exist.
> > 
> > I think this is a good idea to upgrade before use _new_ functionality.
> 
> You are missing the point again. You have no choice upgrade or not - when you
> start a new tarantool binary, your SQL implementation is already upgraded - it is
> a part of the binary. The parser, VDBE implementation. But some built-in functions
> may not be upgraded yet, until you call box.schema.upgrade(). Your SQL implementation
> and its built-in function declarations become desynchronized from the beginning.
> 
> >> ====================
> >> ## Reason 4
> >>
> >> Some of the functions are supposed to be used for aggregated values, and this makes
> >> their implementation not reusable in other languages. That in turn makes their presence
> >> in the common _func space, used for all functions, irrelevant. I am talking about
> >> SUM(), COUNT(), AVG(), probably there are more.
> > 
> > For now, error should be emitted, but in future nothing blocks us from enabling
> > such aggregates, e.g. for arrays.
> 
> It makes no sense, Lua has own much better aggregation functions. Just look at luafun
> library. It is the best way to do aggregation things in Lua, if you are a fan of
> one-liners. Or it can be done simply in a cycle with yields and all. For all the
> built-in functions of SQL Lua has its own alternatives, also language-specific,
> and working best in this language.
> 
> >> ================================================================================
> >> Now talking of the points I received about how good would it be to have these
> >> functions in _func.
> >>
> >> ====================
> >> ## Storage in _func does not change _func schema and documentation? - No.
> >>
> >> Cite from Peter's email:
> >>
> >> 	I did not document in the manual's SQL section that built-in functions will
> >> 	be in _func, so removing them is not a regression from documented behaviour. 
> >>
> >> It is good that their presence in _func is not documented. Because even if
> >> it would, and we would go for _func extension, we would change _func schema
> >> and options anyway, because of the new options 'is_overloaded' and 'has_vararg',
> >> and new fake functions with _VARBINARY suffix.
> >>
> >> Cite from Nikita's email:
> >>
> >> 	Built-ins are already declaraed in _func, so reverting this thing would
> >> 	result in another one unnecessary schema change and upgrade (so I doubt that
> >> 	implementation would be somehow 'simpler')
> >>
> >> This is also wrong. Both versions change _func. But the version with extending
> >> _func makes it bigger, adds new ugly SQL-specific options, and adds not existing
> >> functions with fake names. In the version about functions-as-part-of-language _func
> >> is cleared from these functions, and nothing more happens with the schema. So the
> >> patch will be simpler for sure, even though may get bigger due to more removals.
> >>
> >> Besides, their removal from _func would also allow to get rid of the crutch with
> >> language 'SQL_BUILTIN' in _func. It will be present in C, but won't exist in _func.
> >> This is not a language. Languages are SQL and Lua. SQL_BUILTIN was a crutch from
> >> the beginning, to make the functions work anyhow.
> > 
> > I'd like to make call mechanism for SQL built-ins and use-defined routines the same.
> 
> It is the same, and not related to what we are discussing. Functions still are going
> to be stored in an internal hash, but with much more freedom of what we store
> together with each function, what metadata. And with freedom to separate them if
> needed any time. Any options become possible - vararg, aggregates, any types (even
> a new artificial type proposed by Mergen to declare functions both for strings and
> varbinaries). With _func we are very limited, because can't do language-specific
> things properly, and we shouldn't.
> 
> >> ====================
> >> ## Users benefit from seeing SQL-specific functions in _func? - No.
> >>
> >> Look at [1]. The output format is hardly readable. Not only because of the
> >> new options (partially because of them), but also because 1) some functions
> >> here don't really exist - LENGTH_VARBINARY, POSITION_VARBINARY, TRIM_VARBINARY,
> >> SUBSTR_VARBINARY, 2) because of lots of other fields of _func, which make
> >> the output super hard to read and understand what is what, even now.
> >>
> >> _func is a dead end if someone wants to see function definitions. For that it
> >> would be better to introduce a new pretty printer, somewhere in box.func maybe.
> > 
> > That is strange argument (and I guess it is duplicate of one ofprevious). _func
> > is a service space. It should be read with care.
> 
> As I said above twice, and in the header of this paragraph, it is a direct
> answer to Nikita's and Peter's *comments about _func being useful to be looked
> at*. It is actually not, this is what I am saying, and you are saying the same
> again.
> 
> It is repeated, because firstly I listed the reasons why _func can't be used
> for built-ins. Then I listed my answers to all what Nikita and Peter said. I
> tried to use ## to separate my text into paragraphs with headers for that. But
> it looks like you didn't read carefully.
> 
> >> ====================
> >> ## Reuse SQL functions in Lua and other languages? - No.
> >>
> >> Cite from Nikita's email:
> >>
> >> 	Finally part of functions can turn out to be really
> >> 	usefull in Lua someday such as date()/time()
> >>
> >> It is not a secret, that all the languages we support already have
> >> date/time functions. In Lua these are 'os.date', 'os.time', 'fiber.clock',
> >> 'fiber.time' and more. In C these are all the standard C functions such
> >> as 'time()', 'gettimeofdat()', 'clock_gettime()', 'clock()', and more. Also
> >> lots of functions for time formatting, but I don't remember their names.
> >>
> >> So what exactly are the SQL built-in functions so much needed in Lua
> >> and C? Looking at the unreadable _func output below, I can't imagine
> >> why somebody need any of these functions out of SQL.
> > 
> > Third time about non-readable output? Well, ok. It is unreadable. But that
> > is not an argument.
> 
> Are you kidding? Is this *the only word* you noticed in this paragraph?
> 'Non-readable'? If yes, I don't think you need my opinion. Looks like
> you already decided everything, and didn't care much to read what I wrote
> here.
> 
> > That said. I see we have 3 voices against 1.
> 
> I didn't see any analysis from you. You just referred to the compilers
> and mangling, only proving that these functions are very language specific,
> if their usage requires to invent a compiler. Also you said _func output is
> internal like if I was talking not the same, but I was talking the same.
> 
> We can get back to that when you will actually read what I am trying to say.
> 
> ## References
> 
> [1] Exporting a mangled name
> 
> #### File test.cpp
> 
> extern int
> func(int a)
> {
> 	exit(a);
> }
> 
> extern int
> func(float a)
> {
> 	exit((int)a);
> }
> 
> extern "C" void
> print_symbol();
> 
> extern int
> main(void)
> {
> 	print_symbol();
> 	printf("So the point about mangled names being not accessible is "
> 	       "false\n");
> 	return 0;
> }
> 
> #### File test.c
> 
> extern void _Z4funcf(void);
> 
> void
> print_symbol()
> {
> 	printf("%p\n", &_Z4funcf);
> }
> 
> #### Build and run
> 
> $> gcc -c test.c -o obj1.o
> $> g++ -c test.cpp -o obj2.o
> $> g++ obj1.o obj2.o
> $> ./a.out
> 0x10333af40
> So the point about mangled names being not accessible is false

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-10-02 15:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-27 15:18 [Tarantool-discussions] SQL built-in functions position Mergen Imeev
2020-09-27 20:56 ` Peter Gulutzan
2020-09-28 20:07   ` Vladislav Shpilevoy
2020-09-29 19:22     ` Peter Gulutzan
2020-09-28 18:19 ` Nikita Pettik
2020-09-28 20:07   ` Vladislav Shpilevoy
2020-09-28 20:07 ` Vladislav Shpilevoy
2020-10-01 14:46   ` Kirill Yukhin
2020-10-01 21:15     ` Vladislav Shpilevoy
2020-10-02 15:18       ` Mergen Imeev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox