[Tarantool-discussions] SQL built-in functions position

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Oct 2 00:15:36 MSK 2020


>> ==================================================
>> ## Reason 1
>>
>> The implementation is ugly by design. Since SQL functions are strictly typed,
>> and some of them may take more than one type, we were forced to implement some
>> kind of function overload by mangling function names. We are not trying to
>> implement C++ here, it goes against the _func basic schema.
> 
> We are implementing SQL built-ins here and that is what syntax dictates.

I don't know what is the syntax you are talking about. We discuss whether to store
them in _func, that is all. No talks about syntax. Function invocation syntax is
not related to built-ins.

> Originally,
> wheren we were removing hard code it was intentded to make both languages
> interoperable. I see no reason, why _func should exclusively belong to Lua.

I never said it belongs to Lua. I said there are certain functions belonging to
a language. Lua has its functions like next(), select(), os.time() etc. SQL
has its own functions. _func is for common functions accessible from any language.

> We are free to extend _func schema.

I never said we are not allowed to do that. I said it is not a place for
language-specific functions.

>> To workaround that there were added a new ugly option: is_overloaded. Overloaded
>> means that there is still one function but declared twice, just to check types.
> 
> This option is useless since we might implement stable mangler, which won't change
> a name if there're no types of args and returns are specified.

Mangler should be a part of the language. What if a user registers function
'test()' with types int, uint? Will he be forced to use test_int_uint() name?
Who will do the mangling, unmangling? It does not look like a task to solve on
the _func or even schema level. _func is basically an interface to give
permissions to functions, to load C functions, and to persist their code. It
is not some sub-language nor a compiler.

>> For example, LENGTH and LENGTH_VARBINARY, TRIM, TRIM_VARBINARY, and so on. That
>> leads to an issue, that we have not existing functions in _func. For example:
>>
>> 	tarantool> box.space._func.index.name:select({'LENGTH_VARBINARY'})
>> 	---
>> 	- - [68, 1, 'LENGTH_VARBINARY', 1, 'SQL_BUILTIN', '', 'function', ['varbinary'], 'integer',
>> 	    'none', 'none', true, false, true, ['SQL'], {'is_overloaded': true}, '', '2020-08-14
>> 	      16:27:52', '2020-08-14 16:27:52']
>> 	...
>>
>> 	tarantool> box.execute("SELECT LENGTH_VARBINARY('abc')")
>> 	---
>> 	- null
>> 	- Function 'LENGTH_VARBINARY' does not exist
>> 	...
>>
>> Doesn't this look bad? That is -1 point to the argument about 'it is good to
>> have function visible in _func'. They are visible, but it does not mean anything.
> 
> _func is a service space and it is unusual to use it that way. Just imagine, you're
> doing `nm` on some C++ object and trying to invoke mangled names from there.

I can export a mangled name and even use it. It is at least accessible. See example
in [1]. But you are missing the point. The point was that I was said _func is useful
to be looked at, and here I prove it is not so.

Talking of the comparison with C++ - are you ok? This is a service space to store
permissions, function bodies, and load C functions. It is not a language or a compiler.

> We should
> deal with it anyway, since user might create she's own function and there'll be no
> way to hardcode it.

I couldn't parse this sentence, sorry. Please, rephrase.

>> ====================
>> ## Reason 2
>>
>> SQL has vararg functions - the ones which take unknown number of arguments. That
>> led to addition of one another ugly option: has_vararg. When it is true, all
>> arguments after last declared argument are forced to have the same type as the
>> last argument. This looks wrong from all possible sides.
> 
> That is the way all compilers work. They add hidden attribute to function declaration
> which manifests it has va_arg in it.

It is not a compiler. I will repeat here what I said above:

	_func is basically an interface to give permissions to functions,
	to load C functions, and to persist their code now. It is not some
	sub-language nor a compiler.

>> Secondly, like with the option 'is_overloaded' it complicates output of _func and
>> makes it harder for a user to create a function, when he sees so many irrelevant
>> options, needed for SQL only. Just take a look at the _func output in the end of
>> this email to see how bad is it [1].
> 
> Complication of the output of select() from service space is not an argument at all.

Well, did you read the other emails? This is an answer to the arguments
that _func is useful to be looked at. Here I prove it is not, and you say
the same, what just proves my point. It is strange that you tell that me
and not to those, who made that point.

>> ====================
>> ## Reason 3
>>
>> SQL built-in functions are a part of the language. When we store them separately,
>> we risk to get SQL implementation different from the built-in SQL functions schema.
>> For example, we will add a new built-in function, and a user will upgrade, but
>> won't be able to use it until he upgrades the schema, and we will need to support
>> that in SQL implementation - that some built-in functions actually may not exist.
> 
> I think this is a good idea to upgrade before use _new_ functionality.

You are missing the point again. You have no choice upgrade or not - when you
start a new tarantool binary, your SQL implementation is already upgraded - it is
a part of the binary. The parser, VDBE implementation. But some built-in functions
may not be upgraded yet, until you call box.schema.upgrade(). Your SQL implementation
and its built-in function declarations become desynchronized from the beginning.

>> ====================
>> ## Reason 4
>>
>> Some of the functions are supposed to be used for aggregated values, and this makes
>> their implementation not reusable in other languages. That in turn makes their presence
>> in the common _func space, used for all functions, irrelevant. I am talking about
>> SUM(), COUNT(), AVG(), probably there are more.
> 
> For now, error should be emitted, but in future nothing blocks us from enabling
> such aggregates, e.g. for arrays.

It makes no sense, Lua has own much better aggregation functions. Just look at luafun
library. It is the best way to do aggregation things in Lua, if you are a fan of
one-liners. Or it can be done simply in a cycle with yields and all. For all the
built-in functions of SQL Lua has its own alternatives, also language-specific,
and working best in this language.

>> ================================================================================
>> Now talking of the points I received about how good would it be to have these
>> functions in _func.
>>
>> ====================
>> ## Storage in _func does not change _func schema and documentation? - No.
>>
>> Cite from Peter's email:
>>
>> 	I did not document in the manual's SQL section that built-in functions will
>> 	be in _func, so removing them is not a regression from documented behaviour. 
>>
>> It is good that their presence in _func is not documented. Because even if
>> it would, and we would go for _func extension, we would change _func schema
>> and options anyway, because of the new options 'is_overloaded' and 'has_vararg',
>> and new fake functions with _VARBINARY suffix.
>>
>> Cite from Nikita's email:
>>
>> 	Built-ins are already declaraed in _func, so reverting this thing would
>> 	result in another one unnecessary schema change and upgrade (so I doubt that
>> 	implementation would be somehow 'simpler')
>>
>> This is also wrong. Both versions change _func. But the version with extending
>> _func makes it bigger, adds new ugly SQL-specific options, and adds not existing
>> functions with fake names. In the version about functions-as-part-of-language _func
>> is cleared from these functions, and nothing more happens with the schema. So the
>> patch will be simpler for sure, even though may get bigger due to more removals.
>>
>> Besides, their removal from _func would also allow to get rid of the crutch with
>> language 'SQL_BUILTIN' in _func. It will be present in C, but won't exist in _func.
>> This is not a language. Languages are SQL and Lua. SQL_BUILTIN was a crutch from
>> the beginning, to make the functions work anyhow.
> 
> I'd like to make call mechanism for SQL built-ins and use-defined routines the same.

It is the same, and not related to what we are discussing. Functions still are going
to be stored in an internal hash, but with much more freedom of what we store
together with each function, what metadata. And with freedom to separate them if
needed any time. Any options become possible - vararg, aggregates, any types (even
a new artificial type proposed by Mergen to declare functions both for strings and
varbinaries). With _func we are very limited, because can't do language-specific
things properly, and we shouldn't.

>> ====================
>> ## Users benefit from seeing SQL-specific functions in _func? - No.
>>
>> Look at [1]. The output format is hardly readable. Not only because of the
>> new options (partially because of them), but also because 1) some functions
>> here don't really exist - LENGTH_VARBINARY, POSITION_VARBINARY, TRIM_VARBINARY,
>> SUBSTR_VARBINARY, 2) because of lots of other fields of _func, which make
>> the output super hard to read and understand what is what, even now.
>>
>> _func is a dead end if someone wants to see function definitions. For that it
>> would be better to introduce a new pretty printer, somewhere in box.func maybe.
> 
> That is strange argument (and I guess it is duplicate of one ofprevious). _func
> is a service space. It should be read with care.

As I said above twice, and in the header of this paragraph, it is a direct
answer to Nikita's and Peter's *comments about _func being useful to be looked
at*. It is actually not, this is what I am saying, and you are saying the same
again.

It is repeated, because firstly I listed the reasons why _func can't be used
for built-ins. Then I listed my answers to all what Nikita and Peter said. I
tried to use ## to separate my text into paragraphs with headers for that. But
it looks like you didn't read carefully.

>> ====================
>> ## Reuse SQL functions in Lua and other languages? - No.
>>
>> Cite from Nikita's email:
>>
>> 	Finally part of functions can turn out to be really
>> 	usefull in Lua someday such as date()/time()
>>
>> It is not a secret, that all the languages we support already have
>> date/time functions. In Lua these are 'os.date', 'os.time', 'fiber.clock',
>> 'fiber.time' and more. In C these are all the standard C functions such
>> as 'time()', 'gettimeofdat()', 'clock_gettime()', 'clock()', and more. Also
>> lots of functions for time formatting, but I don't remember their names.
>>
>> So what exactly are the SQL built-in functions so much needed in Lua
>> and C? Looking at the unreadable _func output below, I can't imagine
>> why somebody need any of these functions out of SQL.
> 
> Third time about non-readable output? Well, ok. It is unreadable. But that
> is not an argument.

Are you kidding? Is this *the only word* you noticed in this paragraph?
'Non-readable'? If yes, I don't think you need my opinion. Looks like
you already decided everything, and didn't care much to read what I wrote
here.

> That said. I see we have 3 voices against 1.

I didn't see any analysis from you. You just referred to the compilers
and mangling, only proving that these functions are very language specific,
if their usage requires to invent a compiler. Also you said _func output is
internal like if I was talking not the same, but I was talking the same.

We can get back to that when you will actually read what I am trying to say.

## References

[1] Exporting a mangled name

#### File test.cpp

extern int
func(int a)
{
	exit(a);
}

extern int
func(float a)
{
	exit((int)a);
}

extern "C" void
print_symbol();

extern int
main(void)
{
	print_symbol();
	printf("So the point about mangled names being not accessible is "
	       "false\n");
	return 0;
}

#### File test.c

extern void _Z4funcf(void);

void
print_symbol()
{
	printf("%p\n", &_Z4funcf);
}

#### Build and run

$> gcc -c test.c -o obj1.o
$> g++ -c test.cpp -o obj2.o
$> g++ obj1.o obj2.o
$> ./a.out
0x10333af40
So the point about mangled names being not accessible is false


More information about the Tarantool-discussions mailing list