[Tarantool-discussions] SQL built-in functions position

Mergen Imeev imeevma at tarantool.org
Fri Oct 2 18:18:03 MSK 2020


Hi all! After today's discussion, we decided that:
1) This discussion will be held again later, after the next release.
2) Main reasons against completely moving built-in SQL functions to _func:
         a) This can cause problems with schema.
         b) This can cause problems in cluster.
3) The main reasons against removing built-in SQL functions from _func are:
         a) Behavior change.
4) To solve the main problem "functions must accept arguments in accordance with
the rules of implicit casting", we decided to fix each function so that they
work according to the specified rules. There will be no general decision at this
point.


On Thu, Oct 01, 2020 at 11:15:36PM +0200, Vladislav Shpilevoy wrote:
> >> ==================================================
> >> ## Reason 1
> >>
> >> The implementation is ugly by design. Since SQL functions are strictly typed,
> >> and some of them may take more than one type, we were forced to implement some
> >> kind of function overload by mangling function names. We are not trying to
> >> implement C++ here, it goes against the _func basic schema.
> > 
> > We are implementing SQL built-ins here and that is what syntax dictates.
> 
> I don't know what is the syntax you are talking about. We discuss whether to store
> them in _func, that is all. No talks about syntax. Function invocation syntax is
> not related to built-ins.
> 
> > Originally,
> > wheren we were removing hard code it was intentded to make both languages
> > interoperable. I see no reason, why _func should exclusively belong to Lua.
> 
> I never said it belongs to Lua. I said there are certain functions belonging to
> a language. Lua has its functions like next(), select(), os.time() etc. SQL
> has its own functions. _func is for common functions accessible from any language.
> 
> > We are free to extend _func schema.
> 
> I never said we are not allowed to do that. I said it is not a place for
> language-specific functions.
> 
> >> To workaround that there were added a new ugly option: is_overloaded. Overloaded
> >> means that there is still one function but declared twice, just to check types.
> > 
> > This option is useless since we might implement stable mangler, which won't change
> > a name if there're no types of args and returns are specified.
> 
> Mangler should be a part of the language. What if a user registers function
> 'test()' with types int, uint? Will he be forced to use test_int_uint() name?
> Who will do the mangling, unmangling? It does not look like a task to solve on
> the _func or even schema level. _func is basically an interface to give
> permissions to functions, to load C functions, and to persist their code. It
> is not some sub-language nor a compiler.
> 
> >> For example, LENGTH and LENGTH_VARBINARY, TRIM, TRIM_VARBINARY, and so on. That
> >> leads to an issue, that we have not existing functions in _func. For example:
> >>
> >> 	tarantool> box.space._func.index.name:select({'LENGTH_VARBINARY'})
> >> 	---
> >> 	- - [68, 1, 'LENGTH_VARBINARY', 1, 'SQL_BUILTIN', '', 'function', ['varbinary'], 'integer',
> >> 	    'none', 'none', true, false, true, ['SQL'], {'is_overloaded': true}, '', '2020-08-14
> >> 	      16:27:52', '2020-08-14 16:27:52']
> >> 	...
> >>
> >> 	tarantool> box.execute("SELECT LENGTH_VARBINARY('abc')")
> >> 	---
> >> 	- null
> >> 	- Function 'LENGTH_VARBINARY' does not exist
> >> 	...
> >>
> >> Doesn't this look bad? That is -1 point to the argument about 'it is good to
> >> have function visible in _func'. They are visible, but it does not mean anything.
> > 
> > _func is a service space and it is unusual to use it that way. Just imagine, you're
> > doing `nm` on some C++ object and trying to invoke mangled names from there.
> 
> I can export a mangled name and even use it. It is at least accessible. See example
> in [1]. But you are missing the point. The point was that I was said _func is useful
> to be looked at, and here I prove it is not so.
> 
> Talking of the comparison with C++ - are you ok? This is a service space to store
> permissions, function bodies, and load C functions. It is not a language or a compiler.
> 
> > We should
> > deal with it anyway, since user might create she's own function and there'll be no
> > way to hardcode it.
> 
> I couldn't parse this sentence, sorry. Please, rephrase.
> 
> >> ====================
> >> ## Reason 2
> >>
> >> SQL has vararg functions - the ones which take unknown number of arguments. That
> >> led to addition of one another ugly option: has_vararg. When it is true, all
> >> arguments after last declared argument are forced to have the same type as the
> >> last argument. This looks wrong from all possible sides.
> > 
> > That is the way all compilers work. They add hidden attribute to function declaration
> > which manifests it has va_arg in it.
> 
> It is not a compiler. I will repeat here what I said above:
> 
> 	_func is basically an interface to give permissions to functions,
> 	to load C functions, and to persist their code now. It is not some
> 	sub-language nor a compiler.
> 
> >> Secondly, like with the option 'is_overloaded' it complicates output of _func and
> >> makes it harder for a user to create a function, when he sees so many irrelevant
> >> options, needed for SQL only. Just take a look at the _func output in the end of
> >> this email to see how bad is it [1].
> > 
> > Complication of the output of select() from service space is not an argument at all.
> 
> Well, did you read the other emails? This is an answer to the arguments
> that _func is useful to be looked at. Here I prove it is not, and you say
> the same, what just proves my point. It is strange that you tell that me
> and not to those, who made that point.
> 
> >> ====================
> >> ## Reason 3
> >>
> >> SQL built-in functions are a part of the language. When we store them separately,
> >> we risk to get SQL implementation different from the built-in SQL functions schema.
> >> For example, we will add a new built-in function, and a user will upgrade, but
> >> won't be able to use it until he upgrades the schema, and we will need to support
> >> that in SQL implementation - that some built-in functions actually may not exist.
> > 
> > I think this is a good idea to upgrade before use _new_ functionality.
> 
> You are missing the point again. You have no choice upgrade or not - when you
> start a new tarantool binary, your SQL implementation is already upgraded - it is
> a part of the binary. The parser, VDBE implementation. But some built-in functions
> may not be upgraded yet, until you call box.schema.upgrade(). Your SQL implementation
> and its built-in function declarations become desynchronized from the beginning.
> 
> >> ====================
> >> ## Reason 4
> >>
> >> Some of the functions are supposed to be used for aggregated values, and this makes
> >> their implementation not reusable in other languages. That in turn makes their presence
> >> in the common _func space, used for all functions, irrelevant. I am talking about
> >> SUM(), COUNT(), AVG(), probably there are more.
> > 
> > For now, error should be emitted, but in future nothing blocks us from enabling
> > such aggregates, e.g. for arrays.
> 
> It makes no sense, Lua has own much better aggregation functions. Just look at luafun
> library. It is the best way to do aggregation things in Lua, if you are a fan of
> one-liners. Or it can be done simply in a cycle with yields and all. For all the
> built-in functions of SQL Lua has its own alternatives, also language-specific,
> and working best in this language.
> 
> >> ================================================================================
> >> Now talking of the points I received about how good would it be to have these
> >> functions in _func.
> >>
> >> ====================
> >> ## Storage in _func does not change _func schema and documentation? - No.
> >>
> >> Cite from Peter's email:
> >>
> >> 	I did not document in the manual's SQL section that built-in functions will
> >> 	be in _func, so removing them is not a regression from documented behaviour. 
> >>
> >> It is good that their presence in _func is not documented. Because even if
> >> it would, and we would go for _func extension, we would change _func schema
> >> and options anyway, because of the new options 'is_overloaded' and 'has_vararg',
> >> and new fake functions with _VARBINARY suffix.
> >>
> >> Cite from Nikita's email:
> >>
> >> 	Built-ins are already declaraed in _func, so reverting this thing would
> >> 	result in another one unnecessary schema change and upgrade (so I doubt that
> >> 	implementation would be somehow 'simpler')
> >>
> >> This is also wrong. Both versions change _func. But the version with extending
> >> _func makes it bigger, adds new ugly SQL-specific options, and adds not existing
> >> functions with fake names. In the version about functions-as-part-of-language _func
> >> is cleared from these functions, and nothing more happens with the schema. So the
> >> patch will be simpler for sure, even though may get bigger due to more removals.
> >>
> >> Besides, their removal from _func would also allow to get rid of the crutch with
> >> language 'SQL_BUILTIN' in _func. It will be present in C, but won't exist in _func.
> >> This is not a language. Languages are SQL and Lua. SQL_BUILTIN was a crutch from
> >> the beginning, to make the functions work anyhow.
> > 
> > I'd like to make call mechanism for SQL built-ins and use-defined routines the same.
> 
> It is the same, and not related to what we are discussing. Functions still are going
> to be stored in an internal hash, but with much more freedom of what we store
> together with each function, what metadata. And with freedom to separate them if
> needed any time. Any options become possible - vararg, aggregates, any types (even
> a new artificial type proposed by Mergen to declare functions both for strings and
> varbinaries). With _func we are very limited, because can't do language-specific
> things properly, and we shouldn't.
> 
> >> ====================
> >> ## Users benefit from seeing SQL-specific functions in _func? - No.
> >>
> >> Look at [1]. The output format is hardly readable. Not only because of the
> >> new options (partially because of them), but also because 1) some functions
> >> here don't really exist - LENGTH_VARBINARY, POSITION_VARBINARY, TRIM_VARBINARY,
> >> SUBSTR_VARBINARY, 2) because of lots of other fields of _func, which make
> >> the output super hard to read and understand what is what, even now.
> >>
> >> _func is a dead end if someone wants to see function definitions. For that it
> >> would be better to introduce a new pretty printer, somewhere in box.func maybe.
> > 
> > That is strange argument (and I guess it is duplicate of one ofprevious). _func
> > is a service space. It should be read with care.
> 
> As I said above twice, and in the header of this paragraph, it is a direct
> answer to Nikita's and Peter's *comments about _func being useful to be looked
> at*. It is actually not, this is what I am saying, and you are saying the same
> again.
> 
> It is repeated, because firstly I listed the reasons why _func can't be used
> for built-ins. Then I listed my answers to all what Nikita and Peter said. I
> tried to use ## to separate my text into paragraphs with headers for that. But
> it looks like you didn't read carefully.
> 
> >> ====================
> >> ## Reuse SQL functions in Lua and other languages? - No.
> >>
> >> Cite from Nikita's email:
> >>
> >> 	Finally part of functions can turn out to be really
> >> 	usefull in Lua someday such as date()/time()
> >>
> >> It is not a secret, that all the languages we support already have
> >> date/time functions. In Lua these are 'os.date', 'os.time', 'fiber.clock',
> >> 'fiber.time' and more. In C these are all the standard C functions such
> >> as 'time()', 'gettimeofdat()', 'clock_gettime()', 'clock()', and more. Also
> >> lots of functions for time formatting, but I don't remember their names.
> >>
> >> So what exactly are the SQL built-in functions so much needed in Lua
> >> and C? Looking at the unreadable _func output below, I can't imagine
> >> why somebody need any of these functions out of SQL.
> > 
> > Third time about non-readable output? Well, ok. It is unreadable. But that
> > is not an argument.
> 
> Are you kidding? Is this *the only word* you noticed in this paragraph?
> 'Non-readable'? If yes, I don't think you need my opinion. Looks like
> you already decided everything, and didn't care much to read what I wrote
> here.
> 
> > That said. I see we have 3 voices against 1.
> 
> I didn't see any analysis from you. You just referred to the compilers
> and mangling, only proving that these functions are very language specific,
> if their usage requires to invent a compiler. Also you said _func output is
> internal like if I was talking not the same, but I was talking the same.
> 
> We can get back to that when you will actually read what I am trying to say.
> 
> ## References
> 
> [1] Exporting a mangled name
> 
> #### File test.cpp
> 
> extern int
> func(int a)
> {
> 	exit(a);
> }
> 
> extern int
> func(float a)
> {
> 	exit((int)a);
> }
> 
> extern "C" void
> print_symbol();
> 
> extern int
> main(void)
> {
> 	print_symbol();
> 	printf("So the point about mangled names being not accessible is "
> 	       "false\n");
> 	return 0;
> }
> 
> #### File test.c
> 
> extern void _Z4funcf(void);
> 
> void
> print_symbol()
> {
> 	printf("%p\n", &_Z4funcf);
> }
> 
> #### Build and run
> 
> $> gcc -c test.c -o obj1.o
> $> g++ -c test.cpp -o obj2.o
> $> g++ obj1.o obj2.o
> $> ./a.out
> 0x10333af40
> So the point about mangled names being not accessible is false


More information about the Tarantool-discussions mailing list