From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp44.i.mail.ru (smtp44.i.mail.ru [94.100.177.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 945DC469719 for ; Thu, 1 Oct 2020 17:46:36 +0300 (MSK) Date: Thu, 1 Oct 2020 17:46:31 +0300 From: Kirill Yukhin Message-ID: <20201001144631.qtyocw74secpvcmt@tarantool.org> References: <66362762-8791-bea3-745f-afc1e3eaa199@tarantool.org> <208ccd32-fb16-c2dc-3af7-2a8cb437160e@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <208ccd32-fb16-c2dc-3af7-2a8cb437160e@tarantool.org> Subject: Re: [Tarantool-discussions] SQL built-in functions position List-Id: Tarantool development process List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-discussions@dev.tarantool.org Hello, My comments/answers inlined. On 28 сен 22:07, Vladislav Shpilevoy wrote: > Hi! > > On 27.09.2020 17:18, Mergen Imeev wrote: > > Hi all. I have a question that I would like to discuss. > > > > The question is about SQL built-in functions. At the moment these functions are > > partially described in _func and partially in src/box/sql/func.c. I received two > > completely different suggestions from my reviewers on what to do with these > > functions: > > 1) Move definitions completely to _func. Remove definitions from func.c. > > 2) Move definitions completely to func.c. Remove definitions from _func. > > > > In the first case, users will be able to see the function definitions. Also, in > > the future, we may allow these functions to be called from Lua (although not > > sure if this is necessary). The main idea is 'all functions have the same > > interface'. > > > > In the second case, the implementation is simpler, and we can more easily > > implement some features, such as "virtual" functions. For users, the definition > > can only be seen in the documentation. The main idea is 'SQL built-in functions > > are part of SQL'. > > > > Which of these approaches do you think is more beneficial to us? > > I am the one who proposed the second option. I proposed it because of > several reasons. > > ================================================== > ## Reason 1 > > The implementation is ugly by design. Since SQL functions are strictly typed, > and some of them may take more than one type, we were forced to implement some > kind of function overload by mangling function names. We are not trying to > implement C++ here, it goes against the _func basic schema. We are implementing SQL built-ins here and that is what syntax dictates. Originally, wheren we were removing hard code it was intentded to make both languages interoperable. I see no reason, why _func should exclusively belong to Lua. We are free to extend _func schema. > To workaround that there were added a new ugly option: is_overloaded. Overloaded > means that there is still one function but declared twice, just to check types. This option is useless since we might implement stable mangler, which won't change a name if there're no types of args and returns are specified. > For example, LENGTH and LENGTH_VARBINARY, TRIM, TRIM_VARBINARY, and so on. That > leads to an issue, that we have not existing functions in _func. For example: > > tarantool> box.space._func.index.name:select({'LENGTH_VARBINARY'}) > --- > - - [68, 1, 'LENGTH_VARBINARY', 1, 'SQL_BUILTIN', '', 'function', ['varbinary'], 'integer', > 'none', 'none', true, false, true, ['SQL'], {'is_overloaded': true}, '', '2020-08-14 > 16:27:52', '2020-08-14 16:27:52'] > ... > > tarantool> box.execute("SELECT LENGTH_VARBINARY('abc')") > --- > - null > - Function 'LENGTH_VARBINARY' does not exist > ... > > Doesn't this look bad? That is -1 point to the argument about 'it is good to > have function visible in _func'. They are visible, but it does not mean anything. _func is a service space and it is unusual to use it that way. Just imagine, you're doing `nm` on some C++ object and trying to invoke mangled names from there. We should deal with it anyway, since user might create she's own function and there'll be no way to hardcode it. > ==================== > ## Reason 2 > > SQL has vararg functions - the ones which take unknown number of arguments. That > led to addition of one another ugly option: has_vararg. When it is true, all > arguments after last declared argument are forced to have the same type as the > last argument. This looks wrong from all possible sides. That is the way all compilers work. They add hidden attribute to function declaration which manifests it has va_arg in it. > Firstly, if a function has variable argument count, why the hell should they all > be of the same type? It is a silly restriction motivated entirely by SQL > specifics like SUM() taking all arguments of the same type, or LEAST(), or GREATEST() > and so on. That is very-SQL thing. I can't imagine when such a restriction may > be needed in other supported languages. They won't. va_arg must be typeless. Type of actual values must be checked inside given routine. Just look inside printf(). > Secondly, like with the option 'is_overloaded' it complicates output of _func and > makes it harder for a user to create a function, when he sees so many irrelevant > options, needed for SQL only. Just take a look at the _func output in the end of > this email to see how bad is it [1]. Complication of the output of select() from service space is not an argument at all. > Thirdly, it simply does not work for functions which are really vararg, and not > some kind of aggregates, ironically. For example, PRINTF(). To workaround the > workaround-option has_vararg such functions simply turn off their type checks by > making the last argument SCALAR. Well, this looks really bad. It would become a > bit simpler if we would split that option in two - has_vararg to make an > equivalent of ... in C with values of any type, and is_aggregate to check all > values have the same type. But still ugly. See first comment in this chapter. > ==================== > ## Reason 3 > > SQL built-in functions are a part of the language. When we store them separately, > we risk to get SQL implementation different from the built-in SQL functions schema. > For example, we will add a new built-in function, and a user will upgrade, but > won't be able to use it until he upgrades the schema, and we will need to support > that in SQL implementation - that some built-in functions actually may not exist. I think this is a good idea to upgrade before use _new_ functionality. > ==================== > ## Reason 4 > > Some of the functions are supposed to be used for aggregated values, and this makes > their implementation not reusable in other languages. That in turn makes their presence > in the common _func space, used for all functions, irrelevant. I am talking about > SUM(), COUNT(), AVG(), probably there are more. For now, error should be emitted, but in future nothing blocks us from enabling such aggregates, e.g. for arrays. > ================================================================================ > Now talking of the points I received about how good would it be to have these > functions in _func. > > ==================== > ## Built-in functions require privileges? - No. > > Cite from Peter's email: > > The current built-in functions are harmless, except perhaps for RANDOMBLOB with a huge value. > However, in future maybe there will be built-in functions that should require privileges. > In that case, I believe, they will have to be in _func (and someday in _vfunc) > so that grant() will work for them. > > Cite from Nikita's email: > > allows to grant and verify priveleges in the same way and so forth > > What are examples of such built-in functions? Built-in functions are a part > of the language. Restricting access to SQL built-ins is like restricting access > to 'next()', 'string.upper()', 'os.time()' and other functions in Lua. Or > restricting usage of 'operator+' in C++. They are a part of the language. If > something needs restrictions, it does not seem to be a part of the language. > If a function is sandboxed, so it does not affect other users, I don't see why > would it need restrictions. All built-ins are sandboxed. > > ==================== > ## Built-in functions prevent duplicates in _func? - No. > > Cite from Nikita's email: > > It makes name collisions check simple > > This is a matter of a few code lines to add that check to _func triggers > to check new function name in all languages, if necessary. Or we could just make > the functions hash do the checking. So we won't even need to involve SQL-specific > code. > > That is by the way questionable. I would think more if we want to forbid > definition of functions clashing with built-ins. That would be a 'legal' overload, > if we would look into _func first, and then into built-ins, but I don't really > know about that. Didn't think of it properly yet. > > ==================== > ## Storage in _func does not change _func schema and documentation? - No. > > Cite from Peter's email: > > I did not document in the manual's SQL section that built-in functions will > be in _func, so removing them is not a regression from documented behaviour. > > It is good that their presence in _func is not documented. Because even if > it would, and we would go for _func extension, we would change _func schema > and options anyway, because of the new options 'is_overloaded' and 'has_vararg', > and new fake functions with _VARBINARY suffix. > > Cite from Nikita's email: > > Built-ins are already declaraed in _func, so reverting this thing would > result in another one unnecessary schema change and upgrade (so I doubt that > implementation would be somehow 'simpler') > > This is also wrong. Both versions change _func. But the version with extending > _func makes it bigger, adds new ugly SQL-specific options, and adds not existing > functions with fake names. In the version about functions-as-part-of-language _func > is cleared from these functions, and nothing more happens with the schema. So the > patch will be simpler for sure, even though may get bigger due to more removals. > > Besides, their removal from _func would also allow to get rid of the crutch with > language 'SQL_BUILTIN' in _func. It will be present in C, but won't exist in _func. > This is not a language. Languages are SQL and Lua. SQL_BUILTIN was a crutch from > the beginning, to make the functions work anyhow. I'd like to make call mechanism for SQL built-ins and use-defined routines the same. > ==================== > ## Users benefit from seeing SQL-specific functions in _func? - No. > > Look at [1]. The output format is hardly readable. Not only because of the > new options (partially because of them), but also because 1) some functions > here don't really exist - LENGTH_VARBINARY, POSITION_VARBINARY, TRIM_VARBINARY, > SUBSTR_VARBINARY, 2) because of lots of other fields of _func, which make > the output super hard to read and understand what is what, even now. > > _func is a dead end if someone wants to see function definitions. For that it > would be better to introduce a new pretty printer, somewhere in box.func maybe. That is strange argument (and I guess it is duplicate of one ofprevious). _func is a service space. It should be read with care. > ==================== > ## Storage in _func unifies functions? - _func output is unified, yes. Other things - no. > > That is the only unification. Code still is going to separate them. By > introduction of the new _func options you just move the complexity from a > private place related to SQL only to the public space, related to the whole > _func and other languages. We should end up with the same mechanisms. > ==================== > ## Reuse SQL functions in Lua and other languages? - No. > > Cite from Nikita's email: > > Finally part of functions can turn out to be really > usefull in Lua someday such as date()/time() > > It is not a secret, that all the languages we support already have > date/time functions. In Lua these are 'os.date', 'os.time', 'fiber.clock', > 'fiber.time' and more. In C these are all the standard C functions such > as 'time()', 'gettimeofdat()', 'clock_gettime()', 'clock()', and more. Also > lots of functions for time formatting, but I don't remember their names. > > So what exactly are the SQL built-in functions so much needed in Lua > and C? Looking at the unreadable _func output below, I can't imagine > why somebody need any of these functions out of SQL. Third time about non-readable output? Well, ok. It is unreadable. But that is not an argument. That said. I see we have 3 voices against 1. -- Regards, Kirill Yukhin