From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 2EEE1248C8 for ; Wed, 10 Jul 2019 07:01:19 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IDh_k3SReKsI for ; Wed, 10 Jul 2019 07:01:19 -0400 (EDT) Received: from smtp60.i.mail.ru (smtp60.i.mail.ru [217.69.128.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id BA44D1FAA9 for ; Wed, 10 Jul 2019 07:01:18 -0400 (EDT) From: Kirill Shcherbatov Subject: [tarantool-patches] [PATCH v2 08/12] sql: rfc for SQL and Lua functions Date: Wed, 10 Jul 2019 14:01:07 +0300 Message-Id: <53e0ef51750696b33729dfcb38a269f1ee159112.1562756438.git.kshcherbatov@tarantool.org> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org, korablev@tarantool.org Cc: kostja@tarantool.org, Kirill Shcherbatov Part of #4182 --- doc/rfc/4182-persistent-lua-functions.md | 214 +++++++++++++++++++++++ 1 file changed, 214 insertions(+) create mode 100644 doc/rfc/4182-persistent-lua-functions.md diff --git a/doc/rfc/4182-persistent-lua-functions.md b/doc/rfc/4182-persistent-lua-functions.md new file mode 100644 index 000000000..80e857f56 --- /dev/null +++ b/doc/rfc/4182-persistent-lua-functions.md @@ -0,0 +1,214 @@ +# Persistent functions in Tarantool + +* **Status**: In progress +* **Start date**: 10-05-2019 +* **Authors**: Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org, Vladimir Davydov @locker vdavydov.dev@gmail.com, Konstantin Osipov @kostja kostja@tarantool.org +* **Issues**: [#4182](https://github.com/tarantool/tarantool/issues/4182), [#1260](https://github.com/tarantool/tarantool/issues/1260) + +## Summary +Persistent Lua functions in Tarantool are such Lua functions that are the part of data schema and are also available after restart. + +## Background and motivation +Now Lua functions defined in Tarantool are a part of the runtime environment and must be defined again after program restart. +We need to introduce a machinery to persist them (make them a part of snapshot). This is a useful feature itself and moreover it is a dependency for functional indexes: +entire index definition must be a part of database schema. + +## Detailed design +Let's store function definition in _func and automatically load/define function on bootstrap (as well as replicate it). + +### Extend schema with some new fields +We need to extend _func:format with some new fields. +1. name: ``id``\ + type: ``unsigned``\ + Unique routine id +2. name: ``owner``\ + type: ``unsigned``\ + The object owner +3. name: ``name``\ + type: ``string``\ + collation: ``unicode_ci``\ + Routine name +4. name: ``stuid``\ + type: ``unsigned``\ + This makes Tarantool treat the function’s caller as the function’s creator. +5. name: ``language``\ + type: ``string``\ + default: ``LUA``\ + Procedure language - `LUA`, `C`, `SQL`(reserved) +6. name: ``body``\ + type: ``string``\ + default: ``''``\ + Function language-dependent body +7. name: ``routine_type``\ + type: ``string``\ + default: ``FUNCTION``\ + Type of registered object: `FUNCTION` or `PROCEDURE`; The procedure is a function that never returns result. +8. name: ``param_list``\ + type: ``map``\ + is_nullable: ``true``\ + default: ``{}``\ + An array of maps described one argument e.g. {name = `a`, type = `string`}. `{}`(undefined) by default. + This field would be usefull when static field types validation would be performed for function arguments. +9. name: ``returns``\ + type: ``string``\ + default: ``any``\ + A field_type-compatible string describing returned value type. +10. name: ``aggregate``\ + type: ``string``\ + default: ``NONE``\ + Whether this routine is SQL aggregate function: `NONE` or `GROUP`. +11. name: ``sql_data_access``\ + type: ``string``\ + default: ``NONE``\ + Returns one of the following values:\ + `NONE` = Function does not contain SQL.\ + `CONTAINS` = Function possibly contains SQL.\ + `READS` = Function possibly reads SQL data.\ + `MODIFIES` = Function possibly modifies SQL data. +12. name: ``is_deterministic``\ + type: ``boolean``\ + default: ``false``\ + Whether the routine is deterministic (can produce only one result for a given list of parameters) or not. +13. name: ``is_sandboxed``\ + type: ``boolean``\ + default: ``false``\ + Whether the Lua routine should be executed in isolated environment with limited number of available modules. This option is compatible only with Lua function. +14. name: ``is_null_call``\ + type: ``boolean``\ + default: ``true``\ + Indicates whether the routine will be called if any one of its arguments is NULL. This option is compatible only with SQL functions. +14. name: ``exports``\ + type: ``array``\ + default: ``{'Lua'}``\ + An array of `enum('LUA', 'SQL')` strings - the Tarantool's frontends where new function must be also available. +15. name: ``opts``\ + type: ``map``\ + default: ``{}``\ + Options map (reserved). +16. name: ``comment``\ + type: ``string``\ + default: ``''`` + Comment associated with the routine. +17. name: ``created``\ + type: ``string``\ + Date and time the routine was created (format `0000-00-00 00:00:00`). +18. name: ``last_altered``\ + type: ``string``\ + Date and time the routine was modified (format `0000-00-00 00:00:00`). Initially `last_altered == created`. + +The updated interface to create a functions in Tarantool is: +``` +body_string = [[function(a, b) return a + b end]] + +box.schema.func.create('funcname', + , + , + , + , + , + + , + ' +}) +``` + +### Persistent functions in Lua +Persistent Lua function object must be assembled just on insertion (in contrast with C functions are imported from .so that are loaded on demand) to export call pointer in 'persistent' table and to safely construct a sandbox, verify function body. Anyway the function internals may be invalid, indeterminate(important for functional indexes implementation); but there's nothing we can do about it. + +The function creation process is also could be unsafe. To assemble a new function object, we must evaluate an expression given by use, that could be `body = 'fiber.yield()'` instead of `body = 'function() return fiber.yield() end`. Therefore an assemble of a function object require own empty sandbox. + +#### Sandboxing +Persistent Lua functions mustn't refer to runtime environment objects like global variables or Lua modules. To protect function from doing harmful things, user may specify `is_sandboxed = true` option to initialize the function object in **unique** sandbox via `setfenv`. +The sandbox provides an access to this functions and modules: +``` +assert, error, type, unpack, pairs, ipairs, pcall, xpcall, +print, next, select, string, table, tonumber, math, utf8 +``` + +All Lua sandboxed functions are stateless. + +#### Privileges +The reworked persistent Lua functions use the same security model, that earlier. All access checks are performed on each function call, when ``stuid`` field is defined, the id if the function is a set-definer-uid one. + +### SQL Functions + +#### Background +Currently Tarantool has a ``box.internal.sql_create_function`` mechanism to make Lua functions callable from SQL statements. +``` +sql_create_function("func_name", "type", func_lua_object, + func_arg_num, is_deterministic); +``` +That internally calls +``` +int +sql_create_function_v2(sql * db, const char *zFunc, + enum field_type returns, int nArg, + int flags, + void *p, + void (*xSFunc) (sql_context *, int, + sql_value **), + void (*xStep) (sql_context *, int, + sql_value **), + void (*xFinal) (sql_context *), + void (*xDestroy) (void *)); +``` +With prepared context +``` +struct lua_sql_func_info { + int func_ref; +} func_info = {.func_ref = luaL_ref(L, LUA_REGISTRYINDEX);}; + +sql_create_function_v2(db, normalized_name, type, + func_arg_num, + is_deterministic ? SQL_DETERMINISTIC : 0, + func_info, lua_sql_call, + NULL, NULL, lua_sql_destroy); +``` + +The persistent Lua function has everything what ``sql_create_function_v2`` needs: +1. ``func_lua.base.def.name``, +2. ``func_lua.lua_ref``, +3. ``func_lua.base.def.is_deterministic`` +4. ``func_lua.base.def.param_count`` (it is the size of the ``param_list`` array) + +Therefore the internal ``box.internal.sql_create_function`` endpoint becomes redundant and **must be deleted**. + +**To control which Lua-functions are exported into SQL frontend, we would use `_func.exports` array. +Only a functions that have `'SQL'` string in that array are exported in SQL.** + +The SQL subsystem has own function FuncDef representation and corresponding hash. It is inconsistent and +it is not scallable, so it must be reworked. SQL subsystem must use Tarantools' function hash. + +#### SQL Built-ins + +SQL defines some names for builtins. They are: +``` + TRIM, TYPEOF, PRINTF, UNICODE, CHAR, HEX, VERSION, + QUOTE, REPLACE, SUBSTR, GROUP_CONCAT, JULIANDAY, DATE, + TIME, DATETIME, STRFTIME, CURRENT_TIME, CURRENT_TIMESTAMP, + CURRENT_DATE, LENGTH, POSITION, ROUND, UPPER, LOWER, + IFNULL, RANDOM, CEIL, CEILING, CHARACTER_LENGTH, + CHAR_LENGTH, FLOOR, MOD, OCTET_LENGTH, ROW_COUNT, COUNT, + LIKE, ABS, EXP, LN, POWER, SQRT, SUM, TOTAL, AVG, + RANDOMBLOB, NULLIF, ZEROBLOB, MIN, MAX, COALESCE, EVERY, + EXISTS, EXTRACT, SOME, GREATER, LESSER +``` +(the functions are currently not implemented in Tarantool but must be implemented in future) + +We must forbid such names for `box.schema.func.create` endpoint to prevent a mess in SQL code. +To solve this problem, we may set `collation = 'unicode_ci'` for ``_func.name`` field and put all built-ins in bootstrap image. + +We also reserve service SQL method names +``` + _sql_stat_get, _sql_stat_push, _sql_stat_init +``` +required to work with SQL analyze statistics. + +#### Functions access from SQL +SQL may call all functions defined with 'SQL' export in exports array. + +A new function `LUA` allows to evaluate a Lua string in SQL request. +``` +box.execute('SELECT lua(\'return box.cfg.memtx_memory\')') +``` -- 2.21.0