From: Kirill Shcherbatov <kshcherbatov@tarantool.org> To: tarantool-patches@freelists.org, korablev@tarantool.org Cc: kostja@tarantool.org, vdavydov.dev@gmail.com, Kirill Shcherbatov <kshcherbatov@tarantool.org> Subject: [PATCH v1 01/12] sql: rfc for SQL and Lua functions Date: Mon, 8 Jul 2019 14:26:15 +0300 [thread overview] Message-ID: <faf41b94e3bb86c7d0b65081a4276c8ca6adf89b.1562584567.git.kshcherbatov@tarantool.org> (raw) In-Reply-To: <cover.1562584567.git.kshcherbatov@tarantool.org> Part of #4182 --- doc/rfc/4182-persistent-lua-functions.md | 214 +++++++++++++++++++++++ 1 file changed, 214 insertions(+) create mode 100644 doc/rfc/4182-persistent-lua-functions.md diff --git a/doc/rfc/4182-persistent-lua-functions.md b/doc/rfc/4182-persistent-lua-functions.md new file mode 100644 index 000000000..aba837c03 --- /dev/null +++ b/doc/rfc/4182-persistent-lua-functions.md @@ -0,0 +1,214 @@ +# Persistent functions in Tarantool + +* **Status**: In progress +* **Start date**: 10-05-2019 +* **Authors**: Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org, Vladimir Davydov @locker vdavydov.dev@gmail.com, Konstantin Osipov @kostja kostja@tarantool.org +* **Issues**: [#4182](https://github.com/tarantool/tarantool/issues/4182), [#1260](https://github.com/tarantool/tarantool/issues/1260) + +## Summary +Persistent Lua functions in Tarantool are such Lua functions that are the part of data schema and are also available after restart. + +## Background and motivation +Now Lua functions defined in Tarantool are a part of the runtime environment and must be defined again after program restart. +We need to introduce a machinery to persist them (make them a part of snapshot). This is a useful feature itself and moreover it is a dependency for functional indexes: +entire index definition must be a part of database schema. + +## Detailed design +Let's store function definition in _func and automatically load/define function on bootstrap (as well as replicate it). + +### Extend schema with some new fields +We need to extend _func:format with some new fields. +1. name: ``id``\ + type: ``unsigned``\ + Unique routine id +2. name: ``owner``\ + type: ``unsigned``\ + The object owner +3. name: ``name``\ + type: ``string``\ + collation: ``unicode_ci``\ + Routine name +4. name: ``stuid``\ + type: ``unsigned``\ + This makes Tarantool treat the function’s caller as the function’s creator. +5. name: ``language``\ + type: ``string``\ + default: ``LUA``\ + Procedure language - `LUA`, `C`, `SQL`(reserved) +6. name: ``body``\ + type: ``string``\ + default: ``''``\ + Function language-dependent body +7. name: ``routine_type``\ + type: ``string``\ + default: ``FUNCTION``\ + Type of registered object: `FUNCTION` or `PROCEDURE`; The procedure is a function that never returns result. +8. name: ``param_list``\ + type: ``map``\ + is_nullable: ``true``\ + default: ``{}``\ + An array of maps described one argument e.g. {name = `a`, type = `string`}. `{}`(undefined) by default. + This field would be usefull when static field types validation would be performed for function arguments. +9. name: ``returns``\ + type: ``string``\ + default: ``any``\ + A field_type-compatible string describing returned value type. +10. name: ``aggregate``\ + type: ``string``\ + default: ``NONE``\ + Whether this routine is SQL aggregate function: `NONE` or `GROUP`. +11. name: ``sql_data_access``\ + type: ``string``\ + default: ``NONE``\ + Returns one of the following values:\ + `NONE` = Function does not contain SQL.\ + `CONTAINS` = Function possibly contains SQL.\ + `READS` = Function possibly reads SQL data.\ + `MODIFIES` = Function possibly modifies SQL data. +12. name: ``is_deterministic``\ + type: ``boolean``\ + default: ``false``\ + Whether the routine is deterministic (can produce only one result for a given list of parameters) or not. +13. name: ``is_sandboxed``\ + type: ``boolean``\ + default: ``false``\ + Whether the Lua routine should be executed in isolated environment with limited number of available modules. This option is compatible only with Lua function. +14. name: ``is_null_call``\ + type: ``boolean``\ + default: ``true``\ + Indicates whether the routine will be called if any one of its arguments is NULL. This option is compatible only with SQL functions. +14. name: ``exports``\ + type: ``array``\ + default: ``{'Lua'}``\ + An array of `enum('LUA', 'SQL')` strings - the Tarantool's frontends where new function must be also available. +15. name: ``opts``\ + type: ``map``\ + default: ``{}``\ + Options map (reserved). +16. name: ``comment``\ + type: ``string``\ + default: ``''`` + Comment associated with the routine. +17. name: ``created``\ + type: ``string``\ + Date and time the routine was created (format `0000-00-00 00:00:00`). +18. name: ``last_altered``\ + type: ``string``\ + Date and time the routine was modified (format `0000-00-00 00:00:00`). Initially `last_altered == created`. + +The updated interface to create a functions in Tarantool is: +``` +body_string = [[function(a, b) return a + b end]] + +box.schema.func.create('funcname', + <if_not_exists = boolean [false]>, + <setuid = boolean [false]>, + <language = enum('LUA', 'C') ['LUA']>, + <body = string [''], + <is_deterministic = boolean [false]>, + <is_sandboxed = boolean [false]>, + <returns = string ['any']> + <param_list = array [{}]>, + <comment = string ['']>' +}) +``` + +### Persistent functions in Lua +Persistent Lua function object must be assembled just on insertion (in contrast with C functions are imported from .so that are loaded on demand) to export call pointer in 'persistent' table and to safely construct a sandbox, verify function body. Anyway the function internals may be invalid, indeterminate(important for functional indexes implementation); but there's nothing we can do about it. + +The function creation process is also could be unsafe. To assemble a new function object, we must evaluate an expression given by use, that could be `body = 'fiber.yield()'` instead of `body = 'function() return fiber.yield() end`. Therefore an assemble of a function object require own empty sandbox. + +#### Sandboxing +Persistent Lua functions mustn't refer to runtime environment objects like global variables or Lua modules. To protect function from doing harmful things, user may specify `is_sandboxed = true` option to initialize the function object in **unique** sandbox via `setfenv`. +The sandbox provides an access to this functions and modules: +``` +assert, error, type, unpack, pairs, ipairs, pcall, xpcall, +print, next, select, string, table, tonumber, math, utf8 +``` + +All Lua sandboxed functions are stateless. + +#### Privileges +The reworked persistent Lua functions use the same security model, that earlier. All access checks are performed on each function call, when ``stuid`` field is defined, the id if the function is a set-definer-uid one. + +### SQL Functions + +#### Background +Currently Tarantool has a ``box.internal.sql_create_function`` mechanism to make Lua functions callable from SQL statements. +``` +sql_create_function("func_name", "type", func_lua_object, + func_arg_num, is_deterministic); +``` +That internally calls +``` +int +sql_create_function_v2(sql * db, const char *zFunc, + enum field_type returns, int nArg, + int flags, + void *p, + void (*xSFunc) (sql_context *, int, + sql_value **), + void (*xStep) (sql_context *, int, + sql_value **), + void (*xFinal) (sql_context *), + void (*xDestroy) (void *)); +``` +With prepared context +``` +struct lua_sql_func_info { + int func_ref; +} func_info = {.func_ref = luaL_ref(L, LUA_REGISTRYINDEX);}; + +sql_create_function_v2(db, normalized_name, type, + func_arg_num, + is_deterministic ? SQL_DETERMINISTIC : 0, + func_info, lua_sql_call, + NULL, NULL, lua_sql_destroy); +``` + +The persistent Lua function has everything what ``sql_create_function_v2`` needs: +1. ``func_lua.base.def.name``, +2. ``func_lua.lua_ref``, +3. ``func_lua.base.def.is_deterministic`` +4. ``func_lua.base.def.param_count`` (it is the size of the ``param_list`` array) + +Therefore the internal ``box.internal.sql_create_function`` endpoint becomes redundant and **must be deleted**. + +**To control which Lua-functions are exported into SQL frontend, we would use `_func.exports` array. +Only a functions that have `'SQL'` string in that array are exported in SQL.** + +The SQL subsystem has own function FuncDef representation and corresponding hash. It is inconsistent and +it is not scallable, so it must be reworked. SQL subsystem must use Tarantools' function hash. + +#### SQL Built-ins + +SQL defines some names for builtins. They are: +``` + TRIM, TYPEOF, PRINTF, UNICODE, CHAR, HEX, VERSION, + QUOTE, REPLACE, SUBSTR, GROUP_CONCAT, JULIANDAY, DATE, + TIME, DATETIME, STRFTIME, CURRENT_TIME, CURRENT_TIMESTAMP, + CURRENT_DATE, LENGTH, POSITION, ROUND, UPPER, LOWER, + IFNULL, RANDOM, CEIL, CEILING, CHARACTER_LENGTH, + CHAR_LENGTH, FLOOR, MOD, OCTET_LENGTH, ROW_COUNT, COUNT, + LIKE, ABS, EXP, LN, POWER, SQRT, SUM, TOTAL, AVG, + RANDOMBLOB, NULLIF, ZEROBLOB, MIN, MAX, COALESCE, EVERY, + EXISTS, EXTRACT, SOME +``` +(the functions are currently not implemented in Tarantool but must be implemented in future) + +We must forbid such names for `box.schema.func.create` endpoint to prevent a mess in SQL code. +To solve this problem, we may set `collation = 'unicode_ci'` for ``_func.name`` field and put all built-ins in bootstrap image. + +We also reserve service SQL method names +``` + _sql_record, _sql_stat_get, _sql_stat_push, _sql_stat_init +``` +required to work with SQL analyze statistics. + +#### Functions access from SQL +SQL may call all functions defined with 'SQL' export in exports array. + +A new function `LUA` allows to evaluate a Lua string in SQL request. +``` +box.execute('SELECT lua(\'return box.cfg.memtx_memory\')') +``` -- 2.21.0
next prev parent reply other threads:[~2019-07-08 11:26 UTC|newest] Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-07-08 11:26 [PATCH v1 00/12] sql: uniform SQL and Lua functions subsystem Kirill Shcherbatov 2019-07-08 11:26 ` Kirill Shcherbatov [this message] 2019-07-08 11:26 ` [PATCH v1 10/12] sql: remove SQL_PreferBuiltin flag Kirill Shcherbatov 2019-07-08 11:26 ` [PATCH v1 11/12] sql: move LIKE UConverter object to collation library Kirill Shcherbatov 2019-07-08 11:26 ` [PATCH v1 12/12] sql: use schema's func hash instead of FuncDef hash Kirill Shcherbatov 2019-07-08 11:26 ` [PATCH v1 02/12] sql: get rid of SOUNDEX, MATCH Kirill Shcherbatov 2019-07-08 11:26 ` [PATCH v1 03/12] sql: get rid of LIKELY, UNLIKELY and LIKEHOOD Kirill Shcherbatov 2019-07-08 11:26 ` [PATCH v1 04/12] box: introduce Lua persistent functions Kirill Shcherbatov 2019-07-08 11:26 ` [PATCH v1 05/12] sql: put analyze helpers to FuncDef cache Kirill Shcherbatov 2019-07-09 15:56 ` [tarantool-patches] " n.pettik 2019-07-08 11:26 ` [PATCH v1 06/12] sql: rework LIKE case-insensitive mode Kirill Shcherbatov 2019-07-08 11:26 ` [PATCH v1 07/12] sql: replace bool is_derived_coll marker with flag Kirill Shcherbatov 2019-07-09 16:13 ` [tarantool-patches] " n.pettik 2019-07-08 11:26 ` [PATCH v1 08/12] sql: refactor builtins signatures with port Kirill Shcherbatov 2019-07-08 11:26 ` [PATCH v1 09/12] box: use own vtab per each function object Kirill Shcherbatov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=faf41b94e3bb86c7d0b65081a4276c8ca6adf89b.1562584567.git.kshcherbatov@tarantool.org \ --to=kshcherbatov@tarantool.org \ --cc=korablev@tarantool.org \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --cc=vdavydov.dev@gmail.com \ --subject='Re: [PATCH v1 01/12] sql: rfc for SQL and Lua functions' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox