[PATCH v1 01/12] sql: rfc for SQL and Lua functions

Kirill Shcherbatov kshcherbatov at tarantool.org
Mon Jul 8 14:26:15 MSK 2019


Part of #4182
---
 doc/rfc/4182-persistent-lua-functions.md | 214 +++++++++++++++++++++++
 1 file changed, 214 insertions(+)
 create mode 100644 doc/rfc/4182-persistent-lua-functions.md

diff --git a/doc/rfc/4182-persistent-lua-functions.md b/doc/rfc/4182-persistent-lua-functions.md
new file mode 100644
index 000000000..aba837c03
--- /dev/null
+++ b/doc/rfc/4182-persistent-lua-functions.md
@@ -0,0 +1,214 @@
+# Persistent functions in Tarantool
+
+* **Status**: In progress
+* **Start date**: 10-05-2019
+* **Authors**: Kirill Shcherbatov @kshcherbatov kshcherbatov at tarantool.org, Vladimir Davydov @locker vdavydov.dev at gmail.com, Konstantin Osipov @kostja kostja at tarantool.org
+* **Issues**: [#4182](https://github.com/tarantool/tarantool/issues/4182), [#1260](https://github.com/tarantool/tarantool/issues/1260)
+
+## Summary
+Persistent Lua functions in Tarantool are such Lua functions that are the part of data schema and are also available after restart.
+
+## Background and motivation
+Now Lua functions defined in Tarantool are a part of the runtime environment and must be defined again after program restart.
+We need to introduce a machinery to persist them (make them a part of snapshot). This is a useful feature itself and moreover it is a dependency for functional indexes:
+entire index definition must be a part of database schema.
+
+## Detailed design
+Let's store function definition in _func and automatically load/define function on bootstrap (as well as replicate it).
+
+### Extend schema with some new fields
+We need to extend _func:format with some new fields.
+1. name: ``id``\
+   type: ``unsigned``\
+   Unique routine id
+2. name: ``owner``\
+   type: ``unsigned``\
+   The object owner
+3. name: ``name``\
+   type: ``string``\
+   collation: ``unicode_ci``\
+   Routine name
+4. name: ``stuid``\
+   type: ``unsigned``\
+   This makes Tarantool treat the function’s caller as the function’s creator.
+5. name: ``language``\
+   type: ``string``\
+   default: ``LUA``\
+   Procedure language -  `LUA`, `C`, `SQL`(reserved)
+6. name: ``body``\
+   type: ``string``\
+   default: ``''``\
+   Function language-dependent body
+7. name: ``routine_type``\
+   type: ``string``\
+   default: ``FUNCTION``\
+   Type of registered object: `FUNCTION` or `PROCEDURE`; The procedure is a function that never returns result.
+8. name: ``param_list``\
+   type: ``map``\
+   is_nullable: ``true``\
+   default: ``{}``\
+   An array of maps described one argument e.g. {name = `a`, type = `string`}. `{}`(undefined) by default.
+   This field would be usefull when static field types validation would be performed for function arguments.
+9. name: ``returns``\
+   type: ``string``\
+   default: ``any``\
+   A field_type-compatible string describing returned value type.
+10. name: ``aggregate``\
+    type: ``string``\
+    default: ``NONE``\
+    Whether this routine is SQL aggregate function: `NONE` or `GROUP`.
+11. name: ``sql_data_access``\
+    type: ``string``\
+    default: ``NONE``\
+    Returns one of the following values:\
+    `NONE` = Function does not contain SQL.\
+    `CONTAINS` = Function possibly contains SQL.\
+    `READS` = Function possibly reads SQL data.\
+    `MODIFIES` = Function possibly modifies SQL data.
+12. name: ``is_deterministic``\
+    type: ``boolean``\
+    default: ``false``\
+    Whether the routine is deterministic (can produce only one result for a given list of parameters) or not.
+13. name: ``is_sandboxed``\
+    type: ``boolean``\
+    default: ``false``\
+    Whether the Lua routine should be executed in isolated environment with limited number of available modules. This option is compatible only with Lua function.
+14. name: ``is_null_call``\
+    type: ``boolean``\
+    default: ``true``\
+    Indicates whether the routine will be called if any one of its arguments is NULL. This option is compatible only with SQL functions.
+14. name: ``exports``\
+    type: ``array``\
+    default: ``{'Lua'}``\
+    An array of `enum('LUA', 'SQL')` strings - the Tarantool's frontends where new function must be also available.
+15. name: ``opts``\
+    type: ``map``\
+    default: ``{}``\
+    Options map (reserved).
+16. name: ``comment``\
+    type: ``string``\
+    default: ``''``
+    Comment associated with the routine.
+17. name: ``created``\
+    type: ``string``\
+    Date and time the routine was created (format `0000-00-00 00:00:00`).
+18. name: ``last_altered``\
+    type: ``string``\
+    Date and time the routine was modified (format `0000-00-00 00:00:00`). Initially `last_altered == created`.
+
+The updated interface to create a functions in Tarantool is:
+```
+body_string = [[function(a, b) return a + b end]]
+
+box.schema.func.create('funcname',
+	<if_not_exists = boolean [false]>,
+        <setuid = boolean [false]>,
+        <language = enum('LUA', 'C') ['LUA']>,
+	<body = string [''],
+        <is_deterministic = boolean [false]>,
+	<is_sandboxed = boolean [false]>,
+        <returns = string ['any']>
+        <param_list = array [{}]>,
+        <comment = string ['']>'
+})
+```
+
+### Persistent functions in Lua
+Persistent Lua function object must be assembled just on insertion (in contrast with C functions are imported from .so that are loaded on demand) to export call pointer in 'persistent' table and to safely construct a sandbox, verify function body. Anyway the function internals may be invalid, indeterminate(important for functional indexes implementation); but there's nothing we can do about it.
+
+The function creation process is also could be unsafe. To assemble a new function object, we must evaluate an expression given by use, that could be `body = 'fiber.yield()'` instead of `body = 'function() return fiber.yield() end`. Therefore an assemble of a function object require own empty sandbox.
+
+#### Sandboxing
+Persistent Lua functions mustn't refer to runtime environment objects like global variables or Lua modules. To protect function from doing harmful things, user may specify `is_sandboxed = true` option to initialize the function object in **unique** sandbox via `setfenv`.
+The sandbox provides an access to this functions and modules:
+```
+assert, error, type, unpack, pairs, ipairs, pcall, xpcall,
+print, next, select, string, table, tonumber, math, utf8
+```
+
+All Lua sandboxed functions are stateless.
+
+#### Privileges
+The reworked persistent Lua functions use the same security model, that earlier. All access checks are performed on each function call, when ``stuid`` field is defined, the id if the function is a set-definer-uid one.
+
+### SQL Functions
+
+#### Background
+Currently Tarantool has a ``box.internal.sql_create_function`` mechanism  to make Lua functions callable from SQL statements.
+```
+sql_create_function("func_name", "type", func_lua_object,
+                    func_arg_num, is_deterministic);
+```
+That internally calls
+```
+int
+sql_create_function_v2(sql * db, const char *zFunc,
+			enum field_type returns, int nArg,
+			int flags,
+			void *p,
+			void (*xSFunc) (sql_context *, int,
+					sql_value **),
+			void (*xStep) (sql_context *, int,
+					  sql_value **),
+			void (*xFinal) (sql_context *),
+			void (*xDestroy) (void *));
+```
+With prepared context
+```
+struct lua_sql_func_info {
+	int func_ref;
+} func_info = {.func_ref = luaL_ref(L, LUA_REGISTRYINDEX);};
+
+sql_create_function_v2(db, normalized_name, type,
+                       func_arg_num,
+                       is_deterministic ? SQL_DETERMINISTIC : 0,
+                       func_info, lua_sql_call,
+                       NULL, NULL, lua_sql_destroy);
+```
+
+The persistent Lua function has everything what ``sql_create_function_v2`` needs:
+1. ``func_lua.base.def.name``,
+2. ``func_lua.lua_ref``,
+3. ``func_lua.base.def.is_deterministic``
+4. ``func_lua.base.def.param_count`` (it is the size of the ``param_list`` array)
+
+Therefore the internal ``box.internal.sql_create_function`` endpoint becomes redundant and **must be deleted**.
+
+**To control which Lua-functions are exported into SQL frontend, we would use `_func.exports` array.
+Only a functions that have `'SQL'` string in that array are exported in SQL.**
+
+The SQL subsystem has own function FuncDef representation and corresponding hash. It is inconsistent and
+it is not scallable, so it must be reworked. SQL subsystem must use Tarantools' function hash.
+
+#### SQL Built-ins
+
+SQL defines some names for builtins. They are:
+```
+   TRIM, TYPEOF, PRINTF, UNICODE, CHAR, HEX, VERSION,
+   QUOTE, REPLACE, SUBSTR, GROUP_CONCAT, JULIANDAY, DATE,
+   TIME, DATETIME, STRFTIME, CURRENT_TIME, CURRENT_TIMESTAMP,
+   CURRENT_DATE, LENGTH, POSITION, ROUND, UPPER, LOWER,
+   IFNULL, RANDOM, CEIL, CEILING, CHARACTER_LENGTH,
+   CHAR_LENGTH, FLOOR, MOD, OCTET_LENGTH, ROW_COUNT, COUNT,
+   LIKE, ABS, EXP, LN, POWER, SQRT, SUM, TOTAL, AVG,
+   RANDOMBLOB, NULLIF, ZEROBLOB, MIN, MAX, COALESCE, EVERY,
+   EXISTS, EXTRACT, SOME
+```
+(the functions are currently not implemented in Tarantool but must be implemented in future)
+
+We must forbid such names for `box.schema.func.create` endpoint to prevent a mess in SQL code.
+To solve this problem, we may set `collation = 'unicode_ci'` for ``_func.name`` field and put all built-ins in bootstrap image.
+
+We also reserve service SQL method names
+```
+ _sql_record, _sql_stat_get, _sql_stat_push, _sql_stat_init
+```
+required to work with SQL analyze statistics.
+
+#### Functions access from SQL
+SQL may call all functions defined with 'SQL' export in exports array.
+
+A new function `LUA` allows to evaluate a Lua string in SQL request.
+```
+box.execute('SELECT lua(\'return box.cfg.memtx_memory\')')
+```
-- 
2.21.0




More information about the Tarantool-patches mailing list