Tarantool development patches archive
 help / color / mirror / Atom feed
* [tarantool-patches] [PATCH v1 0/2] box: functional and multikey indexes
@ 2018-11-17 13:16 Kirill Shcherbatov
  2018-11-17 13:16 ` [tarantool-patches] [PATCH v1 1/2] box: refactor index termination on space drop Kirill Shcherbatov
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Kirill Shcherbatov @ 2018-11-17 13:16 UTC (permalink / raw)
  To: tarantool-patches; +Cc: georgy, Kirill Shcherbatov

http://github.com/tarantool/tarantool/tree/kshch/gh-1260-functional-index
https://github.com/tarantool/tarantool/issues/1260

Tarantool functional and multikey indexes allows you to create
an index based on user-specified function.
Feature use hidden space _i_{index_name}_{space_name} containing
tuples of structure [{extractor_format}{pk_format}] and redefines
on_replace trigger for target space to fill this space. Index
object metamethods are monkeypatched to make indirect lookups in
ispace to retrieve required tuple primary key.

Same tuples indexed by different keys may be returned multiple
(matches key math count) times. To prevent such behavior for
select and count operations builtin filters were implemented.
The :pairs operator can't work such way, so it may return
same tuples.

Kirill Shcherbatov (2):
  box: refactor index termination on space drop
  box: functional and multikey indexes

 src/box/CMakeLists.txt              |   1 +
 src/box/alter.cc                    |  61 ++++++-
 src/box/index.cc                    |   9 +
 src/box/index.h                     |   4 +
 src/box/index_def.c                 |  64 +++++++-
 src/box/index_def.h                 |  14 ++
 src/box/lua/call.c                  |  43 +++++
 src/box/lua/call.h                  |  18 ++
 src/box/lua/func_idx.lua            | 319 ++++++++++++++++++++++++++++++++++++
 src/box/lua/schema.lua              |  52 ++++--
 src/box/lua/space.cc                |  21 +++
 src/box/memtx_engine.c              |   6 +
 src/box/vinyl.c                     |   6 +
 src/lua/init.c                      |   2 +
 test/engine/functional_idx.result   | 120 ++++++++++++++
 test/engine/functional_idx.test.lua |  35 ++++
 16 files changed, 753 insertions(+), 22 deletions(-)
 create mode 100644 src/box/lua/func_idx.lua
 create mode 100644 test/engine/functional_idx.result
 create mode 100644 test/engine/functional_idx.test.lua

-- 
2.7.4

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tarantool-patches] [PATCH v1 1/2] box: refactor index termination on space drop
  2018-11-17 13:16 [tarantool-patches] [PATCH v1 0/2] box: functional and multikey indexes Kirill Shcherbatov
@ 2018-11-17 13:16 ` Kirill Shcherbatov
  2018-11-19 12:22 ` [tarantool-patches] Re: [PATCH] functional and multikey indexes RFC Kirill Shcherbatov
       [not found] ` <45d55e70c60c5c4b5806c8e7ce3330957a8dc0ce.1542460247.git.kshcherbatov@tarantool.org>
  2 siblings, 0 replies; 5+ messages in thread
From: Kirill Shcherbatov @ 2018-11-17 13:16 UTC (permalink / raw)
  To: tarantool-patches; +Cc: georgy, Kirill Shcherbatov

Drop indexes via index object metamethod call instead of direct
_index space tuple deletion. With functional indexes patch
drop metamethod would be redefined to make an extra service
operations during index deletion.

Need for #1260
---
 src/box/lua/schema.lua | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
index 8a804f0..b6693b1 100644
--- a/src/box/lua/schema.lua
+++ b/src/box/lua/schema.lua
@@ -524,7 +524,8 @@ box.schema.space.drop = function(space_id, space_name, opts)
     local keys = _vindex:select(space_id)
     for i = #keys, 1, -1 do
         local v = keys[i]
-        _index:delete{v[1], v[2]}
+        local index = box.space[space_id].index[v[2]]
+        index:drop()
     end
     for _, t in _fk_constraint.index.child_id:pairs({space_id}) do
         _fk_constraint:delete({t.name, space_id})
-- 
2.7.4

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tarantool-patches] Re: [PATCH] functional and multikey indexes RFC
  2018-11-17 13:16 [tarantool-patches] [PATCH v1 0/2] box: functional and multikey indexes Kirill Shcherbatov
  2018-11-17 13:16 ` [tarantool-patches] [PATCH v1 1/2] box: refactor index termination on space drop Kirill Shcherbatov
@ 2018-11-19 12:22 ` Kirill Shcherbatov
  2018-11-20 12:18   ` Kirill Shcherbatov
       [not found] ` <45d55e70c60c5c4b5806c8e7ce3330957a8dc0ce.1542460247.git.kshcherbatov@tarantool.org>
  2 siblings, 1 reply; 5+ messages in thread
From: Kirill Shcherbatov @ 2018-11-19 12:22 UTC (permalink / raw)
  To: tarantool-patches, Georgy Kirichenko; +Cc: Vladislav Shpilevoy, Kostya Osipov

# Tarantool functional index

* **Status**: In progress
* **Start date**: 30-10-2018
* **Authors**: Georgy Kirichenko @GeorgyKirichenko georgy@tarantool.org, Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org
* **Issues**: [#1260](https://github.com/tarantool/tarantool/issues/1260)

## Summary
Tarantool functional and multikey indexes allows you to create an index based on user-specified function. This interface is the most flexible tool for defining an index over a data tuple.

## Background and motivation
To create a **functional index** you need to describe an `extractor_f(tuple)` function string, that returns a key(or a table of extracted keys if `is_multikey` flag is set) having same format that are used to insert a tuple in index.
The function will be used by the kernel after creating an LUA representation via loadstring() call.
The format of returned keys should be described in the form `extractor_f_format = {{type = $TYPE, is_nullable = $IS_NULLABLE, collation = $COLLATION}, ..}`.

The semantics of options are similar to part's options. The `parts` table must not be specified.

### 1.0 Index creation
Create an index. It is mandatory to create an index for a space before trying to insert tuples into it, or select tuples from it. **Functional index could not be a primary key.**
```
space_object:create_index(index-name[, options])
```


| Name          	| Effect                                                                                  	| Type                                                                                                                                                                                                                                                                                                  	| Default                                      	|
|---------------	|-----------------------------------------------------------------------------------------	|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	|----------------------------------------------	|
| func_code     	| Functional index extractor routine                                                      	| string with body:<br /> return function(tuple)<br />&nbsp;&nbsp;&nbsp;&nbsp;-- Get key by tuple<br />&nbsp;&nbsp;&nbsp;&nbsp;return key<br /> end<br />  or<br />  return function(tuple)<br />&nbsp;&nbsp;&nbsp;&nbsp;-- Get keys by tuple<br />&nbsp;&nbsp;&nbsp;&nbsp;return {key1, ..keyN}<br /> end<br /> when is_multikey flag is set and each key has func_format 	| None, required for functional index creation 	|
| func_format   	| Format of functional index extractor routine keys                                       	| dictionary {type = $TYPE, collation = .., is_nullable = ..} where type is a scalar type                                                                                                                                                                                                               	| None, required for functional index creation 	|
| is_multikey   	| functional index is multikey, i.e. func_code extractor routine returnes a table of keys 	| boolean                                                                                                                                                                                                                                                                                               	| false                                        	|
| type          	| type of index                                                                           	| string (‘HASH’ or ‘TREE’ or ‘BITSET’ or ‘RTREE’) Note: storage engine: vinyl only  supports ‘TREE’                                                                                                                                                                                                    	| ‘TREE’                                       	|
| id            	| unique identifier                                                                       	| number                                                                                                                                                                                                                                                                                                	| last index’s id, +1                          	|
| unique        	| index is unique                                                                         	| boolean                                                                                                                                                                                                                                                                                               	| true                                         	|
| if_not_exists 	| no error if duplicate name                                                              	| boolean                                                                                                                                                                                                                                                                                               	| false                                        	|

#### 1.1 Example:

```
extractor_f = [[
        return function(tuple)
                if not type(tuple[3]) == 'string' then
                        return nil, 'Invalid field type'
                end
                local t = tuple[3]:lower():split()
                local filter = {}
                for i, v in pairs(t) do
                        filter[v] = true
                end
                t = {}
                for k, _ in pairs(filter) do
                        table.insert(t, {k})
                end
                return t
        end
]]

extractor_format = {{type='str', collation='unicode_ci'}}

addr_low = s:create_index(‘func_idx ', {func_code = extractor_f,
                                        func_format = extractor_f_format,
                                        unique = true, is_multikey = true})

```

### 2.0 Index methods
| Name                        	| Use                                               	|
|-----------------------------	|---------------------------------------------------	|
| index_object.unique         	| Flag, true if an index is unique                  	|
| index_object.type           	| Index type                                        	|
| index_object.functional     	| Table with functional_index-specific options      	|
|     .func_code              	| Functional index extractor routine code string    	|
|     .func                   	| Pointer to functional index extractor LUA routine 	|
| index_object:pairs()        	| Prepare for iterating                             	|
| index_object:select()       	| Select one or more tuples via index               	|
| index_object:get()          	| Select a tuple via index                          	|
| index_object:min()          	| Find the minimum value in index                   	|
| index_object:max()          	| Find the maximum value in index                   	|
| index_object:random()       	| Find a random value in index                      	|
| index_object:count()        	| Count tuples matching key value                   	|
| index_object:update()       	| Update a tuple                                    	|
| index_object:delete()       	| Delete a tuple by key                             	|
| index_object:drop()         	| Drop an index                                     	|
| index_object:rename()       	| Rename an index                                   	|
| index_object:bsize()        	| Get count of bytes for an index                   	|
| index_object:stat()         	| Get statistics for an index                       	|
| index_object:compact()      	| Remove unused index space                         	|
| index_object:user_defined() 	| Any function / method that any user wants to add  	|
| index_object:alter()        	| Alter an index                                    	|


There few functional index-specific things that should be described:
1. `alter()` functional index where `func_code`, `func_format` or `is_multikey` property has been changed always require rebuild and may be slow
2. When `is_multikey` flag is set in index options, `select()` and `count()`, `pairs()` methods will return tuples only once, even they satisfy the multikey index search key many times. This may consume much memory,

#### 2.1 Example
```
...
s:insert({2, 'James Bond', 'GB London Mi6'})
---
- [2, 'James Bond', 'GB London Mi6']
...
s:insert({3, 'Kirill Alekseevich', 'Russia Moscow Dolgoprudny RocketBuilders 1'})
---
- error: Duplicate key exists in unique index
...
s:insert({4, 'Jack London', 'GB'})
---
- error: Duplicate key exists in unique index
...
addr_low:count()
---
- 2
...
addr_low:count({'moscow'})
---
- 1
...
addr_low:count({'gb'})
---
- 1
...
addr_low:select()
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
  - [2, 'James Bond', 'GB London Mi6']
...
addr_low:select({}, {limit = 1})
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
...
addr_low:select({}, {limit = 1, offset=1})
---
- - [2, 'James Bond', 'GB London Mi6']
...
addr_low:select({'Moscow'})
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
...
_ = addr_low:delete({'moscow'})
---
...
addr_low:select({'MOSCOW'})
---
- []
...
```

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tarantool-patches] Re: [PATCH v1 2/2] box: functional and multikey indexes
       [not found] ` <45d55e70c60c5c4b5806c8e7ce3330957a8dc0ce.1542460247.git.kshcherbatov@tarantool.org>
@ 2018-11-20 12:18   ` Kirill Shcherbatov
  0 siblings, 0 replies; 5+ messages in thread
From: Kirill Shcherbatov @ 2018-11-20 12:18 UTC (permalink / raw)
  To: tarantool-patches, Georgy Kirichenko

53,6 +55,22 @@ box_lua_call(struct call_request *request, struct port *port);
 int
 box_lua_eval(struct call_request *request, struct port *port);
 
+/** Set functional index trap trigger on space. */
+void
+lua_func_idx_trigger_set(const char *space_name, const char *idx_name,
+			 int32_t *trigger_ref);
+
+/**
+ * Make and register a new LUA function object by func_code
+ * string.
+ */
+int
+lua_func_new(const char *func_code, int32_t *func_ref);
+
+/** Release registered LUA function object. */
+void
+lua_func_delete(uint32_t func_ref);
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */
diff --git a/src/box/lua/func_idx.lua b/src/box/lua/func_idx.lua
new file mode 100644
index 0000000..8ffc7a1
--- /dev/null
+++ b/src/box/lua/func_idx.lua
@@ -0,0 +1,347 @@
+-- func_idx.lua (internal file)
+--
+
+local function func_idx_space_name(space_name, idx_name)
+    return '_i_' .. idx_name .. '_' .. space_name
+end
+
+local function test_a_in_b(a, b)
+    local test = {}
+    for _, v in pairs(b) do
+            test[v] = true
+    end
+    for _, v in pairs(a) do
+            if test[v] == nil then return false end
+    end
+    return true
+end
+
+local function func_idx_keys_equal(a, b)
+    return test_a_in_b(a, b) and test_a_in_b(b, a)
+end
+
+local function func_idx_key_in_set(key, set)
+    for _, r in pairs(set) do
+        if func_idx_keys_equal(key, r) then return true end
+    end
+    return false
+end
+
+local function func_idx_select_do(index, key, iterator, func, arg)
+    local space = box.space[index.space_id]
+    local ispace_name = func_idx_space_name(space.name, index.name)
+    local ispace = box.space[ispace_name]
+    local pkey_offset = index.functional.pkey_offset
+    local unique_filter = {}
+    for _, k in ispace:pairs(key, {iterator = iterator}) do
+        local pkey = {}
+        for _, p in pairs(space.index[0].parts) do
+            table.insert(pkey, k[#pkey + pkey_offset + 1])
+        end
+        -- test tuple uniqueness for multikey index
+        if (index.is_multikey == false) or
+           (func_idx_key_in_set(pkey, unique_filter) == false) then
+            table.insert(unique_filter, pkey)
+            if func(arg, space, pkey) ~= 0 then break end
+        end
+    end
+end
+
+local function func_idx_select(index, key, opts)
+    box.internal.check_index_arg(index, 'select')
+    key = key or {}
+    local iterator, offset, limit =
+        box.internal.check_select_opts(opts, #key < index.functional.pkey_offset)
+    local res = {}
+    local curr_offset, done = 0, 0
+    func_idx_select_do(index, key, iterator,
+                       function(container, space, pkey)
+                            if done >= limit then return 1 end
+                            if curr_offset >= offset then
+                                local tuple = space:get(pkey)
+                                table.insert(container, tuple)
+                                done = done + 1
+                            end
+                            curr_offset = curr_offset + 1
+                            return 0
+                       end, res)
+    return res
+end
+
+local function func_idx_get(index, key)
+    box.internal.check_index_arg(index, 'get')
+    if not index.unique then
+        box.error(box.error.MORE_THAN_ONE_TUPLE, "")
+    end
+    ret = func_idx_select(index, key, {limit = 1})[1]
+    if ret ~= nil then
+        return ret
+    end
+end
+
+local function func_idx_bsize(index)
+    box.internal.check_index_arg(index, 'bsize')
+    local space = box.space[index.space_id]
+    local ispace = box.space[func_idx_space_name(space.name, index.name)]
+    local ispace_pk = ispace.index[0]
+    return ispace_pk:bsize()
+end
+
+local function func_idx_random(index, seed)
+    box.internal.check_index_arg(index, 'random')
+    local space = box.space[index.space_id]
+    return space.index[0]:random(seed)
+end
+
+local function func_idx_count(index, key, opts)
+    box.internal.check_index_arg(index, 'count')
+    key = key or {}
+    local iterator, offset, limit =
+        box.internal.check_select_opts(opts, #key < index.functional.pkey_offset)
+    local res = {count = 0}
+    func_idx_select_do(index, key, iterator,
+                       function(container, space, pkey)
+                            container.count = container.count + 1
+                            return 0
+                       end, res)
+    return res.count
+end
+
+local function func_idx_rename(index, name)
+    box.internal.check_index_arg(index, 'rename')
+    local space = box.space[index.space_id]
+    local ispace = box.space[func_idx_space_name(space.name, index.name)]
+    local new_ispace_name = func_idx_space_name(space.name, name)
+    ispace:rename(new_ispace_name)
+    return box.schema.index.rename(space.id, index.id, name)
+end
+
+local function func_idx_min(index, key)
+    box.internal.check_index_arg(index, 'min')
+    return func_idx_select(index, key, {limit = 1, iterator = 'EQ'})
+end
+
+local function func_idx_max(index, key)
+    box.internal.check_index_arg(index, 'max')
+    return func_idx_select(index, key, {limit = 1, iterator = 'REQ'})
+end
+
+local function func_idx_drop(index, key)
+    box.internal.check_index_arg(index, 'drop')
+    local space = box.space[index.space_id]
+    space:on_replace(nil, index.functional.trigger)
+    local ispace = box.space[func_idx_space_name(space.name, index.name)]
+    ispace:drop()
+    return box.schema.index.drop(space.id, index.id)
+end
+
+local function func_idx_delete(index, key)
+    box.internal.check_index_arg(index, 'delete')
+    if not index.unique then
+        box.error(box.error.MORE_THAN_ONE_TUPLE, "")
+    end
+    key = key or {}
+    local iterator, offset, limit = box.internal.check_select_opts({}, false)
+    local ret = {}
+    func_idx_select_do(index, key, iterator,
+                       function(container, space, pkey)
+                            local tuple = space:delete(pkey)
+                            if tuple ~= nil then
+                                table.insert(container, tuple)
+                            end
+                            return 1
+                       end, ret)
+    if ret[1] ~= nil then
+        return ret[1]
+    end
+end
+
+local function func_idx_update(index, key, ops)
+    box.internal.check_index_arg(index, 'update')
+    if not index.unique then
+        box.error(box.error.MORE_THAN_ONE_TUPLE, "")
+    end
+    local iterator, offset, limit = box.internal.check_select_opts({}, false)
+    local ret = {}
+    func_idx_select_do(index, key, iterator,
+                       function(container, space, pkey)
+                            local tuple = space:update(pkey, ops)
+                            if tuple ~= nil then
+                                table.insert(container, tuple)
+                            end
+                            return 1
+                       end, ret)
+    if ret[1] ~= nil then
+        return ret[1]
+    end
+end
+
+local function func_idx_pairs_next(param, state)
+    local state, tuple = param.gen(param.gen_param, state)
+    if state == nil then
+        param.filter_set = {}
+        return nil
+    else
+        local pkey_offset = param.index.functional.pkey_offset
+        local pkey = {}
+        for _, p in pairs(param.space.index[0].parts) do
+            table.insert(pkey, tuple[#pkey + pkey_offset + 1])
+        end
+        return state, param.space:get(pkey)
+    end
+end
+
+local function func_idx_pairs(index, key, opts)
+    box.internal.check_index_arg(index, 'pairs')
+    local space = box.space[index.space_id]
+    local ispace_name = func_idx_space_name(space.name, index.name)
+    local ispace = box.space[ispace_name]
+    key = key or {}
+    local iterator, offset, limit =
+        box.internal.check_select_opts(opts, #key < index.functional.pkey_offset)
+    local obj, param, state = ispace:pairs(key, {iterator = iterator})
+    param = {filter_set = {}, gen_param = param, gen = obj.gen,
+             index = index, space = space}
+    obj.gen = function(param, state)
+        local state, tuple = func_idx_pairs_next(param, state)
+        while state ~= nil and func_idx_key_in_set(tuple, param.filter_set) == true do
+            state, tuple = func_idx_pairs_next(param, state)
+        end
+        if state == nil then
+            param.filter_set = {}
+            return nil
+        else
+            table.insert(param.filter_set, tuple)
+            return state, tuple
+        end
+    end
+    return obj, param, state
+end
+
+local function func_idx_unique_filter(keys)
+    local ret = {}
+    for _, k in pairs(keys) do
+        if func_idx_key_in_set(k, ret) then goto continue end
+        table.insert(ret, k)
+        ::continue::
+    end
+    return ret
+end
+
+local function func_idx_fkeys(func, tuple, is_multikey)
+    local fkeys = func(tuple)
+    if is_multikey then
+        fkeys = func_idx_unique_filter(fkeys)
+    else
+        fkeys = {{fkeys}}
+    end
+    return fkeys
+end
+
+local func_idx = {}
+
+func_idx.space_trigger_set = function (space_name, idx_name)
+    local space = box.space[space_name]
+    assert(space ~= nil)
+    local ispace = box.space[func_idx_space_name(space_name, idx_name)]
+    assert(ispace ~= nil)
+    local trigger = function (old, new)
+        if box.info.status ~= 'running' then
+            return
+        end
+        local findex = space.index[idx_name]
+        local func = findex.functional.func
+        local pkey = {}
+        local tuple = old or new
+        for _, p in pairs(space.index[0].parts) do
+            table.insert(pkey, tuple[p.fieldno])
+        end
+        if old ~= nil then
+            local fkeys = func_idx_fkeys(func, old, findex.is_multikey)
+            for _, key in pairs(fkeys) do
+                if findex.unique == false then
+                    for _, v in pairs(pkey) do
+                        table.insert(key, v)
+                    end
+                end
+                ispace:delete(key)
+            end
+        end
+        if new ~= nil then
+            local fkeys = func_idx_fkeys(func, new, findex.is_multikey)
+            for _, key in pairs(fkeys) do
+                for _, v in pairs(pkey) do
+                    table.insert(key, v)
+                end
+                ispace:insert(key)
+            end
+        end
+    end
+    space:on_replace(trigger)
+    return trigger
+end
+
+func_idx.ispace_create = function (space, name, func_code, func_format,
+                                   is_unique, is_multikey)
+    local func, err = loadstring(func_code)
+    if func == nil then
+        box.error(box.error.ILLEGAL_PARAMS,
+                  "functional index extractor routine is invalid: "..err)
+    end
+    func = func()
+    local iformat = {}
+    local iparts = {}
+    for _, f in pairs(func_format) do
+        table.insert(iformat, {name = 'i' .. tostring(#iformat), type = f.type})
+        table.insert(iparts, {#iparts + 1, f.type,
+                              is_nullable = f.is_nullable,
+                              collation = f.collation})
+    end
+    local pkey_offset = #func_format
+    local pkey = box.internal.check_primary_index(space)
+    for _, p in pairs(pkey.parts) do
+        table.insert(iformat, {name = 'i' .. tostring(#iformat), type = p.type})
+        if is_unique == false then
+            table.insert(iparts, {#iparts + 1, p.type})
+        end
+    end
+    local ispace = box.schema.space.create('_i_' .. name .. '_' .. space.name,
+                                           {engine = space.engine})
+    ispace:format(iformat)
+    ispace:create_index('pk', {parts = iparts})
+
+    for _, tuple in space:pairs() do
+        local pkey = {}
+        for _, p in pairs(space.index[0].parts) do
+            table.insert(pkey, tuple[p.fieldno])
+        end
+        local fkeys = func_idx_fkeys(func, tuple, is_multikey)
+        for _, key in pairs(fkeys) do
+            for _, v in pairs(pkey) do
+                table.insert(key, v)
+            end
+            ispace:insert(key)
+        end
+    end
+end
+
+func_idx.monkeypatch = function (index)
+    local meta = getmetatable(index)
+    meta.select = func_idx_select
+    meta.get = func_idx_get
+    meta.bsize = func_idx_bsize
+    meta.random = func_idx_random
+    meta.count = func_idx_count
+    meta.rename = func_idx_rename
+    meta.min = func_idx_min
+    meta.max = func_idx_max
+    meta.drop = func_idx_drop
+    meta.delete = func_idx_delete
+    meta.update = func_idx_update
+    meta.pairs = func_idx_pairs
+    meta.alter = nil
+    meta.compact = nil
+    meta.stat = nil
+end
+
+return func_idx
diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
index b6693b1..93f7a6c 100644
--- a/src/box/lua/schema.lua
+++ b/src/box/lua/schema.lua
@@ -7,6 +7,7 @@ local fun = require('fun')
 local log = require('log')
 local fio = require('fio')
 local json = require('json')
+local func_idx = require('func_idx')
 local session = box.session
 local internal = require('box.internal')
 local function setmap(table)
@@ -281,6 +282,10 @@ local function check_param_table(table, template)
     end
 end
 
+function func_idx_trigger_set(space_name, idx_name)
+    return func_idx.space_trigger_set(space_name, idx_name)
+end
+
 --[[
  @brief Common function to check type parameter (of function)
  Calls box.error(box.error.ILLEGAL_PARAMS, ) on error
@@ -700,6 +705,9 @@ local index_options = {
     range_size = 'number',
     page_size = 'number',
     bloom_fpr = 'number',
+    is_multikey = 'boolean',
+    func_code = 'string',
+    func_format = 'table'
 }
 
 --
@@ -745,16 +753,31 @@ box.schema.index.create = function(space_id, name, options)
     }
     options_defaults = type_dependent_defaults[options.type]
             or type_dependent_defaults.other
-    if not options.parts then
-        local fieldno = options_defaults.parts[1]
-        if #format >= fieldno then
-            local t = format[fieldno].type
-            if t ~= 'any' then
-                options.parts = {{fieldno, format[fieldno].type}}
+    if not options.func_code then
+        if not options.parts then
+            local fieldno = options_defaults.parts[1]
+            if #format >= fieldno then
+                local t = format[fieldno].type
+                if t ~= 'any' then
+                    options.parts = {{fieldno, format[fieldno].type}}
+                end
             end
         end
+        options = update_param_table(options, options_defaults)
+    elseif options.parts ~= nil then
+        box.error(box.error.ILLEGAL_PARAMS,
+                  "options.parts: parts can't be set for functional index")
+    end
+    if (options.func_code == nil) ~= (options.func_format == nil) then
+         box.error(box.error.ILLEGAL_PARAMS,
+                  "options.func_: bouth of parameters func_code and "..
+                  "func_format should be specified for functional index")
+    end
+    if options.is_multikey and (options.func_code == nil) then
+        box.error(box.error.ILLEGAL_PARAMS,
+                  "options.is_multikey: can't specify is_multikey option for "..
+                  "non-functional index")
     end
-    options = update_param_table(options, options_defaults)
     if space.engine == 'vinyl' then
         options_defaults = {
             page_size = box.cfg.vinyl_page_size,
@@ -792,8 +815,11 @@ box.schema.index.create = function(space_id, name, options)
             end
         end
     end
-    local parts, parts_can_be_simplified =
-        update_index_parts(format, options.parts)
+    local parts, parts_can_be_simplified = {}, false
+    if not options.func_code then
+        parts, parts_can_be_simplified =
+            update_index_parts(format, options.parts)
+    end
     -- create_index() options contains type, parts, etc,
     -- stored separately. Remove these members from index_opts
     local index_opts = {
@@ -805,6 +831,9 @@ box.schema.index.create = function(space_id, name, options)
             run_count_per_level = options.run_count_per_level,
             run_size_ratio = options.run_size_ratio,
             bloom_fpr = options.bloom_fpr,
+            is_multikey = options.is_multikey,
+            func_code = options.func_code,
+            func_format = options.func_format,
     }
     local field_type_aliases = {
         num = 'unsigned'; -- Deprecated since 1.7.2
@@ -849,6 +878,11 @@ box.schema.index.create = function(space_id, name, options)
     if parts_can_be_simplified then
         parts = simplify_index_parts(parts)
     end
+    if options.func_code ~= nil then
+        func_idx.ispace_create(space, name, options.func_code,
+                               options.func_format, options.unique or false,
+                               options.is_multikey or false)
+    end
     _index:insert{space_id, iid, name, options.type, index_opts, parts}
     if sequence ~= nil then
         _space_sequence:insert{space_id, sequence, sequence_is_generated}
@@ -1319,6 +1353,7 @@ local function check_select_opts(opts, key_is_nil)
     end
     return iterator, offset, limit
 end
+box.internal.check_select_opts = check_select_opts
 
 base_index_mt.select_ffi = function(index, key, opts)
     check_index_arg(index, 'select')
@@ -1546,6 +1581,7 @@ local function wrap_schema_object_mt(name)
     return mt
 end
 
+
 function box.schema.space.bless(space)
     local index_mt_name
     if space.engine == 'vinyl' then
@@ -1560,6 +1596,9 @@ function box.schema.space.bless(space)
         for j, index in pairs(space.index) do
             if type(j) == 'number' then
                 setmetatable(index, wrap_schema_object_mt(index_mt_name))
+                if index.functional ~= nil then
+                    func_idx.monkeypatch(index)
+                end
             end
         end
     end
diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index 7cae436..227dd20 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -312,6 +312,30 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
 
 		lua_settable(L, -3); /* space.index[k].parts */
 
+		if (index_is_functional(index_def)) {
+			lua_pushstring(L, "functional");
+			lua_newtable(L);
+
+			lua_pushstring(L, index_def->opts.func_code);
+			lua_setfield(L, -2, "func_code");
+
+			lua_pushinteger(L, index_def->opts.pkey_offset);
+			lua_setfield(L, -2, "pkey_offset");
+
+			lua_rawgeti(L, LUA_REGISTRYINDEX, index->func_ref);
+			assert(lua_isfunction(L, -1));
+			lua_setfield(L, -2, "func");
+
+			lua_rawgeti(L, LUA_REGISTRYINDEX, index->func_trigger_ref);
+			assert(lua_isfunction(L, -1));
+			lua_setfield(L, -2, "trigger");
+
+			lua_settable(L, -3);
+		}
+
+		lua_pushboolean(L, index_opts->is_multikey);
+		lua_setfield(L, -2, "is_multikey");
+
 		lua_pushstring(L, "sequence_id");
 		if (k == 0 && space->sequence != NULL) {
 			lua_pushnumber(L, space->sequence->def->id);
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 6a91d0b..6369c02 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -1293,6 +1293,12 @@ memtx_index_def_change_requires_rebuild(struct index *index,
 	if (!old_def->opts.is_unique && new_def->opts.is_unique)
 		return true;
 
+	if (index_is_functional(old_def) != index_is_functional(new_def))
+		return true;
+	else if (index_is_functional(old_def) &&
+		 strcmp(old_def->opts.func_code, new_def->opts.func_code) != 0)
+		return true;
+
 	const struct key_def *old_cmp_def, *new_cmp_def;
 	if (index_depends_on_pk(index)) {
 		old_cmp_def = old_def->cmp_def;
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index b09b6ad..7aa370d 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -955,6 +955,12 @@ vinyl_index_def_change_requires_rebuild(struct index *index,
 	if (!old_def->opts.is_unique && new_def->opts.is_unique)
 		return true;
 
+	if (index_is_functional(old_def) != index_is_functional(new_def))
+		return true;
+	else if (index_is_functional(old_def) &&
+		 strcmp(old_def->opts.func_code, new_def->opts.func_code) != 0)
+		return true;
+
 	assert(index_depends_on_pk(index));
 	const struct key_def *old_cmp_def = old_def->cmp_def;
 	const struct key_def *new_cmp_def = new_def->cmp_def;
diff --git a/src/lua/init.c b/src/lua/init.c
index 3728a57..3d58876 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -81,6 +81,7 @@ bool start_loop = true;
 extern char strict_lua[],
 	uuid_lua[],
 	msgpackffi_lua[],
+	func_idx_lua[],
 	fun_lua[],
 	crypto_lua[],
 	digest_lua[],
@@ -120,6 +121,7 @@ extern char strict_lua[],
 static const char *lua_modules[] = {
 	/* Make it first to affect load of all other modules */
 	"strict", strict_lua,
+	"func_idx", func_idx_lua,
 	"fun", fun_lua,
 	"tarantool", init_lua,
 	"errno", errno_lua,
diff --git a/test/engine/functional_idx.result b/test/engine/functional_idx.result
new file mode 100644
index 0000000..9bd3164
--- /dev/null
+++ b/test/engine/functional_idx.result
@@ -0,0 +1,120 @@
+msgpack = require('msgpack')
+---
+...
+env = require('test_run')
+---
+...
+test_run = env.new()
+---
+...
+engine = test_run:get_cfg('engine')
+---
+...
+format = {{'id','unsigned'}, {'name','string'},{'address', 'string'}}
+---
+...
+s = box.schema.space.create('clients', {engine = engine, format=format})
+---
+...
+addr_extractor = [[return function(tuple) if type(tuple.address) ~= 'string' then return nil, 'Invalid field type' end local t = tuple.address:lower():split() for k,v in pairs(t) do t[k] = {v} end return t end]]
+---
+...
+addr_extractor_format = {{type='str', collation='unicode_ci'}}
+---
+...
+s:create_index('address', {func_code = addr_extractor, func_format = addr_extractor_format, unique=true})
+---
+- error: 'No index #0 is defined in space ''clients'''
+...
+pk = s:create_index('pk')
+---
+...
+s:insert({1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А'})
+---
+- [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
+...
+s:create_index('address', {func_code = addr_extractor, unique=true})
+---
+- error: 'Illegal parameters, options.func_: bouth of parameters func_code and func_format
+    should be specified for functional index'
+...
+s:create_index('address', {func_format = addr_extractor_format, unique=true})
+---
+- error: 'Illegal parameters, options.func_: bouth of parameters func_code and func_format
+    should be specified for functional index'
+...
+s:create_index('address', {func_code = addr_extractor..'err', func_format = addr_extractor_format, unique=true})
+---
+- error: 'Illegal parameters, functional index extractor routine is invalid: [string
+    "return function(tuple) if type(tuple.address)..."]:1: ''end'' expected near ''enderr'''
+...
+addr_low = s:create_index('address', {func_code = addr_extractor, func_format = addr_extractor_format, unique=true, is_multikey=true})
+---
+...
+assert(box.space.clients:on_replace()[1] == addr_low.functional.trigger)
+---
+- true
+...
+s:insert({2, 'James Bond', 'GB London Mi6'})
+---
+- [2, 'James Bond', 'GB London Mi6']
+...
+s:insert({3, 'Kirill Alekseevich', 'Russia Moscow Dolgoprudny RocketBuilders 1'})
+---
+- error: Duplicate key exists in unique index 'pk' in space '_i_address_clients'
+...
+s:insert({4, 'Jack London', 'GB'})
+---
+- error: Duplicate key exists in unique index 'pk' in space '_i_address_clients'
+...
+addr_low:count()
+---
+- 2
+...
+addr_low:count({'moscow'})
+---
+- 1
+...
+addr_low:count({'gb'})
+---
+- 1
+...
+addr_extractor = [[return function(tuple) if type(tuple.address) ~= 'string' then return nil, 'Invalid field type' end return tuple.address:upper():split()[1] end]]
+---
+...
+addr_high = s:create_index('address2', {func_code = addr_extractor, func_format = addr_extractor_format, unique=true, is_multikey=false})
+---
+...
+addr_high:select()
+---
+- - [2, 'James Bond', 'GB London Mi6']
+  - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
+...
+addr_high:select({}, {limit = 1})
+---
+- - [2, 'James Bond', 'GB London Mi6']
+...
+addr_high:select({}, {limit = 1, offset=1})
+---
+- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
+...
+addr_high:select({'Moscow'})
+---
+- []
+...
+for _, v in addr_low:pairs() do print(v.name) end
+---
+...
+_ = addr_low:delete({'moscow'})
+---
+...
+addr_high:select({'MOSCOW'})
+---
+- []
+...
+addr_high:drop()
+---
+...
+s:drop()
+---
+...
diff --git a/test/engine/functional_idx.test.lua b/test/engine/functional_idx.test.lua
new file mode 100644
index 0000000..b4eb5b4
--- /dev/null
+++ b/test/engine/functional_idx.test.lua
@@ -0,0 +1,35 @@
+msgpack = require('msgpack')
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
+
+format = {{'id','unsigned'}, {'name','string'},{'address', 'string'}}
+s = box.schema.space.create('clients', {engine = engine, format=format})
+addr_extractor = [[return function(tuple) if type(tuple.address) ~= 'string' then return nil, 'Invalid field type' end local t = tuple.address:lower():split() for k,v in pairs(t) do t[k] = {v} end return t end]]
+addr_extractor_format = {{type='str', collation='unicode_ci'}}
+s:create_index('address', {func_code = addr_extractor, func_format = addr_extractor_format, unique=true})
+pk = s:create_index('pk')
+s:insert({1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А'})
+s:create_index('address', {func_code = addr_extractor, unique=true})
+s:create_index('address', {func_format = addr_extractor_format, unique=true})
+s:create_index('address', {func_code = addr_extractor..'err', func_format = addr_extractor_format, unique=true})
+addr_low = s:create_index('address', {func_code = addr_extractor, func_format = addr_extractor_format, unique=true, is_multikey=true})
+assert(box.space.clients:on_replace()[1] == addr_low.functional.trigger)
+s:insert({2, 'James Bond', 'GB London Mi6'})
+s:insert({3, 'Kirill Alekseevich', 'Russia Moscow Dolgoprudny RocketBuilders 1'})
+s:insert({4, 'Jack London', 'GB'})
+addr_low:count()
+addr_low:count({'moscow'})
+addr_low:count({'gb'})
+addr_extractor = [[return function(tuple) if type(tuple.address) ~= 'string' then return nil, 'Invalid field type' end return tuple.address:upper():split()[1] end]]
+addr_high = s:create_index('address2', {func_code = addr_extractor, func_format = addr_extractor_format, unique=true, is_multikey=false})
+addr_high:select()
+addr_high:select({}, {limit = 1})
+addr_high:select({}, {limit = 1, offset=1})
+addr_high:select({'Moscow'})
+for _, v in addr_high:pairs() do print(v.name) end
+_ = addr_low:delete({'moscow'})
+addr_high:select({'MOSCOW'})
+addr_high:drop()
+
+s:drop()
-- 
2.7.4

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tarantool-patches] Re: [PATCH] functional and multikey indexes RFC
  2018-11-19 12:22 ` [tarantool-patches] Re: [PATCH] functional and multikey indexes RFC Kirill Shcherbatov
@ 2018-11-20 12:18   ` Kirill Shcherbatov
  0 siblings, 0 replies; 5+ messages in thread
From: Kirill Shcherbatov @ 2018-11-20 12:18 UTC (permalink / raw)
  To: tarantool-patches, Georgy Kirichenko

# Tarantool functional index

* **Status**: In progress
* **Start date**: 30-10-2018
* **Authors**: Georgy Kirichenko @GeorgyKirichenko georgy@tarantool.org, Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org
* **Issues**: [#1260](https://github.com/tarantool/tarantool/issues/1260)

## Summary
Tarantool functional and multikey indexes allows you to create an index based on user-specified function. This interface is the most flexible tool for defining an index over a data tuple.

## Background and motivation
A function-based index is an index that is created on the results of a function. The reason to create a function-based index on a column is to make complex queries to field using extractor function.

To create a **functional index** you need to describe an `extractor_f(tuple)` function string, that returns a key(or a table of extracted keys if `is_multikey` flag is set) having same format that are used to insert a tuple in index.
The function will be used by the kernel after creating an LUA representation via loadstring() call.
The format of returned keys should be described in the form `extractor_f_format = {{type = $TYPE, is_nullable = $IS_NULLABLE, collation = $COLLATION}, ..}`.

The semantics of options are similar to part's options. The `parts` table must not be specified.

### 1.0 Index creation
Create an index. It is mandatory to create an index for a space before trying to insert tuples into it, or select tuples from it. **Functional index cannot be a primary key.**
```
space_object:create_index(index-name[, options])
```


| Name          	| Effect                                                                                  	| Type                                                                                                                                                                                                                                                                                                  	| Default                                      	|
|---------------	|-----------------------------------------------------------------------------------------	|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	|----------------------------------------------	|
| func_code     	| Functional index extractor routine                                                      	| string with body:<br /> return function(tuple)<br />&nbsp;&nbsp;&nbsp;&nbsp;-- Get key by tuple<br />&nbsp;&nbsp;&nbsp;&nbsp;return key<br /> end<br />  or<br />  return function(tuple)<br />&nbsp;&nbsp;&nbsp;&nbsp;-- Get keys by tuple<br />&nbsp;&nbsp;&nbsp;&nbsp;return {key1, ..keyN}<br /> end<br /> when is_multikey flag is set and each key has func_format 	| None, required for functional index creation 	|
| func_format   	| Format of functional index extractor routine keys                                       	| array of {type = $TYPE, collation = .., is_nullable = ..} where type is a scalar type                                                                                                                                                                                                               	| None, required for functional index creation 	|
| is_multikey   	| functional index is multikey, i.e. func_code extractor routine returnes a table of keys 	| boolean                                                                                                                                                                                                                                                                                               	| false                                        	|
| type          	| type of index                                                                           	| string (‘HASH’ or ‘TREE’ or ‘BITSET’ or ‘RTREE’) Note: storage engine: vinyl only  supports ‘TREE’                                                                                                                                                                                                    	| ‘TREE’                                       	|
| id            	| unique identifier                                                                       	| number                                                                                                                                                                                                                                                                                                	| last index’s id, +1                          	|
| unique        	| index is unique                                                                         	| boolean                                                                                                                                                                                                                                                                                               	| true                                         	|
| if_not_exists 	| no error if duplicate name                                                              	| boolean                                                                                                                                                                                                                                                                                               	| false                                        	|

#### 1.1 Example:

```
extractor_f = [[
        return function(tuple)
                if not type(tuple[3]) ~= 'string' then
                        return nil, 'Invalid field type'
                end
                local t = tuple[3]:lower():split()
                local filter = {}
                for i, v in pairs(t) do
                        filter[v] = true
                end
                t = {}
                for k, _ in pairs(filter) do
                        table.insert(t, {k})
                end
                return t
        end
]]

extractor_format = {{type='str', collation='unicode_ci'}}

addr_low = s:create_index(‘func_idx ', {func_code = extractor_f,
                                        func_format = extractor_f_format,
                                        unique = true, is_multikey = true})

```

### 2.0 Index methods
| Name                        	| Use                                               	|
|-----------------------------	|---------------------------------------------------	|
| index_object.unique         	| Flag, true if an index is unique                  	|
| index_object.type           	| Index type                                        	|
| index_object.functional     	| Table with functional_index-specific options      	|
|     .func_code              	| Functional index extractor routine code string    	|
|     .func                   	| Pointer to functional index extractor LUA routine 	|
| index_object:pairs()        	| Prepare for iterating                             	|
| index_object:select()       	| Select one or more tuples via index               	|
| index_object:get()          	| Select a tuple via index                          	|
| index_object:min()          	| Find the minimum value in index                   	|
| index_object:max()          	| Find the maximum value in index                   	|
| index_object:random()       	| Find a random value in index                      	|
| index_object:count()        	| Count tuples matching key value                   	|
| index_object:update()       	| Update a tuple                                    	|
| index_object:delete()       	| Delete a tuple by key                             	|
| index_object:drop()         	| Drop an index                                     	|
| index_object:rename()       	| Rename an index                                   	|
| index_object:bsize()        	| Get count of bytes for an index                   	|
| index_object:stat()         	| Get statistics for an index                       	|
| index_object:compact()      	| Remove unused index space                         	|
| index_object:user_defined() 	| Any function / method that any user wants to add  	|
| index_object:alter()        	| Alter an index                                    	|


There few functional index-specific things that should be described:
1. `alter()` functional index where `func_code`, `func_format` or `is_multikey` property has been changed always require rebuild and may be slow
2. When `is_multikey` flag is set in index options, `select()` and `count()`, `pairs()` methods will return tuples only once, even they satisfy the multikey index search key many times. Tuples are sorted in order of first extracted keys for `pairs()` operator, `select()` output is not sorted. This may consume much memory,

#### 2.1 Example
```
...
s:insert({2, 'James Bond', 'GB London Mi6'})
---
- [2, 'James Bond', 'GB London Mi6']
...
s:insert({3, 'Kirill Alekseevich', 'Russia Moscow Dolgoprudny RocketBuilders 1'})
---
- error: Duplicate key exists in unique index
...
s:insert({4, 'Jack London', 'GB'})
---
- error: Duplicate key exists in unique index
...
addr_low:count()
---
- 2
...
addr_low:count({'moscow'})
---
- 1
...
addr_low:count({'gb'})
---
- 1
...
addr_low:select()
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
  - [2, 'James Bond', 'GB London Mi6']
...
addr_low:select({}, {limit = 1})
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
...
addr_low:select({}, {limit = 1, offset=1})
---
- - [2, 'James Bond', 'GB London Mi6']
...
addr_low:select({'Moscow'})
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
...
_ = addr_low:delete({'moscow'})
---
...
addr_low:select({'MOSCOW'})
---
- []
...
```

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-11-20 12:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-17 13:16 [tarantool-patches] [PATCH v1 0/2] box: functional and multikey indexes Kirill Shcherbatov
2018-11-17 13:16 ` [tarantool-patches] [PATCH v1 1/2] box: refactor index termination on space drop Kirill Shcherbatov
2018-11-19 12:22 ` [tarantool-patches] Re: [PATCH] functional and multikey indexes RFC Kirill Shcherbatov
2018-11-20 12:18   ` Kirill Shcherbatov
     [not found] ` <45d55e70c60c5c4b5806c8e7ce3330957a8dc0ce.1542460247.git.kshcherbatov@tarantool.org>
2018-11-20 12:18   ` [tarantool-patches] Re: [PATCH v1 2/2] box: functional and multikey indexes Kirill Shcherbatov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox