From: Kirill Shcherbatov <kshcherbatov@tarantool.org>
To: tarantool-patches@freelists.org,
Georgy Kirichenko <georgy@tarantool.org>
Subject: [tarantool-patches] Re: [PATCH] functional and multikey indexes RFC
Date: Tue, 20 Nov 2018 15:18:43 +0300 [thread overview]
Message-ID: <69f2aca8-7d0d-0b2e-1fb2-03aeef474325@tarantool.org> (raw)
In-Reply-To: <2c0939bd-35f9-34c7-b4f3-d8d7685e3692@tarantool.org>
# Tarantool functional index
* **Status**: In progress
* **Start date**: 30-10-2018
* **Authors**: Georgy Kirichenko @GeorgyKirichenko georgy@tarantool.org, Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org
* **Issues**: [#1260](https://github.com/tarantool/tarantool/issues/1260)
## Summary
Tarantool functional and multikey indexes allows you to create an index based on user-specified function. This interface is the most flexible tool for defining an index over a data tuple.
## Background and motivation
A function-based index is an index that is created on the results of a function. The reason to create a function-based index on a column is to make complex queries to field using extractor function.
To create a **functional index** you need to describe an `extractor_f(tuple)` function string, that returns a key(or a table of extracted keys if `is_multikey` flag is set) having same format that are used to insert a tuple in index.
The function will be used by the kernel after creating an LUA representation via loadstring() call.
The format of returned keys should be described in the form `extractor_f_format = {{type = $TYPE, is_nullable = $IS_NULLABLE, collation = $COLLATION}, ..}`.
The semantics of options are similar to part's options. The `parts` table must not be specified.
### 1.0 Index creation
Create an index. It is mandatory to create an index for a space before trying to insert tuples into it, or select tuples from it. **Functional index cannot be a primary key.**
```
space_object:create_index(index-name[, options])
```
| Name | Effect | Type | Default |
|--------------- |----------------------------------------------------------------------------------------- |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |---------------------------------------------- |
| func_code | Functional index extractor routine | string with body:<br /> return function(tuple)<br /> -- Get key by tuple<br /> return key<br /> end<br /> or<br /> return function(tuple)<br /> -- Get keys by tuple<br /> return {key1, ..keyN}<br /> end<br /> when is_multikey flag is set and each key has func_format | None, required for functional index creation |
| func_format | Format of functional index extractor routine keys | array of {type = $TYPE, collation = .., is_nullable = ..} where type is a scalar type | None, required for functional index creation |
| is_multikey | functional index is multikey, i.e. func_code extractor routine returnes a table of keys | boolean | false |
| type | type of index | string (‘HASH’ or ‘TREE’ or ‘BITSET’ or ‘RTREE’) Note: storage engine: vinyl only supports ‘TREE’ | ‘TREE’ |
| id | unique identifier | number | last index’s id, +1 |
| unique | index is unique | boolean | true |
| if_not_exists | no error if duplicate name | boolean | false |
#### 1.1 Example:
```
extractor_f = [[
return function(tuple)
if not type(tuple[3]) ~= 'string' then
return nil, 'Invalid field type'
end
local t = tuple[3]:lower():split()
local filter = {}
for i, v in pairs(t) do
filter[v] = true
end
t = {}
for k, _ in pairs(filter) do
table.insert(t, {k})
end
return t
end
]]
extractor_format = {{type='str', collation='unicode_ci'}}
addr_low = s:create_index(‘func_idx ', {func_code = extractor_f,
func_format = extractor_f_format,
unique = true, is_multikey = true})
```
### 2.0 Index methods
| Name | Use |
|----------------------------- |--------------------------------------------------- |
| index_object.unique | Flag, true if an index is unique |
| index_object.type | Index type |
| index_object.functional | Table with functional_index-specific options |
| .func_code | Functional index extractor routine code string |
| .func | Pointer to functional index extractor LUA routine |
| index_object:pairs() | Prepare for iterating |
| index_object:select() | Select one or more tuples via index |
| index_object:get() | Select a tuple via index |
| index_object:min() | Find the minimum value in index |
| index_object:max() | Find the maximum value in index |
| index_object:random() | Find a random value in index |
| index_object:count() | Count tuples matching key value |
| index_object:update() | Update a tuple |
| index_object:delete() | Delete a tuple by key |
| index_object:drop() | Drop an index |
| index_object:rename() | Rename an index |
| index_object:bsize() | Get count of bytes for an index |
| index_object:stat() | Get statistics for an index |
| index_object:compact() | Remove unused index space |
| index_object:user_defined() | Any function / method that any user wants to add |
| index_object:alter() | Alter an index |
There few functional index-specific things that should be described:
1. `alter()` functional index where `func_code`, `func_format` or `is_multikey` property has been changed always require rebuild and may be slow
2. When `is_multikey` flag is set in index options, `select()` and `count()`, `pairs()` methods will return tuples only once, even they satisfy the multikey index search key many times. Tuples are sorted in order of first extracted keys for `pairs()` operator, `select()` output is not sorted. This may consume much memory,
#### 2.1 Example
```
...
s:insert({2, 'James Bond', 'GB London Mi6'})
---
- [2, 'James Bond', 'GB London Mi6']
...
s:insert({3, 'Kirill Alekseevich', 'Russia Moscow Dolgoprudny RocketBuilders 1'})
---
- error: Duplicate key exists in unique index
...
s:insert({4, 'Jack London', 'GB'})
---
- error: Duplicate key exists in unique index
...
addr_low:count()
---
- 2
...
addr_low:count({'moscow'})
---
- 1
...
addr_low:count({'gb'})
---
- 1
...
addr_low:select()
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
- [2, 'James Bond', 'GB London Mi6']
...
addr_low:select({}, {limit = 1})
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
...
addr_low:select({}, {limit = 1, offset=1})
---
- - [2, 'James Bond', 'GB London Mi6']
...
addr_low:select({'Moscow'})
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
...
_ = addr_low:delete({'moscow'})
---
...
addr_low:select({'MOSCOW'})
---
- []
...
```
next prev parent reply other threads:[~2018-11-20 12:18 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-17 13:16 [tarantool-patches] [PATCH v1 0/2] box: functional and multikey indexes Kirill Shcherbatov
2018-11-17 13:16 ` [tarantool-patches] [PATCH v1 1/2] box: refactor index termination on space drop Kirill Shcherbatov
2018-11-19 12:22 ` [tarantool-patches] Re: [PATCH] functional and multikey indexes RFC Kirill Shcherbatov
2018-11-20 12:18 ` Kirill Shcherbatov [this message]
[not found] ` <45d55e70c60c5c4b5806c8e7ce3330957a8dc0ce.1542460247.git.kshcherbatov@tarantool.org>
2018-11-20 12:18 ` [tarantool-patches] Re: [PATCH v1 2/2] box: functional and multikey indexes Kirill Shcherbatov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69f2aca8-7d0d-0b2e-1fb2-03aeef474325@tarantool.org \
--to=kshcherbatov@tarantool.org \
--cc=georgy@tarantool.org \
--cc=tarantool-patches@freelists.org \
--subject='[tarantool-patches] Re: [PATCH] functional and multikey indexes RFC' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox