From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounce@freelists.org>
Received: from localhost (localhost [127.0.0.1])
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id C6E432EC3F
	for <tarantool-patches@freelists.org>; Tue, 20 Nov 2018 07:18:47 -0500 (EST)
Received: from turing.freelists.org ([127.0.0.1])
	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 34PjOuBgT_2m for <tarantool-patches@freelists.org>;
	Tue, 20 Nov 2018 07:18:47 -0500 (EST)
Received: from smtp34.i.mail.ru (smtp34.i.mail.ru [94.100.177.94])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 58F962EC3A
	for <tarantool-patches@freelists.org>; Tue, 20 Nov 2018 07:18:47 -0500 (EST)
Subject: [tarantool-patches] Re: [PATCH] functional and multikey indexes
 RFC
References: <cover.1542460247.git.kshcherbatov@tarantool.org>
 <2c0939bd-35f9-34c7-b4f3-d8d7685e3692@tarantool.org>
From: Kirill Shcherbatov <kshcherbatov@tarantool.org>
Message-ID: <69f2aca8-7d0d-0b2e-1fb2-03aeef474325@tarantool.org>
Date: Tue, 20 Nov 2018 15:18:43 +0300
MIME-Version: 1.0
In-Reply-To: <2c0939bd-35f9-34c7-b4f3-d8d7685e3692@tarantool.org>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Sender: tarantool-patches-bounce@freelists.org
Errors-to: tarantool-patches-bounce@freelists.org
Reply-To: tarantool-patches@freelists.org
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <tarantool-patches-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: tarantool-patches <tarantool-patches.freelists.org>
List-subscribe: <tarantool-patches-request@freelists.org?Subject=subscribe>
List-owner: <mailto:>
List-post: <mailto:tarantool-patches@freelists.org>
List-archive: <http://www.freelists.org/archives/tarantool-patches>
To: tarantool-patches@freelists.org, Georgy Kirichenko <georgy@tarantool.org>

# Tarantool functional index

* **Status**: In progress
* **Start date**: 30-10-2018
* **Authors**: Georgy Kirichenko @GeorgyKirichenko georgy@tarantool.org, Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org
* **Issues**: [#1260](https://github.com/tarantool/tarantool/issues/1260)

## Summary
Tarantool functional and multikey indexes allows you to create an index based on user-specified function. This interface is the most flexible tool for defining an index over a data tuple.

## Background and motivation
A function-based index is an index that is created on the results of a function. The reason to create a function-based index on a column is to make complex queries to field using extractor function.

To create a **functional index** you need to describe an `extractor_f(tuple)` function string, that returns a key(or a table of extracted keys if `is_multikey` flag is set) having same format that are used to insert a tuple in index.
The function will be used by the kernel after creating an LUA representation via loadstring() call.
The format of returned keys should be described in the form `extractor_f_format = {{type = $TYPE, is_nullable = $IS_NULLABLE, collation = $COLLATION}, ..}`.

The semantics of options are similar to part's options. The `parts` table must not be specified.

### 1.0 Index creation
Create an index. It is mandatory to create an index for a space before trying to insert tuples into it, or select tuples from it. **Functional index cannot be a primary key.**
```
space_object:create_index(index-name[, options])
```


| Name          	| Effect                                                                                  	| Type                                                                                                                                                                                                                                                                                                  	| Default                                      	|
|---------------	|-----------------------------------------------------------------------------------------	|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	|----------------------------------------------	|
| func_code     	| Functional index extractor routine                                                      	| string with body:<br /> return function(tuple)<br />&nbsp;&nbsp;&nbsp;&nbsp;-- Get key by tuple<br />&nbsp;&nbsp;&nbsp;&nbsp;return key<br /> end<br />  or<br />  return function(tuple)<br />&nbsp;&nbsp;&nbsp;&nbsp;-- Get keys by tuple<br />&nbsp;&nbsp;&nbsp;&nbsp;return {key1, ..keyN}<br /> end<br /> when is_multikey flag is set and each key has func_format 	| None, required for functional index creation 	|
| func_format   	| Format of functional index extractor routine keys                                       	| array of {type = $TYPE, collation = .., is_nullable = ..} where type is a scalar type                                                                                                                                                                                                               	| None, required for functional index creation 	|
| is_multikey   	| functional index is multikey, i.e. func_code extractor routine returnes a table of keys 	| boolean                                                                                                                                                                                                                                                                                               	| false                                        	|
| type          	| type of index                                                                           	| string (‘HASH’ or ‘TREE’ or ‘BITSET’ or ‘RTREE’) Note: storage engine: vinyl only  supports ‘TREE’                                                                                                                                                                                                    	| ‘TREE’                                       	|
| id            	| unique identifier                                                                       	| number                                                                                                                                                                                                                                                                                                	| last index’s id, +1                          	|
| unique        	| index is unique                                                                         	| boolean                                                                                                                                                                                                                                                                                               	| true                                         	|
| if_not_exists 	| no error if duplicate name                                                              	| boolean                                                                                                                                                                                                                                                                                               	| false                                        	|

#### 1.1 Example:

```
extractor_f = [[
        return function(tuple)
                if not type(tuple[3]) ~= 'string' then
                        return nil, 'Invalid field type'
                end
                local t = tuple[3]:lower():split()
                local filter = {}
                for i, v in pairs(t) do
                        filter[v] = true
                end
                t = {}
                for k, _ in pairs(filter) do
                        table.insert(t, {k})
                end
                return t
        end
]]

extractor_format = {{type='str', collation='unicode_ci'}}

addr_low = s:create_index(‘func_idx ', {func_code = extractor_f,
                                        func_format = extractor_f_format,
                                        unique = true, is_multikey = true})

```

### 2.0 Index methods
| Name                        	| Use                                               	|
|-----------------------------	|---------------------------------------------------	|
| index_object.unique         	| Flag, true if an index is unique                  	|
| index_object.type           	| Index type                                        	|
| index_object.functional     	| Table with functional_index-specific options      	|
|     .func_code              	| Functional index extractor routine code string    	|
|     .func                   	| Pointer to functional index extractor LUA routine 	|
| index_object:pairs()        	| Prepare for iterating                             	|
| index_object:select()       	| Select one or more tuples via index               	|
| index_object:get()          	| Select a tuple via index                          	|
| index_object:min()          	| Find the minimum value in index                   	|
| index_object:max()          	| Find the maximum value in index                   	|
| index_object:random()       	| Find a random value in index                      	|
| index_object:count()        	| Count tuples matching key value                   	|
| index_object:update()       	| Update a tuple                                    	|
| index_object:delete()       	| Delete a tuple by key                             	|
| index_object:drop()         	| Drop an index                                     	|
| index_object:rename()       	| Rename an index                                   	|
| index_object:bsize()        	| Get count of bytes for an index                   	|
| index_object:stat()         	| Get statistics for an index                       	|
| index_object:compact()      	| Remove unused index space                         	|
| index_object:user_defined() 	| Any function / method that any user wants to add  	|
| index_object:alter()        	| Alter an index                                    	|


There few functional index-specific things that should be described:
1. `alter()` functional index where `func_code`, `func_format` or `is_multikey` property has been changed always require rebuild and may be slow
2. When `is_multikey` flag is set in index options, `select()` and `count()`, `pairs()` methods will return tuples only once, even they satisfy the multikey index search key many times. Tuples are sorted in order of first extracted keys for `pairs()` operator, `select()` output is not sorted. This may consume much memory,

#### 2.1 Example
```
...
s:insert({2, 'James Bond', 'GB London Mi6'})
---
- [2, 'James Bond', 'GB London Mi6']
...
s:insert({3, 'Kirill Alekseevich', 'Russia Moscow Dolgoprudny RocketBuilders 1'})
---
- error: Duplicate key exists in unique index
...
s:insert({4, 'Jack London', 'GB'})
---
- error: Duplicate key exists in unique index
...
addr_low:count()
---
- 2
...
addr_low:count({'moscow'})
---
- 1
...
addr_low:count({'gb'})
---
- 1
...
addr_low:select()
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
  - [2, 'James Bond', 'GB London Mi6']
...
addr_low:select({}, {limit = 1})
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
...
addr_low:select({}, {limit = 1, offset=1})
---
- - [2, 'James Bond', 'GB London Mi6']
...
addr_low:select({'Moscow'})
---
- - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А']
...
_ = addr_low:delete({'moscow'})
---
...
addr_low:select({'MOSCOW'})
---
- []
...
```