From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id C6E432EC3F for ; Tue, 20 Nov 2018 07:18:47 -0500 (EST) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 34PjOuBgT_2m for ; Tue, 20 Nov 2018 07:18:47 -0500 (EST) Received: from smtp34.i.mail.ru (smtp34.i.mail.ru [94.100.177.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 58F962EC3A for ; Tue, 20 Nov 2018 07:18:47 -0500 (EST) Subject: [tarantool-patches] Re: [PATCH] functional and multikey indexes RFC References: <2c0939bd-35f9-34c7-b4f3-d8d7685e3692@tarantool.org> From: Kirill Shcherbatov Message-ID: <69f2aca8-7d0d-0b2e-1fb2-03aeef474325@tarantool.org> Date: Tue, 20 Nov 2018 15:18:43 +0300 MIME-Version: 1.0 In-Reply-To: <2c0939bd-35f9-34c7-b4f3-d8d7685e3692@tarantool.org> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org, Georgy Kirichenko # Tarantool functional index * **Status**: In progress * **Start date**: 30-10-2018 * **Authors**: Georgy Kirichenko @GeorgyKirichenko georgy@tarantool.org, Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org * **Issues**: [#1260](https://github.com/tarantool/tarantool/issues/1260) ## Summary Tarantool functional and multikey indexes allows you to create an index based on user-specified function. This interface is the most flexible tool for defining an index over a data tuple. ## Background and motivation A function-based index is an index that is created on the results of a function. The reason to create a function-based index on a column is to make complex queries to field using extractor function. To create a **functional index** you need to describe an `extractor_f(tuple)` function string, that returns a key(or a table of extracted keys if `is_multikey` flag is set) having same format that are used to insert a tuple in index. The function will be used by the kernel after creating an LUA representation via loadstring() call. The format of returned keys should be described in the form `extractor_f_format = {{type = $TYPE, is_nullable = $IS_NULLABLE, collation = $COLLATION}, ..}`. The semantics of options are similar to part's options. The `parts` table must not be specified. ### 1.0 Index creation Create an index. It is mandatory to create an index for a space before trying to insert tuples into it, or select tuples from it. **Functional index cannot be a primary key.** ``` space_object:create_index(index-name[, options]) ``` | Name | Effect | Type | Default | |--------------- |----------------------------------------------------------------------------------------- |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |---------------------------------------------- | | func_code | Functional index extractor routine | string with body:
return function(tuple)
    -- Get key by tuple
    return key
end
or
return function(tuple)
    -- Get keys by tuple
    return {key1, ..keyN}
end
when is_multikey flag is set and each key has func_format | None, required for functional index creation | | func_format | Format of functional index extractor routine keys | array of {type = $TYPE, collation = .., is_nullable = ..} where type is a scalar type | None, required for functional index creation | | is_multikey | functional index is multikey, i.e. func_code extractor routine returnes a table of keys | boolean | false | | type | type of index | string (‘HASH’ or ‘TREE’ or ‘BITSET’ or ‘RTREE’) Note: storage engine: vinyl only supports ‘TREE’ | ‘TREE’ | | id | unique identifier | number | last index’s id, +1 | | unique | index is unique | boolean | true | | if_not_exists | no error if duplicate name | boolean | false | #### 1.1 Example: ``` extractor_f = [[ return function(tuple) if not type(tuple[3]) ~= 'string' then return nil, 'Invalid field type' end local t = tuple[3]:lower():split() local filter = {} for i, v in pairs(t) do filter[v] = true end t = {} for k, _ in pairs(filter) do table.insert(t, {k}) end return t end ]] extractor_format = {{type='str', collation='unicode_ci'}} addr_low = s:create_index(‘func_idx ', {func_code = extractor_f, func_format = extractor_f_format, unique = true, is_multikey = true}) ``` ### 2.0 Index methods | Name | Use | |----------------------------- |--------------------------------------------------- | | index_object.unique | Flag, true if an index is unique | | index_object.type | Index type | | index_object.functional | Table with functional_index-specific options | | .func_code | Functional index extractor routine code string | | .func | Pointer to functional index extractor LUA routine | | index_object:pairs() | Prepare for iterating | | index_object:select() | Select one or more tuples via index | | index_object:get() | Select a tuple via index | | index_object:min() | Find the minimum value in index | | index_object:max() | Find the maximum value in index | | index_object:random() | Find a random value in index | | index_object:count() | Count tuples matching key value | | index_object:update() | Update a tuple | | index_object:delete() | Delete a tuple by key | | index_object:drop() | Drop an index | | index_object:rename() | Rename an index | | index_object:bsize() | Get count of bytes for an index | | index_object:stat() | Get statistics for an index | | index_object:compact() | Remove unused index space | | index_object:user_defined() | Any function / method that any user wants to add | | index_object:alter() | Alter an index | There few functional index-specific things that should be described: 1. `alter()` functional index where `func_code`, `func_format` or `is_multikey` property has been changed always require rebuild and may be slow 2. When `is_multikey` flag is set in index options, `select()` and `count()`, `pairs()` methods will return tuples only once, even they satisfy the multikey index search key many times. Tuples are sorted in order of first extracted keys for `pairs()` operator, `select()` output is not sorted. This may consume much memory, #### 2.1 Example ``` ... s:insert({2, 'James Bond', 'GB London Mi6'}) --- - [2, 'James Bond', 'GB London Mi6'] ... s:insert({3, 'Kirill Alekseevich', 'Russia Moscow Dolgoprudny RocketBuilders 1'}) --- - error: Duplicate key exists in unique index ... s:insert({4, 'Jack London', 'GB'}) --- - error: Duplicate key exists in unique index ... addr_low:count() --- - 2 ... addr_low:count({'moscow'}) --- - 1 ... addr_low:count({'gb'}) --- - 1 ... addr_low:select() --- - - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А'] - [2, 'James Bond', 'GB London Mi6'] ... addr_low:select({}, {limit = 1}) --- - - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А'] ... addr_low:select({}, {limit = 1, offset=1}) --- - - [2, 'James Bond', 'GB London Mi6'] ... addr_low:select({'Moscow'}) --- - - [1, 'Vasya Pupkin', 'Russia Moscow Dolgoprudny Moscow Street 9А'] ... _ = addr_low:delete({'moscow'}) --- ... addr_low:select({'MOSCOW'}) --- - [] ... ```