[tarantool-patches] [PATCH v1 1/5] rfc: describe a Tarantool JSON indexes

Mon Aug 6 15:26:58 MSK 2018

Part of #1012.
---
 doc/rfc/1012-json-indexes.md | 188 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 188 insertions(+)
 create mode 100644 doc/rfc/1012-json-indexes.md

diff --git a/doc/rfc/1012-json-indexes.md b/doc/rfc/1012-json-indexes.md
new file mode 100644
index 0000000..a603f9a
--- /dev/null
+++ b/doc/rfc/1012-json-indexes.md
@@ -0,0 +1,188 @@
+# Tarantool JSON Indexes
+
+* **Status**: In progress
+* **Start date**: 26-07-2018
+* **Authors**: Vladimir Davydov @locker vdavydov.dev at gmail.com, Vladislav Shpilevoy @Gerold103 v.shpilevoy at tarantool.org, Kirill Shcherbatov @kshcherbatov kshcherbatov at tarantool.org
+* **Issues**: [#1012](https://github.com/tarantool/tarantool/issues/1012)
+
+## Summary
+Tarantool *JSON Indexes* is a feature that allows to index space's tuples that have consistent document-type complex structure.
+
+
+## Background and motivation
+The box.index submodule provides read-only access for index definitions and index keys. Indexes are contained in box.space.space-name.index array within each space object. They provide an API for ordered iteration over tuples. This API is a direct binding to corresponding methods of index objects of type box.index in the storage engine.
+```
+s:create_index('primary', {parts = {1, 'unsigned'}, {2, 'str'}})
+```
+
+Sometimes tuple space data have complex but consistent document structure.
+```
+s:insert{1, {town = 'NY', name = {first = 'Richard', second = 'Feynman'}}}
+s:insert{2, {town = 'NY', name = {first = 'Ayn', second = 'Rand'}}}
+```
+Building index on document fields would be attractive feature that allows to search and iterate over such data.
+```
+s:create_index('name'', { parts = {{ "name.first", "str" }, {"name.last", "str"}}})
+```
+
+Fields having complex document structure should have 'map' or 'array' type in format if specified.
+
+
+## Detailed design
+All data in Tarantool stored in atomic database objects called *tuples*. They consist of payload(user data represented as *msgpack*) and extra information that describe how does database should operate with it.
+```
+  [[ Img 0.1 - Tarantool tuple structure ]]
+
+                           +----------------------+-----------------------+-------------+
+  tuple_begin, ..., raw =  |      extra_size      | offset N ... offset 1 | MessagePack |
+  |                        +----------------------+-----------------------+-------------+
+  |                                                                       ^
+  +-----------------------------------------------------------------------data_offset
+```
+Indexed information should be accessible really fast. In case of regular indexes on fields it comes possible with *field_map* - data offsets stored before user payload in each tuple. Metadata cells that are used to store those offsets *offset_slot* are not same for different *formats*(aggregated information of all indexes describing specific space). Without loss of generality this mean that tuples constructed at the moment 'A' when only PK index was registered have another *format* than the tuples constructed at the moment 'B' after new Index "name" was added.
+
+The concept idea with *offset_slot*s could be used for JSON Indexes too.
+
+JSON-path definitely would be a part of *key_part* structure. As we have already noted, *offset_slots* are different for formats of different generations, so we should find an universal and fast way to obtain it by JSON. As code that access tuple data by index is a really hot, the performance is our priority problem.
+We already have *tuple_field_raw_by_path* that follows JSON path in tuple and returns pointer to requested field. Sometimes fields that nested in JSON-structured document may have coinciding path part:
+e.g.:
+```
+s:create_index('name'', { parts = {{ "name.first", "str" }, {"name.last", "str"}}})
+s:create_index('name'', { parts = {{ "town", "str" }, {"name.last", "str"}}})
+```
+We don't like to parse coinciding suffixes multiple time for same tuples.
+
+### 1.1. JSON-path tree in tuple format
+Keeping in mind all challenges we suppose to introduce tree-based structure in *tuple_format* that contain JSON-path lexemes and has allocated tuple *offset_slots* as leaves nodes.
+```
+   [[ Img 1.1 - JSON-path tree in tuple format ]]
+
+   ----\
+       |-----> [name] \
+       |              |----> [first]   <slot1 = -3>
+       |              |
+       |              |----> [last]    <slot2 = -1>
+       |
+       \------> [town]    <slot3 = -1>
+```
+
+For example, when first index part "name.first" is built, we would already have name offset and would able to start there parsing only "last" lexeme.
+```
+s:create_index('name'', { parts = {{ "name.first", "str" },
+                                   { "name.last",  "str" }}})
+```
+The resulting information offset would be stored in slot obtained from tree leaf node.
+
+```
+    [[ Img 1.2 - Tuple data stored in database of same format ]]
+
+                                                              |  Description  |
+
+     {-3}    {-2}    {-1}                                     |   slot index  |
+   +=======+=======+=======+======+===================+
+   | slot1 | slot2 | slot3 | town |       name        |       |  tuple schema |
+   +=======+=======+=======+======+=====+======+======+
+   |   2   |   4   |   0   | NY   | Ayn | Rand |              |  tuple1 data  |
+   +-------+-------+-------+------+-----+---+--+------+
+   |   2   |   5   |   0   | NY   | Richard | Feynman |       |  tuple2 data  |
+   +-------+-------+-------+------+---------+---------+
+                           ^      ^     ^
+                           0      2     4                     | tuple1 offset |
+
+                           ^      ^         ^
+                           0      2         5                 | tuple2 offset |
+```
+
+### 1.2. Access data by JSON-path
+The central API to access information is a routine
+```
+static inline const char *
+tuple_field_raw(const struct tuple_format *format, const char *tuple,
+                const uint32_t *field_map, uint32_t field_no)
+```
+
+In most scenarios related to indexes it is called with *field_no* extracted from *key_part* - a descriptor of part representing index.
+```
+field_b = tuple_field_raw(format_b, tuple_b_raw, field_map_b, part->fieldno);
+```
+With introducing parts containing JSON path we need to rework such calls.
+
+To avoid parsing JSON path each time we suppose to introduce a cache for *offset_slot* in *key_part* structure:
+```
+struct key_part {
+  /** Tuple field index for this part */
+  uint32_t fieldno;
+  ...
+
+  /** Data JSON path. */
+  const char *data_path;
+  /** Epoch of tuple_format cached in offset_slot. */
+  uint32_t slot_epoch;
+  /** Cache of corresponding tuple_format offset_slot. */
+  int32_t slot_cache;
+};
+```
+And extend *tuple_format* where such epoch would be initialized with a new bigger(when source key_parts have changed) value on creation.
+```
+struct tuple_format {
+  ...
+
+  /** Epoch of tuple format. */
+  uint32_t epoch;
+  /** Formats of the fields. */
+  struct tuple_field fields[0];
+};
+```
+We would modify tuple_field to have a tree structure if necessary and represent all intimidate records.
+
+In all index-related scenarios with *tuple_field_raw* we would rework with a new function:
+```
+const char *
+tuple_field_by_part(const struct tuple *tuple, const struct key_def *def,
+                    uint32_t idx);
+
+```
+
+With first argument *tuple* we can obtain tuple_format that contain *epoch* field.
+```
+  [[ Img 2.1 - Block diagram for tuple_field_by_part routine ]]
+
+            [[def->epoch == tuple_format(tuple)->epoch]]
+                                 ||
+                  YES            ||             NO
+                 ________________/\________________
+                |                                  |
+
+     use offset_slot from             PARSE def->parts[idx]->data_path and
+  def->parts[idx].cached_slot     observe tuple_format(tuple)->fields structure,
+                                     UPDATE def->parts[idx].slot_epoch and
+                               def->parts[idx].slot_cache IF format epoch is bigger
+```
+
+PARSE is about parsing JSON path lexeme-by-lexeme and step down the fields tree until leaf with slot offset would reached.
+Then we need to UPDATE key_part cache IF epoch has increased. This *may* make epoch oscillations not so significant.
+
+To sped-up data access we suppose to use global format hashtable with all paths:
+```
+   HASHTABLE<const char *data_path, int32_t offset_slot>
+
+                            lookup
+   "[2]name.first" ----> hash1 ------> [hash1, "name.first", offset_slot1]
+   "[2]name.last"  ----> hash2 ------> [hash2, "name.last", offset_slot2]
+```
+
+## Rationale and alternatives
+### 2.1 Reallocatable slot_cache
+Epoch-based invalidation is looking complicated enough.
+It is possible to allocate slot_cache[] array for each key_part and store slots for all epoch (where epoch would be an index in that array).
+```
+                         format->epoch
+key_part {                     ||
+   slot_epoch[]:               \/
+        [1: s1, 2: s2, ... epoch: s3, ... max_epoch: sN]
+}
+
+I.e.:
+    tuple_map[key_part->slot_epoch[format->epoch]]
+```
+But as formats are created really often, this approach is really resource-consumable.
-- 
2.7.4