[tarantool-patches] [PATCH v2 1/5] rfc: describe a Tarantool JSON indexes

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Kirill Shcherbatov <kshcherbatov@tarantool.org>
To: tarantool-patches@freelists.org
Cc: v.shpilevoy@tarantool.org,
	Kirill Shcherbatov <kshcherbatov@tarantool.org>
Subject: [tarantool-patches] [PATCH v2 1/5] rfc: describe a Tarantool JSON indexes
Date: Wed, 15 Aug 2018 15:14:59 +0300	[thread overview]
Message-ID: <85adafa5ebd2dc307996a8c0cf488c0ddc6a7c3d.1534332920.git.kshcherbatov@tarantool.org> (raw)
In-Reply-To: <cover.1534332920.git.kshcherbatov@tarantool.org>
In-Reply-To: <cover.1534332920.git.kshcherbatov@tarantool.org>

Part of #1012.
---
 doc/rfc/1012-json-indexes.md | 188 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 188 insertions(+)
 create mode 100644 doc/rfc/1012-json-indexes.md

diff --git a/doc/rfc/1012-json-indexes.md b/doc/rfc/1012-json-indexes.md
new file mode 100644
index 0000000..a603f9a
--- /dev/null
+++ b/doc/rfc/1012-json-indexes.md
@@ -0,0 +1,188 @@
+# Tarantool JSON Indexes
+
+* **Status**: In progress
+* **Start date**: 26-07-2018
+* **Authors**: Vladimir Davydov @locker vdavydov.dev@gmail.com, Vladislav Shpilevoy @Gerold103 v.shpilevoy@tarantool.org, Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org
+* **Issues**: [#1012](https://github.com/tarantool/tarantool/issues/1012)
+
+## Summary
+Tarantool *JSON Indexes* is a feature that allows to index space's tuples that have consistent document-type complex structure.
+
+
+## Background and motivation
+The box.index submodule provides read-only access for index definitions and index keys. Indexes are contained in box.space.space-name.index array within each space object. They provide an API for ordered iteration over tuples. This API is a direct binding to corresponding methods of index objects of type box.index in the storage engine.
+```
+s:create_index('primary', {parts = {1, 'unsigned'}, {2, 'str'}})
+```
+
+Sometimes tuple space data have complex but consistent document structure.
+```
+s:insert{1, {town = 'NY', name = {first = 'Richard', second = 'Feynman'}}}
+s:insert{2, {town = 'NY', name = {first = 'Ayn', second = 'Rand'}}}
+```
+Building index on document fields would be attractive feature that allows to search and iterate over such data.
+```
+s:create_index('name'', { parts = {{ "name.first", "str" }, {"name.last", "str"}}})
+```
+
+Fields having complex document structure should have 'map' or 'array' type in format if specified.
+
+
+## Detailed design
+All data in Tarantool stored in atomic database objects called *tuples*. They consist of payload(user data represented as *msgpack*) and extra information that describe how does database should operate with it.
+```
+  [[ Img 0.1 - Tarantool tuple structure ]]
+
+                           +----------------------+-----------------------+-------------+
+  tuple_begin, ..., raw =  |      extra_size      | offset N ... offset 1 | MessagePack |
+  |                        +----------------------+-----------------------+-------------+
+  |                                                                       ^
+  +-----------------------------------------------------------------------data_offset
+```
+Indexed information should be accessible really fast. In case of regular indexes on fields it comes possible with *field_map* - data offsets stored before user payload in each tuple. Metadata cells that are used to store those offsets *offset_slot* are not same for different *formats*(aggregated information of all indexes describing specific space). Without loss of generality this mean that tuples constructed at the moment 'A' when only PK index was registered have another *format* than the tuples constructed at the moment 'B' after new Index "name" was added.
+
+The concept idea with *offset_slot*s could be used for JSON Indexes too.
+
+JSON-path definitely would be a part of *key_part* structure. As we have already noted, *offset_slots* are different for formats of different generations, so we should find an universal and fast way to obtain it by JSON. As code that access tuple data by index is a really hot, the performance is our priority problem.
+We already have *tuple_field_raw_by_path* that follows JSON path in tuple and returns pointer to requested field. Sometimes fields that nested in JSON-structured document may have coinciding path part:
+e.g.:
+```
+s:create_index('name'', { parts = {{ "name.first", "str" }, {"name.last", "str"}}})
+s:create_index('name'', { parts = {{ "town", "str" }, {"name.last", "str"}}})
+```
+We don't like to parse coinciding suffixes multiple time for same tuples.
+
+### 1.1. JSON-path tree in tuple format
+Keeping in mind all challenges we suppose to introduce tree-based structure in *tuple_format* that contain JSON-path lexemes and has allocated tuple *offset_slots* as leaves nodes.
+```
+   [[ Img 1.1 - JSON-path tree in tuple format ]]
+
+   ----\
+       |-----> [name] \
+       |              |----> [first]   <slot1 = -3>
+       |              |
+       |              |----> [last]    <slot2 = -1>
+       |
+       \------> [town]    <slot3 = -1>
+```
+
+For example, when first index part "name.first" is built, we would already have name offset and would able to start there parsing only "last" lexeme.
+```
+s:create_index('name'', { parts = {{ "name.first", "str" },
+                                   { "name.last",  "str" }}})
+```
+The resulting information offset would be stored in slot obtained from tree leaf node.
+
+```
+    [[ Img 1.2 - Tuple data stored in database of same format ]]
+
+                                                              |  Description  |
+
+     {-3}    {-2}    {-1}                                     |   slot index  |
+   +=======+=======+=======+======+===================+
+   | slot1 | slot2 | slot3 | town |       name        |       |  tuple schema |
+   +=======+=======+=======+======+=====+======+======+
+   |   2   |   4   |   0   | NY   | Ayn | Rand |              |  tuple1 data  |
+   +-------+-------+-------+------+-----+---+--+------+
+   |   2   |   5   |   0   | NY   | Richard | Feynman |       |  tuple2 data  |
+   +-------+-------+-------+------+---------+---------+
+                           ^      ^     ^
+                           0      2     4                     | tuple1 offset |
+
+                           ^      ^         ^
+                           0      2         5                 | tuple2 offset |
+```
+
+### 1.2. Access data by JSON-path
+The central API to access information is a routine
+```
+static inline const char *
+tuple_field_raw(const struct tuple_format *format, const char *tuple,
+                const uint32_t *field_map, uint32_t field_no)
+```
+
+In most scenarios related to indexes it is called with *field_no* extracted from *key_part* - a descriptor of part representing index.
+```
+field_b = tuple_field_raw(format_b, tuple_b_raw, field_map_b, part->fieldno);
+```
+With introducing parts containing JSON path we need to rework such calls.
+
+To avoid parsing JSON path each time we suppose to introduce a cache for *offset_slot* in *key_part* structure:
+```
+struct key_part {
+  /** Tuple field index for this part */
+  uint32_t fieldno;
+  ...
+
+  /** Data JSON path. */
+  const char *data_path;
+  /** Epoch of tuple_format cached in offset_slot. */
+  uint32_t slot_epoch;
+  /** Cache of corresponding tuple_format offset_slot. */
+  int32_t slot_cache;
+};
+```
+And extend *tuple_format* where such epoch would be initialized with a new bigger(when source key_parts have changed) value on creation.
+```
+struct tuple_format {
+  ...
+
+  /** Epoch of tuple format. */
+  uint32_t epoch;
+  /** Formats of the fields. */
+  struct tuple_field fields[0];
+};
+```
+We would modify tuple_field to have a tree structure if necessary and represent all intimidate records.
+
+In all index-related scenarios with *tuple_field_raw* we would rework with a new function:
+```
+const char *
+tuple_field_by_part(const struct tuple *tuple, const struct key_def *def,
+                    uint32_t idx);
+
+```
+
+With first argument *tuple* we can obtain tuple_format that contain *epoch* field.
+```
+  [[ Img 2.1 - Block diagram for tuple_field_by_part routine ]]
+
+            [[def->epoch == tuple_format(tuple)->epoch]]
+                                 ||
+                  YES            ||             NO
+                 ________________/\________________
+                |                                  |
+
+     use offset_slot from             PARSE def->parts[idx]->data_path and
+  def->parts[idx].cached_slot     observe tuple_format(tuple)->fields structure,
+                                     UPDATE def->parts[idx].slot_epoch and
+                               def->parts[idx].slot_cache IF format epoch is bigger
+```
+
+PARSE is about parsing JSON path lexeme-by-lexeme and step down the fields tree until leaf with slot offset would reached.
+Then we need to UPDATE key_part cache IF epoch has increased. This *may* make epoch oscillations not so significant.
+
+To sped-up data access we suppose to use global format hashtable with all paths:
+```
+   HASHTABLE<const char *data_path, int32_t offset_slot>
+
+                            lookup
+   "[2]name.first" ----> hash1 ------> [hash1, "name.first", offset_slot1]
+   "[2]name.last"  ----> hash2 ------> [hash2, "name.last", offset_slot2]
+```
+
+## Rationale and alternatives
+### 2.1 Reallocatable slot_cache
+Epoch-based invalidation is looking complicated enough.
+It is possible to allocate slot_cache[] array for each key_part and store slots for all epoch (where epoch would be an index in that array).
+```
+                         format->epoch
+key_part {                     ||
+   slot_epoch[]:               \/
+        [1: s1, 2: s2, ... epoch: s3, ... max_epoch: sN]
+}
+
+I.e.:
+    tuple_map[key_part->slot_epoch[format->epoch]]
+```
+But as formats are created really often, this approach is really resource-consumable.
-- 
2.7.4

next prev parent reply	other threads:[~2018-08-15 12:15 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-15 12:14 [tarantool-patches] [PATCH v2 0/5] box: indexes by JSON path Kirill Shcherbatov
2018-08-15 12:14 ` Kirill Shcherbatov [this message]
2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 2/5] box: introduce slot_cache in key_part Kirill Shcherbatov
2018-08-22  0:27   ` [tarantool-patches] " Vladislav Shpilevoy
2018-08-27  7:37     ` Kirill Shcherbatov
2018-09-03 10:32       ` Vladislav Shpilevoy
2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 3/5] box: introduce path field " Kirill Shcherbatov
2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
2018-08-27  7:37     ` Kirill Shcherbatov
2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 4/5] box: introduce path_hash and tuple_field tree Kirill Shcherbatov
2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
2018-08-27  7:37     ` Kirill Shcherbatov
2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 5/5] box: specify indexes in user-friendly form Kirill Shcherbatov
2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
2018-08-27  7:37     ` Kirill Shcherbatov
2018-08-22  0:28   ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85adafa5ebd2dc307996a8c0cf488c0ddc6a7c3d.1534332920.git.kshcherbatov@tarantool.org \
    --to=kshcherbatov@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [tarantool-patches] [PATCH v2 1/5] rfc: describe a Tarantool JSON indexes' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox