[Tarantool-patches] [PATCH] rfc: multi-directional iterators

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Nikita Pettik <korablev@tarantool.org>
To: tarantool-patches@dev.tarantool.org
Cc: v.shpilevoy@tarantool.org
Subject: [Tarantool-patches] [PATCH] rfc: multi-directional iterators
Date: Sun, 26 Jan 2020 23:54:24 +0300	[thread overview]
Message-ID: <d50042fa51c71b79c88c4e3e8965fba836b2ae1c.1580071900.git.korablev@tarantool.org> (raw)

Part of #3243
---
rfc in human-readable format: https://github.com/tarantool/tarantool/blob/np/gh-3243-multi-directional-iter-rfc/doc/rfc/3309-multi-directional-iterators.md
Issue: https://github.com/tarantool/tarantool/issues/3243

 doc/rfc/3309-multi-directional-iterators.md | 186 ++++++++++++++++++++++++++++
 1 file changed, 186 insertions(+)
 create mode 100644 doc/rfc/3309-multi-directional-iterators.md

diff --git a/doc/rfc/3309-multi-directional-iterators.md b/doc/rfc/3309-multi-directional-iterators.md
new file mode 100644
index 000000000..4f8ca7c99
--- /dev/null
+++ b/doc/rfc/3309-multi-directional-iterators.md
@@ -0,0 +1,186 @@
+# Multi-directional iterators
+
+* **Status**: In progress
+* **Start date**: 22-01-2020
+* **Authors**: Nikita Pettik @korablev77 korablev@tarantool.org
+* **Issues**: [#3243](https://github.com/tarantool/tarantool/issues/3243)
+
+
+## Background and motivation
+
+This RFC touches only Memtx engine and TREE index type (as the only available
+in SQL and most common in user's practice). Multi-directional iterator is
+an iterator which allows iterating through different key parts orders.
+For instance: consider index `i` consisting of three key parts: `a`, `b` and `c`.
+Creating and using casual iterator looks like:
+```
+i = box.space.T.index.i
+i:select({1, 2, 3}, {iterator = 'EQ'}) -- returns all tuples which has
+                                       -- fields a == 1, b == 2, c == 3.
+```
+It is OK to omit one or more key parts declaring key value to be searched. In
+this case they are assumed to be nils:
+`i:select({1}, {iterator = 'EQ'})` is the same as
+`i:select({1, nil, nil}, {iterator = 'EQ'})`. So all tuples which has `a == 1`
+are getting to the result set. More formally matching rule is following:
+```
+if (search-key-part[i] is nil)
+{
+  if (iterator is LT or GT) return FALSE
+  return TRUE
+}
+```
+
+Another example:
+`i:select({1, 1, 1}, {iterator = 'GE'})`
+
+Here all tuples with `a >= 1`, `b >= 1` and `c >= 1` are returned. But some users
+may want to select tuples with `a >= 1`, `b >= 1` but `c < 1`. Or, alternatively,
+somebody may be willing to get tuples ordered by `a` and `b` in ascending order
+but by `c` in descending order: `i:select({}, {iterator = {'GE', 'GE', 'LE'})`.
+It is analogue of common SQL query `SELECT * FROM t ORDER BY a ASC, b ASC, c DESC`.
+These requests are obviously impossible to fulfill with current indexes and
+iterators implementations. This RFC suggests ways to resolve mentioned problem
+in particular for memtx TREE indexes.
+
+## Implementation details
+
+TREE indexes in memtx engine are implemented as BPS-trees (see
+`src/lib/salad/bps_tree.h` for details). Keys correspond to particular values
+of key parts; data - to pointers to tuples. Hence, all data
+are sorted by their key values due to tree structure. For this reason HASH
+indexes have only GT and EQ (and ergo GE) iterators - data stored in a hash is
+unordered. Tree interface itself provides several functions to operate on data. 
+Iteration process starts in `tree_iterator_start()` (which is called once as
+`iterator->next()`): depending on its type iterator is positioned to the lower
+or upper bound (via `memtx_tree_lower_bound()`) of range of values satisfying
+search condition. In case key is not specified (i.e. empty), iterator is simply
+set to the first or last element of tree. At this moment first element to be
+returned (if any) is ready. To continue iterating `next` method of iterator
+object is changed to one of `tree_iterator_next()`, `tree_iterator_prev()` or
+their analogues for GE and LE iterators. Actually these functions fetch next
+element from B-tree leaf block. If iterator points to the last element in the
+block, it is set to the first element of the next block (leaf blocks are linked
+into list); if there's no more blocks, iterator is invalidated and iteration
+process is finished.  
+Taking into account this information let's review several approaches how to
+implement multi-directional iterators.
+
+### Solution №1
+
+First solution doesn't involve any additional data structures so that it deals
+with multi-directional iterators only using existing B-tree structure.  
+It fact, first key part can be used to locate first element as a candidate
+in the range to be selected. To illustrate this point let's consider following
+example:
+
+```
+s:create_index('i', {parts = {{1, 'integer'}, {2, 'integer'}}})`
+s:insert({1, 0})
+s:insert({1, 0})
+s:insert({1, 1})
+s:insert({2, 0})
+s:insert({2, 1})
+i:select({}, {iterator = {'GE', 'LE'}})
+```
+
+Result should be:
+```
+[1, 1]
+[1, 0]
+[1, 0]
+[2, 1]
+[2, 0]
+```
+Note that in case of casual GE iterator (i.e. {GE, GE} in terms of
+multi-directional iterators) result is:
+```
+[1, 0]
+[1, 0]
+[1, 1]
+[2, 0]
+[2, 1]
+```
+As one can see, results are sorted in different orders by second key part,
+but in the same order by first key part (not surprisingly). Assume first
+element with first key part satisfying search condition is located: {1, 0}.
+Then let's find out the first key part with different iterating order (in our
+case it is second key part). Since order is different for that key part, it is
+required to locate the first tuple with next first key part value: {2, 0}.
+After that, auxiliary iterator is created and positioned to that tuple (see
+schema below). Since order for the second key part is different, auxiliary
+iterator moves towards main iterator.
+
+```
+[1, 0], [1, 0], [1, 1], [2, 0] ... // tuples are arranged as in B-tree
+^                      ^
+|               <----- |
+Main iterator         Aux. iterator
+```
+Note that auxiliary iterator is assumed to process all keys between its initial
+position and main iterator position (since those keys are ordered using other
+direction - sort of full scan). Auxiliary iterator is required for each key
+part starting from that which direction is different from one of first key part.
+So that iteration process is likely to be slow without any optimizations.
+For EQ iterator type it is possible to simply skip those tuples which doesn't
+satisfy equality condition. In turn, it results in necessity to extract part of
+key value for all 'EQ' iterator key parts and compare it with reference key
+value. This algorithm can be generalized for any number of key parts in index.  
+
+Pros (+):
+ - it allows to specify any sets of key part iteration orders;
+ - in contrast to the second implementation, the first resulting tuple is
+   returned way much faster (since there's no time overhead to built new tree);
+ - finally, there's almost no memory overhead.  
+
+Cons (-):
+ - obviously the main drawback of this approach is time complexity -
+   it doesn't seem to be way faster than simple full-scan (the more key parts
+   with different iterating order are declared, the slower process will be).
+
+### Solution №2
+
+Since BPS tree is built without acknowledge in which order keys should be
+placed, it is assumed that order is always ascending: keys in blocks are sorted
+from smaller to bigger (left-to-right); comparison between keys is made by
+tuple_compare_with_key() function. It makes given tree be unsuitable for
+efficient iteration in different orders. On the other hand, it is possible to
+build new *temporary in-memory* BPS-tree featuring correct key order. It seems
+to be easy to achieve since keys order depends on result of comparator function.
+Reverting result of comparison for key parts corresponding to opposite iteration
+direction gives appropriate keys ordering in the tree. Note that not all data in
+space is needed to be in tree (in case of non-empty search key); only sub-tree
+making up lower or upper bound of first key part is required.
+
+Pros (+):
+ - any sets of key part iteration orders are allowed.  
+
+Cons (-):
+ - first tuple to selected is probably returned with significant delay;
+ - tree memory construction overhead (only during iteration routine).
+
+### Solution №3
+
+It extends solution №2 in sense it allows specifying sorting direction for
+each part right in key def, that is during index creation. For instance:
+
+`s:create_index('i', {parts = {{1, 'integer', 'asc'}, {2, 'integer', 'desc'}}})`
+
+After that 'GT'/'LT' iterators for parts with declared 'desc' sorting order will
+return reversed results of comparison, so only comparators are affected.
+That's it (probably the simplest solution; what is more 'DESC' index is casual
+SQL feature in other DBs).  
+
+Pros (+):
+ - index search via 'desc' iterators is almost as fast as via casual
+   iterators;
+ - this approach seems to be easy in implementation and resolves
+   problem in SQL (since at the moment of ephemeral space creation it is allowed
+   to set its PK key parts orders).  
+
+Cons (-):
+ - 'desc' indexes are not versatile - user is unable to set different
+    orders in iterator;
+ - order of iteration itself is immutable. As a result, for each different
+   iteration order user has to create separate index which in turn consumes
+   additional memory and time as any other index.
-- 
2.15.1

next             reply	other threads:[~2020-01-26 20:54 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-26 20:54 Nikita Pettik [this message]
2020-01-27 19:34 ` Konstantin Osipov
2020-01-27 19:38   ` Konstantin Osipov
2020-04-25  9:00 ` Konstantin Osipov
2020-04-24 10:10 Aleksandr Lyapunov
2020-04-25  9:03 ` Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d50042fa51c71b79c88c4e3e8965fba836b2ae1c.1580071900.git.korablev@tarantool.org \
    --to=korablev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] rfc: multi-directional iterators' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox