<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>

      <blockquote type="cite">

        <pre>Part of #3243

---

rfc in human-readable format: <a href="https://github.com/tarantool/tarantool/blob/np/gh-3243-multi-directional-iter-rfc/doc/rfc/3309-multi-directional-iterators.md">https://github.com/tarantool/tarantool/blob/np/gh-3243-multi-directional-iter-rfc/doc/rfc/3309-multi-directional-iterators.md</a>

Issue: <a href="https://github.com/tarantool/tarantool/issues/3243">https://github.com/tarantool/tarantool/issues/3243</a>

 doc/rfc/3309-multi-directional-iterators.md | 186 ++++++++++++++++++++++++++++

 1 file changed, 186 insertions(+)

 create mode 100644 doc/rfc/3309-multi-directional-iterators.md

diff --git a/doc/rfc/3309-multi-directional-iterators.md b/doc/rfc/3309-multi-directional-iterators.md

new file mode 100644

index 000000000..4f8ca7c99

--- /dev/null

+++ b/doc/rfc/3309-multi-directional-iterators.md

@@ -0,0 +1,186 @@

+# Multi-directional iterators

+

+* **Status**: In progress

+* **Start date**: 22-01-2020

+* **Authors**: Nikita Pettik @korablev77 <a href="https://lists.tarantool.org/mailman/listinfo/tarantool-patches">korablev at tarantool.org</a>

+* **Issues**: [#3243](<a href="https://github.com/tarantool/tarantool/issues/3243">https://github.com/tarantool/tarantool/issues/3243</a>)

+

+

+## Background and motivation

+

+This RFC touches only Memtx engine and TREE index type (as the only available

+in SQL and most common in user's practice). Multi-directional iterator is

+an iterator which allows iterating through different key parts orders.

+For instance: consider index `i` consisting of three key parts: `a`, `b` and `c`.

+Creating and using casual iterator looks like:

+```

+i = box.space.T.index.i

+i:select({1, 2, 3}, {iterator = 'EQ'}) -- returns all tuples which has

+                                       -- fields a == 1, b == 2, c == 3.

+```

+It is OK to omit one or more key parts declaring key value to be searched. In

+this case they are assumed to be nils:

+`i:select({1}, {iterator = 'EQ'})` is the same as

+`i:select({1, nil, nil}, {iterator = 'EQ'})`. So all tuples which has `a == 1`

+are getting to the result set. More formally matching rule is following:

+```

+if (search-key-part[i] is nil)

+{

+  if (iterator is LT or GT) return FALSE

+  return TRUE</pre>

      </blockquote>

      What does that TRUE and FALSE mean?<br>

      BTW tree index works fine with partial keys and GT/LT iterators.<br>

      E.g. `i:select({1}, {iterator = 'GT'})` returns all tuples with `a

      > 1`.<br>

    </p>

    <p>

      <blockquote type="cite">

        <pre>

+}

+```

+

+Another example:

+`i:select({1, 1, 1}, {iterator = 'GE'})`

+

+Here all tuples with `a >= 1`, `b >= 1` and `c >= 1` are returned. But some users</pre>

      </blockquote>

      That is not correct. Tuples and keys are compared

      lexicographically. Thus all tuples with<br>

      (`a = 1` AND `b = 1` AND `c >= 1') OR (`a = 1` AND `b > 1`)

      OR (`a > 1`) are the result.<br>

    </p>

    <p>

      <blockquote type="cite">

        <pre>

+may want to select tuples with `a >= 1`, `b >= 1` but `c < 1`. Or, alternatively,

+somebody may be willing to get tuples ordered by `a` and `b` in ascending order

+but by `c` in descending order: `i:select({}, {iterator = {'GE', 'GE', 'LE'})`.</pre>

      </blockquote>

      Those are not alternatives but independent options. One may wish

      to select:<br>

      `SELECT * FROM t WHERE a > 1 AND b < 1 ORDER BY a DESC, b

      ASC`.<br>

      <blockquote type="cite">

        <pre>

+It is analogue of common SQL query `SELECT * FROM t ORDER BY a ASC, b ASC, c DESC`.

+These requests are obviously impossible to fulfill with current indexes and

+iterators implementations. This RFC suggests ways to resolve mentioned problem

+in particular for memtx TREE indexes.

+

+## Implementation details

+

+TREE indexes in memtx engine are implemented as BPS-trees (see

+`src/lib/salad/bps_tree.h` for details). Keys correspond to particular values

+of key parts; data - to pointers to tuples. Hence, all data

+are sorted by their key values due to tree structure. For this reason HASH

+indexes have only GT and EQ (and ergo GE) iterators - data stored in a hash is

+unordered. Tree interface itself provides several functions to operate on data. 

+Iteration process starts in `tree_iterator_start()` (which is called once as

+`iterator->next()`): depending on its type iterator is positioned to the lower

+or upper bound (via `memtx_tree_lower_bound()`) of range of values satisfying

+search condition. In case key is not specified (i.e. empty), iterator is simply

+set to the first or last element of tree. At this moment first element to be

+returned (if any) is ready. To continue iterating `next` method of iterator

+object is changed to one of `tree_iterator_next()`, `tree_iterator_prev()` or

+their analogues for GE and LE iterators. Actually these functions fetch next

+element from B-tree leaf block. If iterator points to the last element in the

+block, it is set to the first element of the next block (leaf blocks are linked

+into list); if there's no more blocks, iterator is invalidated and iteration

+process is finished.  

+Taking into account this information let's review several approaches how to

+implement multi-directional iterators.

+

+### Solution №1

+

+First solution doesn't involve any additional data structures so that it deals

+with multi-directional iterators only using existing B-tree structure.  

+It fact, first key part can be used to locate first element as a candidate

+in the range to be selected. To illustrate this point let's consider following

+example:

+

+```

+s:create_index('i', {parts = {{1, 'integer'}, {2, 'integer'}}})`

+s:insert({1, 0})

+s:insert({1, 0})

+s:insert({1, 1})

+s:insert({2, 0})

+s:insert({2, 1})

+i:select({}, {iterator = {'GE', 'LE'}})

+```

+

+Result should be:

+```

+[1, 1]

+[1, 0]

+[1, 0]

+[2, 1]

+[2, 0]

+```

+Note that in case of casual GE iterator (i.e. {GE, GE} in terms of

+multi-directional iterators) result is:

+```

+[1, 0]

+[1, 0]

+[1, 1]

+[2, 0]

+[2, 1]

+```

+As one can see, results are sorted in different orders by second key part,

+but in the same order by first key part (not surprisingly). Assume first

+element with first key part satisfying search condition is located: {1, 0}.

+Then let's find out the first key part with different iterating order (in our

+case it is second key part). Since order is different for that key part, it is

+required to locate the first tuple with next first key part value: {2, 0}.

+After that, auxiliary iterator is created and positioned to that tuple (see</pre>

      </blockquote>

      Generally a stack of iterators is required for that.<br>

      <blockquote type="cite">

        <pre>

+schema below). Since order for the second key part is different, auxiliary

+iterator moves towards main iterator.

+

+```

+[1, 0], [1, 0], [1, 1], [2, 0] ... // tuples are arranged as in B-tree

+^                      ^

+|               <----- |

+Main iterator         Aux. iterator

+```

+Note that auxiliary iterator is assumed to process all keys between its initial

+position and main iterator position (since those keys are ordered using other

+direction - sort of full scan). Auxiliary iterator is required for each key

+part starting from that which direction is different from one of first key part.

+So that iteration process is likely to be slow without any optimizations.

+For EQ iterator type it is possible to simply skip those tuples which doesn't

+satisfy equality condition. In turn, it results in necessity to extract part of

+key value for all 'EQ' iterator key parts and compare it with reference key

+value. This algorithm can be generalized for any number of key parts in index.  

+

+Pros (+):

+ - it allows to specify any sets of key part iteration orders;

+ - in contrast to the second implementation, the first resulting tuple is

+   returned way much faster (since there's no time overhead to built new tree);

+ - finally, there's almost no memory overhead.  

+

+Cons (-):

+ - obviously the main drawback of this approach is time complexity -

+   it doesn't seem to be way faster than simple full-scan (the more key parts

+   with different iterating order are declared, the slower process will be).</pre>

      </blockquote>

      Actually that's not so complex. As I can see the proposed iterator

      will scan<br>

      only those tuples that is about to output. There is also an

      optimization of<br>

      a tree traverse+search that has O(log(N)) complexity that would

      help<br>

      in case of SELECT with LIMIT.

      <blockquote type="cite">

        <pre>+

+### Solution №2

+

+Since BPS tree is built without acknowledge in which order keys should be

+placed, it is assumed that order is always ascending: keys in blocks are sorted

+from smaller to bigger (left-to-right); comparison between keys is made by

+tuple_compare_with_key() function. It makes given tree be unsuitable for

+efficient iteration in different orders. On the other hand, it is possible to

+build new *temporary in-memory* BPS-tree featuring correct key order. It seems

+to be easy to achieve since keys order depends on result of comparator function.

+Reverting result of comparison for key parts corresponding to opposite iteration

+direction gives appropriate keys ordering in the tree. Note that not all data in

+space is needed to be in tree (in case of non-empty search key); only sub-tree

+making up lower or upper bound of first key part is required.

+

+Pros (+):

+ - any sets of key part iteration orders are allowed.  

+

+Cons (-):

+ - first tuple to selected is probably returned with significant delay;</pre>

      </blockquote>

      In the worst case it'll cost N*log(N), where N is the index size.<br>

      Note that such a request will delay ALL the requests for

      unpredictable timespan.<br>

      BTW, usually it's better to build a temporary array and sort it

      rather than building<br>

      a temporary tree for just one output.<br>

      But still I fear that solution is not an option.<br>

      <blockquote type="cite">

        <pre>

+ - tree memory construction overhead (only during iteration routine).

+

+### Solution №3

+

+It extends solution №2 in sense it allows specifying sorting direction for

+each part right in key def, that is during index creation. For instance:

+

+`s:create_index('i', {parts = {{1, 'integer', 'asc'}, {2, 'integer', 'desc'}}})`</pre>

      </blockquote>

      Already implemented:<br>

      `i = s:create_index('test', {type='tree', parts={{1, 'uint',

      sort_order='desc'}}})`<br>

      <blockquote type="cite">

        <pre>

+

+After that 'GT'/'LT' iterators for parts with declared 'desc' sorting order will

+return reversed results of comparison, so only comparators are affected.

+That's it (probably the simplest solution; what is more 'DESC' index is casual

+SQL feature in other DBs).  

+

+Pros (+):

+ - index search via 'desc' iterators is almost as fast as via casual

+   iterators;

+ - this approach seems to be easy in implementation and resolves

+   problem in SQL (since at the moment of ephemeral space creation it is allowed

+   to set its PK key parts orders).  

+

+Cons (-):

+ - 'desc' indexes are not versatile - user is unable to set different

+    orders in iterator;

+ - order of iteration itself is immutable. As a result, for each different

+   iteration order user has to create separate index which in turn consumes

+   additional memory and time as any other index.

-- 

2.15.1</pre>

      </blockquote>

      My summary:</p>

    <ol>

      <li>`select({1, 1}, {iterator='GE'})` is not equivalent to `SELECT

        .. WHERE a>=1 AND b >= 1`.</li>

      <li>It is still possible to implement `iterator={'GE', 'GE'}` that

        will suitable for that SQL SELECT.</li>

      <li>It is also possible to implement `order={'ASC', 'DESC'}`

        option in select using the solution #1.<br>

      </li>

    </ol>

  </body>

</html>