Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH v1 0/4] box: introduce hint option for memtx tree index
@ 2019-02-05 11:58 Kirill Shcherbatov
  2019-02-05 11:58 ` [PATCH v1 1/4] lib: introduce BPS_TREE_EQUAL custom comparator Kirill Shcherbatov
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Kirill Shcherbatov @ 2019-02-05 11:58 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov

Reworked memtx tree to use 'tuple hints'.
Introduced special functions for retrieve tuple hints for a particular key_def.
Hint is an integer that can be used for tuple comparison optimization:
if a hint of one tuple is less than a hint of another then the first
tuple is definitely less than the second; only if hints are equal
tuple_compare must be called for getting comparison result.

Hints are only useful when:

* they are precalculated and stored along with the tuple;
  calculation is not cheap (cheaper than tuple_compare btw) but
  precalculated hints allow to compare tuples without even fetching
  tuple data.
* first part of key_def is 'string', 'unsigned' or 'integer'
* since hint is calculated using only the first part of key_def
  (and only first several characters if it is a string) this part
  must be different with high probability for every tuple pair.

Enabled hint option improve performance on average by 15%; Select operations
are significantly accelerated (there are scenarios in which the difference
reaches 200-250%).

http://github.com/tarantool/tarantool/tree/kshch/gh-3961-tuple-hints
https://github.com/tarantool/tarantool/issues/3961

Kirill Shcherbatov (3):
  lib: introduce BPS_TREE_EQUAL custom comparator
  box: rework memtx_tree class to be reusable
  box: introduce hint option for memtx tree index

Alexandr Lyapunov (1):
  box: introduce tuple compare hint

 src/box/index_def.c                 |   2 +
 src/box/index_def.h                 |   6 +
 src/box/key_def.c                   |   1 +
 src/box/key_def.h                   |  11 +
 src/box/lua/schema.lua              |  12 +
 src/box/lua/space.cc                |   5 +
 src/box/memtx_tree.c                | 851 ++++++------------------
 src/box/memtx_tree.h                |  68 +-
 src/box/memtx_tree_impl.h           | 960 ++++++++++++++++++++++++++++
 src/box/tuple_compare.cc            | 131 ++++
 src/box/tuple_compare.h             |  40 ++
 src/coll.c                          |  23 +
 src/coll.h                          |   3 +
 src/lib/salad/bps_tree.h            |  74 ++-
 test/box/tree_pk.result             | 220 +++++++
 test/box/tree_pk.test.lua           |  88 +++
 test/box/tree_pk_multipart.result   | 155 +++++
 test/box/tree_pk_multipart.test.lua |  65 ++
 test/unit/bps_tree_iterator.cc      |  15 +-
 19 files changed, 1989 insertions(+), 741 deletions(-)
 create mode 100644 src/box/memtx_tree_impl.h

-- 
2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v1 1/4] lib: introduce BPS_TREE_EQUAL custom comparator
  2019-02-05 11:58 [PATCH v1 0/4] box: introduce hint option for memtx tree index Kirill Shcherbatov
@ 2019-02-05 11:58 ` Kirill Shcherbatov
  2019-02-11 15:46   ` Vladimir Davydov
  2019-02-05 11:58 ` [PATCH v1 2/4] box: rework memtx_tree class to be reusable Kirill Shcherbatov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Kirill Shcherbatov @ 2019-02-05 11:58 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov

Introduce a macro BPS_TREE_EQUAL for BPS TREE class. This makes
possible to define custom comparators for stucture-based leafs.
Previously, a C++ comparison operator "!=" override was used for
this purpose. Due to the fact that we are not going to rework on
C++ C-only components of Tarantool like memtx_tree, we needed a
way to make complex structures comparisons using preprocessor.

Needed for #3961
---
 src/lib/salad/bps_tree.h       | 74 +++++++++++++++++++++-------------
 test/unit/bps_tree_iterator.cc | 15 ++++---
 2 files changed, 56 insertions(+), 33 deletions(-)

diff --git a/src/lib/salad/bps_tree.h b/src/lib/salad/bps_tree.h
index 8d1a62108..1419af3f7 100644
--- a/src/lib/salad/bps_tree.h
+++ b/src/lib/salad/bps_tree.h
@@ -1035,6 +1035,10 @@ struct bps_leaf_path_elem {
 	bps_tree_pos_t max_elem_pos;
 };
 
+#ifndef BPS_TREE_EQUAL
+#define BPS_TREE_EQUAL(a, b) (a == b)
+#endif
+
 /**
  * @brief Tree construction. Fills struct bps_tree members.
  * @param tree - pointer to a tree
@@ -4592,7 +4596,7 @@ bps_tree_debug_check_block(const struct bps_tree *tree, struct bps_block *block,
 						       inner->child_ids[i]);
 			bps_tree_elem_t calc_max_elem =
 				bps_tree_debug_find_max_elem(tree, tmp_block);
-			if (inner->elems[i] != calc_max_elem)
+			if (!BPS_TREE_EQUAL(inner->elems[i], calc_max_elem))
 				result |= 0x4000;
 		}
 		if (block->size > 1) {
@@ -4651,7 +4655,8 @@ bps_tree_debug_check(const struct bps_tree *tree)
 		return result;
 	}
 	struct bps_block *root = bps_tree_root(tree);
-	if (tree->max_elem != bps_tree_debug_find_max_elem(tree, root))
+	bps_tree_elem_t elem = bps_tree_debug_find_max_elem(tree, root);
+	if (!BPS_TREE_EQUAL(tree->max_elem, elem))
 		result |= 0x8;
 	size_t calc_count = 0;
 	bps_tree_block_id_t expected_prev_id = (bps_tree_block_id_t)(-1);
@@ -5016,16 +5021,22 @@ bps_tree_debug_check_move_to_right_leaf(struct bps_tree *tree, bool assertme)
 					assert(!assertme);
 				}
 
-				if (a.header.size)
-					if (ma != a.elems[a.header.size - 1]) {
+				if (a.header.size) {
+					bps_tree_elem_t elem =
+						a.elems[a.header.size - 1];
+					if (!BPS_TREE_EQUAL(ma, elem)) {
 						result |= (1 << 5);
 						assert(!assertme);
 					}
-				if (b.header.size)
-					if (mb != b.elems[b.header.size - 1]) {
+				}
+				if (b.header.size) {
+					bps_tree_elem_t elem =
+						b.elems[b.header.size - 1];
+					if (!BPS_TREE_EQUAL(mb, elem)) {
 						result |= (1 << 5);
 						assert(!assertme);
 					}
+				}
 
 				c = 0;
 				for (unsigned int u = 0;
@@ -5113,16 +5124,22 @@ bps_tree_debug_check_move_to_left_leaf(struct bps_tree *tree, bool assertme)
 					assert(!assertme);
 				}
 
-				if (a.header.size)
-					if (ma != a.elems[a.header.size - 1]) {
+				if (a.header.size) {
+					bps_tree_elem_t elem =
+						a.elems[a.header.size - 1];
+					if (!BPS_TREE_EQUAL(ma, elem)) {
 						result |= (1 << 7);
 						assert(!assertme);
 					}
-				if (b.header.size)
-					if (mb != b.elems[b.header.size - 1]) {
+				}
+				if (b.header.size) {
+					bps_tree_elem_t elem =
+						b.elems[b.header.size - 1];
+					if (!BPS_TREE_EQUAL(mb, elem)) {
 						result |= (1 << 7);
 						assert(!assertme);
 					}
+				}
 
 				c = 0;
 				for (unsigned int u = 0;
@@ -5224,21 +5241,22 @@ bps_tree_debug_check_insert_and_move_to_right_leaf(struct bps_tree *tree,
 						assert(!assertme);
 					}
 
-					if (i - u + 1)
-						if (ma
-							!= a.elems[a.header.size
-								- 1]) {
+					if (i - u + 1) {
+						bps_tree_elem_t elem =
+							a.elems[a.header.size - 1];
+						if (!BPS_TREE_EQUAL(ma, elem)) {
 							result |= (1 << 9);
 							assert(!assertme);
 						}
-					if (j + u)
-						if (mb
-							!= b.elems[b.header.size
-								- 1]) {
+					}
+					if (j + u) {
+						bps_tree_elem_t elem =
+							b.elems[b.header.size - 1];
+						if (!BPS_TREE_EQUAL(mb, elem)) {
 							result |= (1 << 9);
 							assert(!assertme);
 						}
-
+					}
 					c = 0;
 					for (unsigned int v = 0;
 						v < (unsigned int) a.header.size;
@@ -5340,20 +5358,22 @@ bps_tree_debug_check_insert_and_move_to_left_leaf(struct bps_tree *tree,
 						assert(!assertme);
 					}
 
-					if (i + u)
-						if (ma
-							!= a.elems[a.header.size
-								- 1]) {
+					if (i + u) {
+						bps_tree_elem_t elem =
+							a.elems[a.header.size - 1];
+						if (!BPS_TREE_EQUAL(ma, elem)) {
 							result |= (1 << 11);
 							assert(!assertme);
 						}
-					if (j - u + 1)
-						if (mb
-							!= b.elems[b.header.size
-								- 1]) {
+					}
+					if (j - u + 1) {
+						bps_tree_elem_t elem =
+							b.elems[b.header.size - 1];
+						if (!BPS_TREE_EQUAL(mb, elem)) {
 							result |= (1 << 11);
 							assert(!assertme);
 						}
+					}
 
 					c = 0;
 					for (unsigned int v = 0;
diff --git a/test/unit/bps_tree_iterator.cc b/test/unit/bps_tree_iterator.cc
index 3c2d7f7c4..318aac1ec 100644
--- a/test/unit/bps_tree_iterator.cc
+++ b/test/unit/bps_tree_iterator.cc
@@ -10,18 +10,16 @@
 struct elem_t {
 	long first;
 	long second;
-	bool operator!= (const struct elem_t& another) const
-	{
-		return first != another.first || second != another.second;
-	}
 };
 
+static bool equal(const elem_t &a, const elem_t &b);
 static int compare(const elem_t &a, const elem_t &b);
 static int compare_key(const elem_t &a, long b);
 
 #define BPS_TREE_NAME test
 #define BPS_TREE_BLOCK_SIZE 128 /* value is to low specially for tests */
 #define BPS_TREE_EXTENT_SIZE 1024 /* value is to low specially for tests */
+#define BPS_TREE_EQUAL(a, b) equal(a, b)
 #define BPS_TREE_COMPARE(a, b, arg) compare(a, b)
 #define BPS_TREE_COMPARE_KEY(a, b, arg) compare_key(a, b)
 #define bps_tree_elem_t struct elem_t
@@ -29,6 +27,11 @@ static int compare_key(const elem_t &a, long b);
 #define bps_tree_arg_t int
 #include "salad/bps_tree.h"
 
+static bool equal(const elem_t &a, const elem_t &b)
+{
+	return a.first == b.first && a.second == b.second;
+}
+
 static int compare(const elem_t &a, const elem_t &b)
 {
 	return a.first < b.first ? -1 : a.first > b.first ? 1 :
@@ -501,7 +504,7 @@ iterator_freeze_check()
 		}
 		int tested_count = 0;
 		while ((e = test_iterator_get_elem(&tree, &iterator1))) {
-			if (*e != comp_buf1[tested_count]) {
+			if (!equal(*e, comp_buf1[tested_count])) {
 				fail("version restore failed (1)", "true");
 			}
 			tested_count++;
@@ -523,7 +526,7 @@ iterator_freeze_check()
 
 		tested_count = 0;
 		while ((e = test_iterator_get_elem(&tree, &iterator2))) {
-			if (*e != comp_buf1[tested_count]) {
+			if (!equal(*e, comp_buf1[tested_count])) {
 				fail("version restore failed (1)", "true");
 			}
 			tested_count++;
-- 
2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v1 2/4] box: rework memtx_tree class to be reusable
  2019-02-05 11:58 [PATCH v1 0/4] box: introduce hint option for memtx tree index Kirill Shcherbatov
  2019-02-05 11:58 ` [PATCH v1 1/4] lib: introduce BPS_TREE_EQUAL custom comparator Kirill Shcherbatov
@ 2019-02-05 11:58 ` Kirill Shcherbatov
  2019-02-11 15:49   ` Vladimir Davydov
  2019-02-05 11:58 ` [PATCH v1 3/4] box: introduce tuple compare hint Kirill Shcherbatov
  2019-02-05 11:58 ` [PATCH v1 4/4] box: introduce hint option for memtx tree index Kirill Shcherbatov
  3 siblings, 1 reply; 9+ messages in thread
From: Kirill Shcherbatov @ 2019-02-05 11:58 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov

The memtx tree class has been redesigned so that it can be used
to store arbitrary structures in the future. This makes possible
to implement type hints in memtx in subsequent patches.

Needed for #3961
---
 src/box/memtx_tree.c      | 715 ++--------------------------
 src/box/memtx_tree.h      |  68 +--
 src/box/memtx_tree_impl.h | 960 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 1008 insertions(+), 735 deletions(-)
 create mode 100644 src/box/memtx_tree_impl.h

diff --git a/src/box/memtx_tree.c b/src/box/memtx_tree.c
index f851fb869..6b22883c2 100644
--- a/src/box/memtx_tree.c
+++ b/src/box/memtx_tree.c
@@ -28,684 +28,59 @@
  * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
+#include <stdbool.h>
+#include <stdint.h>
+#include "tuple_compare.h"
 #include "memtx_tree.h"
-#include "memtx_engine.h"
-#include "space.h"
-#include "schema.h" /* space_cache_find() */
-#include "errinj.h"
-#include "memory.h"
-#include "fiber.h"
-#include "tuple.h"
-#include <third_party/qsort_arg.h>
-#include <small/mempool.h>
 
-/* {{{ Utilities. *************************************************/
+/* {{{ Memtx tree of tuples class. ******************************/
 
-static int
-memtx_tree_qcompare(const void* a, const void *b, void *c)
-{
-	return tuple_compare(*(struct tuple **)a,
-		*(struct tuple **)b, (struct key_def *)c);
-}
-
-/* {{{ MemtxTree Iterators ****************************************/
-struct tree_iterator {
-	struct iterator base;
-	const struct memtx_tree *tree;
-	struct index_def *index_def;
-	struct memtx_tree_iterator tree_iterator;
-	enum iterator_type type;
-	struct memtx_tree_key_data key_data;
-	struct tuple *current_tuple;
-	/** Memory pool the iterator was allocated from. */
-	struct mempool *pool;
+/** Struct that is used as a key in BPS tree definition. */
+struct memtx_tuple_tree_key_data {
+	/** Sequence of msgpacked search fields. */
+	const char *key;
+	/** Number of msgpacked search fields. */
+	uint32_t part_count;
 };
 
-static void
-tree_iterator_free(struct iterator *iterator);
-
-static inline struct tree_iterator *
-tree_iterator(struct iterator *it)
-{
-	assert(it->free == tree_iterator_free);
-	return (struct tree_iterator *) it;
-}
-
-static void
-tree_iterator_free(struct iterator *iterator)
-{
-	struct tree_iterator *it = tree_iterator(iterator);
-	if (it->current_tuple != NULL)
-		tuple_unref(it->current_tuple);
-	mempool_free(it->pool, it);
-}
-
-static int
-tree_iterator_dummie(struct iterator *iterator, struct tuple **ret)
-{
-	(void)iterator;
-	*ret = NULL;
-	return 0;
-}
-
-static int
-tree_iterator_next(struct iterator *iterator, struct tuple **ret)
-{
-	struct tuple **res;
-	struct tree_iterator *it = tree_iterator(iterator);
-	assert(it->current_tuple != NULL);
-	struct tuple **check = memtx_tree_iterator_get_elem(it->tree, &it->tree_iterator);
-	if (check == NULL || *check != it->current_tuple)
-		it->tree_iterator =
-			memtx_tree_upper_bound_elem(it->tree, it->current_tuple,
-						    NULL);
-	else
-		memtx_tree_iterator_next(it->tree, &it->tree_iterator);
-	tuple_unref(it->current_tuple);
-	it->current_tuple = NULL;
-	res = memtx_tree_iterator_get_elem(it->tree, &it->tree_iterator);
-	if (res == NULL) {
-		iterator->next = tree_iterator_dummie;
-		*ret = NULL;
-	} else {
-		*ret = it->current_tuple = *res;
-		tuple_ref(it->current_tuple);
-	}
-	return 0;
-}
-
-static int
-tree_iterator_prev(struct iterator *iterator, struct tuple **ret)
-{
-	struct tree_iterator *it = tree_iterator(iterator);
-	assert(it->current_tuple != NULL);
-	struct tuple **check = memtx_tree_iterator_get_elem(it->tree, &it->tree_iterator);
-	if (check == NULL || *check != it->current_tuple)
-		it->tree_iterator =
-			memtx_tree_lower_bound_elem(it->tree, it->current_tuple,
-						    NULL);
-	memtx_tree_iterator_prev(it->tree, &it->tree_iterator);
-	tuple_unref(it->current_tuple);
-	it->current_tuple = NULL;
-	struct tuple **res = memtx_tree_iterator_get_elem(it->tree, &it->tree_iterator);
-	if (!res) {
-		iterator->next = tree_iterator_dummie;
-		*ret = NULL;
-	} else {
-		*ret = it->current_tuple = *res;
-		tuple_ref(it->current_tuple);
-	}
-	return 0;
-}
-
-static int
-tree_iterator_next_equal(struct iterator *iterator, struct tuple **ret)
-{
-	struct tree_iterator *it = tree_iterator(iterator);
-	assert(it->current_tuple != NULL);
-	struct tuple **check = memtx_tree_iterator_get_elem(it->tree,
-						&it->tree_iterator);
-	if (check == NULL || *check != it->current_tuple)
-		it->tree_iterator =
-			memtx_tree_upper_bound_elem(it->tree, it->current_tuple,
-						    NULL);
-	else
-		memtx_tree_iterator_next(it->tree, &it->tree_iterator);
-	tuple_unref(it->current_tuple);
-	it->current_tuple = NULL;
-	struct tuple **res = memtx_tree_iterator_get_elem(it->tree,
-						&it->tree_iterator);
-	/* Use user key def to save a few loops. */
-	if (!res || memtx_tree_compare_key(*res, &it->key_data,
-					   it->index_def->key_def) != 0) {
-		iterator->next = tree_iterator_dummie;
-		*ret = NULL;
-	} else {
-		*ret = it->current_tuple = *res;
-		tuple_ref(it->current_tuple);
-	}
-	return 0;
-}
-
-static int
-tree_iterator_prev_equal(struct iterator *iterator, struct tuple **ret)
-{
-	struct tree_iterator *it = tree_iterator(iterator);
-	assert(it->current_tuple != NULL);
-	struct tuple **check = memtx_tree_iterator_get_elem(it->tree,
-						&it->tree_iterator);
-	if (check == NULL || *check != it->current_tuple)
-		it->tree_iterator =
-			memtx_tree_lower_bound_elem(it->tree, it->current_tuple,
-						    NULL);
-	memtx_tree_iterator_prev(it->tree, &it->tree_iterator);
-	tuple_unref(it->current_tuple);
-	it->current_tuple = NULL;
-	struct tuple **res = memtx_tree_iterator_get_elem(it->tree,
-						&it->tree_iterator);
-	/* Use user key def to save a few loops. */
-	if (!res || memtx_tree_compare_key(*res, &it->key_data,
-					   it->index_def->key_def) != 0) {
-		iterator->next = tree_iterator_dummie;
-		*ret = NULL;
-	} else {
-		*ret = it->current_tuple = *res;
-		tuple_ref(it->current_tuple);
-	}
-	return 0;
-}
-
-static void
-tree_iterator_set_next_method(struct tree_iterator *it)
-{
-	assert(it->current_tuple != NULL);
-	switch (it->type) {
-	case ITER_EQ:
-		it->base.next = tree_iterator_next_equal;
-		break;
-	case ITER_REQ:
-		it->base.next = tree_iterator_prev_equal;
-		break;
-	case ITER_ALL:
-		it->base.next = tree_iterator_next;
-		break;
-	case ITER_LT:
-	case ITER_LE:
-		it->base.next = tree_iterator_prev;
-		break;
-	case ITER_GE:
-	case ITER_GT:
-		it->base.next = tree_iterator_next;
-		break;
-	default:
-		/* The type was checked in initIterator */
-		assert(false);
-	}
-}
-
-static int
-tree_iterator_start(struct iterator *iterator, struct tuple **ret)
-{
-	*ret = NULL;
-	struct tree_iterator *it = tree_iterator(iterator);
-	it->base.next = tree_iterator_dummie;
-	const struct memtx_tree *tree = it->tree;
-	enum iterator_type type = it->type;
-	bool exact = false;
-	assert(it->current_tuple == NULL);
-	if (it->key_data.key == 0) {
-		if (iterator_type_is_reverse(it->type))
-			it->tree_iterator = memtx_tree_iterator_last(tree);
-		else
-			it->tree_iterator = memtx_tree_iterator_first(tree);
-	} else {
-		if (type == ITER_ALL || type == ITER_EQ ||
-		    type == ITER_GE || type == ITER_LT) {
-			it->tree_iterator =
-				memtx_tree_lower_bound(tree, &it->key_data,
-						       &exact);
-			if (type == ITER_EQ && !exact)
-				return 0;
-		} else { // ITER_GT, ITER_REQ, ITER_LE
-			it->tree_iterator =
-				memtx_tree_upper_bound(tree, &it->key_data,
-						       &exact);
-			if (type == ITER_REQ && !exact)
-				return 0;
-		}
-		if (iterator_type_is_reverse(type)) {
-			/*
-			 * Because of limitations of tree search API we use use
-			 * lower_bound for LT search and upper_bound for LE
-			 * and REQ searches. Thus we found position to the
-			 * right of the target one. Let's make a step to the
-			 * left to reach target position.
-			 * If we found an invalid iterator all the elements in
-			 * the tree are less (less or equal) to the key, and
-			 * iterator_next call will convert the iterator to the
-			 * last position in the tree, that's what we need.
-			 */
-			memtx_tree_iterator_prev(it->tree, &it->tree_iterator);
-		}
-	}
-
-	struct tuple **res = memtx_tree_iterator_get_elem(it->tree,
-						&it->tree_iterator);
-	if (!res)
-		return 0;
-	*ret = it->current_tuple = *res;
-	tuple_ref(it->current_tuple);
-	tree_iterator_set_next_method(it);
-	return 0;
-}
+#define MEMTX_TREE_NAME metmx_tuple_tree
+#define memtx_tree_elem struct tuple *
+#define memtx_tree_key struct memtx_tuple_tree_key_data
+#define MEMTX_TREE_ELEM_EQUAL(elem_a_ptr, elem_b_ptr)				\
+	(*elem_a_ptr == *elem_b_ptr)
+#define MEMTX_TREE_ELEM_CMP(elem_a_ptr, elem_b_ptr, key_def)			\
+	tuple_compare(*elem_a_ptr, *elem_b_ptr, key_def)
+#define MEMTX_TREE_ELEM_WITH_KEY_CMP(elem_ptr, key_ptr, key_def)		\
+	tuple_compare_with_key(*elem_ptr, (key_ptr)->key,			\
+			       (key_ptr)->part_count, key_def)
+#define MEMTX_TREE_ELEM_SET(elem_ptr, tuple, key_def)				\
+	({(void)key_def; *elem_ptr = tuple;})
+#define MEMTX_TREE_KEY_SET(key_ptr, key_val, part_count_val, key_def)		\
+	({(void)key_def; (key_ptr)->key = key_val;				\
+	 (key_ptr)->part_count = part_count_val;})
+#define MEMTX_TREE_ELEM_GET(elem_ptr) (*(elem_ptr))
+#define MEMTX_TREE_KEY_GET(key_ptr, part_count_ptr)				\
+	({*part_count_ptr = (key_ptr)->part_count; (key_ptr)->key;})
+
+#include "memtx_tree_impl.h"
+
+#undef memtx_tree_key
+#undef memtx_tree_elem
+#undef MEMTX_TREE_KEY_GET
+#undef MEMTX_TREE_ELEM_GET
+#undef MEMTX_TREE_KEY_SET
+#undef MEMTX_TREE_ELEM_SET
+#undef MEMTX_TREE_ELEM_WITH_KEY_CMP
+#undef MEMTX_TREE_ELEM_CMP
+#undef MEMTX_TREE_ELEM_EQUAL
+#undef memtx_tree_key
+#undef memtx_tree_elem
+#undef MEMTX_TREE_NAME
 
 /* }}} */
 
-/* {{{ MemtxTree  **********************************************************/
-
-/**
- * Return the key def to use for comparing tuples stored
- * in the given tree index.
- *
- * We use extended key def for non-unique and nullable
- * indexes. Unique but nullable index can store multiple
- * NULLs. To correctly compare these NULLs extended key
- * def must be used. For details @sa tuple_compare.cc.
- */
-static struct key_def *
-memtx_tree_index_cmp_def(struct memtx_tree_index *index)
-{
-	struct index_def *def = index->base.def;
-	return def->opts.is_unique && !def->key_def->is_nullable ?
-		def->key_def : def->cmp_def;
-}
-
-static void
-memtx_tree_index_free(struct memtx_tree_index *index)
-{
-	memtx_tree_destroy(&index->tree);
-	free(index->build_array);
-	free(index);
-}
-
-static void
-memtx_tree_index_gc_run(struct memtx_gc_task *task, bool *done)
-{
-	/*
-	 * Yield every 1K tuples to keep latency < 0.1 ms.
-	 * Yield more often in debug mode.
-	 */
-#ifdef NDEBUG
-	enum { YIELD_LOOPS = 1000 };
-#else
-	enum { YIELD_LOOPS = 10 };
-#endif
-
-	struct memtx_tree_index *index = container_of(task,
-			struct memtx_tree_index, gc_task);
-	struct memtx_tree *tree = &index->tree;
-	struct memtx_tree_iterator *itr = &index->gc_iterator;
-
-	unsigned int loops = 0;
-	while (!memtx_tree_iterator_is_invalid(itr)) {
-		struct tuple *tuple = *memtx_tree_iterator_get_elem(tree, itr);
-		memtx_tree_iterator_next(tree, itr);
-		tuple_unref(tuple);
-		if (++loops >= YIELD_LOOPS) {
-			*done = false;
-			return;
-		}
-	}
-	*done = true;
-}
-
-static void
-memtx_tree_index_gc_free(struct memtx_gc_task *task)
-{
-	struct memtx_tree_index *index = container_of(task,
-			struct memtx_tree_index, gc_task);
-	memtx_tree_index_free(index);
-}
-
-static const struct memtx_gc_task_vtab memtx_tree_index_gc_vtab = {
-	.run = memtx_tree_index_gc_run,
-	.free = memtx_tree_index_gc_free,
-};
-
-static void
-memtx_tree_index_destroy(struct index *base)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	struct memtx_engine *memtx = (struct memtx_engine *)base->engine;
-	if (base->def->iid == 0) {
-		/*
-		 * Primary index. We need to free all tuples stored
-		 * in the index, which may take a while. Schedule a
-		 * background task in order not to block tx thread.
-		 */
-		index->gc_task.vtab = &memtx_tree_index_gc_vtab;
-		index->gc_iterator = memtx_tree_iterator_first(&index->tree);
-		memtx_engine_schedule_gc(memtx, &index->gc_task);
-	} else {
-		/*
-		 * Secondary index. Destruction is fast, no need to
-		 * hand over to background fiber.
-		 */
-		memtx_tree_index_free(index);
-	}
-}
-
-static void
-memtx_tree_index_update_def(struct index *base)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	index->tree.arg = memtx_tree_index_cmp_def(index);
-}
-
-static bool
-memtx_tree_index_depends_on_pk(struct index *base)
-{
-	struct index_def *def = base->def;
-	/* See comment to memtx_tree_index_cmp_def(). */
-	return !def->opts.is_unique || def->key_def->is_nullable;
-}
-
-static ssize_t
-memtx_tree_index_size(struct index *base)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	return memtx_tree_size(&index->tree);
-}
-
-static ssize_t
-memtx_tree_index_bsize(struct index *base)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	return memtx_tree_mem_used(&index->tree);
-}
-
-static int
-memtx_tree_index_random(struct index *base, uint32_t rnd, struct tuple **result)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	struct tuple **res = memtx_tree_random(&index->tree, rnd);
-	*result = res != NULL ? *res : NULL;
-	return 0;
-}
-
-static ssize_t
-memtx_tree_index_count(struct index *base, enum iterator_type type,
-		       const char *key, uint32_t part_count)
-{
-	if (type == ITER_ALL)
-		return memtx_tree_index_size(base); /* optimization */
-	return generic_index_count(base, type, key, part_count);
-}
-
-static int
-memtx_tree_index_get(struct index *base, const char *key,
-		     uint32_t part_count, struct tuple **result)
-{
-	assert(base->def->opts.is_unique &&
-	       part_count == base->def->key_def->part_count);
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	struct memtx_tree_key_data key_data;
-	key_data.key = key;
-	key_data.part_count = part_count;
-	struct tuple **res = memtx_tree_find(&index->tree, &key_data);
-	*result = res != NULL ? *res : NULL;
-	return 0;
-}
-
-static int
-memtx_tree_index_replace(struct index *base, struct tuple *old_tuple,
-			 struct tuple *new_tuple, enum dup_replace_mode mode,
-			 struct tuple **result)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	if (new_tuple) {
-		struct tuple *dup_tuple = NULL;
-
-		/* Try to optimistically replace the new_tuple. */
-		int tree_res = memtx_tree_insert(&index->tree,
-						 new_tuple, &dup_tuple);
-		if (tree_res) {
-			diag_set(OutOfMemory, MEMTX_EXTENT_SIZE,
-				 "memtx_tree_index", "replace");
-			return -1;
-		}
-
-		uint32_t errcode = replace_check_dup(old_tuple,
-						     dup_tuple, mode);
-		if (errcode) {
-			memtx_tree_delete(&index->tree, new_tuple);
-			if (dup_tuple)
-				memtx_tree_insert(&index->tree, dup_tuple, 0);
-			struct space *sp = space_cache_find(base->def->space_id);
-			if (sp != NULL)
-				diag_set(ClientError, errcode, base->def->name,
-					 space_name(sp));
-			return -1;
-		}
-		if (dup_tuple) {
-			*result = dup_tuple;
-			return 0;
-		}
-	}
-	if (old_tuple) {
-		memtx_tree_delete(&index->tree, old_tuple);
-	}
-	*result = old_tuple;
-	return 0;
-}
-
-static struct iterator *
-memtx_tree_index_create_iterator(struct index *base, enum iterator_type type,
-				 const char *key, uint32_t part_count)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	struct memtx_engine *memtx = (struct memtx_engine *)base->engine;
-
-	assert(part_count == 0 || key != NULL);
-	if (type > ITER_GT) {
-		diag_set(UnsupportedIndexFeature, base->def,
-			 "requested iterator type");
-		return NULL;
-	}
-
-	if (part_count == 0) {
-		/*
-		 * If no key is specified, downgrade equality
-		 * iterators to a full range.
-		 */
-		type = iterator_type_is_reverse(type) ? ITER_LE : ITER_GE;
-		key = NULL;
-	}
-
-	struct tree_iterator *it = mempool_alloc(&memtx->tree_iterator_pool);
-	if (it == NULL) {
-		diag_set(OutOfMemory, sizeof(struct tree_iterator),
-			 "memtx_tree_index", "iterator");
-		return NULL;
-	}
-	iterator_create(&it->base, base);
-	it->pool = &memtx->tree_iterator_pool;
-	it->base.next = tree_iterator_start;
-	it->base.free = tree_iterator_free;
-	it->type = type;
-	it->key_data.key = key;
-	it->key_data.part_count = part_count;
-	it->index_def = base->def;
-	it->tree = &index->tree;
-	it->tree_iterator = memtx_tree_invalid_iterator();
-	it->current_tuple = NULL;
-	return (struct iterator *)it;
-}
-
-static void
-memtx_tree_index_begin_build(struct index *base)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	assert(memtx_tree_size(&index->tree) == 0);
-	(void)index;
-}
-
-static int
-memtx_tree_index_reserve(struct index *base, uint32_t size_hint)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	if (size_hint < index->build_array_alloc_size)
-		return 0;
-	struct tuple **tmp = (struct tuple **)realloc(index->build_array,
-						      size_hint * sizeof(*tmp));
-	if (tmp == NULL) {
-		diag_set(OutOfMemory, size_hint * sizeof(*tmp),
-			 "memtx_tree_index", "reserve");
-		return -1;
-	}
-	index->build_array = tmp;
-	index->build_array_alloc_size = size_hint;
-	return 0;
-}
-
-static int
-memtx_tree_index_build_next(struct index *base, struct tuple *tuple)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	if (index->build_array == NULL) {
-		index->build_array = (struct tuple **)malloc(MEMTX_EXTENT_SIZE);
-		if (index->build_array == NULL) {
-			diag_set(OutOfMemory, MEMTX_EXTENT_SIZE,
-				 "memtx_tree_index", "build_next");
-			return -1;
-		}
-		index->build_array_alloc_size =
-			MEMTX_EXTENT_SIZE / sizeof(struct tuple*);
-	}
-	assert(index->build_array_size <= index->build_array_alloc_size);
-	if (index->build_array_size == index->build_array_alloc_size) {
-		index->build_array_alloc_size = index->build_array_alloc_size +
-					index->build_array_alloc_size / 2;
-		struct tuple **tmp = (struct tuple **)
-			realloc(index->build_array,
-				index->build_array_alloc_size * sizeof(*tmp));
-		if (tmp == NULL) {
-			diag_set(OutOfMemory, index->build_array_alloc_size *
-				 sizeof(*tmp), "memtx_tree_index", "build_next");
-			return -1;
-		}
-		index->build_array = tmp;
-	}
-	index->build_array[index->build_array_size++] = tuple;
-	return 0;
-}
-
-static void
-memtx_tree_index_end_build(struct index *base)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	struct key_def *cmp_def = memtx_tree_index_cmp_def(index);
-	qsort_arg(index->build_array, index->build_array_size,
-		  sizeof(struct tuple *),
-		  memtx_tree_qcompare, cmp_def);
-	memtx_tree_build(&index->tree, index->build_array,
-			 index->build_array_size);
-
-	free(index->build_array);
-	index->build_array = NULL;
-	index->build_array_size = 0;
-	index->build_array_alloc_size = 0;
-}
-
-struct tree_snapshot_iterator {
-	struct snapshot_iterator base;
-	struct memtx_tree *tree;
-	struct memtx_tree_iterator tree_iterator;
-};
-
-static void
-tree_snapshot_iterator_free(struct snapshot_iterator *iterator)
-{
-	assert(iterator->free == tree_snapshot_iterator_free);
-	struct tree_snapshot_iterator *it =
-		(struct tree_snapshot_iterator *)iterator;
-	struct memtx_tree *tree = (struct memtx_tree *)it->tree;
-	memtx_tree_iterator_destroy(tree, &it->tree_iterator);
-	free(iterator);
-}
-
-static const char *
-tree_snapshot_iterator_next(struct snapshot_iterator *iterator, uint32_t *size)
-{
-	assert(iterator->free == tree_snapshot_iterator_free);
-	struct tree_snapshot_iterator *it =
-		(struct tree_snapshot_iterator *)iterator;
-	struct tuple **res = memtx_tree_iterator_get_elem(it->tree,
-						&it->tree_iterator);
-	if (res == NULL)
-		return NULL;
-	memtx_tree_iterator_next(it->tree, &it->tree_iterator);
-	return tuple_data_range(*res, size);
-}
-
-/**
- * Create an ALL iterator with personal read view so further
- * index modifications will not affect the iteration results.
- * Must be destroyed by iterator->free after usage.
- */
-static struct snapshot_iterator *
-memtx_tree_index_create_snapshot_iterator(struct index *base)
-{
-	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
-	struct tree_snapshot_iterator *it = (struct tree_snapshot_iterator *)
-		calloc(1, sizeof(*it));
-	if (it == NULL) {
-		diag_set(OutOfMemory, sizeof(struct tree_snapshot_iterator),
-			 "memtx_tree_index", "create_snapshot_iterator");
-		return NULL;
-	}
-
-	it->base.free = tree_snapshot_iterator_free;
-	it->base.next = tree_snapshot_iterator_next;
-	it->tree = &index->tree;
-	it->tree_iterator = memtx_tree_iterator_first(&index->tree);
-	memtx_tree_iterator_freeze(&index->tree, &it->tree_iterator);
-	return (struct snapshot_iterator *) it;
-}
-
-static const struct index_vtab memtx_tree_index_vtab = {
-	/* .destroy = */ memtx_tree_index_destroy,
-	/* .commit_create = */ generic_index_commit_create,
-	/* .abort_create = */ generic_index_abort_create,
-	/* .commit_modify = */ generic_index_commit_modify,
-	/* .commit_drop = */ generic_index_commit_drop,
-	/* .update_def = */ memtx_tree_index_update_def,
-	/* .depends_on_pk = */ memtx_tree_index_depends_on_pk,
-	/* .def_change_requires_rebuild = */
-		memtx_index_def_change_requires_rebuild,
-	/* .size = */ memtx_tree_index_size,
-	/* .bsize = */ memtx_tree_index_bsize,
-	/* .min = */ generic_index_min,
-	/* .max = */ generic_index_max,
-	/* .random = */ memtx_tree_index_random,
-	/* .count = */ memtx_tree_index_count,
-	/* .get = */ memtx_tree_index_get,
-	/* .replace = */ memtx_tree_index_replace,
-	/* .create_iterator = */ memtx_tree_index_create_iterator,
-	/* .create_snapshot_iterator = */
-		memtx_tree_index_create_snapshot_iterator,
-	/* .stat = */ generic_index_stat,
-	/* .compact = */ generic_index_compact,
-	/* .reset_stat = */ generic_index_reset_stat,
-	/* .begin_build = */ memtx_tree_index_begin_build,
-	/* .reserve = */ memtx_tree_index_reserve,
-	/* .build_next = */ memtx_tree_index_build_next,
-	/* .end_build = */ memtx_tree_index_end_build,
-};
-
-struct memtx_tree_index *
+struct index *
 memtx_tree_index_new(struct memtx_engine *memtx, struct index_def *def)
 {
-	if (!mempool_is_initialized(&memtx->tree_iterator_pool)) {
-		mempool_create(&memtx->tree_iterator_pool, cord_slab_cache(),
-			       sizeof(struct tree_iterator));
-	}
-
-	struct memtx_tree_index *index =
-		(struct memtx_tree_index *)calloc(1, sizeof(*index));
-	if (index == NULL) {
-		diag_set(OutOfMemory, sizeof(*index),
-			 "malloc", "struct memtx_tree_index");
-		return NULL;
-	}
-	if (index_create(&index->base, (struct engine *)memtx,
-			 &memtx_tree_index_vtab, def) != 0) {
-		free(index);
-		return NULL;
-	}
-
-	struct key_def *cmp_def = memtx_tree_index_cmp_def(index);
-	memtx_tree_create(&index->tree, cmp_def, memtx_index_extent_alloc,
-			  memtx_index_extent_free, memtx);
-	return index;
+	return metmx_tuple_tree_index_new(memtx, def);
 }
diff --git a/src/box/memtx_tree.h b/src/box/memtx_tree.h
index bbc8b2419..edeaebab9 100644
--- a/src/box/memtx_tree.h
+++ b/src/box/memtx_tree.h
@@ -30,78 +30,16 @@
  * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
-#include <stddef.h>
-#include <stdint.h>
-
-#include "index.h"
-#include "memtx_engine.h"
 
 #if defined(__cplusplus)
 extern "C" {
 #endif /* defined(__cplusplus) */
 
+struct index;
+struct index_def;
 struct memtx_engine;
 
-/**
- * Struct that is used as a key in BPS tree definition.
- */
-struct memtx_tree_key_data
-{
-	/** Sequence of msgpacked search fields */
-	const char *key;
-	/** Number of msgpacked search fields */
-	uint32_t part_count;
-};
-
-/**
- * BPS tree element vs key comparator.
- * Defined in header in order to allow compiler to inline it.
- * @param tuple - tuple to compare.
- * @param key_data - key to compare with.
- * @param def - key definition.
- * @retval 0  if tuple == key in terms of def.
- * @retval <0 if tuple < key in terms of def.
- * @retval >0 if tuple > key in terms of def.
- */
-static inline int
-memtx_tree_compare_key(const struct tuple *tuple,
-		       const struct memtx_tree_key_data *key_data,
-		       struct key_def *def)
-{
-	return tuple_compare_with_key(tuple, key_data->key,
-				      key_data->part_count, def);
-}
-
-#define BPS_TREE_NAME memtx_tree
-#define BPS_TREE_BLOCK_SIZE (512)
-#define BPS_TREE_EXTENT_SIZE MEMTX_EXTENT_SIZE
-#define BPS_TREE_COMPARE(a, b, arg) tuple_compare(a, b, arg)
-#define BPS_TREE_COMPARE_KEY(a, b, arg) memtx_tree_compare_key(a, b, arg)
-#define bps_tree_elem_t struct tuple *
-#define bps_tree_key_t struct memtx_tree_key_data *
-#define bps_tree_arg_t struct key_def *
-
-#include "salad/bps_tree.h"
-
-#undef BPS_TREE_NAME
-#undef BPS_TREE_BLOCK_SIZE
-#undef BPS_TREE_EXTENT_SIZE
-#undef BPS_TREE_COMPARE
-#undef BPS_TREE_COMPARE_KEY
-#undef bps_tree_elem_t
-#undef bps_tree_key_t
-#undef bps_tree_arg_t
-
-struct memtx_tree_index {
-	struct index base;
-	struct memtx_tree tree;
-	struct tuple **build_array;
-	size_t build_array_size, build_array_alloc_size;
-	struct memtx_gc_task gc_task;
-	struct memtx_tree_iterator gc_iterator;
-};
-
-struct memtx_tree_index *
+struct index *
 memtx_tree_index_new(struct memtx_engine *memtx, struct index_def *def);
 
 #if defined(__cplusplus)
diff --git a/src/box/memtx_tree_impl.h b/src/box/memtx_tree_impl.h
new file mode 100644
index 000000000..20b88ad3b
--- /dev/null
+++ b/src/box/memtx_tree_impl.h
@@ -0,0 +1,960 @@
+/*
+ * Copyright 2010-2019 Tarantool AUTHORS: please see AUTHORS file.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ *    copyright notice, this list of conditions and the
+ *    following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ *    copyright notice, this list of conditions and the following
+ *    disclaimer in the documentation and/or other materials
+ *    provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#include <small/mempool.h>
+#include <third_party/qsort_arg.h>
+#include "memtx_engine.h"
+#include "schema.h"
+
+#ifndef MEMTX_TREE_NAME
+#error "MEMTX_TREE_NAME must be defined"
+#endif
+
+#ifndef memtx_tree_elem
+#error "memtx_tree_elem must be defined"
+#endif
+
+#ifndef memtx_tree_key
+#error "memtx_tree_key must be defined"
+#endif
+
+#ifndef MEMTX_TREE_ELEM_EQUAL
+#error "MEMTX_TREE_ELEM_EQUAL must be defined"
+#endif
+
+#ifndef MEMTX_TREE_ELEM_CMP
+#error "MEMTX_TREE_ELEM_CMP must be defined"
+#endif
+
+#ifndef MEMTX_TREE_ELEM_WITH_KEY_CMP
+#error "MEMTX_TREE_ELEM_WITH_KEY_CMP must be defined"
+#endif
+
+#ifndef MEMTX_TREE_ELEM_SET
+#error "MEMTX_TREE_ELEM_SET must be defined"
+#endif
+
+#ifndef MEMTX_TREE_KEY_SET
+#error "MEMTX_TREE_KEY_SET must be defined"
+#endif
+
+#ifndef MEMTX_TREE_ELEM_GET
+#error "MEMTX_TREE_ELEM_GET must be defined"
+#endif
+
+#ifndef MEMTX_TREE_KEY_GET
+#error "MEMTX_TREE_KEY_GET must be defined"
+#endif
+
+/**
+ * The size of the biggest memtx tree iterator. Used with
+ * mempool_create. This is the size of the block that will be
+ * allocated for each iterator.
+ */
+#define MEMTX_TREE_ITERATOR_SIZE (136)
+
+#define bps_tree_elem_t memtx_tree_elem
+#define bps_tree_key_t memtx_tree_key *
+#define bps_tree_arg_t struct key_def *
+#define BPS_TREE_NAME CONCAT3(MEMTX_TREE_NAME, _, tree)
+#define BPS_TREE_BLOCK_SIZE (512)
+#define BPS_TREE_EXTENT_SIZE MEMTX_EXTENT_SIZE
+#define BPS_TREE_EQUAL(a, b) MEMTX_TREE_ELEM_EQUAL(&a, &b)
+#define BPS_TREE_COMPARE(a, b, arg) MEMTX_TREE_ELEM_CMP(&a, &b, arg)
+#define BPS_TREE_COMPARE_KEY(a, b, arg) MEMTX_TREE_ELEM_WITH_KEY_CMP(&a, b, arg)
+
+#include "salad/bps_tree.h"
+
+#undef bps_tree_elem_t
+#undef bps_tree_key_t
+#undef bps_tree_arg_t
+#undef BPS_TREE_NAME
+#undef BPS_TREE_BLOCK_SIZE
+#undef BPS_TREE_EXTENT_SIZE
+#undef BPS_TREE_EQUAL
+#undef BPS_TREE_COMPARE
+#undef BPS_TREE_COMPARE_KEY
+
+#define bps_tree CONCAT3(MEMTX_TREE_NAME, _, tree)
+#define _tree_api_name(postfix) CONCAT5(MEMTX_TREE_NAME, _, tree, _, postfix)
+#define bps_tree_iterator _tree_api_name(iterator)
+#define bps_tree_iterator_is_invalid _tree_api_name(iterator_is_invalid)
+#define bps_tree_iterator_first _tree_api_name(iterator_first)
+#define bps_tree_iterator_last _tree_api_name(iterator_last)
+#define bps_tree_iterator_get_elem _tree_api_name(iterator_get_elem)
+#define bps_tree_iterator_next _tree_api_name(iterator_next)
+#define bps_tree_iterator_prev _tree_api_name(iterator_prev)
+#define bps_tree_iterator_freeze _tree_api_name(iterator_freeze)
+#define bps_tree_iterator_destroy _tree_api_name(iterator_destroy)
+#define bps_tree_create _tree_api_name(create)
+#define bps_tree_build _tree_api_name(build)
+#define bps_tree_destroy _tree_api_name(destroy)
+#define bps_tree_find _tree_api_name(find)
+#define bps_tree_insert _tree_api_name(insert)
+#define bps_tree_delete _tree_api_name(delete)
+#define bps_tree_size _tree_api_name(size)
+#define bps_tree_mem_used _tree_api_name(mem_used)
+#define bps_tree_random _tree_api_name(random)
+#define bps_tree_invalid_iterator _tree_api_name(invalid_iterator)
+#define bps_tree_lower_bound _tree_api_name(lower_bound)
+#define bps_tree_upper_bound _tree_api_name(upper_bound)
+#define bps_tree_lower_bound_elem _tree_api_name(lower_bound_elem)
+#define bps_tree_upper_bound_elem _tree_api_name(upper_bound_elem)
+
+#undef _api_name
+#define _api_name(postfix) CONCAT3(MEMTX_TREE_NAME, _, postfix)
+#define memtx_tree_index _api_name(index)
+#define memtx_tree_qcompare _api_name(qcompare)
+#define memtx_tree_iterator _api_name(iterator)
+#define memtx_tree_iterator_free _api_name(iterator_free)
+#define memtx_tree_iterator_dummie _api_name(iterator_dummie)
+#define memtx_tree_iterator_next _api_name(iterator_next)
+#define memtx_tree_iterator_prev _api_name(iterator_prev)
+#define memtx_tree_iterator_next_equal _api_name(iterator_next_equal)
+#define memtx_tree_iterator_prev_equal _api_name(iterator_prev_equal)
+#define memtx_tree_iterator_set_next_method _api_name(iterator_set_next_method)
+#define memtx_tree_iterator_start _api_name(iterator_start)
+#define memtx_tree_index_cmp_def _api_name(index_cmp_def)
+#define memtx_tree_index_free _api_name(index_free)
+#define memtx_tree_index_gc_run _api_name(index_gc_run)
+#define memtx_tree_index_gc_free _api_name(index_gc_free)
+#define memtx_tree_index_gc_vtab _api_name(index_gc_vtab)
+#define memtx_tree_index_destroy _api_name(index_destroy)
+#define memtx_tree_index_update_def _api_name(index_update_def)
+#define memtx_tree_index_depends_on_pk _api_name(index_depends_on_pk)
+#define memtx_tree_index_size _api_name(index_size)
+#define memtx_tree_index_bsize _api_name(index_bsize)
+#define memtx_tree_index_random _api_name(index_random)
+#define memtx_tree_index_count _api_name(index_count)
+#define memtx_tree_index_get _api_name(index_get)
+#define memtx_tree_index_insert_tuple _api_name(index_insert_tuple)
+#define memtx_tree_index_delete_tuple _api_name(index_delete_tuple)
+#define memtx_tree_index_replace _api_name(index_replace)
+#define memtx_tree_index_create_iterator _api_name(index_create_iterator)
+#define memtx_tree_index_begin_build _api_name(index_begin_build)
+#define memtx_tree_index_build_next _api_name(index_build_next)
+#define memtx_tree_index_reserve _api_name(index_reserve)
+#define memtx_tree_index_end_build _api_name(index_end_build)
+#define memtx_tree_snapshot_iterator _api_name(snapshot_iterator)
+#define memtx_tree_snapshot_iterator_free _api_name(snapshot_iterator_free)
+#define memtx_tree_snapshot_iterator_next _api_name(snapshot_iterator_next)
+#define memtx_tree_index_create_snapshot_iterator _api_name(index_create_snapshot_iterator)
+#define memtx_tree_index_vtab _api_name(index_vtab)
+#define memtx_tree_index_new _api_name(index_new)
+
+struct memtx_tree_index {
+	struct index base;
+	memtx_tree_elem *build_array;
+	size_t build_array_size, build_array_alloc_size;
+	struct memtx_gc_task gc_task;
+	struct bps_tree tree;
+	struct bps_tree_iterator gc_iterator;
+};
+
+/* {{{ Utilities. ***********************************************/
+
+static int
+memtx_tree_qcompare(const void* a, const void *b, void *c)
+{
+	return MEMTX_TREE_ELEM_CMP((memtx_tree_elem *)a,
+				   (memtx_tree_elem *)b, c);
+}
+
+/* {{{ MemtxTree Iterators **************************************/
+struct memtx_tree_iterator {
+	struct iterator base;
+	const struct bps_tree *tree;
+	struct index_def *index_def;
+	struct bps_tree_iterator tree_iterator;
+	enum iterator_type type;
+	/** Memory pool the iterator was allocated from. */
+	struct mempool *pool;
+	memtx_tree_key key_data;
+	memtx_tree_elem current;
+};
+
+static_assert(sizeof(struct memtx_tree_iterator) <= MEMTX_TREE_ITERATOR_SIZE,
+	      "memtx_tree_iterator must be less or equal than "
+	      "MEMTX_TREE_ITERATOR_SIZE");
+
+static void
+memtx_tree_iterator_free(struct iterator *iterator)
+{
+	struct memtx_tree_iterator *it = (struct memtx_tree_iterator *)iterator;
+	if (MEMTX_TREE_ELEM_GET(&it->current) != NULL)
+		tuple_unref(MEMTX_TREE_ELEM_GET(&it->current));
+	mempool_free(it->pool, it);
+}
+
+static int
+memtx_tree_iterator_dummie(struct iterator *iterator, struct tuple **ret)
+{
+	(void)iterator;
+	*ret = NULL;
+	return 0;
+}
+
+static int
+memtx_tree_iterator_next(struct iterator *iterator, struct tuple **ret)
+{
+	struct memtx_tree_iterator *it = (struct memtx_tree_iterator *)iterator;
+	assert(MEMTX_TREE_ELEM_GET(&it->current) != NULL);
+	memtx_tree_elem *check =
+		bps_tree_iterator_get_elem(it->tree, &it->tree_iterator);
+	if (check == NULL || !MEMTX_TREE_ELEM_EQUAL(check, &it->current)) {
+		it->tree_iterator =
+			bps_tree_upper_bound_elem(it->tree, it->current, NULL);
+	} else {
+		bps_tree_iterator_next(it->tree, &it->tree_iterator);
+	}
+	tuple_unref(MEMTX_TREE_ELEM_GET(&it->current));
+	MEMTX_TREE_ELEM_SET(&it->current, NULL, it->index_def->key_def);
+	memtx_tree_elem *res =
+		bps_tree_iterator_get_elem(it->tree, &it->tree_iterator);
+	if (res == NULL) {
+		iterator->next = memtx_tree_iterator_dummie;
+		*ret = NULL;
+	} else {
+		*ret = MEMTX_TREE_ELEM_GET(res);
+		MEMTX_TREE_ELEM_SET(&it->current, *ret, it->index_def->key_def);
+		tuple_ref(*ret);
+	}
+	return 0;
+}
+
+static int
+memtx_tree_iterator_prev(struct iterator *iterator, struct tuple **ret)
+{
+	struct memtx_tree_iterator *it = (struct memtx_tree_iterator *)iterator;
+	assert(MEMTX_TREE_ELEM_GET(&it->current) != NULL);
+	memtx_tree_elem *check =
+		bps_tree_iterator_get_elem(it->tree, &it->tree_iterator);
+	if (check == NULL || !MEMTX_TREE_ELEM_EQUAL(check, &it->current)) {
+		it->tree_iterator =
+			bps_tree_lower_bound_elem(it->tree, it->current, NULL);
+	}
+	bps_tree_iterator_prev(it->tree, &it->tree_iterator);
+	tuple_unref(MEMTX_TREE_ELEM_GET(&it->current));
+	MEMTX_TREE_ELEM_SET(&it->current, NULL, it->index_def->key_def);
+	memtx_tree_elem *res =
+		bps_tree_iterator_get_elem(it->tree, &it->tree_iterator);
+	if (!res) {
+		iterator->next = memtx_tree_iterator_dummie;
+		*ret = NULL;
+	} else {
+		*ret = MEMTX_TREE_ELEM_GET(res);
+		MEMTX_TREE_ELEM_SET(&it->current, *ret, it->index_def->key_def);
+		tuple_ref(*ret);
+	}
+	return 0;
+}
+
+static int
+memtx_tree_iterator_next_equal(struct iterator *iterator, struct tuple **ret)
+{
+	struct memtx_tree_iterator *it = (struct memtx_tree_iterator *)iterator;
+	assert(MEMTX_TREE_ELEM_GET(&it->current) != NULL);
+	memtx_tree_elem *check =
+		bps_tree_iterator_get_elem(it->tree, &it->tree_iterator);
+	if (check == NULL || !MEMTX_TREE_ELEM_EQUAL(check, &it->current)) {
+		it->tree_iterator =
+			bps_tree_upper_bound_elem(it->tree, it->current, NULL);
+	} else {
+		bps_tree_iterator_next(it->tree, &it->tree_iterator);
+	}
+	tuple_unref(MEMTX_TREE_ELEM_GET(&it->current));
+	MEMTX_TREE_ELEM_SET(&it->current, NULL, it->index_def->key_def);
+	memtx_tree_elem *res =
+		bps_tree_iterator_get_elem(it->tree, &it->tree_iterator);
+	/* Use user key def to save a few loops. */
+	if (!res ||
+	    MEMTX_TREE_ELEM_WITH_KEY_CMP(res, &it->key_data,
+					 it->index_def->key_def) != 0) {
+		iterator->next = memtx_tree_iterator_dummie;
+		*ret = NULL;
+	} else {
+		*ret = MEMTX_TREE_ELEM_GET(res);
+		MEMTX_TREE_ELEM_SET(&it->current, *ret, it->index_def->key_def);
+		tuple_ref(*ret);
+	}
+	return 0;
+}
+
+static int
+memtx_tree_iterator_prev_equal(struct iterator *iterator, struct tuple **ret)
+{
+	struct memtx_tree_iterator *it = (struct memtx_tree_iterator *)iterator;
+	assert(MEMTX_TREE_ELEM_GET(&it->current) != NULL);
+	memtx_tree_elem *check =
+		bps_tree_iterator_get_elem(it->tree, &it->tree_iterator);
+	if (check == NULL || !MEMTX_TREE_ELEM_EQUAL(check, &it->current)) {
+		it->tree_iterator =
+			bps_tree_lower_bound_elem(it->tree, it->current, NULL);
+	}
+	bps_tree_iterator_prev(it->tree, &it->tree_iterator);
+	tuple_unref(MEMTX_TREE_ELEM_GET(&it->current));
+	MEMTX_TREE_ELEM_SET(&it->current, NULL, it->index_def->key_def);
+	memtx_tree_elem *res =
+		bps_tree_iterator_get_elem(it->tree, &it->tree_iterator);
+	/* Use user key def to save a few loops. */
+	if (!res ||
+	    MEMTX_TREE_ELEM_WITH_KEY_CMP(res, &it->key_data,
+					 it->index_def->key_def) != 0) {
+		iterator->next = memtx_tree_iterator_dummie;
+		*ret = NULL;
+	} else {
+		*ret = MEMTX_TREE_ELEM_GET(res);
+		MEMTX_TREE_ELEM_SET(&it->current, *ret, it->index_def->key_def);
+		tuple_ref(*ret);
+	}
+	return 0;
+}
+
+static void
+memtx_tree_iterator_set_next_method(struct memtx_tree_iterator *it)
+{
+	assert(MEMTX_TREE_ELEM_GET(&it->current) != NULL);
+	switch (it->type) {
+	case ITER_EQ:
+		it->base.next = memtx_tree_iterator_next_equal;
+		break;
+	case ITER_REQ:
+		it->base.next = memtx_tree_iterator_prev_equal;
+		break;
+	case ITER_ALL:
+		it->base.next = memtx_tree_iterator_next;
+		break;
+	case ITER_LT:
+	case ITER_LE:
+		it->base.next = memtx_tree_iterator_prev;
+		break;
+	case ITER_GE:
+	case ITER_GT:
+		it->base.next = memtx_tree_iterator_next;
+		break;
+	default:
+		/* The type was checked in initIterator. */
+		assert(false);
+	}
+}
+
+static int
+memtx_tree_iterator_start(struct iterator *iterator, struct tuple **ret)
+{
+	*ret = NULL;
+	struct memtx_tree_iterator *it = (struct memtx_tree_iterator *)iterator;
+	it->base.next = memtx_tree_iterator_dummie;
+	const struct bps_tree *tree = it->tree;
+	enum iterator_type type = it->type;
+	bool exact = false;
+	assert(MEMTX_TREE_ELEM_GET(&it->current) == NULL);
+	uint32_t part_count;
+	const char *key = MEMTX_TREE_KEY_GET(&it->key_data, &part_count);
+	if (key == NULL) {
+		if (iterator_type_is_reverse(it->type))
+			it->tree_iterator = bps_tree_iterator_last(tree);
+		else
+			it->tree_iterator = bps_tree_iterator_first(tree);
+	} else {
+		if (type == ITER_ALL || type == ITER_EQ ||
+		    type == ITER_GE || type == ITER_LT) {
+			it->tree_iterator =
+				bps_tree_lower_bound(tree, &it->key_data,
+						     &exact);
+			if (type == ITER_EQ && !exact)
+				return 0;
+		} else {
+			/* ITER_GT, ITER_REQ, ITER_LE. */
+			it->tree_iterator =
+				bps_tree_upper_bound(tree, &it->key_data,
+						     &exact);
+			if (type == ITER_REQ && !exact)
+				return 0;
+		}
+		if (iterator_type_is_reverse(type)) {
+			/*
+			 * Because of limitations of tree search
+			 * API we use use lower_bound for LT
+			 * search and upper_bound for LE and REQ
+			 * searches. Thus we found position to the
+			 * right of the target one. Let's make a
+			 * step to the left to reach target
+			 * position.
+			 * If we found an invalid iterator all the
+			 * elements in the tree are less (less or
+			 * equal) to the key, and iterator_next
+			 * call will convert the iterator to the
+			 * last position in the tree, that's what
+			 * we need.
+			 */
+			bps_tree_iterator_prev(it->tree, &it->tree_iterator);
+		}
+	}
+
+	memtx_tree_elem *res =
+		bps_tree_iterator_get_elem(it->tree, &it->tree_iterator);
+	if (!res)
+		return 0;
+	*ret = MEMTX_TREE_ELEM_GET(res);
+	MEMTX_TREE_ELEM_SET(&it->current, *ret, it->index_def->key_def);
+	tuple_ref(*ret);
+	memtx_tree_iterator_set_next_method(it);
+	return 0;
+}
+
+/* }}} */
+
+/* {{{ MemtxTree  ***********************************************/
+
+/**
+ * Return the key def to use for comparing tuples stored
+ * in the given tree index.
+ *
+ * We use extended key def for non-unique and nullable
+ * indexes. Unique but nullable index can store multiple
+ * NULLs. To correctly compare these NULLs extended key
+ * def must be used. For details @sa tuple_compare.cc.
+ */
+static struct key_def *
+memtx_tree_index_cmp_def(struct memtx_tree_index *index)
+{
+	struct index_def *def = index->base.def;
+	return def->opts.is_unique && !def->key_def->is_nullable ?
+	       def->key_def : def->cmp_def;
+}
+
+static void
+memtx_tree_index_free(struct memtx_tree_index *index)
+{
+	bps_tree_destroy(&index->tree);
+	free(index->build_array);
+	free(index);
+}
+
+static void
+memtx_tree_index_gc_run(struct memtx_gc_task *task, bool *done)
+{
+	/*
+	 * Yield every 1K tuples to keep latency < 0.1 ms.
+	 * Yield more often in debug mode.
+	 */
+#ifdef NDEBUG
+	enum { YIELD_LOOPS = 1000 };
+#else
+	enum { YIELD_LOOPS = 10 };
+#endif
+
+	struct memtx_tree_index *index =
+		container_of(task, struct memtx_tree_index, gc_task);
+	struct bps_tree *tree = &index->tree;
+	struct bps_tree_iterator *itr = &index->gc_iterator;
+
+	unsigned int loops = 0;
+	while (!bps_tree_iterator_is_invalid(itr)) {
+		memtx_tree_elem *res =
+			bps_tree_iterator_get_elem(tree, itr);
+		struct tuple *tuple = MEMTX_TREE_ELEM_GET(res);;
+		bps_tree_iterator_next(tree, itr);
+		tuple_unref(tuple);
+		if (++loops >= YIELD_LOOPS) {
+			*done = false;
+			return;
+		}
+	}
+	*done = true;
+}
+
+static void
+memtx_tree_index_gc_free(struct memtx_gc_task *task)
+{
+	struct memtx_tree_index *index =
+		container_of(task, struct memtx_tree_index, gc_task);
+	memtx_tree_index_free(index);
+}
+
+static const struct memtx_gc_task_vtab memtx_tree_index_gc_vtab = {
+	.run = memtx_tree_index_gc_run,
+	.free = memtx_tree_index_gc_free,
+};
+
+static void
+memtx_tree_index_destroy(struct index *base)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	struct memtx_engine *memtx = (struct memtx_engine *)base->engine;
+	if (base->def->iid == 0) {
+		/*
+		 * Primary index. We need to free all tuples
+		 * stored in the index, which may take a while.
+		 * Schedule a background task in order not to
+		 * block tx thread.
+		 */
+		index->gc_task.vtab = &memtx_tree_index_gc_vtab;
+		index->gc_iterator = bps_tree_iterator_first(&index->tree);
+		memtx_engine_schedule_gc(memtx, &index->gc_task);
+	} else {
+		/*
+		 * Secondary index. Destruction is fast, no need
+		 * to hand over to background fiber.
+		 */
+		memtx_tree_index_free(index);
+	}
+}
+
+static void
+memtx_tree_index_update_def(struct index *base)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	index->tree.arg = memtx_tree_index_cmp_def(index);
+}
+
+static bool
+memtx_tree_index_depends_on_pk(struct index *base)
+{
+	struct index_def *def = base->def;
+	/* See comment to memtx_tree_index_cmp_def(). */
+	return !def->opts.is_unique || def->key_def->is_nullable;
+}
+
+static ssize_t
+memtx_tree_index_size(struct index *base)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	return bps_tree_size(&index->tree);
+}
+
+static ssize_t
+memtx_tree_index_bsize(struct index *base)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	return bps_tree_mem_used(&index->tree);
+}
+
+static int
+memtx_tree_index_random(struct index *base, uint32_t rnd, struct tuple **result)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	memtx_tree_elem *res = bps_tree_random(&index->tree, rnd);
+	*result = res != NULL ? MEMTX_TREE_ELEM_GET(res) : NULL;
+	return 0;
+}
+
+static ssize_t
+memtx_tree_index_count(struct index *base, enum iterator_type type,
+		       const char *key, uint32_t part_count)
+{
+	if (type == ITER_ALL) {
+		/* Optimization. */
+		return memtx_tree_index_size(base);
+	}
+	return generic_index_count(base, type, key, part_count);
+}
+
+static int
+memtx_tree_index_get(struct index *base, const char *key,
+		     uint32_t part_count, struct tuple **result)
+{
+	assert(base->def->opts.is_unique &&
+	       part_count == base->def->key_def->part_count);
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	memtx_tree_key key_data;
+	struct key_def *cmp_def = memtx_tree_index_cmp_def(index);
+	MEMTX_TREE_KEY_SET(&key_data, key, part_count, cmp_def);
+	memtx_tree_elem *res = bps_tree_find(&index->tree, &key_data);
+	*result = res != NULL ? MEMTX_TREE_ELEM_GET(res) : NULL;
+	return 0;
+}
+
+static int
+memtx_tree_index_insert_tuple(struct index *base, struct tuple *tuple,
+			      struct tuple **replaced)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	memtx_tree_elem data;
+	MEMTX_TREE_ELEM_SET(&data, tuple, index->tree.arg);
+	memtx_tree_elem data_replaced;
+	memset(&data_replaced, 0, sizeof(data_replaced));
+	int rc = bps_tree_insert(&index->tree, data, &data_replaced);
+	if (replaced != NULL)
+		*replaced = MEMTX_TREE_ELEM_GET(&data_replaced);
+	return rc;
+}
+
+static void
+memtx_tree_index_delete_tuple(struct index *base, struct tuple *tuple)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	memtx_tree_elem data;
+	MEMTX_TREE_ELEM_SET(&data, tuple, index->tree.arg);
+	bps_tree_delete(&index->tree, data);
+}
+
+static int
+memtx_tree_index_replace(struct index *base, struct tuple *old_tuple,
+			 struct tuple *new_tuple, enum dup_replace_mode mode,
+			 struct tuple **result)
+{
+	if (new_tuple) {
+		struct tuple *dup_tuple = NULL;
+
+		/* Try to optimistically replace the new_tuple. */
+		int tree_res =
+			memtx_tree_index_insert_tuple(base, new_tuple,
+						      &dup_tuple);
+		if (tree_res) {
+			diag_set(OutOfMemory, MEMTX_EXTENT_SIZE,
+				 "memtx_tree_index", "replace");
+			return -1;
+		}
+
+		uint32_t errcode =
+			replace_check_dup(old_tuple, dup_tuple, mode);
+		if (errcode) {
+			memtx_tree_index_delete_tuple(base, new_tuple);
+			if (dup_tuple)
+				memtx_tree_index_insert_tuple(base, dup_tuple, 0);
+			struct space *sp = space_cache_find(base->def->space_id);
+			if (sp != NULL) {
+				diag_set(ClientError, errcode, base->def->name,
+					 space_name(sp));
+			}
+			return -1;
+		}
+		if (dup_tuple) {
+			*result = dup_tuple;
+			return 0;
+		}
+	}
+	if (old_tuple)
+		memtx_tree_index_delete_tuple(base, old_tuple);
+	*result = old_tuple;
+	return 0;
+}
+
+static struct iterator *
+memtx_tree_index_create_iterator(struct index *base, enum iterator_type type,
+				 const char *key, uint32_t part_count)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	struct memtx_engine *memtx = (struct memtx_engine *)base->engine;
+
+	assert(part_count == 0 || key != NULL);
+	if (type > ITER_GT) {
+		diag_set(UnsupportedIndexFeature, base->def,
+			 "requested iterator type");
+		return NULL;
+	}
+
+	if (part_count == 0) {
+		/*
+		 * If no key is specified, downgrade equality
+		 * iterators to a full range.
+		 */
+		type = iterator_type_is_reverse(type) ? ITER_LE : ITER_GE;
+		key = NULL;
+	}
+
+	struct memtx_tree_iterator *it =
+		mempool_alloc(&memtx->tree_iterator_pool);
+	if (it == NULL) {
+		diag_set(OutOfMemory, sizeof(struct memtx_tree_iterator),
+			 "memtx_tree_index", "iterator");
+		return NULL;
+	}
+	iterator_create(&it->base, base);
+	it->pool = &memtx->tree_iterator_pool;
+	it->base.next = memtx_tree_iterator_start;
+	it->base.free = memtx_tree_iterator_free;
+	it->type = type;
+	struct key_def *cmp_def = memtx_tree_index_cmp_def(index);
+	MEMTX_TREE_KEY_SET(&it->key_data, key, part_count, cmp_def);
+	it->index_def = base->def;
+	it->tree = &index->tree;
+	it->tree_iterator = bps_tree_invalid_iterator();
+	MEMTX_TREE_ELEM_SET(&it->current, NULL, it->index_def->key_def);
+	return (struct iterator *)it;
+}
+
+static void
+memtx_tree_index_begin_build(struct index *base)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	assert(bps_tree_size(&index->tree) == 0);
+	(void)index;
+}
+
+static int
+memtx_tree_index_reserve(struct index *base, uint32_t size_hint)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	if (size_hint < index->build_array_alloc_size)
+		return 0;
+	memtx_tree_elem *tmp =
+		realloc(index->build_array, size_hint * sizeof(*tmp));
+	if (tmp == NULL) {
+		diag_set(OutOfMemory, size_hint * sizeof(*tmp),
+			 "memtx_tree_index", "reserve");
+		return -1;
+	}
+	index->build_array = tmp;
+	index->build_array_alloc_size = size_hint;
+	return 0;
+}
+
+static int
+memtx_tree_index_build_next(struct index *base, struct tuple *tuple)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	if (index->build_array == NULL) {
+		index->build_array =
+			(memtx_tree_elem *)malloc(MEMTX_EXTENT_SIZE);
+		if (index->build_array == NULL) {
+			diag_set(OutOfMemory, MEMTX_EXTENT_SIZE,
+				 "memtx_tree_index", "build_next");
+			return -1;
+		}
+		index->build_array_alloc_size =
+			MEMTX_EXTENT_SIZE / sizeof(index->build_array[0]);
+	}
+	assert(index->build_array_size <= index->build_array_alloc_size);
+	if (index->build_array_size == index->build_array_alloc_size) {
+		index->build_array_alloc_size = index->build_array_alloc_size +
+					index->build_array_alloc_size / 2;
+		memtx_tree_elem *tmp = realloc(index->build_array,
+				index->build_array_alloc_size * sizeof(*tmp));
+		if (tmp == NULL) {
+			diag_set(OutOfMemory, index->build_array_alloc_size *
+				 sizeof(*tmp), "memtx_tree_index", "build_next");
+			return -1;
+		}
+		index->build_array = tmp;
+	}
+	MEMTX_TREE_ELEM_SET(&index->build_array[index->build_array_size++],
+			    tuple, memtx_tree_index_cmp_def(index));
+	return 0;
+}
+
+static void
+memtx_tree_index_end_build(struct index *base)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	struct key_def *cmp_def = memtx_tree_index_cmp_def(index);
+	qsort_arg(index->build_array, index->build_array_size,
+		  sizeof(index->build_array[0]), memtx_tree_qcompare, cmp_def);
+	bps_tree_build(&index->tree, index->build_array,
+		       index->build_array_size);
+
+	free(index->build_array);
+	index->build_array = NULL;
+	index->build_array_size = 0;
+	index->build_array_alloc_size = 0;
+}
+
+struct memtx_tree_snapshot_iterator {
+	struct snapshot_iterator base;
+	struct bps_tree *tree;
+	struct bps_tree_iterator tree_iterator;
+};
+
+static void
+memtx_tree_snapshot_iterator_free(struct snapshot_iterator *iterator)
+{
+	assert(iterator->free == memtx_tree_snapshot_iterator_free);
+	struct memtx_tree_snapshot_iterator *it =
+		(struct memtx_tree_snapshot_iterator *)iterator;
+	bps_tree_iterator_destroy(it->tree, &it->tree_iterator);
+	free(iterator);
+}
+
+static const char *
+memtx_tree_snapshot_iterator_next(struct snapshot_iterator *iterator,
+				  uint32_t *size)
+{
+	assert(iterator->free == memtx_tree_snapshot_iterator_free);
+	struct memtx_tree_snapshot_iterator *it =
+		(struct memtx_tree_snapshot_iterator *)iterator;
+	memtx_tree_elem *res =
+		bps_tree_iterator_get_elem(it->tree, &it->tree_iterator);
+	if (res == NULL)
+		return NULL;
+	bps_tree_iterator_next(it->tree, &it->tree_iterator);
+	return tuple_data_range(MEMTX_TREE_ELEM_GET(res), size);
+}
+
+/**
+ * Create an ALL iterator with personal read view so further
+ * index modifications will not affect the iteration results.
+ * Must be destroyed by iterator->free after usage.
+ */
+static struct snapshot_iterator *
+memtx_tree_index_create_snapshot_iterator(struct index *base)
+{
+	struct memtx_tree_index *index = (struct memtx_tree_index *)base;
+	struct memtx_tree_snapshot_iterator *it =
+		(struct memtx_tree_snapshot_iterator *)calloc(1, sizeof(*it));
+	if (it == NULL) {
+		diag_set(OutOfMemory, sizeof(struct memtx_tree_snapshot_iterator),
+			 "memtx_tree_index", "create_snapshot_iterator");
+		return NULL;
+	}
+
+	it->base.free = memtx_tree_snapshot_iterator_free;
+	it->base.next = memtx_tree_snapshot_iterator_next;
+	it->tree = &index->tree;
+	it->tree_iterator = bps_tree_iterator_first(&index->tree);
+	bps_tree_iterator_freeze(&index->tree, &it->tree_iterator);
+	return (struct snapshot_iterator *) it;
+}
+
+static const struct index_vtab memtx_tree_index_vtab = {
+	/* .destroy = */ memtx_tree_index_destroy,
+	/* .commit_create = */ generic_index_commit_create,
+	/* .abort_create = */ generic_index_abort_create,
+	/* .commit_modify = */ generic_index_commit_modify,
+	/* .commit_drop = */ generic_index_commit_drop,
+	/* .update_def = */ memtx_tree_index_update_def,
+	/* .depends_on_pk = */ memtx_tree_index_depends_on_pk,
+	/* .def_change_requires_rebuild = */
+		memtx_index_def_change_requires_rebuild,
+	/* .size = */ memtx_tree_index_size,
+	/* .bsize = */ memtx_tree_index_bsize,
+	/* .min = */ generic_index_min,
+	/* .max = */ generic_index_max,
+	/* .random = */ memtx_tree_index_random,
+	/* .count = */ memtx_tree_index_count,
+	/* .get = */ memtx_tree_index_get,
+	/* .replace = */ memtx_tree_index_replace,
+	/* .create_iterator = */ memtx_tree_index_create_iterator,
+	/* .create_snapshot_iterator = */
+		memtx_tree_index_create_snapshot_iterator,
+	/* .stat = */ generic_index_stat,
+	/* .compact = */ generic_index_compact,
+	/* .reset_stat = */ generic_index_reset_stat,
+	/* .begin_build = */ memtx_tree_index_begin_build,
+	/* .reserve = */ memtx_tree_index_reserve,
+	/* .build_next = */ memtx_tree_index_build_next,
+	/* .end_build = */ memtx_tree_index_end_build,
+};
+
+struct index *
+memtx_tree_index_new(struct memtx_engine *memtx, struct index_def *def)
+{
+	if (!mempool_is_initialized(&memtx->tree_iterator_pool)) {
+		mempool_create(&memtx->tree_iterator_pool, cord_slab_cache(),
+			       MEMTX_TREE_ITERATOR_SIZE);
+	}
+
+	struct memtx_tree_index *index =
+		(struct memtx_tree_index *)calloc(1, sizeof(*index));
+	if (index == NULL) {
+		diag_set(OutOfMemory, sizeof(*index),
+			 "malloc", "struct memtx_tree_index");
+		return NULL;
+	}
+	if (index_create(&index->base, (struct engine *)memtx,
+			 &memtx_tree_index_vtab, def) != 0) {
+		free(index);
+		return NULL;
+	}
+
+	struct key_def *cmp_def = memtx_tree_index_cmp_def(index);
+	bps_tree_create(&index->tree, cmp_def, memtx_index_extent_alloc,
+			memtx_index_extent_free, memtx);
+	return (struct index *)index;
+}
+
+/* }}} */
+
+#undef bps_tree
+#undef _tree_api_name
+#undef bps_tree_iterator
+#undef bps_tree_iterator_is_invalid
+#undef bps_tree_iterator_first
+#undef bps_tree_iterator_last
+#undef bps_tree_iterator_get_elem
+#undef bps_tree_iterator_next
+#undef bps_tree_iterator_prev
+#undef bps_tree_iterator_freeze
+#undef bps_tree_iterator_destroy
+#undef bps_tree_create
+#undef bps_tree_build
+#undef bps_tree_destroy
+#undef bps_tree_find
+#undef bps_tree_insert
+#undef bps_tree_delete
+#undef bps_tree_size
+#undef bps_tree_mem_used
+#undef bps_tree_random
+#undef bps_tree_invalid_iterator
+#undef bps_tree_lower_bound
+#undef bps_tree_upper_bound
+#undef bps_tree_lower_bound_elem
+#undef bps_tree_upper_bound_elem
+
+#undef _api_name
+#undef memtx_tree_index
+#undef memtx_tree_qcompare
+#undef memtx_tree_iterator
+#undef memtx_tree_iterator_free
+#undef memtx_tree_iterator_dummie
+#undef memtx_tree_iterator_next
+#undef memtx_tree_iterator_prev
+#undef memtx_tree_iterator_next_equal
+#undef memtx_tree_iterator_prev_equal
+#undef memtx_tree_iterator_set_next_method
+#undef memtx_tree_iterator_start
+#undef memtx_tree_index_cmp_def
+#undef memtx_tree_index_free
+#undef memtx_tree_index_gc_run
+#undef memtx_tree_index_gc_free
+#undef memtx_tree_index_gc_vtab
+#undef memtx_tree_index_destroy
+#undef memtx_tree_index_update_def
+#undef memtx_tree_index_depends_on_pk
+#undef memtx_tree_index_size
+#undef memtx_tree_index_bsize
+#undef memtx_tree_index_random
+#undef memtx_tree_index_count
+#undef memtx_tree_index_get
+#undef memtx_tree_index_insert_tuple
+#undef memtx_tree_index_delete_tuple
+#undef memtx_tree_index_replace
+#undef memtx_tree_index_create_iterator
+#undef memtx_tree_index_begin_build
+#undef memtx_tree_index_build_next
+#undef memtx_tree_index_reserve
+#undef memtx_tree_index_end_build
+#undef memtx_tree_snapshot_iterator
+#undef memtx_tree_snapshot_iterator_free
+#undef memtx_tree_snapshot_iterator_next
+#undef memtx_tree_index_create_snapshot_iterator
+#undef memtx_tree_index_vtab
+#undef memtx_tree_index_new
+
-- 
2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v1 3/4] box: introduce tuple compare hint
  2019-02-05 11:58 [PATCH v1 0/4] box: introduce hint option for memtx tree index Kirill Shcherbatov
  2019-02-05 11:58 ` [PATCH v1 1/4] lib: introduce BPS_TREE_EQUAL custom comparator Kirill Shcherbatov
  2019-02-05 11:58 ` [PATCH v1 2/4] box: rework memtx_tree class to be reusable Kirill Shcherbatov
@ 2019-02-05 11:58 ` Kirill Shcherbatov
  2019-02-11 15:57   ` Vladimir Davydov
  2019-02-05 11:58 ` [PATCH v1 4/4] box: introduce hint option for memtx tree index Kirill Shcherbatov
  3 siblings, 1 reply; 9+ messages in thread
From: Kirill Shcherbatov @ 2019-02-05 11:58 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: Alexandr Lyapunov

From: Alexandr Lyapunov <aleks@tarantool.org>

Implement functions for retrieving tuple hints for a particular
key_def. Hint is an integer that can be used for tuple comparison
optimization: if a hint of one tuple is less than a hint of another
then the first tuple is definitely less than the second; only if
hints are equal tuple_compare must be called for getting comparison
result.

Hints are calculated using only the first part of key_def.

Hints are only useful when:
 * they are precalculated and stored along with the tuple;
calculation is not cheap (cheaper than tuple_compare btw) but
precalculated hints allow to compare tuples without even fetching
tuple data.
 * first part of key_def is 'string', 'unsigned' or 'integer'
 * since hint is calculated using only the first part of key_def
(and only first several characters if it is a string) this part
must be different with high probability for every tuple pair.

@kshcherbatov: rebase to 2.1, fix conflicts

Needed for #3961
---
 src/box/key_def.c        |   1 +
 src/box/key_def.h        |  11 ++++
 src/box/tuple_compare.cc | 131 +++++++++++++++++++++++++++++++++++++++
 src/box/tuple_compare.h  |  40 ++++++++++++
 src/coll.c               |  23 +++++++
 src/coll.h               |   3 +
 6 files changed, 209 insertions(+)

diff --git a/src/box/key_def.c b/src/box/key_def.c
index 9411ade39..ceca1ebee 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -135,6 +135,7 @@ key_def_set_cmp(struct key_def *def)
 	def->tuple_compare_with_key = tuple_compare_with_key_create(def);
 	tuple_hash_func_set(def);
 	tuple_extract_key_set(def);
+	tuple_hint_set(def);
 }
 
 static void
diff --git a/src/box/key_def.h b/src/box/key_def.h
index 85bed92bb..ff86de3ca 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -153,6 +153,13 @@ typedef uint32_t (*tuple_hash_t)(const struct tuple *tuple,
 typedef uint32_t (*key_hash_t)(const char *key,
 				struct key_def *key_def);
 
+/** @copydoc tuple_hint() */
+typedef uint64_t (*tuple_hint_t)(const struct tuple *tuple,
+				 struct key_def *key_def);
+
+/** @copydoc key_hint() */
+typedef uint64_t (*key_hint_t)(const char *key, struct key_def *key_def);
+
 /* Definition of a multipart key. */
 struct key_def {
 	/** @see tuple_compare() */
@@ -167,6 +174,10 @@ struct key_def {
 	tuple_hash_t tuple_hash;
 	/** @see key_hash() */
 	key_hash_t key_hash;
+	/** @see tuple_hint() */
+	tuple_hint_t tuple_hint;
+	/** @see key_hint() */
+	key_hint_t key_hint;
 	/**
 	 * Minimal part count which always is unique. For example,
 	 * if a secondary index is unique, then
diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index 1d1ab8711..b8f91972c 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -1321,4 +1321,135 @@ tuple_compare_with_key_create(const struct key_def *def)
 	       compare_with_key_slowpath_funcs[cmp_func_idx];
 }
 
+static uint64_t
+key_hint_default(const char *key, struct key_def *key_def)
+{
+	(void)key;
+	(void)key_def;
+	return 0;
+}
+
+static uint64_t
+tuple_hint_default(const struct tuple *tuple, struct key_def *key_def)
+{
+	(void)tuple;
+	(void)key_def;
+	return 0;
+}
+
+static uint64_t
+key_hint_uint(const char *key, struct key_def *key_def)
+{
+	(void)key_def;
+	assert(key_def->parts->type == FIELD_TYPE_UNSIGNED);
+	assert(mp_typeof(*key) == MP_UINT);
+	return mp_decode_uint(&key);
+}
+
+static uint64_t
+tuple_hint_uint(const struct tuple *tuple, struct key_def *key_def)
+{
+	const char *field = tuple_field_by_part(tuple, key_def->parts);
+	return key_hint_uint(field, key_def);
+}
+
+static uint64_t
+key_hint_int(const char *key, struct key_def *key_def)
+{
+	(void)key_def;
+	assert(key_def->parts->type == FIELD_TYPE_INTEGER);
+	if (mp_typeof(*key) == MP_UINT) {
+		uint64_t val = mp_decode_uint(&key);
+		if (val > INT64_MAX)
+			return INT64_MAX;
+		return val - (uint64_t)INT64_MIN;
+	} else {
+		assert(mp_typeof(*key) == MP_INT);
+		int64_t val = mp_decode_int(&key);
+		return (uint64_t)val - (uint64_t)INT64_MIN;
+	}
+}
+
+static uint64_t
+tuple_hint_int(const struct tuple *tuple, struct key_def *key_def)
+{
+	const char *field = tuple_field_by_part(tuple, key_def->parts);
+	return key_hint_int(field, key_def);
+}
+
+static uint64_t
+key_hint_string(const char *key, struct key_def *key_def)
+{
+	(void)key_def;
+	assert(key_def->parts->type == FIELD_TYPE_STRING);
+	assert(mp_typeof(*key) == MP_STR);
+	uint32_t len;
+	const unsigned char *str =
+		(const unsigned char *)mp_decode_str(&key, &len);
+	uint64_t result = 0;
+	uint32_t process_len = MIN(len, 7);
+	for (uint32_t i = 0; i < process_len; i++) {
+		result <<= 9;
+		result |= 0x100;
+		result |= str[i];
+	}
+	result <<= 9 * (7 - process_len);
+	return result;
+}
+
+static uint64_t
+tuple_hint_string(const struct tuple *tuple, struct key_def *key_def)
+{
+	const char *field = tuple_field_by_part(tuple, key_def->parts);
+	return key_hint_string(field, key_def);
+}
+
+static uint64_t
+key_hint_string_coll(const char *key, struct key_def *key_def)
+{
+	assert(key_def->parts->type == FIELD_TYPE_STRING);
+	assert(mp_typeof(*key) == MP_STR);
+	assert(key_def->parts->coll != NULL);
+	uint32_t len;
+	const char *str = mp_decode_str(&key, &len);
+	return key_def->parts->coll->hint(str, len, key_def->parts->coll);
+}
+
+static uint64_t
+tuple_hint_string_coll(const struct tuple *tuple, struct key_def *key_def)
+{
+	const char *field = tuple_field_by_part(tuple, key_def->parts);
+	return key_hint_string_coll(field, key_def);
+}
+
+void
+tuple_hint_set(struct key_def *def)
+{
+	def->key_hint = key_hint_default;
+	def->tuple_hint = tuple_hint_default;
+	if (key_part_is_nullable(def->parts))
+		return;
+	if (def->parts->type == FIELD_TYPE_STRING && def->parts->coll != NULL) {
+		def->key_hint = key_hint_string_coll;
+		def->tuple_hint = tuple_hint_string_coll;
+		return;
+	}
+	switch (def->parts->type) {
+	case FIELD_TYPE_UNSIGNED:
+		def->key_hint = key_hint_uint;
+		def->tuple_hint = tuple_hint_uint;
+		break;
+	case FIELD_TYPE_INTEGER:
+		def->key_hint = key_hint_int;
+		def->tuple_hint = tuple_hint_int;
+		break;
+	case FIELD_TYPE_STRING:
+		def->key_hint = key_hint_string;
+		def->tuple_hint = tuple_hint_string;
+		break;
+	default:
+		break;
+	};
+}
+
 /* }}} tuple_compare_with_key */
diff --git a/src/box/tuple_compare.h b/src/box/tuple_compare.h
index e3a63204f..be96ba38c 100644
--- a/src/box/tuple_compare.h
+++ b/src/box/tuple_compare.h
@@ -67,6 +67,46 @@ tuple_compare_create(const struct key_def *key_def);
 tuple_compare_with_key_t
 tuple_compare_with_key_create(const struct key_def *key_def);
 
+/**
+ * Get a comparison hint of a tuple.
+ * Hint is such a function h(tuple) in terms of particular key_def that
+ * has follows rules:
+ * if h(t1) < h(t2) then t1 < t2;
+ * if h(t1) > h(t2) then t1 > t2;
+ * if t1 == t2 then h(t1) == h(t2);
+ * These rules means that instead of direct tuple vs tuple (or tuple vs key)
+ * comparison one may compare theirs hints first; and only if theirs hints
+ * are equal compare the tuples themselves.
+ * @param tuple - tuple to get hint of.
+ * @param key_def - key_def that defines which comparison is used.
+ * @return the hint.
+ */
+static inline uint64_t
+tuple_hint(const struct tuple *tuple, struct key_def *key_def)
+{
+	return key_def->tuple_hint(tuple, key_def);
+}
+
+/**
+ * Get a comparison hint of a key.
+ * @See tuple_hint for hint term definition.
+ * @param key - key to get hint of.
+ * @param key_def - key_def that defines which comparison is used.
+ * @return the hint.
+ */
+static inline uint64_t
+key_hint(const char *key, struct key_def *key_def)
+{
+	return key_def->key_hint(key, key_def);
+}
+
+/**
+ * Initialize tuple_hint() and key_hint() functions for the key_def.
+ * @param key_def key definition to set up.
+ */
+void
+tuple_hint_set(struct key_def *key_def);
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */
diff --git a/src/coll.c b/src/coll.c
index 6d9c44dbf..cff2899cd 100644
--- a/src/coll.c
+++ b/src/coll.c
@@ -133,6 +133,28 @@ coll_bin_hash(const char *s, size_t s_len, uint32_t *ph, uint32_t *pcarry,
 	return s_len;
 }
 
+/** Get a compare hint of a string using ICU collation. */
+static uint64_t
+coll_icu_hint(const char *s, size_t s_len, struct coll *coll)
+{
+	UCharIterator itr;
+	uiter_setUTF8(&itr, s, s_len);
+	uint8_t buf[7];
+	uint32_t state[2] = {0, 0};
+	UErrorCode status = U_ZERO_ERROR;
+	int32_t got = ucol_nextSortKeyPart(coll->collator, &itr, state, buf,
+					   sizeof(buf), &status);
+	assert(got >= 0 && got <= 7);
+	uint64_t result = 0;
+	for (int32_t i = 0; i < got; i++) {
+		result <<= 9;
+		result |= 0x100;
+		result |= buf[i];
+	}
+	result <<= 9 * (7 - got);
+	return result;
+}
+
 /**
  * Set up ICU collator and init cmp and hash members of collation.
  * @param coll Collation to set up.
@@ -242,6 +264,7 @@ coll_icu_init_cmp(struct coll *coll, const struct coll_def *def)
 	}
 	coll->cmp = coll_icu_cmp;
 	coll->hash = coll_icu_hash;
+	coll->hint = coll_icu_hint;
 	return 0;
 }
 
diff --git a/src/coll.h b/src/coll.h
index 9c725712a..4030d38de 100644
--- a/src/coll.h
+++ b/src/coll.h
@@ -47,6 +47,8 @@ typedef int (*coll_cmp_f)(const char *s, size_t s_len, const char *t,
 typedef uint32_t (*coll_hash_f)(const char *s, size_t s_len, uint32_t *ph,
 				uint32_t *pcarry, struct coll *coll);
 
+typedef uint64_t (*coll_hint_f)(const char *s, size_t s_len, struct coll *coll);
+
 struct UCollator;
 
 /**
@@ -61,6 +63,7 @@ struct coll {
 	/** String comparator. */
 	coll_cmp_f cmp;
 	coll_hash_f hash;
+	coll_hint_f hint;
 	/** Reference counter. */
 	int refs;
 	/**
-- 
2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v1 4/4] box: introduce hint option for memtx tree index
  2019-02-05 11:58 [PATCH v1 0/4] box: introduce hint option for memtx tree index Kirill Shcherbatov
                   ` (2 preceding siblings ...)
  2019-02-05 11:58 ` [PATCH v1 3/4] box: introduce tuple compare hint Kirill Shcherbatov
@ 2019-02-05 11:58 ` Kirill Shcherbatov
  2019-02-11 16:12   ` Vladimir Davydov
  3 siblings, 1 reply; 9+ messages in thread
From: Kirill Shcherbatov @ 2019-02-05 11:58 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov

If an index has certain option, a compare hint must be calculated
and stored within bps_tree along with struct tuple pointer. This
hint must be used for tuple precomparison in order to avoid memory
fetching and significantly improve tree performance.

Such a hinted tree will cost twice amount of memory - 20 byte per
tuple on average.
Enabled hint option improve performance on average by 15%; Select
operations are significantly accelerated (there are scenarios in
which the difference reaches 200-250%).

Test set from original @alyapunov patch.

Needed for #3961
---
 src/box/index_def.c                 |   2 +
 src/box/index_def.h                 |   6 +
 src/box/lua/schema.lua              |  12 ++
 src/box/lua/space.cc                |   5 +
 src/box/memtx_tree.c                | 192 +++++++++++++++++++++++-
 src/box/memtx_tree_impl.h           |   2 +-
 test/box/tree_pk.result             | 220 ++++++++++++++++++++++++++++
 test/box/tree_pk.test.lua           |  88 +++++++++++
 test/box/tree_pk_multipart.result   | 155 ++++++++++++++++++++
 test/box/tree_pk_multipart.test.lua |  65 ++++++++
 10 files changed, 745 insertions(+), 2 deletions(-)

diff --git a/src/box/index_def.c b/src/box/index_def.c
index c52aa38d1..4d3fadc6e 100644
--- a/src/box/index_def.c
+++ b/src/box/index_def.c
@@ -49,6 +49,7 @@ const struct index_opts index_opts_default = {
 	/* .bloom_fpr           = */ 0.05,
 	/* .lsn                 = */ 0,
 	/* .stat                = */ NULL,
+	/* .hint                = */ false,
 };
 
 const struct opt_def index_opts_reg[] = {
@@ -62,6 +63,7 @@ const struct opt_def index_opts_reg[] = {
 	OPT_DEF("run_size_ratio", OPT_FLOAT, struct index_opts, run_size_ratio),
 	OPT_DEF("bloom_fpr", OPT_FLOAT, struct index_opts, bloom_fpr),
 	OPT_DEF("lsn", OPT_INT64, struct index_opts, lsn),
+	OPT_DEF("hint", OPT_BOOL, struct index_opts, hint),
 	OPT_END,
 };
 
diff --git a/src/box/index_def.h b/src/box/index_def.h
index 7717ecd64..74814bef5 100644
--- a/src/box/index_def.h
+++ b/src/box/index_def.h
@@ -157,6 +157,10 @@ struct index_opts {
 	 * LSN from the time of index creation.
 	 */
 	int64_t lsn;
+	/**
+	 * Use hint optimization for tree index.
+	 */
+	bool hint;
 	/**
 	 * SQL specific statistics concerning tuples
 	 * distribution for query planer. It is automatically
@@ -207,6 +211,8 @@ index_opts_cmp(const struct index_opts *o1, const struct index_opts *o2)
 		return o1->run_size_ratio < o2->run_size_ratio ? -1 : 1;
 	if (o1->bloom_fpr != o2->bloom_fpr)
 		return o1->bloom_fpr < o2->bloom_fpr ? -1 : 1;
+	if (o1->hint != o2->hint)
+		return o1->hint < o2->hint ? -1 : 1;
 	return 0;
 }
 
diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
index 0daf484b8..f05d71875 100644
--- a/src/box/lua/schema.lua
+++ b/src/box/lua/schema.lua
@@ -761,6 +761,7 @@ local index_options = {
     range_size = 'number',
     page_size = 'number',
     bloom_fpr = 'number',
+    hint = 'boolean',
 }
 
 --
@@ -828,6 +829,11 @@ box.schema.index.create = function(space_id, name, options)
         options_defaults = {}
     end
     options = update_param_table(options, options_defaults)
+    if options.hint and
+       (options.type ~= 'tree' or box.space[space_id].engine ~= 'memtx') then
+        box.error(box.error.MODIFY_INDEX, name, box.space[space_id].name,
+            "hint is only reasonable with memtx tree index")
+    end
 
     local _index = box.space[box.schema.INDEX_ID]
     local _vindex = box.space[box.schema.VINDEX_ID]
@@ -866,6 +872,7 @@ box.schema.index.create = function(space_id, name, options)
             run_count_per_level = options.run_count_per_level,
             run_size_ratio = options.run_size_ratio,
             bloom_fpr = options.bloom_fpr,
+            hint = options.hint,
     }
     local field_type_aliases = {
         num = 'unsigned'; -- Deprecated since 1.7.2
@@ -1018,6 +1025,11 @@ box.schema.index.alter = function(space_id, index_id, options)
             index_opts[k] = options[k]
         end
     end
+    if options.hint and
+       (options.type ~= 'tree' or box.space[space_id].engine ~= 'memtx') then
+        box.error(box.error.MODIFY_INDEX, options.name, box.space[space_id].name,
+            "hint is only reasonable with memtx tree index")
+    end
     if options.parts then
         local parts_can_be_simplified
         parts, parts_can_be_simplified =
diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index 1f152917e..bf6199ced 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -269,6 +269,11 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
 			lua_setfield(L, -2, "dimension");
 		}
 
+		if (index_def->type == TREE && index_opts->hint) {
+			lua_pushboolean(L, index_opts->hint);
+			lua_setfield(L, -2, "hint");
+		}
+
 		lua_pushstring(L, index_type_strs[index_def->type]);
 		lua_setfield(L, -2, "type");
 
diff --git a/src/box/memtx_tree.c b/src/box/memtx_tree.c
index 6b22883c2..426a11262 100644
--- a/src/box/memtx_tree.c
+++ b/src/box/memtx_tree.c
@@ -79,8 +79,198 @@ struct memtx_tuple_tree_key_data {
 
 /* }}} */
 
+/* {{{ Memtx hinted and hint_only tree class. *******************/
+
+/**
+ * Struct that is used as a key in BPS tree definition in
+ * metmx_hint_only_tree and metmx_hinted_tree.
+*/
+struct memtx_hinted_tree_key_data {
+	/** Sequence of msgpacked search fields. */
+	const char *key;
+	/** Number of msgpacked search fields. */
+	uint32_t part_count;
+	/**
+	 * Compare hint. Is calculated automatically on 'set'
+	 * operation with memtx_hinted_tree_key_data_set().
+	 */
+	uint64_t hint;
+};
+
+/**
+ * Struct that is used as a key in BPS tree definition in
+ * metmx_hint_only_tree and metmx_hinted_tree.
+ */
+struct memtx_hinted_tree_data {
+	/** Tuple this node is representing. */
+	struct tuple *tuple;
+	/**
+	 * Compare hint. Is calculated automatically on 'set'
+	 * operation with memtx_hinted_tree_data_set().
+	 */
+	uint64_t hint;
+};
+
+/**
+ * Compare memtx_hinted_tree records.
+ */
+static int
+memtx_hinted_tree_data_cmp(struct memtx_hinted_tree_data *a,
+			   struct memtx_hinted_tree_data *b,
+			   struct key_def *key_def)
+{
+	if (a->hint != b->hint)
+		return a->hint < b->hint ? -1 : 1;
+	return tuple_compare(a->tuple, b->tuple, key_def);
+}
+
+/**
+ * Compare memtx_hinted_tree record with key.
+ */
+static int
+memtx_hinted_tree_data_cmp_with_key(struct memtx_hinted_tree_data *a,
+				    struct memtx_hinted_tree_key_data *key,
+				    struct key_def *key_def)
+{
+	if (a->hint != key->hint)
+		return a->hint < key->hint ? -1 : 1;
+	return tuple_compare_with_key(a->tuple, key->key, key->part_count,
+				      key_def);
+}
+
+/**
+ * Initialize memtx_hinted_tree or memtx_hint_only_tree record
+ * with tuple and recalculate internal hint field.
+ */
+static void
+memtx_hinted_tree_data_set(struct memtx_hinted_tree_data *data,
+			   struct tuple *tuple, struct key_def *key_def)
+{
+	data->tuple = tuple;
+	data->hint = tuple != NULL ? tuple_hint(tuple, key_def) : 0;
+}
+
+/**
+ * Initialize memtx_hinted_tree or memtx_hint_only_tree key with
+ * key raw and part count and recalculate internal hint field.
+ */
+static void
+memtx_hinted_tree_key_data_set(struct memtx_hinted_tree_key_data *key_data,
+			       const char *key, uint32_t part_count,
+			       struct key_def *key_def)
+{
+	key_data->key = key;
+	key_data->part_count = part_count;
+	key_data->hint = key != NULL && part_count > 0 ?
+			 key_hint(key, key_def) : 0;
+}
+
+#define MEMTX_TREE_NAME metmx_hinted_tree
+#define memtx_tree_elem struct memtx_hinted_tree_data
+#define memtx_tree_key struct memtx_hinted_tree_key_data
+#define MEMTX_TREE_ELEM_EQUAL(elem_a_ptr, elem_b_ptr)				\
+	((elem_a_ptr)->tuple == (elem_b_ptr)->tuple)
+#define MEMTX_TREE_ELEM_CMP(elem_a_ptr, elem_b_ptr, key_def)			\
+	memtx_hinted_tree_data_cmp(elem_a_ptr, elem_b_ptr, key_def)
+#define MEMTX_TREE_ELEM_WITH_KEY_CMP(elem_ptr, key_ptr, key_def)		\
+	memtx_hinted_tree_data_cmp_with_key(elem_ptr, key_ptr, key_def)
+#define MEMTX_TREE_ELEM_SET(elem_ptr, tuple, key_def)				\
+	memtx_hinted_tree_data_set(elem_ptr, tuple, key_def)
+#define MEMTX_TREE_KEY_SET(key_ptr, key_val, part_count_val, key_def)		\
+	memtx_hinted_tree_key_data_set(key_ptr, key_val, part_count_val,	\
+				       key_def)
+#define MEMTX_TREE_ELEM_GET(elem_ptr) ((elem_ptr)->tuple)
+#define MEMTX_TREE_KEY_GET(key_ptr, part_count_ptr)				\
+	({*part_count_ptr = (key_ptr)->part_count; (key_ptr)->key;})
+
+#include "memtx_tree_impl.h"
+
+#undef memtx_tree_key
+#undef memtx_tree_elem
+#undef MEMTX_TREE_KEY_GET
+#undef MEMTX_TREE_ELEM_GET
+#undef MEMTX_TREE_KEY_SET
+#undef MEMTX_TREE_ELEM_SET
+#undef MEMTX_TREE_ELEM_WITH_KEY_CMP
+#undef MEMTX_TREE_ELEM_CMP
+#undef MEMTX_TREE_ELEM_EQUAL
+#undef memtx_tree_key
+#undef memtx_tree_elem
+#undef MEMTX_TREE_NAME
+
+/**
+ * Compare memtx_hinted_tree records.
+ */
+static inline int
+memtx_hint_only_tree_data_cmp(struct memtx_hinted_tree_data *a,
+			      struct memtx_hinted_tree_data *b,
+			      struct key_def *key_def)
+{
+	(void)key_def;
+	assert(key_def->part_count == 1 &&
+	       (key_def->parts->type == FIELD_TYPE_UNSIGNED ||
+	        key_def->parts->type == FIELD_TYPE_INTEGER));
+	return a->hint < b->hint ? -1 : a->hint > b->hint ? 1 : 0;
+}
+
+/**
+ * Compare memtx_hinted_tree record with key.
+ */
+static inline int
+memtx_hint_only_tree_data_cmp_with_key(struct memtx_hinted_tree_data *a,
+				       struct memtx_hinted_tree_key_data *key,
+				       struct key_def *key_def)
+{
+	(void)key_def;
+	assert(key_def->part_count == 1 &&
+	       (key_def->parts->type == FIELD_TYPE_UNSIGNED ||
+	        key_def->parts->type == FIELD_TYPE_INTEGER));
+	return a->hint < key->hint ? -1 : a->hint > key->hint ? 1 : 0;
+}
+
+#define MEMTX_TREE_NAME metmx_hint_only_tree
+#define memtx_tree_elem struct memtx_hinted_tree_data
+#define memtx_tree_key struct memtx_hinted_tree_key_data
+#define MEMTX_TREE_ELEM_EQUAL(elem_a_ptr, elem_b_ptr)				\
+	((elem_a_ptr)->tuple == (elem_b_ptr)->tuple)
+#define MEMTX_TREE_ELEM_CMP(elem_a_ptr, elem_b_ptr, key_def)			\
+	memtx_hint_only_tree_data_cmp(elem_a_ptr, elem_b_ptr, key_def)
+#define MEMTX_TREE_ELEM_WITH_KEY_CMP(elem_ptr, key_ptr, key_def)		\
+	memtx_hint_only_tree_data_cmp_with_key(elem_ptr, key_ptr, key_def)
+#define MEMTX_TREE_ELEM_SET(elem_ptr, tuple, key_def)				\
+	memtx_hinted_tree_data_set(elem_ptr, tuple, key_def)
+#define MEMTX_TREE_KEY_SET(key_ptr, key_val, part_count_val, key_def)		\
+	memtx_hinted_tree_key_data_set(key_ptr, key_val, part_count_val,	\
+				       key_def)
+#define MEMTX_TREE_ELEM_GET(elem_ptr) ((elem_ptr)->tuple)
+#define MEMTX_TREE_KEY_GET(key_ptr, part_count_ptr)				\
+	({*part_count_ptr = (key_ptr)->part_count; (key_ptr)->key;})
+
+#include "memtx_tree_impl.h"
+
+#undef memtx_tree_key
+#undef memtx_tree_elem
+#undef MEMTX_TREE_KEY_GET
+#undef MEMTX_TREE_ELEM_GET
+#undef MEMTX_TREE_KEY_SET
+#undef MEMTX_TREE_ELEM_SET
+#undef MEMTX_TREE_ELEM_WITH_KEY_CMP
+#undef MEMTX_TREE_ELEM_CMP
+#undef MEMTX_TREE_ELEM_EQUAL
+#undef memtx_tree_key
+#undef memtx_tree_elem
+#undef MEMTX_TREE_NAME
+
+/* }}} */
+
 struct index *
 memtx_tree_index_new(struct memtx_engine *memtx, struct index_def *def)
 {
-	return metmx_tuple_tree_index_new(memtx, def);
+	if (!def->opts.hint)
+		return metmx_tuple_tree_index_new(memtx, def);
+	if (def->cmp_def->part_count == 1 &&
+	    (def->cmp_def->parts->type == FIELD_TYPE_UNSIGNED ||
+	     def->cmp_def->parts->type == FIELD_TYPE_INTEGER))
+		return metmx_hint_only_tree_index_new(memtx, def);
+	return metmx_hinted_tree_index_new(memtx, def);
 }
diff --git a/src/box/memtx_tree_impl.h b/src/box/memtx_tree_impl.h
index 20b88ad3b..9a5122bdd 100644
--- a/src/box/memtx_tree_impl.h
+++ b/src/box/memtx_tree_impl.h
@@ -78,7 +78,7 @@
  * mempool_create. This is the size of the block that will be
  * allocated for each iterator.
  */
-#define MEMTX_TREE_ITERATOR_SIZE (136)
+#define MEMTX_TREE_ITERATOR_SIZE (152)
 
 #define bps_tree_elem_t memtx_tree_elem
 #define bps_tree_key_t memtx_tree_key *
diff --git a/test/box/tree_pk.result b/test/box/tree_pk.result
index df3c78bed..7c7aa7f7e 100644
--- a/test/box/tree_pk.result
+++ b/test/box/tree_pk.result
@@ -852,3 +852,223 @@ box.internal.collation.drop('test')
 box.internal.collation.drop('test-ci')
 ---
 ...
+--
+-- gh-3961: Hint option for memtx tree index.
+--
+s = box.schema.space.create('test')
+---
+...
+s:create_index('test', {type = 'tree', hint = 'true'} )
+---
+- error: Illegal parameters, options parameter 'hint' should be of type boolean
+...
+s:create_index('test', {type = 'hash', hint = true} )
+---
+- error: 'Can''t create or modify index ''test'' in space ''test'': hint is only reasonable
+    with memtx tree index'
+...
+s:create_index('test', {type = 'hash'}):alter({hint = true})
+---
+- error: 'Can''t create or modify index ''test'' in space ''test'': hint is only reasonable
+    with memtx tree index'
+...
+s:drop()
+---
+...
+s = box.schema.space.create('test', {engine = 'vinyl'})
+---
+...
+s:create_index('test', {type = 'tree', hint = true} )
+---
+- error: 'Can''t create or modify index ''test'' in space ''test'': hint is only reasonable
+    with memtx tree index'
+...
+s:drop()
+---
+...
+-- numeric hints
+s1 = box.schema.space.create('test1')
+---
+...
+s1:create_index('test', {type = 'tree', hint = false}).hint
+---
+- null
+...
+s2 = box.schema.space.create('test2')
+---
+...
+s2:create_index('test', {type = 'tree', hint = true}).hint
+---
+- true
+...
+N = 2000
+---
+...
+for i = 1,N do local v = math.random(10 * N) s1:replace{v} s2:replace{v} end
+---
+...
+s1:count() == s2:count()
+---
+- true
+...
+good = true r1 = s1:select{} r2 = s2:select{}
+---
+...
+if #r1 ~= #r2 then good = false end
+---
+...
+for k,v in pairs(r1) do if r2[k][1] ~= v[1] then good = false end end
+---
+...
+good
+---
+- true
+...
+r1 = nil r2 = nil
+---
+...
+function diff(t1, t2) if t1 then return t1[1] ~= t2[1] else return t2 ~= nil end end
+---
+...
+for i = 1, N * 10 do if diff(s1:get{i}, s2:get{i}) then good = false end end
+---
+...
+good
+---
+- true
+...
+s1:drop()
+---
+...
+s2:drop()
+---
+...
+-- string hints
+s1 = box.schema.space.create('test1')
+---
+...
+s1:create_index('test', {type = 'tree', parts = {1, 'str'}, hint = false}).hint
+---
+- null
+...
+s2 = box.schema.space.create('test2')
+---
+...
+s2:create_index('test', {type = 'tree', parts = {1, 'str'}, hint = true}).hint
+---
+- true
+...
+N = 1000
+---
+...
+for i = 1,N do local v = ''..math.random(10 * N) s1:replace{v} s2:replace{v} end
+---
+...
+s1:count() == s2:count()
+---
+- true
+...
+good = true r1 = s1:select{} r2 = s2:select{}
+---
+...
+if #r1 ~= #r2 then good = false end
+---
+...
+for k,v in pairs(r1) do if r2[k][1] ~= v[1] then good = false end end
+---
+...
+good
+---
+- true
+...
+r1 = nil r2 = nil
+---
+...
+function diff(t1, t2) if t1 then return t1[1] ~= t2[1] else return t2 ~= nil end end
+---
+...
+for i = 1, N * 10 do v = ''..i if diff(s1:get{v}, s2:get{v}) then good = false end end
+---
+...
+good
+---
+- true
+...
+s1:drop()
+---
+...
+s2:drop()
+---
+...
+-- string with collation hints
+s1 = box.schema.space.create('test1')
+---
+...
+s1:create_index('test', {type = 'tree', parts = {{1, 'str', collation = 'unicode_ci'}}, hint = false}).hint
+---
+- null
+...
+s2 = box.schema.space.create('test2')
+---
+...
+s2:create_index('test', {type = 'tree', parts = {{1, 'str', collation = 'unicode_ci'}}, hint = true}).hint
+---
+- true
+...
+N = 500
+---
+...
+syms = {'a', 'B', 'c', 'D', 'ж', 'Ё', '~', '1', '%', '2'}
+---
+...
+syms_size = #syms
+---
+...
+len = 20
+---
+...
+vals = {}
+---
+...
+for i = 1,2*N do str = '' for j=1,len do str = str .. syms[math.random(syms_size)] end vals[i] = str end
+---
+...
+for i = 1,N do local v = vals[i] s1:replace{v} s2:replace{v} end
+---
+...
+s1:count() == s2:count()
+---
+- true
+...
+good = true r1 = s1:select{} r2 = s2:select{}
+---
+...
+if #r1 ~= #r2 then good = false end
+---
+...
+for k,v in pairs(r1) do if r2[k][1] ~= v[1] then good = false end end
+---
+...
+good
+---
+- true
+...
+r1 = nil r2 = nil
+---
+...
+function diff(t1, t2) if t1 then return t1[1] ~= t2[1] else return t2 ~= nil end end
+---
+...
+for i = 1, N * 2 do v = vals[i] if diff(s1:get{v}, s2:get{v}) then good = false end end
+---
+...
+good
+---
+- true
+...
+s1:drop()
+---
+...
+s2:drop()
+---
+...
diff --git a/test/box/tree_pk.test.lua b/test/box/tree_pk.test.lua
index 1190ab424..3edbdda57 100644
--- a/test/box/tree_pk.test.lua
+++ b/test/box/tree_pk.test.lua
@@ -301,3 +301,91 @@ s:drop()
 
 box.internal.collation.drop('test')
 box.internal.collation.drop('test-ci')
+
+--
+-- gh-3961: Hint option for memtx tree index.
+--
+s = box.schema.space.create('test')
+s:create_index('test', {type = 'tree', hint = 'true'} )
+s:create_index('test', {type = 'hash', hint = true} )
+s:create_index('test', {type = 'hash'}):alter({hint = true})
+s:drop()
+
+s = box.schema.space.create('test', {engine = 'vinyl'})
+s:create_index('test', {type = 'tree', hint = true} )
+s:drop()
+
+-- numeric hints
+s1 = box.schema.space.create('test1')
+s1:create_index('test', {type = 'tree', hint = false}).hint
+s2 = box.schema.space.create('test2')
+s2:create_index('test', {type = 'tree', hint = true}).hint
+
+N = 2000
+for i = 1,N do local v = math.random(10 * N) s1:replace{v} s2:replace{v} end
+s1:count() == s2:count()
+
+good = true r1 = s1:select{} r2 = s2:select{}
+if #r1 ~= #r2 then good = false end
+for k,v in pairs(r1) do if r2[k][1] ~= v[1] then good = false end end
+good
+r1 = nil r2 = nil
+
+function diff(t1, t2) if t1 then return t1[1] ~= t2[1] else return t2 ~= nil end end
+for i = 1, N * 10 do if diff(s1:get{i}, s2:get{i}) then good = false end end
+good
+
+s1:drop()
+s2:drop()
+
+-- string hints
+s1 = box.schema.space.create('test1')
+s1:create_index('test', {type = 'tree', parts = {1, 'str'}, hint = false}).hint
+s2 = box.schema.space.create('test2')
+s2:create_index('test', {type = 'tree', parts = {1, 'str'}, hint = true}).hint
+
+N = 1000
+for i = 1,N do local v = ''..math.random(10 * N) s1:replace{v} s2:replace{v} end
+s1:count() == s2:count()
+
+good = true r1 = s1:select{} r2 = s2:select{}
+if #r1 ~= #r2 then good = false end
+for k,v in pairs(r1) do if r2[k][1] ~= v[1] then good = false end end
+good
+r1 = nil r2 = nil
+
+function diff(t1, t2) if t1 then return t1[1] ~= t2[1] else return t2 ~= nil end end
+for i = 1, N * 10 do v = ''..i if diff(s1:get{v}, s2:get{v}) then good = false end end
+good
+
+s1:drop()
+s2:drop()
+
+-- string with collation hints
+s1 = box.schema.space.create('test1')
+s1:create_index('test', {type = 'tree', parts = {{1, 'str', collation = 'unicode_ci'}}, hint = false}).hint
+s2 = box.schema.space.create('test2')
+s2:create_index('test', {type = 'tree', parts = {{1, 'str', collation = 'unicode_ci'}}, hint = true}).hint
+
+N = 500
+syms = {'a', 'B', 'c', 'D', 'ж', 'Ё', '~', '1', '%', '2'}
+syms_size = #syms
+len = 20
+vals = {}
+for i = 1,2*N do str = '' for j=1,len do str = str .. syms[math.random(syms_size)] end vals[i] = str end
+
+for i = 1,N do local v = vals[i] s1:replace{v} s2:replace{v} end
+s1:count() == s2:count()
+
+good = true r1 = s1:select{} r2 = s2:select{}
+if #r1 ~= #r2 then good = false end
+for k,v in pairs(r1) do if r2[k][1] ~= v[1] then good = false end end
+good
+r1 = nil r2 = nil
+
+function diff(t1, t2) if t1 then return t1[1] ~= t2[1] else return t2 ~= nil end end
+for i = 1, N * 2 do v = vals[i] if diff(s1:get{v}, s2:get{v}) then good = false end end
+good
+
+s1:drop()
+s2:drop()
diff --git a/test/box/tree_pk_multipart.result b/test/box/tree_pk_multipart.result
index 93219f666..e95272feb 100644
--- a/test/box/tree_pk_multipart.result
+++ b/test/box/tree_pk_multipart.result
@@ -542,3 +542,158 @@ space:drop()
 space = nil
 ---
 ...
+--
+-- gh-3961: Hint option for memtx tree index.
+--
+test_run:cmd("setopt delimiter ';'")
+---
+- true
+...
+function equal(res1, res2)
+    if #res1 ~= #res2 then
+        return false
+    end
+    for k,v in pairs(res1) do
+        if res2[k][1] ~= v[1] or res2[k][2] ~= v[2] then
+            return false
+        end
+    end
+    return true
+end
+test_run:cmd("setopt delimiter ''");
+---
+...
+-- num num
+N1 = 100
+---
+...
+t1 = {}
+---
+...
+for i = 1,N1*2 do t1[i] = math.random(10000) * 10000 + i end
+---
+...
+N2 = 5
+---
+...
+t2 = {}
+---
+...
+for i = 1,N2*2 do t2[i] = math.random(1000000) end
+---
+...
+s1 = box.schema.space.create('test1')
+---
+...
+s1:create_index('test', {type = 'tree', parts = {{1, 'num'}, {2, 'num'}}, hint = false}).hint
+---
+- null
+...
+s2 = box.schema.space.create('test2')
+---
+...
+s2:create_index('test', {type = 'tree', parts = {{1, 'num'}, {2, 'num'}}, hint = true}).hint
+---
+- true
+...
+for j = 1,N2 do for i = 1,N1 do s1:replace{t1[i], t2[j]} s2:replace{t1[i], t2[j]} end end
+---
+...
+s1:count() == s2:count()
+---
+- true
+...
+equal(s1:select{}, s2:select{})
+---
+- true
+...
+good = true
+---
+...
+for i = 1,N1*2 do good = good and equal(s1:select{t1[i]}, s2:select{t1[i]}) end
+---
+...
+good
+---
+- true
+...
+for i = 1,N1*2 do for j=1,N2*2 do good = good and equal(s1:select{t1[i], t2[j]}, s2:select{t1[i], t2[j]}) end end
+---
+...
+good
+---
+- true
+...
+s1:drop()
+---
+...
+s2:drop()
+---
+...
+-- str num
+N1 = 100
+---
+...
+t1 = {}
+---
+...
+for i = 1,N1*2 do t1[i] = ''..(math.random(10000) * 10000 + i) end
+---
+...
+N2 = 5
+---
+...
+t2 = {}
+---
+...
+for i = 1,N2*2 do t2[i] = math.random(1000000) end
+---
+...
+s1 = box.schema.space.create('test1')
+---
+...
+s1:create_index('test', {type = 'tree', parts = {{1, 'str'}, {2, 'num'}}, hint = false}).hint
+---
+- null
+...
+s2 = box.schema.space.create('test2')
+---
+...
+s2:create_index('test', {type = 'tree', parts = {{1, 'str'}, {2, 'num'}}, hint = true}).hint
+---
+- true
+...
+for j = 1,N2 do for i = 1,N1 do s1:replace{t1[i], t2[j]} s2:replace{t1[i], t2[j]} end end
+---
+...
+s1:count() == s2:count()
+---
+- true
+...
+equal(s1:select{}, s2:select{})
+---
+- true
+...
+good = true
+---
+...
+for i = 1,N1*2 do good = good and equal(s1:select{t1[i]}, s2:select{t1[i]}) end
+---
+...
+good
+---
+- true
+...
+for i = 1,N1*2 do for j=1,N2*2 do good = good and equal(s1:select{t1[i], t2[j]}, s2:select{t1[i], t2[j]}) end end
+---
+...
+good
+---
+- true
+...
+s1:drop()
+---
+...
+s2:drop()
+---
+...
diff --git a/test/box/tree_pk_multipart.test.lua b/test/box/tree_pk_multipart.test.lua
index 7d3405ffe..bcb9dc541 100644
--- a/test/box/tree_pk_multipart.test.lua
+++ b/test/box/tree_pk_multipart.test.lua
@@ -194,3 +194,68 @@ test_run:cmd("setopt delimiter ''");
 space:drop()
 
 space = nil
+
+--
+-- gh-3961: Hint option for memtx tree index.
+--
+test_run:cmd("setopt delimiter ';'")
+function equal(res1, res2)
+    if #res1 ~= #res2 then
+        return false
+    end
+    for k,v in pairs(res1) do
+        if res2[k][1] ~= v[1] or res2[k][2] ~= v[2] then
+            return false
+        end
+    end
+    return true
+end
+test_run:cmd("setopt delimiter ''");
+
+-- num num
+N1 = 100
+t1 = {}
+for i = 1,N1*2 do t1[i] = math.random(10000) * 10000 + i end
+N2 = 5
+t2 = {}
+for i = 1,N2*2 do t2[i] = math.random(1000000) end
+
+s1 = box.schema.space.create('test1')
+s1:create_index('test', {type = 'tree', parts = {{1, 'num'}, {2, 'num'}}, hint = false}).hint
+s2 = box.schema.space.create('test2')
+s2:create_index('test', {type = 'tree', parts = {{1, 'num'}, {2, 'num'}}, hint = true}).hint
+for j = 1,N2 do for i = 1,N1 do s1:replace{t1[i], t2[j]} s2:replace{t1[i], t2[j]} end end
+s1:count() == s2:count()
+equal(s1:select{}, s2:select{})
+good = true
+for i = 1,N1*2 do good = good and equal(s1:select{t1[i]}, s2:select{t1[i]}) end
+good
+for i = 1,N1*2 do for j=1,N2*2 do good = good and equal(s1:select{t1[i], t2[j]}, s2:select{t1[i], t2[j]}) end end
+good
+
+s1:drop()
+s2:drop()
+
+-- str num
+N1 = 100
+t1 = {}
+for i = 1,N1*2 do t1[i] = ''..(math.random(10000) * 10000 + i) end
+N2 = 5
+t2 = {}
+for i = 1,N2*2 do t2[i] = math.random(1000000) end
+
+s1 = box.schema.space.create('test1')
+s1:create_index('test', {type = 'tree', parts = {{1, 'str'}, {2, 'num'}}, hint = false}).hint
+s2 = box.schema.space.create('test2')
+s2:create_index('test', {type = 'tree', parts = {{1, 'str'}, {2, 'num'}}, hint = true}).hint
+for j = 1,N2 do for i = 1,N1 do s1:replace{t1[i], t2[j]} s2:replace{t1[i], t2[j]} end end
+s1:count() == s2:count()
+equal(s1:select{}, s2:select{})
+good = true
+for i = 1,N1*2 do good = good and equal(s1:select{t1[i]}, s2:select{t1[i]}) end
+good
+for i = 1,N1*2 do for j=1,N2*2 do good = good and equal(s1:select{t1[i], t2[j]}, s2:select{t1[i], t2[j]}) end end
+good
+
+s1:drop()
+s2:drop()
-- 
2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/4] lib: introduce BPS_TREE_EQUAL custom comparator
  2019-02-05 11:58 ` [PATCH v1 1/4] lib: introduce BPS_TREE_EQUAL custom comparator Kirill Shcherbatov
@ 2019-02-11 15:46   ` Vladimir Davydov
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Davydov @ 2019-02-11 15:46 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Tue, Feb 05, 2019 at 02:58:36PM +0300, Kirill Shcherbatov wrote:
> Introduce a macro BPS_TREE_EQUAL for BPS TREE class. This makes
> possible to define custom comparators for stucture-based leafs.
> Previously, a C++ comparison operator "!=" override was used for
> this purpose. Due to the fact that we are not going to rework on
> C++ C-only components of Tarantool like memtx_tree, we needed a
> way to make complex structures comparisons using preprocessor.
> 
> Needed for #3961
> ---
>  src/lib/salad/bps_tree.h       | 74 +++++++++++++++++++++-------------
>  test/unit/bps_tree_iterator.cc | 15 ++++---
>  2 files changed, 56 insertions(+), 33 deletions(-)
> 
> diff --git a/src/lib/salad/bps_tree.h b/src/lib/salad/bps_tree.h
> index 8d1a62108..1419af3f7 100644
> --- a/src/lib/salad/bps_tree.h
> +++ b/src/lib/salad/bps_tree.h
> @@ -1035,6 +1035,10 @@ struct bps_leaf_path_elem {
>  	bps_tree_pos_t max_elem_pos;
>  };
>  
> +#ifndef BPS_TREE_EQUAL
> +#define BPS_TREE_EQUAL(a, b) (a == b)
> +#endif
> +

Funny thing, this is only needed for self checks and only used in unit
tests. The patch is unobtrusive so I guess it's OK.

Please move this macro to the place where other BPS_TREE macros are
defined and augment the definition with a comment.

Also, I'm not quite sure about the name, because if we have 'compare'
operation, why do we need to define 'equal' separately? What about
BPS_TREE_SELFSAeE or BPS_TREE_IDENTICAL?

Alternatively, we could define BPS_TREE_NO_DEBUG for each BPS tree
definition in the code. Then we wouldn't need this macro at all. I'm
inclined to think that would be the easiest way to deal with this
problem.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/4] box: rework memtx_tree class to be reusable
  2019-02-05 11:58 ` [PATCH v1 2/4] box: rework memtx_tree class to be reusable Kirill Shcherbatov
@ 2019-02-11 15:49   ` Vladimir Davydov
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Davydov @ 2019-02-11 15:49 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Tue, Feb 05, 2019 at 02:58:37PM +0300, Kirill Shcherbatov wrote:
> The memtx tree class has been redesigned so that it can be used
> to store arbitrary structures in the future. This makes possible
> to implement type hints in memtx in subsequent patches.
> 
> Needed for #3961
> ---
>  src/box/memtx_tree.c      | 715 ++--------------------------
>  src/box/memtx_tree.h      |  68 +--
>  src/box/memtx_tree_impl.h | 960 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1008 insertions(+), 735 deletions(-)
>  create mode 100644 src/box/memtx_tree_impl.h

You move a huge chunk of code to another file and modify it at the same
time, which makes the patch impossible to review. Please split it in
two. First, do the necessary modifications in memtx_tree.c (it's OK to
include *.c files in C), then rename memtx_tree.c to memtx_tree_impl.h.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 3/4] box: introduce tuple compare hint
  2019-02-05 11:58 ` [PATCH v1 3/4] box: introduce tuple compare hint Kirill Shcherbatov
@ 2019-02-11 15:57   ` Vladimir Davydov
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Davydov @ 2019-02-11 15:57 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, Alexandr Lyapunov

On Tue, Feb 05, 2019 at 02:58:38PM +0300, Kirill Shcherbatov wrote:
> +static uint64_t
> +key_hint_string(const char *key, struct key_def *key_def)
> +{
> +	(void)key_def;
> +	assert(key_def->parts->type == FIELD_TYPE_STRING);
> +	assert(mp_typeof(*key) == MP_STR);
> +	uint32_t len;
> +	const unsigned char *str =
> +		(const unsigned char *)mp_decode_str(&key, &len);
> +	uint64_t result = 0;
> +	uint32_t process_len = MIN(len, 7);
> +	for (uint32_t i = 0; i < process_len; i++) {
> +		result <<= 9;
> +		result |= 0x100;
> +		result |= str[i];
> +	}
> +	result <<= 9 * (7 - process_len);
> +	return result;

I don't understand this part. I'd expect you to compare the strings
byte-wise. Please add a comment explaining what's going on here.

> diff --git a/src/box/tuple_compare.h b/src/box/tuple_compare.h
> index e3a63204f..be96ba38c 100644
> --- a/src/box/tuple_compare.h
> +++ b/src/box/tuple_compare.h
> @@ -67,6 +67,46 @@ tuple_compare_create(const struct key_def *key_def);
>  tuple_compare_with_key_t
>  tuple_compare_with_key_create(const struct key_def *key_def);
>  
> +/**
> + * Get a comparison hint of a tuple.
> + * Hint is such a function h(tuple) in terms of particular key_def that
> + * has follows rules:

the following rules

> + * if h(t1) < h(t2) then t1 < t2;
> + * if h(t1) > h(t2) then t1 > t2;
> + * if t1 == t2 then h(t1) == h(t2);
> + * These rules means that instead of direct tuple vs tuple (or tuple vs key)

mean

> + * comparison one may compare theirs hints first; and only if theirs hints
> + * are equal compare the tuples themselves.
> + * @param tuple - tuple to get hint of.
> + * @param key_def - key_def that defines which comparison is used.
> + * @return the hint.
> + */
> +static inline uint64_t
> +tuple_hint(const struct tuple *tuple, struct key_def *key_def)
> +{
> +	return key_def->tuple_hint(tuple, key_def);
> +}
> +
> +/**
> + * Get a comparison hint of a key.
> + * @See tuple_hint for hint term definition.
> + * @param key - key to get hint of.
> + * @param key_def - key_def that defines which comparison is used.
> + * @return the hint.
> + */
> +static inline uint64_t
> +key_hint(const char *key, struct key_def *key_def)
> +{
> +	return key_def->key_hint(key, key_def);
> +}

All the above functions should be defined in key_def.h, where 'hash' and
'compare' reside.

> +
> +/**
> + * Initialize tuple_hint() and key_hint() functions for the key_def.
> + * @param key_def key definition to set up.
> + */
> +void
> +tuple_hint_set(struct key_def *key_def);
> +
>  #if defined(__cplusplus)
>  } /* extern "C" */
>  #endif /* defined(__cplusplus) */
> diff --git a/src/coll.h b/src/coll.h
> index 9c725712a..4030d38de 100644
> --- a/src/coll.h
> +++ b/src/coll.h
> @@ -47,6 +47,8 @@ typedef int (*coll_cmp_f)(const char *s, size_t s_len, const char *t,
>  typedef uint32_t (*coll_hash_f)(const char *s, size_t s_len, uint32_t *ph,
>  				uint32_t *pcarry, struct coll *coll);
>  
> +typedef uint64_t (*coll_hint_f)(const char *s, size_t s_len, struct coll *coll);
> +
>  struct UCollator;
>  
>  /**
> @@ -61,6 +63,7 @@ struct coll {
>  	/** String comparator. */
>  	coll_cmp_f cmp;
>  	coll_hash_f hash;
> +	coll_hint_f hint;

Please add a comment.

>  	/** Reference counter. */
>  	int refs;
>  	/**

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 4/4] box: introduce hint option for memtx tree index
  2019-02-05 11:58 ` [PATCH v1 4/4] box: introduce hint option for memtx tree index Kirill Shcherbatov
@ 2019-02-11 16:12   ` Vladimir Davydov
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Davydov @ 2019-02-11 16:12 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Tue, Feb 05, 2019 at 02:58:39PM +0300, Kirill Shcherbatov wrote:
> If an index has certain option, a compare hint must be calculated
> and stored within bps_tree along with struct tuple pointer. This
> hint must be used for tuple precomparison in order to avoid memory
> fetching and significantly improve tree performance.
> 
> Such a hinted tree will cost twice amount of memory - 20 byte per
> tuple on average.
> Enabled hint option improve performance on average by 15%; Select
> operations are significantly accelerated (there are scenarios in
> which the difference reaches 200-250%).
> 
> Test set from original @alyapunov patch.
> 
> Needed for #3961
> ---
>  src/box/index_def.c                 |   2 +
>  src/box/index_def.h                 |   6 +
>  src/box/lua/schema.lua              |  12 ++
>  src/box/lua/space.cc                |   5 +
>  src/box/memtx_tree.c                | 192 +++++++++++++++++++++++-
>  src/box/memtx_tree_impl.h           |   2 +-
>  test/box/tree_pk.result             | 220 ++++++++++++++++++++++++++++
>  test/box/tree_pk.test.lua           |  88 +++++++++++
>  test/box/tree_pk_multipart.result   | 155 ++++++++++++++++++++
>  test/box/tree_pk_multipart.test.lua |  65 ++++++++
>  10 files changed, 745 insertions(+), 2 deletions(-)
> 
> diff --git a/src/box/index_def.c b/src/box/index_def.c
> index c52aa38d1..4d3fadc6e 100644
> --- a/src/box/index_def.c
> +++ b/src/box/index_def.c
> @@ -49,6 +49,7 @@ const struct index_opts index_opts_default = {
>  	/* .bloom_fpr           = */ 0.05,
>  	/* .lsn                 = */ 0,
>  	/* .stat                = */ NULL,
> +	/* .hint                = */ false,
>  };
>  
>  const struct opt_def index_opts_reg[] = {
> @@ -62,6 +63,7 @@ const struct opt_def index_opts_reg[] = {
>  	OPT_DEF("run_size_ratio", OPT_FLOAT, struct index_opts, run_size_ratio),
>  	OPT_DEF("bloom_fpr", OPT_FLOAT, struct index_opts, bloom_fpr),
>  	OPT_DEF("lsn", OPT_INT64, struct index_opts, lsn),
> +	OPT_DEF("hint", OPT_BOOL, struct index_opts, hint),
>  	OPT_END,
>  };
>  
> diff --git a/src/box/index_def.h b/src/box/index_def.h
> index 7717ecd64..74814bef5 100644
> --- a/src/box/index_def.h
> +++ b/src/box/index_def.h
> @@ -157,6 +157,10 @@ struct index_opts {
>  	 * LSN from the time of index creation.
>  	 */
>  	int64_t lsn;
> +	/**
> +	 * Use hint optimization for tree index.
> +	 */
> +	bool hint;

Hints should be enabled unconditionally if possible. Please don't
introduce a new index option.

The tests won't be necessary then, I guess.

> diff --git a/src/box/memtx_tree.c b/src/box/memtx_tree.c
> index 6b22883c2..426a11262 100644
> --- a/src/box/memtx_tree.c
> +++ b/src/box/memtx_tree.c
> @@ -79,8 +79,198 @@ struct memtx_tuple_tree_key_data {
>  
>  /* }}} */
>  
> +/* {{{ Memtx hinted and hint_only tree class. *******************/

Please add a comment somewhere explaining what's the difference.

I also have some concerns about struct/function names, but we'll talk
about them later, when you make patch 2 review-able.

> +
> +/**
> + * Struct that is used as a key in BPS tree definition in
> + * metmx_hint_only_tree and metmx_hinted_tree.
> +*/
> +struct memtx_hinted_tree_key_data {
> +	/** Sequence of msgpacked search fields. */
> +	const char *key;
> +	/** Number of msgpacked search fields. */
> +	uint32_t part_count;
> +	/**
> +	 * Compare hint. Is calculated automatically on 'set'
> +	 * operation with memtx_hinted_tree_key_data_set().
> +	 */
> +	uint64_t hint;
> +};

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-02-11 16:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-05 11:58 [PATCH v1 0/4] box: introduce hint option for memtx tree index Kirill Shcherbatov
2019-02-05 11:58 ` [PATCH v1 1/4] lib: introduce BPS_TREE_EQUAL custom comparator Kirill Shcherbatov
2019-02-11 15:46   ` Vladimir Davydov
2019-02-05 11:58 ` [PATCH v1 2/4] box: rework memtx_tree class to be reusable Kirill Shcherbatov
2019-02-11 15:49   ` Vladimir Davydov
2019-02-05 11:58 ` [PATCH v1 3/4] box: introduce tuple compare hint Kirill Shcherbatov
2019-02-11 15:57   ` Vladimir Davydov
2019-02-05 11:58 ` [PATCH v1 4/4] box: introduce hint option for memtx tree index Kirill Shcherbatov
2019-02-11 16:12   ` Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox