[commits] [tarantool] 02/04: vinyl: introduce bloom filters for partial key lookups

Wed Mar 28 00:08:17 MSK 2018

* Vladimir Davydov <vdavydov.dev at gmail.com> [18/03/21 16:36]:

> commit 5bfb2dc8c91bd9924fc6692006a69b45a532f6da
> Author: Vladimir Davydov <vdavydov.dev at gmail.com>
> AuthorDate: Fri Feb 23 23:01:24 2018 +0300
> 
>     vinyl: introduce bloom filters for partial key lookups
>     
>     Currently, we store and use bloom only for full-key lookups. However,
>     there are use cases when we can also benefit from maintaining bloom
>     filters for partial keys as well - see #3177 for example. So this patch
>     replaces the current full-key bloom filter with a multipart one, which
>     is basically a set of bloom filters, one per each sub-key.

What's a sub-key? I suggest using "partial key" or "prefix of a
full key", this is less ambiguous (remember my initial questions).

> Old bloom
>     filters stored on disk will be silently ignored. This will degrade
>     performance of existing deployments until major compaction is scheduled,
>     but this should be mitigated by #3139 (knob for forcing compaction).

Please fix this.

>     Besides, there are still no vinyl deployments in production as far as we
>     know.

Please remove this statement.

> Anyway, if there are complaints about the performance degradation,
>     we can add some code to load and use the legacy bloom filter if the new
>     one is unavailable.

No, we should not wait for complaints, and it's wrong discussing
this in a commit comment as if it was Kostja's official policy. I
use this weapon very sparingly I hope.
>     
>     When a key or tuple is checked against a multipart bloom filter, we
>     check all its sub-keys to reduce the false positive result. Nevertheless
>     there's no size optimization as per now. E.g. even if the cardinality of
>     a partial key is the same as of the full key, we will still store two
>     full-sized bloom filters although we could probably save some space in
>     this case by assuming that checking against the bloom corresponding to
>     a partial key would reduce the false positive rate of full key lookups.
>     This is addressed later in the series.
>     
>     Before this patch we used a bloom spectrum object to construct a bloom
>     filter. A bloom spectrum is basically a set of bloom filters ranging in
>     size. The point of using a spectrum is that we don't know what the run
>     size will be while we are writing it so we create 10 bloom filters and
>     choose the best of them after we are done. With the default bloom fpr of
>     0.05 it is 10 byte overhead per record, which seems to be OK. However,
>     if we try to optimize other parameters as well, e.g. the number of hash
>     functions, the cost of a spectrum will become prohibitive. Funny thing
>     is a tuple hash is only 4 bytes long, which means if we stored all
>     hashes in an array and built a bloom filter after we'd written a run, we
>     would reduce the memory footprint by more than half! And that would only
>     slightly increase the run write time as scanning a memory map of hashes
>     and constructing a bloom filter is cheap in comparison to mering runs.
>     Putting it all together, we stop using bloom spectrum in this patch,
>     instead we stash all hashes in a new bloom builder object and use them
>     to build a perfect bloom filer after the run has been written and we
>     know the cardinality of each sub key.
>     
>     Closes #3177
> ---
>  src/box/CMakeLists.txt      |   1 +
>  src/box/iproto_constants.c  |   3 +-
>  src/box/iproto_constants.h  |   4 +-
>  src/box/tuple_bloom.c       | 274 ++++++++++++++++++++++++++++++++++++++++++++
>  src/box/tuple_bloom.h       | 191 ++++++++++++++++++++++++++++++
>  src/box/tuple_compare.cc    |  24 ++++
>  src/box/tuple_compare.h     |  12 ++
>  src/box/tuple_hash.cc       |  13 ++-
>  src/box/tuple_hash.h        |  30 +++++
>  src/box/vy_run.c            | 243 +++++++++++++++------------------------
>  src/box/vy_run.h            |  23 ++--
>  src/box/vy_scheduler.c      |  20 +---
>  src/box/xlog.c              |  27 -----
>  src/box/xlog.h              |   9 --
>  test/unit/vy_point_lookup.c |   2 +-
>  test/vinyl/bloom.result     | 157 ++++++++++++++++++++++++-
>  test/vinyl/bloom.test.lua   |  68 ++++++++++-
>  test/vinyl/info.result      |  16 +--
>  18 files changed, 875 insertions(+), 242 deletions(-)
> 
> diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
> index ad7f910..cb27db7 100644
> --- a/src/box/CMakeLists.txt
> +++ b/src/box/CMakeLists.txt
> @@ -37,6 +37,7 @@ add_library(tuple STATIC
>      tuple_compare.cc
>      tuple_extract_key.cc
>      tuple_hash.cc
> +    tuple_bloom.c
>      tuple_dictionary.c
>      key_def.cc
>      coll_def.c
> diff --git a/src/box/iproto_constants.c b/src/box/iproto_constants.c
> index cd7b1d0..4a8643c 100644
> --- a/src/box/iproto_constants.c
> +++ b/src/box/iproto_constants.c
> @@ -194,7 +194,8 @@ const char *vy_run_info_key_strs[VY_RUN_INFO_KEY_MAX] = {
>  	"min lsn",
>  	"max lsn",
>  	"page count",
> -	"bloom filter"
> +	"bloom filter legacy",
> +	"bloom filter",
>  };
>  
>  const char *vy_row_index_key_strs[VY_ROW_INDEX_KEY_MAX] = {
> diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
> index 9518424..92534ef 100644
> --- a/src/box/iproto_constants.h
> +++ b/src/box/iproto_constants.h
> @@ -295,8 +295,10 @@ enum vy_run_info_key {
>  	VY_RUN_INFO_MAX_LSN = 4,
>  	/** Number of pages in the run. */
>  	VY_RUN_INFO_PAGE_COUNT = 5,
> +	/** Legacy bloom filter implementation. */
> +	VY_RUN_INFO_BLOOM_LEGACY = 6,
>  	/** Bloom filter for keys. */
> -	VY_RUN_INFO_BLOOM = 6,
> +	VY_RUN_INFO_BLOOM = 7,
>  	/** The last key in this enum + 1 */
>  	VY_RUN_INFO_KEY_MAX
>  };
> diff --git a/src/box/tuple_bloom.c b/src/box/tuple_bloom.c
> new file mode 100644
> index 0000000..65e9827
> --- /dev/null
> +++ b/src/box/tuple_bloom.c
> @@ -0,0 +1,274 @@
> +/*
> + * Copyright 2010-2018, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + *    copyright notice, this list of conditions and the
> + *    following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + *    copyright notice, this list of conditions and the following
> + *    disclaimer in the documentation and/or other materials
> + *    provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include "tuple_bloom.h"
> +
> +#include <assert.h>
> +#include <stdbool.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <msgpuck.h>
> +#include "diag.h"
> +#include "errcode.h"
> +#include "memory.h"
> +#include "key_def.h"
> +#include "tuple.h"
> +#include "tuple_hash.h"
> +#include "salad/bloom.h"
> +#include "trivia/util.h"
> +#include "third_party/PMurHash.h"
> +
> +enum { HASH_SEED = 13U };
> +
> +struct tuple_bloom_builder *
> +tuple_bloom_builder_new(uint32_t part_count)
> +{
> +	size_t size = sizeof(struct tuple_bloom_builder) +
> +		part_count * sizeof(struct tuple_hash_array);
> +	struct tuple_bloom_builder *builder = malloc(size);
> +	if (builder == NULL) {
> +		diag_set(OutOfMemory, size, "malloc", "tuple bloom builder");
> +		return NULL;
> +	}
> +	memset(builder, 0, size);
> +	builder->part_count = part_count;
> +	return builder;
> +}
> +
> +void
> +tuple_bloom_builder_delete(struct tuple_bloom_builder *builder)
> +{
> +	for (uint32_t i = 0; i < builder->part_count; i++)
> +		free(builder->parts[i].values);
> +	free(builder);
> +}
> +
> +int
> +tuple_bloom_builder_add(struct tuple_bloom_builder *builder,
> +			const struct tuple *tuple,
> +			const struct key_def *key_def,
> +			uint32_t hashed_parts)
> +{
> +	assert(builder->part_count == key_def->part_count);
> +
> +	uint32_t h = HASH_SEED;
> +	uint32_t carry = 0;
> +	uint32_t total_size = 0;
> +
> +	for (uint32_t i = 0; i < key_def->part_count; i++) {
> +		total_size += tuple_hash_key_part(&h, &carry, tuple,
> +						  &key_def->parts[i]);

Is there a faster way to iterate over key parts?
tuple_hash_key_part() involves tuple_field(). Are you assuming
tuple_field() always has an offset map available?

> +		if (i < hashed_parts)
> +			continue;

Ugh, looks at first glance that we're invoking a hash function
and iterating over key parts only to find out the total size. 
Then I see that you also calculate the hash value and carry as
a side effect. If we can reuse hashed_parts, can't we reuse the
hash of the hashed parts and the total size? Perhaps I'm missing
something.

> +		struct tuple_hash_array *hash_arr = &builder->parts[i];
> +		if (hash_arr->count >= hash_arr->capacity) {
> +			uint32_t capacity = MAX(hash_arr->capacity * 2, 1024U);
> +			uint32_t *values = realloc(hash_arr->values,
> +						   capacity * sizeof(*values));
> +			if (values == NULL) {
> +				diag_set(OutOfMemory, capacity * sizeof(*values),
> +					 "malloc", "tuple hash array");
> +				return -1;
> +			}
> +			hash_arr->capacity = capacity;
> +			hash_arr->values = values;

Sounds like we need an array data structure after all...
Although I'd use a linked list of 4k blocks - I purposefully do not allow
introducing arrays into the source code to avoid inefficient
applications like this one. 

> +		}
> +		uint32_t hash = PMurHash32_Result(h, carry, total_size);
> +		hash_arr->values[hash_arr->count++] = hash;
> +	}
> +	return 0;
> +}
> +
> +struct tuple_bloom *
> +tuple_bloom_new(struct tuple_bloom_builder *builder, double fpr)
> +{
> +	uint32_t part_count = builder->part_count;
> +	size_t size = sizeof(struct tuple_bloom) +
> +		part_count * sizeof(struct bloom);
> +	struct tuple_bloom *bloom = malloc(size);
> +	if (bloom == NULL) {
> +		diag_set(OutOfMemory, size, "malloc", "tuple bloom");
> +		return NULL;
> +	}
> +	for (uint32_t i = 0; i < part_count; i++) {
> +		struct tuple_hash_array *hash_arr = &builder->parts[i];
> +		uint32_t count = hash_arr->count;
> +		if (bloom_create(&bloom->parts[i], count,
> +				 fpr, runtime.quota) != 0) {
> +			diag_set(OutOfMemory, 0, "bloom_create",
> +				 "tuple bloom part");
> +			for (uint32_t j = 0; j < i; j++)
> +				bloom_destroy(&bloom->parts[j], runtime.quota);
> +			free(bloom);

Why can't use use tuple_bloom_delete() here?
Looks like you need to introduce tuple_bloom_init() function for
that.

> +			return NULL;
> +		}
> +		for (uint32_t k = 0; k < count; k++)
> +			bloom_add(&bloom->parts[i], hash_arr->values[k]);
> +	}
> +	bloom->part_count = part_count;
> +	return bloom;
> +}
> +
> +void
> +tuple_bloom_delete(struct tuple_bloom *bloom)
> +{
> +	for (uint32_t i = 0; i < bloom->part_count; i++)
> +		bloom_destroy(&bloom->parts[i], runtime.quota);
> +	free(bloom);
> +}
> +
> +bool
> +tuple_bloom_possible_has(const struct tuple_bloom *bloom,
> +			 const struct tuple *tuple,
> +			 const struct key_def *key_def)
> +{
> +	assert(bloom->part_count == key_def->part_count);
> +
> +	uint32_t h = HASH_SEED;
> +	uint32_t carry = 0;
> +	uint32_t total_size = 0;
> +
> +	for (uint32_t i = 0; i < key_def->part_count; i++) {
> +		total_size += tuple_hash_key_part(&h, &carry, tuple,
> +						  &key_def->parts[i]);
> +		uint32_t hash = PMurHash32_Result(h, carry, total_size);

> +		if (!bloom_possible_has(&bloom->parts[i], hash))
> +			return false;
> +	}
> +	return true;
> +}
> +
> +bool
> +tuple_bloom_possible_has_key(const struct tuple_bloom *bloom,
> +			     const char *key, uint32_t part_count,
> +			     const struct key_def *key_def)
> +{
> +	assert(part_count <= key_def->part_count);
> +	assert(bloom->part_count == key_def->part_count);
> +
> +	uint32_t h = HASH_SEED;
> +	uint32_t carry = 0;
> +	uint32_t total_size = 0;
> +
> +	for (uint32_t i = 0; i < part_count; i++) {
> +		total_size += tuple_hash_field(&h, &carry, &key,
> +					       key_def->parts[i].coll);
> +		uint32_t hash = PMurHash32_Result(h, carry, total_size);
> +		if (!bloom_possible_has(&bloom->parts[i], hash))
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static size_t
> +tuple_bloom_sizeof_part(const struct bloom *part)
> +{
> +	size_t size = 0;
> +	size += mp_sizeof_array(3);
> +	size += mp_sizeof_uint(part->table_size);
> +	size += mp_sizeof_uint(part->hash_count);
> +	size += mp_sizeof_bin(bloom_store_size(part));
> +	return size;
> +}
> +
> +static char *
> +tuple_bloom_encode_part(const struct bloom *part, char *buf)
> +{
> +	buf = mp_encode_array(buf, 3);
> +	buf = mp_encode_uint(buf, part->table_size);
> +	buf = mp_encode_uint(buf, part->hash_count);
> +	buf = mp_encode_binl(buf, bloom_store_size(part));
> +	buf = bloom_store(part, buf);
> +	return buf;
> +}
> +
> +static int
> +tuple_bloom_decode_part(struct bloom *part, const char **data)
> +{
> +	memset(part, 0, sizeof(*part));
> +	if (mp_decode_array(data) != 3)
> +		unreachable();
> +	part->table_size = mp_decode_uint(data);
> +	part->hash_count = mp_decode_uint(data);
> +	size_t store_size = mp_decode_binl(data);
> +	assert(store_size == bloom_store_size(part));
> +	if (bloom_load_table(part, *data, runtime.quota) != 0) {
> +		diag_set(OutOfMemory, store_size, "bloom_load_table",
> +			 "tuple bloom part");
> +		return -1;
> +	}
> +	*data += store_size;
> +	return 0;
> +}
> +
> +size_t
> +tuple_bloom_size(const struct tuple_bloom *bloom)
> +{
> +	size_t size = 0;
> +	size += mp_sizeof_array(bloom->part_count);
> +	for (uint32_t i = 0; i < bloom->part_count; i++)
> +		size += tuple_bloom_sizeof_part(&bloom->parts[i]);
> +	return size;
> +}
> +
> +char *
> +tuple_bloom_encode(const struct tuple_bloom *bloom, char *buf)
> +{
> +	buf = mp_encode_array(buf, bloom->part_count);
> +	for (uint32_t i = 0; i < bloom->part_count; i++)
> +		buf = tuple_bloom_encode_part(&bloom->parts[i], buf);
> +	return buf;
> +}
> +
> +struct tuple_bloom *
> +tuple_bloom_decode(const char **data)
> +{
> +	uint32_t part_count = mp_decode_array(data);
> +	struct tuple_bloom *bloom = malloc(sizeof(*bloom) +
> +			part_count * sizeof(*bloom->parts));
> +	if (bloom == NULL) {
> +		diag_set(OutOfMemory, sizeof(*bloom) +
> +			 part_count * sizeof(*bloom->parts),
> +			 "malloc", "tuple bloom");
> +		return NULL;
> +	}
> +	for (uint32_t i = 0; i < part_count; i++) {
> +		if (tuple_bloom_decode_part(&bloom->parts[i], data) != 0) {
> +			for (uint32_t j = 0; j < i; j++)
> +				bloom_destroy(&bloom->parts[j], runtime.quota);
> +			free(bloom);
> +			return NULL;
> +		}
> +	}
> +	bloom->part_count = part_count;
> +	return bloom;
> +}
> diff --git a/src/box/tuple_bloom.h b/src/box/tuple_bloom.h
> new file mode 100644
> index 0000000..24a7e1c
> --- /dev/null
> +++ b/src/box/tuple_bloom.h
> @@ -0,0 +1,191 @@
> +#ifndef TARANTOOL_BOX_TUPLE_BLOOM_H_INCLUDED
> +#define TARANTOOL_BOX_TUPLE_BLOOM_H_INCLUDED
> +/*
> + * Copyright 2010-2018, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + *    copyright notice, this list of conditions and the
> + *    following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + *    copyright notice, this list of conditions and the following
> + *    disclaimer in the documentation and/or other materials
> + *    provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +
> +#include <stdbool.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include "salad/bloom.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif /* defined(__cplusplus) */
> +
> +struct tuple;
> +struct key_def;
> +
> +/**
> + * Tuple bloom filter.
> + *
> + * Consists of a set of bloom filters, one per each sub key.

key prefix.

> + * When a key is checked to be hashed in the bloom, all its
> + * sub keys are checked as well, which lowers the probability
> + * of false positive results.
> + */
> +struct tuple_bloom {
> +	/** Number of key parts. */
> +	uint32_t part_count;
> +	/** Array of bloom filters, one per each sub key. */
> +	struct bloom parts[0];
> +};
> +
> +/**
> + * Array of tuple hashes.
> + */
> +struct tuple_hash_array {
> +	/** Number of hashes stored in the array. */
> +	uint32_t count;
> +	/** Capacity of the array of hashes. */
> +	uint32_t capacity;
> +	/** Array of hashes. */
> +	uint32_t *values;
> +};
> +
> +/**
> + * Tuple bloom filter builder.
> + *
> + * Construction of a bloom filter proceeds as follows.
> + * First all tuple hashes are stored in a builder object.
> + * Once all hashes have been stored, a bloom filter of
> + * the optimal size and all the hashes are added to it.
> + */
> +struct tuple_bloom_builder {
> +	/** Number of key parts. */
> +	uint32_t part_count;
> +	/** Hash arrays, one per each sub key. */
> +	struct tuple_hash_array parts[0];
> +};
> +
> +/**
> + * Create a bloom filter builder.
> + * @param part_count - number of key parts
> + * @return bloom builder on success or NULL on OOM
> + */
> +struct tuple_bloom_builder *
> +tuple_bloom_builder_new(uint32_t part_count);
> +
> +/**
> + * Destroy a bloom filter builder.
> + * @param builder - bloom filter builder
> + */
> +void
> +tuple_bloom_builder_delete(struct tuple_bloom_builder *builder);
> +
> +/**
> + * Add a tuple hash to a bloom filter builder.
> + * @param builder - bloom filter builder
> + * @param tuple - tuple to add
> + * @param key_def - key definition
> + * @param hashed_parts - number of key parts that have already
> + *  been added to the builder
> + * @return 0 on success, -1 on OOM
> + */
> +int
> +tuple_bloom_builder_add(struct tuple_bloom_builder *builder,
> +			const struct tuple *tuple,
> +			const struct key_def *key_def,
> +			uint32_t hashed_parts);
> +
> +/**
> + * Create a new bloom filter.

a new tuple bloom filter.

> + * @param builder - bloom filter builder
> + * @param fpr - desired false positive rate
> + * @return bloom filter on success or NULL on OOM
> + */
> +struct tuple_bloom *
> +tuple_bloom_new(struct tuple_bloom_builder *builder, double fpr);
> +
> +/**
> + * Delete a bloom filter.
> + * @param bloom - bloom filter to delete
> + */
> +void
> +tuple_bloom_delete(struct tuple_bloom *bloom);
> +
> +/**
> + * Check if a tuple was stored in a bloom filter.
> + * @param bloom - bloom filter
> + * @param tuple - tuple to check
> + * @param key_def - key definition
> + * @return true if the tuple may have been stored in the bloom,
> + *  false if the tuple is definitely not in the bloom
> + */
> +bool
> +tuple_bloom_possible_has(const struct tuple_bloom *bloom,
> +			 const struct tuple *tuple,
> +			 const struct key_def *key_def);
> +

suggestion: possible -> maybe

> +/**
> + * Check if a tuple matching a key was stored in a bloom filter.
> + * @param bloom - bloom filter
> + * @param key - key to check
> + * @param part_count - number of parts in the key
> + * @param key_def - key definition
> + * @return true if there may be a tuple matching the key stored in
> + *  the bloom, false if there is definitely no such tuple
> + */
> +bool
> +tuple_bloom_possible_has_key(const struct tuple_bloom *bloom,
> +			     const char *key, uint32_t part_count,
> +			     const struct key_def *key_def);
> +
> +/**
> + * Return the size of a bloom filter when encoded.
> + * @param bloom - bloom filter
> + * @return size of the bloom filter, in bytes
> + */
> +size_t
> +tuple_bloom_size(const struct tuple_bloom *bloom);
> +
> +/**
> + * Encode a bloom filter in MsgPack.
> + * @param bloom - bloom filter
> + * @param buf - buffer where to store the bloom filter
> + * @return pointer to the first byte following encoded data
> + */
> +char *
> +tuple_bloom_encode(const struct tuple_bloom *bloom, char *buf);
> +
> +/**
> + * Decode a bloom filter from MsgPack.
> + * @param data - pointer to buffer storing encoded bloom filter;
> + *  on success it is advanced by the number of decoded bytes
> + * @return the decoded bloom on success or NULL on OOM
> + */
> +struct tuple_bloom *
> +tuple_bloom_decode(const char **data);
> +
> +#if defined(__cplusplus)
> +} /* extern "C" */
> +#endif /* defined(__cplusplus) */
> +
> +#endif /* TARANTOOL_BOX_TUPLE_BLOOM_H_INCLUDED */
> diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
> index 13ebfc5..cfee004 100644
> --- a/src/box/tuple_compare.cc
> +++ b/src/box/tuple_compare.cc
> @@ -426,6 +426,30 @@ tuple_compare_field_with_hint(const char *field_a, enum mp_type a_type,
>  	}
>  }
>  
> +uint32_t
> +tuple_common_key_parts(const struct tuple *tuple_a,
> +		       const struct tuple *tuple_b,
> +		       const struct key_def *key_def)
> +{
> +	uint32_t i;
> +	for (i = 0; i < key_def->part_count; i++) {
> +		const struct key_part *part = &key_def->parts[i];
> +		const char *field_a = tuple_field(tuple_a, part->fieldno);
> +		const char *field_b = tuple_field(tuple_b, part->fieldno);
> +		enum mp_type a_type = field_a != NULL ?
> +				      mp_typeof(*field_a) : MP_NIL;
> +		enum mp_type b_type = field_b != NULL ?
> +				      mp_typeof(*field_b) : MP_NIL;
> +		if (a_type == MP_NIL && b_type == MP_NIL)
> +			continue;
> +		if (a_type == MP_NIL || b_type == MP_NIL ||
> +		    tuple_compare_field_with_hint(field_a, a_type,
> +				field_b, b_type, part->type, part->coll) != 0)
> +			break;
> +	}
> +	return i;
> +}
> +
>  template<bool is_nullable, bool has_optional_parts>
>  static inline int
>  tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
> diff --git a/src/box/tuple_compare.h b/src/box/tuple_compare.h
> index 3888bbb..d80e1e7 100644
> --- a/src/box/tuple_compare.h
> +++ b/src/box/tuple_compare.h
> @@ -42,6 +42,18 @@ extern "C" {
>  #endif /* defined(__cplusplus) */
>  
>  /**
> + * Return the length of the longest common prefix of two tuples.
> + * @param tuple_a first tuple
> + * @param tuple_b second tuple
> + * @param key_def key defintion
> + * @return number of parts the two tuples have in common

These are not parts of the tuple, these are parts of the key.
described in the key def.

> + */
> +uint32_t
> +tuple_common_key_parts(const struct tuple *tuple_a,
> +		       const struct tuple *tuple_b,
> +		       const struct key_def *key_def);
> +
> +/**
>   * Create a comparison function for the key_def
>   *
>   * @param key_def key_definition
> diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
> index 0f7ba91..0fa8ea5 100644
> --- a/src/box/tuple_hash.cc
> +++ b/src/box/tuple_hash.cc
> @@ -262,7 +262,7 @@ slowpath:
>  	key_def->key_hash = key_hash_slowpath;
>  }
>  
> -static uint32_t
> +uint32_t
>  tuple_hash_field(uint32_t *ph1, uint32_t *pcarry, const char **field,
>  		 struct coll *coll)
>  {
> @@ -308,6 +308,17 @@ tuple_hash_null(uint32_t *ph1, uint32_t *pcarry)
>  	return mp_sizeof_nil();
>  }
>  
> +uint32_t
> +tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry,
> +		    const struct tuple *tuple,
> +		    const struct key_part *part)
> +{
> +	const char *field = tuple_field(tuple, part->fieldno);
> +	if (field == NULL)
> +		return tuple_hash_null(ph1, pcarry);
> +	return tuple_hash_field(ph1, pcarry, &field, part->coll);
> +}

<snip>

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov