Re: [commits] [tarantool] 02/04: vinyl: introduce bloom filters for partial key lookups

Tarantool development patches archive
 help / color / mirror / Atom feed

* Re: [commits] [tarantool] 02/04: vinyl: introduce bloom filters for partial key lookups
@ 2018-03-27 21:08 Konstantin Osipov
       [not found] ` <20180327210938.GC11829@atlas>
  2018-03-28 11:58 ` [commits] [tarantool] 02/04: vinyl: introduce bloom filters for partial key lookups Vladimir Davydov
  0 siblings, 2 replies; 6+ messages in thread
From: Konstantin Osipov @ 2018-03-27 21:08 UTC (permalink / raw)
  To: Vladimir Davydov, commits; +Cc: tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [18/03/21 16:36]:

> commit 5bfb2dc8c91bd9924fc6692006a69b45a532f6da
> Author: Vladimir Davydov <vdavydov.dev@gmail.com>
> AuthorDate: Fri Feb 23 23:01:24 2018 +0300
> 
>     vinyl: introduce bloom filters for partial key lookups
>     
>     Currently, we store and use bloom only for full-key lookups. However,
>     there are use cases when we can also benefit from maintaining bloom
>     filters for partial keys as well - see #3177 for example. So this patch
>     replaces the current full-key bloom filter with a multipart one, which
>     is basically a set of bloom filters, one per each sub-key.

What's a sub-key? I suggest using "partial key" or "prefix of a
full key", this is less ambiguous (remember my initial questions).

> Old bloom
>     filters stored on disk will be silently ignored. This will degrade
>     performance of existing deployments until major compaction is scheduled,
>     but this should be mitigated by #3139 (knob for forcing compaction).

Please fix this.

>     Besides, there are still no vinyl deployments in production as far as we
>     know.

Please remove this statement.

> Anyway, if there are complaints about the performance degradation,
>     we can add some code to load and use the legacy bloom filter if the new
>     one is unavailable.

No, we should not wait for complaints, and it's wrong discussing
this in a commit comment as if it was Kostja's official policy. I
use this weapon very sparingly I hope.
>     
>     When a key or tuple is checked against a multipart bloom filter, we
>     check all its sub-keys to reduce the false positive result. Nevertheless
>     there's no size optimization as per now. E.g. even if the cardinality of
>     a partial key is the same as of the full key, we will still store two
>     full-sized bloom filters although we could probably save some space in
>     this case by assuming that checking against the bloom corresponding to
>     a partial key would reduce the false positive rate of full key lookups.
>     This is addressed later in the series.
>     
>     Before this patch we used a bloom spectrum object to construct a bloom
>     filter. A bloom spectrum is basically a set of bloom filters ranging in
>     size. The point of using a spectrum is that we don't know what the run
>     size will be while we are writing it so we create 10 bloom filters and
>     choose the best of them after we are done. With the default bloom fpr of
>     0.05 it is 10 byte overhead per record, which seems to be OK. However,
>     if we try to optimize other parameters as well, e.g. the number of hash
>     functions, the cost of a spectrum will become prohibitive. Funny thing
>     is a tuple hash is only 4 bytes long, which means if we stored all
>     hashes in an array and built a bloom filter after we'd written a run, we
>     would reduce the memory footprint by more than half! And that would only
>     slightly increase the run write time as scanning a memory map of hashes
>     and constructing a bloom filter is cheap in comparison to mering runs.
>     Putting it all together, we stop using bloom spectrum in this patch,
>     instead we stash all hashes in a new bloom builder object and use them
>     to build a perfect bloom filer after the run has been written and we
>     know the cardinality of each sub key.
>     
>     Closes #3177
> ---
>  src/box/CMakeLists.txt      |   1 +
>  src/box/iproto_constants.c  |   3 +-
>  src/box/iproto_constants.h  |   4 +-
>  src/box/tuple_bloom.c       | 274 ++++++++++++++++++++++++++++++++++++++++++++
>  src/box/tuple_bloom.h       | 191 ++++++++++++++++++++++++++++++
>  src/box/tuple_compare.cc    |  24 ++++
>  src/box/tuple_compare.h     |  12 ++
>  src/box/tuple_hash.cc       |  13 ++-
>  src/box/tuple_hash.h        |  30 +++++
>  src/box/vy_run.c            | 243 +++++++++++++++------------------------
>  src/box/vy_run.h            |  23 ++--
>  src/box/vy_scheduler.c      |  20 +---
>  src/box/xlog.c              |  27 -----
>  src/box/xlog.h              |   9 --
>  test/unit/vy_point_lookup.c |   2 +-
>  test/vinyl/bloom.result     | 157 ++++++++++++++++++++++++-
>  test/vinyl/bloom.test.lua   |  68 ++++++++++-
>  test/vinyl/info.result      |  16 +--
>  18 files changed, 875 insertions(+), 242 deletions(-)
> 
> diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
> index ad7f910..cb27db7 100644
> --- a/src/box/CMakeLists.txt
> +++ b/src/box/CMakeLists.txt
> @@ -37,6 +37,7 @@ add_library(tuple STATIC
>      tuple_compare.cc
>      tuple_extract_key.cc
>      tuple_hash.cc
> +    tuple_bloom.c
>      tuple_dictionary.c
>      key_def.cc
>      coll_def.c
> diff --git a/src/box/iproto_constants.c b/src/box/iproto_constants.c
> index cd7b1d0..4a8643c 100644
> --- a/src/box/iproto_constants.c
> +++ b/src/box/iproto_constants.c
> @@ -194,7 +194,8 @@ const char *vy_run_info_key_strs[VY_RUN_INFO_KEY_MAX] = {
>  	"min lsn",
>  	"max lsn",
>  	"page count",
> -	"bloom filter"
> +	"bloom filter legacy",
> +	"bloom filter",
>  };
>  
>  const char *vy_row_index_key_strs[VY_ROW_INDEX_KEY_MAX] = {
> diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
> index 9518424..92534ef 100644
> --- a/src/box/iproto_constants.h
> +++ b/src/box/iproto_constants.h
> @@ -295,8 +295,10 @@ enum vy_run_info_key {
>  	VY_RUN_INFO_MAX_LSN = 4,
>  	/** Number of pages in the run. */
>  	VY_RUN_INFO_PAGE_COUNT = 5,
> +	/** Legacy bloom filter implementation. */
> +	VY_RUN_INFO_BLOOM_LEGACY = 6,
>  	/** Bloom filter for keys. */
> -	VY_RUN_INFO_BLOOM = 6,
> +	VY_RUN_INFO_BLOOM = 7,
>  	/** The last key in this enum + 1 */
>  	VY_RUN_INFO_KEY_MAX
>  };
> diff --git a/src/box/tuple_bloom.c b/src/box/tuple_bloom.c
> new file mode 100644
> index 0000000..65e9827
> --- /dev/null
> +++ b/src/box/tuple_bloom.c
> @@ -0,0 +1,274 @@
> +/*
> + * Copyright 2010-2018, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + *    copyright notice, this list of conditions and the
> + *    following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + *    copyright notice, this list of conditions and the following
> + *    disclaimer in the documentation and/or other materials
> + *    provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include "tuple_bloom.h"
> +
> +#include <assert.h>
> +#include <stdbool.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <msgpuck.h>
> +#include "diag.h"
> +#include "errcode.h"
> +#include "memory.h"
> +#include "key_def.h"
> +#include "tuple.h"
> +#include "tuple_hash.h"
> +#include "salad/bloom.h"
> +#include "trivia/util.h"
> +#include "third_party/PMurHash.h"
> +
> +enum { HASH_SEED = 13U };
> +
> +struct tuple_bloom_builder *
> +tuple_bloom_builder_new(uint32_t part_count)
> +{
> +	size_t size = sizeof(struct tuple_bloom_builder) +
> +		part_count * sizeof(struct tuple_hash_array);
> +	struct tuple_bloom_builder *builder = malloc(size);
> +	if (builder == NULL) {
> +		diag_set(OutOfMemory, size, "malloc", "tuple bloom builder");
> +		return NULL;
> +	}
> +	memset(builder, 0, size);
> +	builder->part_count = part_count;
> +	return builder;
> +}
> +
> +void
> +tuple_bloom_builder_delete(struct tuple_bloom_builder *builder)
> +{
> +	for (uint32_t i = 0; i < builder->part_count; i++)
> +		free(builder->parts[i].values);
> +	free(builder);
> +}
> +
> +int
> +tuple_bloom_builder_add(struct tuple_bloom_builder *builder,
> +			const struct tuple *tuple,
> +			const struct key_def *key_def,
> +			uint32_t hashed_parts)
> +{
> +	assert(builder->part_count == key_def->part_count);
> +
> +	uint32_t h = HASH_SEED;
> +	uint32_t carry = 0;
> +	uint32_t total_size = 0;
> +
> +	for (uint32_t i = 0; i < key_def->part_count; i++) {
> +		total_size += tuple_hash_key_part(&h, &carry, tuple,
> +						  &key_def->parts[i]);

Is there a faster way to iterate over key parts?
tuple_hash_key_part() involves tuple_field(). Are you assuming
tuple_field() always has an offset map available?

> +		if (i < hashed_parts)
> +			continue;

Ugh, looks at first glance that we're invoking a hash function
and iterating over key parts only to find out the total size. 
Then I see that you also calculate the hash value and carry as
a side effect. If we can reuse hashed_parts, can't we reuse the
hash of the hashed parts and the total size? Perhaps I'm missing
something.

> +		struct tuple_hash_array *hash_arr = &builder->parts[i];
> +		if (hash_arr->count >= hash_arr->capacity) {
> +			uint32_t capacity = MAX(hash_arr->capacity * 2, 1024U);
> +			uint32_t *values = realloc(hash_arr->values,
> +						   capacity * sizeof(*values));
> +			if (values == NULL) {
> +				diag_set(OutOfMemory, capacity * sizeof(*values),
> +					 "malloc", "tuple hash array");
> +				return -1;
> +			}
> +			hash_arr->capacity = capacity;
> +			hash_arr->values = values;

Sounds like we need an array data structure after all...
Although I'd use a linked list of 4k blocks - I purposefully do not allow
introducing arrays into the source code to avoid inefficient
applications like this one. 

> +		}
> +		uint32_t hash = PMurHash32_Result(h, carry, total_size);
> +		hash_arr->values[hash_arr->count++] = hash;
> +	}
> +	return 0;
> +}
> +
> +struct tuple_bloom *
> +tuple_bloom_new(struct tuple_bloom_builder *builder, double fpr)
> +{
> +	uint32_t part_count = builder->part_count;
> +	size_t size = sizeof(struct tuple_bloom) +
> +		part_count * sizeof(struct bloom);
> +	struct tuple_bloom *bloom = malloc(size);
> +	if (bloom == NULL) {
> +		diag_set(OutOfMemory, size, "malloc", "tuple bloom");
> +		return NULL;
> +	}
> +	for (uint32_t i = 0; i < part_count; i++) {
> +		struct tuple_hash_array *hash_arr = &builder->parts[i];
> +		uint32_t count = hash_arr->count;
> +		if (bloom_create(&bloom->parts[i], count,
> +				 fpr, runtime.quota) != 0) {
> +			diag_set(OutOfMemory, 0, "bloom_create",
> +				 "tuple bloom part");
> +			for (uint32_t j = 0; j < i; j++)
> +				bloom_destroy(&bloom->parts[j], runtime.quota);
> +			free(bloom);

Why can't use use tuple_bloom_delete() here?
Looks like you need to introduce tuple_bloom_init() function for
that.

> +			return NULL;
> +		}
> +		for (uint32_t k = 0; k < count; k++)
> +			bloom_add(&bloom->parts[i], hash_arr->values[k]);
> +	}
> +	bloom->part_count = part_count;
> +	return bloom;
> +}
> +
> +void
> +tuple_bloom_delete(struct tuple_bloom *bloom)
> +{
> +	for (uint32_t i = 0; i < bloom->part_count; i++)
> +		bloom_destroy(&bloom->parts[i], runtime.quota);
> +	free(bloom);
> +}
> +
> +bool
> +tuple_bloom_possible_has(const struct tuple_bloom *bloom,
> +			 const struct tuple *tuple,
> +			 const struct key_def *key_def)
> +{
> +	assert(bloom->part_count == key_def->part_count);
> +
> +	uint32_t h = HASH_SEED;
> +	uint32_t carry = 0;
> +	uint32_t total_size = 0;
> +
> +	for (uint32_t i = 0; i < key_def->part_count; i++) {
> +		total_size += tuple_hash_key_part(&h, &carry, tuple,
> +						  &key_def->parts[i]);
> +		uint32_t hash = PMurHash32_Result(h, carry, total_size);

> +		if (!bloom_possible_has(&bloom->parts[i], hash))
> +			return false;
> +	}
> +	return true;
> +}
> +
> +bool
> +tuple_bloom_possible_has_key(const struct tuple_bloom *bloom,
> +			     const char *key, uint32_t part_count,
> +			     const struct key_def *key_def)
> +{
> +	assert(part_count <= key_def->part_count);
> +	assert(bloom->part_count == key_def->part_count);
> +
> +	uint32_t h = HASH_SEED;
> +	uint32_t carry = 0;
> +	uint32_t total_size = 0;
> +
> +	for (uint32_t i = 0; i < part_count; i++) {
> +		total_size += tuple_hash_field(&h, &carry, &key,
> +					       key_def->parts[i].coll);
> +		uint32_t hash = PMurHash32_Result(h, carry, total_size);
> +		if (!bloom_possible_has(&bloom->parts[i], hash))
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static size_t
> +tuple_bloom_sizeof_part(const struct bloom *part)
> +{
> +	size_t size = 0;
> +	size += mp_sizeof_array(3);
> +	size += mp_sizeof_uint(part->table_size);
> +	size += mp_sizeof_uint(part->hash_count);
> +	size += mp_sizeof_bin(bloom_store_size(part));
> +	return size;
> +}
> +
> +static char *
> +tuple_bloom_encode_part(const struct bloom *part, char *buf)
> +{
> +	buf = mp_encode_array(buf, 3);
> +	buf = mp_encode_uint(buf, part->table_size);
> +	buf = mp_encode_uint(buf, part->hash_count);
> +	buf = mp_encode_binl(buf, bloom_store_size(part));
> +	buf = bloom_store(part, buf);
> +	return buf;
> +}
> +
> +static int
> +tuple_bloom_decode_part(struct bloom *part, const char **data)
> +{
> +	memset(part, 0, sizeof(*part));
> +	if (mp_decode_array(data) != 3)
> +		unreachable();
> +	part->table_size = mp_decode_uint(data);
> +	part->hash_count = mp_decode_uint(data);
> +	size_t store_size = mp_decode_binl(data);
> +	assert(store_size == bloom_store_size(part));
> +	if (bloom_load_table(part, *data, runtime.quota) != 0) {
> +		diag_set(OutOfMemory, store_size, "bloom_load_table",
> +			 "tuple bloom part");
> +		return -1;
> +	}
> +	*data += store_size;
> +	return 0;
> +}
> +
> +size_t
> +tuple_bloom_size(const struct tuple_bloom *bloom)
> +{
> +	size_t size = 0;
> +	size += mp_sizeof_array(bloom->part_count);
> +	for (uint32_t i = 0; i < bloom->part_count; i++)
> +		size += tuple_bloom_sizeof_part(&bloom->parts[i]);
> +	return size;
> +}
> +
> +char *
> +tuple_bloom_encode(const struct tuple_bloom *bloom, char *buf)
> +{
> +	buf = mp_encode_array(buf, bloom->part_count);
> +	for (uint32_t i = 0; i < bloom->part_count; i++)
> +		buf = tuple_bloom_encode_part(&bloom->parts[i], buf);
> +	return buf;
> +}
> +
> +struct tuple_bloom *
> +tuple_bloom_decode(const char **data)
> +{
> +	uint32_t part_count = mp_decode_array(data);
> +	struct tuple_bloom *bloom = malloc(sizeof(*bloom) +
> +			part_count * sizeof(*bloom->parts));
> +	if (bloom == NULL) {
> +		diag_set(OutOfMemory, sizeof(*bloom) +
> +			 part_count * sizeof(*bloom->parts),
> +			 "malloc", "tuple bloom");
> +		return NULL;
> +	}
> +	for (uint32_t i = 0; i < part_count; i++) {
> +		if (tuple_bloom_decode_part(&bloom->parts[i], data) != 0) {
> +			for (uint32_t j = 0; j < i; j++)
> +				bloom_destroy(&bloom->parts[j], runtime.quota);
> +			free(bloom);
> +			return NULL;
> +		}
> +	}
> +	bloom->part_count = part_count;
> +	return bloom;
> +}
> diff --git a/src/box/tuple_bloom.h b/src/box/tuple_bloom.h
> new file mode 100644
> index 0000000..24a7e1c
> --- /dev/null
> +++ b/src/box/tuple_bloom.h
> @@ -0,0 +1,191 @@
> +#ifndef TARANTOOL_BOX_TUPLE_BLOOM_H_INCLUDED
> +#define TARANTOOL_BOX_TUPLE_BLOOM_H_INCLUDED
> +/*
> + * Copyright 2010-2018, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + *    copyright notice, this list of conditions and the
> + *    following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + *    copyright notice, this list of conditions and the following
> + *    disclaimer in the documentation and/or other materials
> + *    provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +
> +#include <stdbool.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include "salad/bloom.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif /* defined(__cplusplus) */
> +
> +struct tuple;
> +struct key_def;
> +
> +/**
> + * Tuple bloom filter.
> + *
> + * Consists of a set of bloom filters, one per each sub key.

key prefix.

> + * When a key is checked to be hashed in the bloom, all its
> + * sub keys are checked as well, which lowers the probability
> + * of false positive results.
> + */
> +struct tuple_bloom {
> +	/** Number of key parts. */
> +	uint32_t part_count;
> +	/** Array of bloom filters, one per each sub key. */
> +	struct bloom parts[0];
> +};
> +
> +/**
> + * Array of tuple hashes.
> + */
> +struct tuple_hash_array {
> +	/** Number of hashes stored in the array. */
> +	uint32_t count;
> +	/** Capacity of the array of hashes. */
> +	uint32_t capacity;
> +	/** Array of hashes. */
> +	uint32_t *values;
> +};
> +
> +/**
> + * Tuple bloom filter builder.
> + *
> + * Construction of a bloom filter proceeds as follows.
> + * First all tuple hashes are stored in a builder object.
> + * Once all hashes have been stored, a bloom filter of
> + * the optimal size and all the hashes are added to it.
> + */
> +struct tuple_bloom_builder {
> +	/** Number of key parts. */
> +	uint32_t part_count;
> +	/** Hash arrays, one per each sub key. */
> +	struct tuple_hash_array parts[0];
> +};
> +
> +/**
> + * Create a bloom filter builder.
> + * @param part_count - number of key parts
> + * @return bloom builder on success or NULL on OOM
> + */
> +struct tuple_bloom_builder *
> +tuple_bloom_builder_new(uint32_t part_count);
> +
> +/**
> + * Destroy a bloom filter builder.
> + * @param builder - bloom filter builder
> + */
> +void
> +tuple_bloom_builder_delete(struct tuple_bloom_builder *builder);
> +
> +/**
> + * Add a tuple hash to a bloom filter builder.
> + * @param builder - bloom filter builder
> + * @param tuple - tuple to add
> + * @param key_def - key definition
> + * @param hashed_parts - number of key parts that have already
> + *  been added to the builder
> + * @return 0 on success, -1 on OOM
> + */
> +int
> +tuple_bloom_builder_add(struct tuple_bloom_builder *builder,
> +			const struct tuple *tuple,
> +			const struct key_def *key_def,
> +			uint32_t hashed_parts);
> +
> +/**
> + * Create a new bloom filter.

a new tuple bloom filter.

> + * @param builder - bloom filter builder
> + * @param fpr - desired false positive rate
> + * @return bloom filter on success or NULL on OOM
> + */
> +struct tuple_bloom *
> +tuple_bloom_new(struct tuple_bloom_builder *builder, double fpr);
> +
> +/**
> + * Delete a bloom filter.
> + * @param bloom - bloom filter to delete
> + */
> +void
> +tuple_bloom_delete(struct tuple_bloom *bloom);
> +
> +/**
> + * Check if a tuple was stored in a bloom filter.
> + * @param bloom - bloom filter
> + * @param tuple - tuple to check
> + * @param key_def - key definition
> + * @return true if the tuple may have been stored in the bloom,
> + *  false if the tuple is definitely not in the bloom
> + */
> +bool
> +tuple_bloom_possible_has(const struct tuple_bloom *bloom,
> +			 const struct tuple *tuple,
> +			 const struct key_def *key_def);
> +

suggestion: possible -> maybe

> +/**
> + * Check if a tuple matching a key was stored in a bloom filter.
> + * @param bloom - bloom filter
> + * @param key - key to check
> + * @param part_count - number of parts in the key
> + * @param key_def - key definition
> + * @return true if there may be a tuple matching the key stored in
> + *  the bloom, false if there is definitely no such tuple
> + */
> +bool
> +tuple_bloom_possible_has_key(const struct tuple_bloom *bloom,
> +			     const char *key, uint32_t part_count,
> +			     const struct key_def *key_def);
> +
> +/**
> + * Return the size of a bloom filter when encoded.
> + * @param bloom - bloom filter
> + * @return size of the bloom filter, in bytes
> + */
> +size_t
> +tuple_bloom_size(const struct tuple_bloom *bloom);
> +
> +/**
> + * Encode a bloom filter in MsgPack.
> + * @param bloom - bloom filter
> + * @param buf - buffer where to store the bloom filter
> + * @return pointer to the first byte following encoded data
> + */
> +char *
> +tuple_bloom_encode(const struct tuple_bloom *bloom, char *buf);
> +
> +/**
> + * Decode a bloom filter from MsgPack.
> + * @param data - pointer to buffer storing encoded bloom filter;
> + *  on success it is advanced by the number of decoded bytes
> + * @return the decoded bloom on success or NULL on OOM
> + */
> +struct tuple_bloom *
> +tuple_bloom_decode(const char **data);
> +
> +#if defined(__cplusplus)
> +} /* extern "C" */
> +#endif /* defined(__cplusplus) */
> +
> +#endif /* TARANTOOL_BOX_TUPLE_BLOOM_H_INCLUDED */
> diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
> index 13ebfc5..cfee004 100644
> --- a/src/box/tuple_compare.cc
> +++ b/src/box/tuple_compare.cc
> @@ -426,6 +426,30 @@ tuple_compare_field_with_hint(const char *field_a, enum mp_type a_type,
>  	}
>  }
>  
> +uint32_t
> +tuple_common_key_parts(const struct tuple *tuple_a,
> +		       const struct tuple *tuple_b,
> +		       const struct key_def *key_def)
> +{
> +	uint32_t i;
> +	for (i = 0; i < key_def->part_count; i++) {
> +		const struct key_part *part = &key_def->parts[i];
> +		const char *field_a = tuple_field(tuple_a, part->fieldno);
> +		const char *field_b = tuple_field(tuple_b, part->fieldno);
> +		enum mp_type a_type = field_a != NULL ?
> +				      mp_typeof(*field_a) : MP_NIL;
> +		enum mp_type b_type = field_b != NULL ?
> +				      mp_typeof(*field_b) : MP_NIL;
> +		if (a_type == MP_NIL && b_type == MP_NIL)
> +			continue;
> +		if (a_type == MP_NIL || b_type == MP_NIL ||
> +		    tuple_compare_field_with_hint(field_a, a_type,
> +				field_b, b_type, part->type, part->coll) != 0)
> +			break;
> +	}
> +	return i;
> +}
> +
>  template<bool is_nullable, bool has_optional_parts>
>  static inline int
>  tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
> diff --git a/src/box/tuple_compare.h b/src/box/tuple_compare.h
> index 3888bbb..d80e1e7 100644
> --- a/src/box/tuple_compare.h
> +++ b/src/box/tuple_compare.h
> @@ -42,6 +42,18 @@ extern "C" {
>  #endif /* defined(__cplusplus) */
>  
>  /**
> + * Return the length of the longest common prefix of two tuples.
> + * @param tuple_a first tuple
> + * @param tuple_b second tuple
> + * @param key_def key defintion
> + * @return number of parts the two tuples have in common

These are not parts of the tuple, these are parts of the key.
described in the key def.

> + */
> +uint32_t
> +tuple_common_key_parts(const struct tuple *tuple_a,
> +		       const struct tuple *tuple_b,
> +		       const struct key_def *key_def);
> +
> +/**
>   * Create a comparison function for the key_def
>   *
>   * @param key_def key_definition
> diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
> index 0f7ba91..0fa8ea5 100644
> --- a/src/box/tuple_hash.cc
> +++ b/src/box/tuple_hash.cc
> @@ -262,7 +262,7 @@ slowpath:
>  	key_def->key_hash = key_hash_slowpath;
>  }
>  
> -static uint32_t
> +uint32_t
>  tuple_hash_field(uint32_t *ph1, uint32_t *pcarry, const char **field,
>  		 struct coll *coll)
>  {
> @@ -308,6 +308,17 @@ tuple_hash_null(uint32_t *ph1, uint32_t *pcarry)
>  	return mp_sizeof_nil();
>  }
>  
> +uint32_t
> +tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry,
> +		    const struct tuple *tuple,
> +		    const struct key_part *part)
> +{
> +	const char *field = tuple_field(tuple, part->fieldno);
> +	if (field == NULL)
> +		return tuple_hash_null(ph1, pcarry);
> +	return tuple_hash_field(ph1, pcarry, &field, part->coll);
> +}


<snip>

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [commits] [tarantool] 03/04: bloom: optimize tuple bloom filter size
       [not found] ` <20180327210938.GC11829@atlas>
@ 2018-03-28 11:16   ` Vladimir Davydov
  2018-03-28 17:03     ` Konstantin Osipov
  0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Davydov @ 2018-03-28 11:16 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, commits

On Wed, Mar 28, 2018 at 12:09:38AM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [18/03/21 16:36]:
> > 
> 
> Please add references to the papers you have been using to
> construct the hash functions and calculate the theoretical false
> positive rate. 

They are already in bloom.h, thanks to Alexander Lyapunov.

> 
> In future, It would be nice to actually have stats about actual false
> positive rates so that we know if the papers are BS or not.

We already have bloom.hit and bloom.miss stat counters in index.info()
so we can calculate the false positive rate as

  bloom.miss / (bloom.hit + bloom.miss)

> 
> Otherwise this part is OK.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [commits] [tarantool] 02/04: vinyl: introduce bloom filters for partial key lookups
  2018-03-27 21:08 [commits] [tarantool] 02/04: vinyl: introduce bloom filters for partial key lookups Konstantin Osipov
       [not found] ` <20180327210938.GC11829@atlas>
@ 2018-03-28 11:58 ` Vladimir Davydov
  2018-03-28 17:03   ` Konstantin Osipov
  1 sibling, 1 reply; 6+ messages in thread
From: Vladimir Davydov @ 2018-03-28 11:58 UTC (permalink / raw)
  To: Konstantin Osipov, tarantool-patches, commits

On Wed, Mar 28, 2018 at 12:08:17AM +0300, Konstantin Osipov wrote:
> > +int
> > +tuple_bloom_builder_add(struct tuple_bloom_builder *builder,
> > +			const struct tuple *tuple,
> > +			const struct key_def *key_def,
> > +			uint32_t hashed_parts)
> > +{
> > +	assert(builder->part_count == key_def->part_count);
> > +
> > +	uint32_t h = HASH_SEED;
> > +	uint32_t carry = 0;
> > +	uint32_t total_size = 0;
> > +
> > +	for (uint32_t i = 0; i < key_def->part_count; i++) {
> > +		total_size += tuple_hash_key_part(&h, &carry, tuple,
> > +						  &key_def->parts[i]);
> 
> Is there a faster way to iterate over key parts?
> tuple_hash_key_part() involves tuple_field(). Are you assuming
> tuple_field() always has an offset map available?

Yes, as this is a key part. I can, of course, implement the optimization
done by tuple_hash_slowpath(), but I don't think it's worth it in this
particular case.

> 
> > +		if (i < hashed_parts)
> > +			continue;
> 
> Ugh, looks at first glance that we're invoking a hash function
> and iterating over key parts only to find out the total size. 
> Then I see that you also calculate the hash value and carry as
> a side effect. If we can reuse hashed_parts, can't we reuse the
> hash of the hashed parts and the total size?

I guess we could, but I'm not sure it's worth the complexity it would
introduce. After all, this code is only called from a worker thread, no
point to over-optimize.

> 
> > +		struct tuple_hash_array *hash_arr = &builder->parts[i];
> > +		if (hash_arr->count >= hash_arr->capacity) {
> > +			uint32_t capacity = MAX(hash_arr->capacity * 2, 1024U);
> > +			uint32_t *values = realloc(hash_arr->values,
> > +						   capacity * sizeof(*values));
> > +			if (values == NULL) {
> > +				diag_set(OutOfMemory, capacity * sizeof(*values),
> > +					 "malloc", "tuple hash array");
> > +				return -1;
> > +			}
> > +			hash_arr->capacity = capacity;
> > +			hash_arr->values = values;
> 
> Sounds like we need an array data structure after all...

Yes, there are several places where we have to dynamically grow
an array of entries.

> Although I'd use a linked list of 4k blocks - I purposefully do not allow
> introducing arrays into the source code to avoid inefficient
> applications like this one. 

True, in this particular case it isn't a problem to use a list of blocks
instead of a plain array. I'll rework.

> 
> > +		}
> > +		uint32_t hash = PMurHash32_Result(h, carry, total_size);
> > +		hash_arr->values[hash_arr->count++] = hash;
> > +	}
> > +	return 0;
> > +}
> > +
> > +struct tuple_bloom *
> > +tuple_bloom_new(struct tuple_bloom_builder *builder, double fpr)
> > +{
> > +	uint32_t part_count = builder->part_count;
> > +	size_t size = sizeof(struct tuple_bloom) +
> > +		part_count * sizeof(struct bloom);
> > +	struct tuple_bloom *bloom = malloc(size);
> > +	if (bloom == NULL) {
> > +		diag_set(OutOfMemory, size, "malloc", "tuple bloom");
> > +		return NULL;
> > +	}
> > +	for (uint32_t i = 0; i < part_count; i++) {
> > +		struct tuple_hash_array *hash_arr = &builder->parts[i];
> > +		uint32_t count = hash_arr->count;
> > +		if (bloom_create(&bloom->parts[i], count,
> > +				 fpr, runtime.quota) != 0) {
> > +			diag_set(OutOfMemory, 0, "bloom_create",
> > +				 "tuple bloom part");
> > +			for (uint32_t j = 0; j < i; j++)
> > +				bloom_destroy(&bloom->parts[j], runtime.quota);
> > +			free(bloom);
> 
> Why can't use use tuple_bloom_delete() here?
> Looks like you need to introduce tuple_bloom_init() function for
> that.

If you insist.

> 
> > +			return NULL;
> > +		}
> > +		for (uint32_t k = 0; k < count; k++)
> > +			bloom_add(&bloom->parts[i], hash_arr->values[k]);
> > +	}
> > +	bloom->part_count = part_count;
> > +	return bloom;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [commits] [tarantool] 02/04: vinyl: introduce bloom filters for partial key lookups
  2018-03-28 11:58 ` [commits] [tarantool] 02/04: vinyl: introduce bloom filters for partial key lookups Vladimir Davydov
@ 2018-03-28 17:03   ` Konstantin Osipov
  0 siblings, 0 replies; 6+ messages in thread
From: Konstantin Osipov @ 2018-03-28 17:03 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches, commits

* Vladimir Davydov <vdavydov.dev@gmail.com> [18/03/28 15:00]:
> > Why can't use use tuple_bloom_delete() here?
> > Looks like you need to introduce tuple_bloom_init() function for
> > that.
> 
> If you insist.

No, I just wanted to bring this up for consideration.

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [commits] [tarantool] 03/04: bloom: optimize tuple bloom filter size
  2018-03-28 11:16   ` [commits] [tarantool] 03/04: bloom: optimize tuple bloom filter size Vladimir Davydov
@ 2018-03-28 17:03     ` Konstantin Osipov
  2018-03-28 17:19       ` Vladimir Davydov
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantin Osipov @ 2018-03-28 17:03 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches, commits

* Vladimir Davydov <vdavydov.dev@gmail.com> [18/03/28 14:50]:
> > 
> > In future, It would be nice to actually have stats about actual false
> > positive rates so that we know if the papers are BS or not.
> 
> We already have bloom.hit and bloom.miss stat counters in index.info()
> so we can calculate the false positive rate as
> 
>   bloom.miss / (bloom.hit + bloom.miss)

Do you have a test verifying that the formula works?

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [commits] [tarantool] 03/04: bloom: optimize tuple bloom filter size
  2018-03-28 17:03     ` Konstantin Osipov
@ 2018-03-28 17:19       ` Vladimir Davydov
  0 siblings, 0 replies; 6+ messages in thread
From: Vladimir Davydov @ 2018-03-28 17:19 UTC (permalink / raw)
  To: Konstantin Osipov, tarantool-patches, commits

On Wed, Mar 28, 2018 at 08:03:48PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [18/03/28 14:50]:
> > > 
> > > In future, It would be nice to actually have stats about actual false
> > > positive rates so that we know if the papers are BS or not.
> > 
> > We already have bloom.hit and bloom.miss stat counters in index.info()
> > so we can calculate the false positive rate as
> > 
> >   bloom.miss / (bloom.hit + bloom.miss)
> 
> Do you have a test verifying that the formula works?

vinyl/bloom.test.lua uses these stats to check that observed false
positive rate doesn't exceed the configured value.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-28 17:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-27 21:08 [commits] [tarantool] 02/04: vinyl: introduce bloom filters for partial key lookups Konstantin Osipov
     [not found] ` <20180327210938.GC11829@atlas>
2018-03-28 11:16   ` [commits] [tarantool] 03/04: bloom: optimize tuple bloom filter size Vladimir Davydov
2018-03-28 17:03     ` Konstantin Osipov
2018-03-28 17:19       ` Vladimir Davydov
2018-03-28 11:58 ` [commits] [tarantool] 02/04: vinyl: introduce bloom filters for partial key lookups Vladimir Davydov
2018-03-28 17:03   ` Konstantin Osipov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox