From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 28 Mar 2018 00:08:17 +0300 From: Konstantin Osipov Subject: Re: [commits] [tarantool] 02/04: vinyl: introduce bloom filters for partial key lookups Message-ID: <20180327210817.GB11829@atlas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline To: Vladimir Davydov , commits@tarantool.org Cc: tarantool-patches@freelists.org List-ID: * Vladimir Davydov [18/03/21 16:36]: > commit 5bfb2dc8c91bd9924fc6692006a69b45a532f6da > Author: Vladimir Davydov > AuthorDate: Fri Feb 23 23:01:24 2018 +0300 > > vinyl: introduce bloom filters for partial key lookups > > Currently, we store and use bloom only for full-key lookups. However, > there are use cases when we can also benefit from maintaining bloom > filters for partial keys as well - see #3177 for example. So this patch > replaces the current full-key bloom filter with a multipart one, which > is basically a set of bloom filters, one per each sub-key. What's a sub-key? I suggest using "partial key" or "prefix of a full key", this is less ambiguous (remember my initial questions). > Old bloom > filters stored on disk will be silently ignored. This will degrade > performance of existing deployments until major compaction is scheduled, > but this should be mitigated by #3139 (knob for forcing compaction). Please fix this. > Besides, there are still no vinyl deployments in production as far as we > know. Please remove this statement. > Anyway, if there are complaints about the performance degradation, > we can add some code to load and use the legacy bloom filter if the new > one is unavailable. No, we should not wait for complaints, and it's wrong discussing this in a commit comment as if it was Kostja's official policy. I use this weapon very sparingly I hope. > > When a key or tuple is checked against a multipart bloom filter, we > check all its sub-keys to reduce the false positive result. Nevertheless > there's no size optimization as per now. E.g. even if the cardinality of > a partial key is the same as of the full key, we will still store two > full-sized bloom filters although we could probably save some space in > this case by assuming that checking against the bloom corresponding to > a partial key would reduce the false positive rate of full key lookups. > This is addressed later in the series. > > Before this patch we used a bloom spectrum object to construct a bloom > filter. A bloom spectrum is basically a set of bloom filters ranging in > size. The point of using a spectrum is that we don't know what the run > size will be while we are writing it so we create 10 bloom filters and > choose the best of them after we are done. With the default bloom fpr of > 0.05 it is 10 byte overhead per record, which seems to be OK. However, > if we try to optimize other parameters as well, e.g. the number of hash > functions, the cost of a spectrum will become prohibitive. Funny thing > is a tuple hash is only 4 bytes long, which means if we stored all > hashes in an array and built a bloom filter after we'd written a run, we > would reduce the memory footprint by more than half! And that would only > slightly increase the run write time as scanning a memory map of hashes > and constructing a bloom filter is cheap in comparison to mering runs. > Putting it all together, we stop using bloom spectrum in this patch, > instead we stash all hashes in a new bloom builder object and use them > to build a perfect bloom filer after the run has been written and we > know the cardinality of each sub key. > > Closes #3177 > --- > src/box/CMakeLists.txt | 1 + > src/box/iproto_constants.c | 3 +- > src/box/iproto_constants.h | 4 +- > src/box/tuple_bloom.c | 274 ++++++++++++++++++++++++++++++++++++++++++++ > src/box/tuple_bloom.h | 191 ++++++++++++++++++++++++++++++ > src/box/tuple_compare.cc | 24 ++++ > src/box/tuple_compare.h | 12 ++ > src/box/tuple_hash.cc | 13 ++- > src/box/tuple_hash.h | 30 +++++ > src/box/vy_run.c | 243 +++++++++++++++------------------------ > src/box/vy_run.h | 23 ++-- > src/box/vy_scheduler.c | 20 +--- > src/box/xlog.c | 27 ----- > src/box/xlog.h | 9 -- > test/unit/vy_point_lookup.c | 2 +- > test/vinyl/bloom.result | 157 ++++++++++++++++++++++++- > test/vinyl/bloom.test.lua | 68 ++++++++++- > test/vinyl/info.result | 16 +-- > 18 files changed, 875 insertions(+), 242 deletions(-) > > diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt > index ad7f910..cb27db7 100644 > --- a/src/box/CMakeLists.txt > +++ b/src/box/CMakeLists.txt > @@ -37,6 +37,7 @@ add_library(tuple STATIC > tuple_compare.cc > tuple_extract_key.cc > tuple_hash.cc > + tuple_bloom.c > tuple_dictionary.c > key_def.cc > coll_def.c > diff --git a/src/box/iproto_constants.c b/src/box/iproto_constants.c > index cd7b1d0..4a8643c 100644 > --- a/src/box/iproto_constants.c > +++ b/src/box/iproto_constants.c > @@ -194,7 +194,8 @@ const char *vy_run_info_key_strs[VY_RUN_INFO_KEY_MAX] = { > "min lsn", > "max lsn", > "page count", > - "bloom filter" > + "bloom filter legacy", > + "bloom filter", > }; > > const char *vy_row_index_key_strs[VY_ROW_INDEX_KEY_MAX] = { > diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h > index 9518424..92534ef 100644 > --- a/src/box/iproto_constants.h > +++ b/src/box/iproto_constants.h > @@ -295,8 +295,10 @@ enum vy_run_info_key { > VY_RUN_INFO_MAX_LSN = 4, > /** Number of pages in the run. */ > VY_RUN_INFO_PAGE_COUNT = 5, > + /** Legacy bloom filter implementation. */ > + VY_RUN_INFO_BLOOM_LEGACY = 6, > /** Bloom filter for keys. */ > - VY_RUN_INFO_BLOOM = 6, > + VY_RUN_INFO_BLOOM = 7, > /** The last key in this enum + 1 */ > VY_RUN_INFO_KEY_MAX > }; > diff --git a/src/box/tuple_bloom.c b/src/box/tuple_bloom.c > new file mode 100644 > index 0000000..65e9827 > --- /dev/null > +++ b/src/box/tuple_bloom.c > @@ -0,0 +1,274 @@ > +/* > + * Copyright 2010-2018, Tarantool AUTHORS, please see AUTHORS file. > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * 1. Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the > + * following disclaimer. > + * > + * 2. Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY ``AS IS'' AND > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL > + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, > + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR > + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF > + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF > + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > + * SUCH DAMAGE. > + */ > +#include "tuple_bloom.h" > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include "diag.h" > +#include "errcode.h" > +#include "memory.h" > +#include "key_def.h" > +#include "tuple.h" > +#include "tuple_hash.h" > +#include "salad/bloom.h" > +#include "trivia/util.h" > +#include "third_party/PMurHash.h" > + > +enum { HASH_SEED = 13U }; > + > +struct tuple_bloom_builder * > +tuple_bloom_builder_new(uint32_t part_count) > +{ > + size_t size = sizeof(struct tuple_bloom_builder) + > + part_count * sizeof(struct tuple_hash_array); > + struct tuple_bloom_builder *builder = malloc(size); > + if (builder == NULL) { > + diag_set(OutOfMemory, size, "malloc", "tuple bloom builder"); > + return NULL; > + } > + memset(builder, 0, size); > + builder->part_count = part_count; > + return builder; > +} > + > +void > +tuple_bloom_builder_delete(struct tuple_bloom_builder *builder) > +{ > + for (uint32_t i = 0; i < builder->part_count; i++) > + free(builder->parts[i].values); > + free(builder); > +} > + > +int > +tuple_bloom_builder_add(struct tuple_bloom_builder *builder, > + const struct tuple *tuple, > + const struct key_def *key_def, > + uint32_t hashed_parts) > +{ > + assert(builder->part_count == key_def->part_count); > + > + uint32_t h = HASH_SEED; > + uint32_t carry = 0; > + uint32_t total_size = 0; > + > + for (uint32_t i = 0; i < key_def->part_count; i++) { > + total_size += tuple_hash_key_part(&h, &carry, tuple, > + &key_def->parts[i]); Is there a faster way to iterate over key parts? tuple_hash_key_part() involves tuple_field(). Are you assuming tuple_field() always has an offset map available? > + if (i < hashed_parts) > + continue; Ugh, looks at first glance that we're invoking a hash function and iterating over key parts only to find out the total size. Then I see that you also calculate the hash value and carry as a side effect. If we can reuse hashed_parts, can't we reuse the hash of the hashed parts and the total size? Perhaps I'm missing something. > + struct tuple_hash_array *hash_arr = &builder->parts[i]; > + if (hash_arr->count >= hash_arr->capacity) { > + uint32_t capacity = MAX(hash_arr->capacity * 2, 1024U); > + uint32_t *values = realloc(hash_arr->values, > + capacity * sizeof(*values)); > + if (values == NULL) { > + diag_set(OutOfMemory, capacity * sizeof(*values), > + "malloc", "tuple hash array"); > + return -1; > + } > + hash_arr->capacity = capacity; > + hash_arr->values = values; Sounds like we need an array data structure after all... Although I'd use a linked list of 4k blocks - I purposefully do not allow introducing arrays into the source code to avoid inefficient applications like this one. > + } > + uint32_t hash = PMurHash32_Result(h, carry, total_size); > + hash_arr->values[hash_arr->count++] = hash; > + } > + return 0; > +} > + > +struct tuple_bloom * > +tuple_bloom_new(struct tuple_bloom_builder *builder, double fpr) > +{ > + uint32_t part_count = builder->part_count; > + size_t size = sizeof(struct tuple_bloom) + > + part_count * sizeof(struct bloom); > + struct tuple_bloom *bloom = malloc(size); > + if (bloom == NULL) { > + diag_set(OutOfMemory, size, "malloc", "tuple bloom"); > + return NULL; > + } > + for (uint32_t i = 0; i < part_count; i++) { > + struct tuple_hash_array *hash_arr = &builder->parts[i]; > + uint32_t count = hash_arr->count; > + if (bloom_create(&bloom->parts[i], count, > + fpr, runtime.quota) != 0) { > + diag_set(OutOfMemory, 0, "bloom_create", > + "tuple bloom part"); > + for (uint32_t j = 0; j < i; j++) > + bloom_destroy(&bloom->parts[j], runtime.quota); > + free(bloom); Why can't use use tuple_bloom_delete() here? Looks like you need to introduce tuple_bloom_init() function for that. > + return NULL; > + } > + for (uint32_t k = 0; k < count; k++) > + bloom_add(&bloom->parts[i], hash_arr->values[k]); > + } > + bloom->part_count = part_count; > + return bloom; > +} > + > +void > +tuple_bloom_delete(struct tuple_bloom *bloom) > +{ > + for (uint32_t i = 0; i < bloom->part_count; i++) > + bloom_destroy(&bloom->parts[i], runtime.quota); > + free(bloom); > +} > + > +bool > +tuple_bloom_possible_has(const struct tuple_bloom *bloom, > + const struct tuple *tuple, > + const struct key_def *key_def) > +{ > + assert(bloom->part_count == key_def->part_count); > + > + uint32_t h = HASH_SEED; > + uint32_t carry = 0; > + uint32_t total_size = 0; > + > + for (uint32_t i = 0; i < key_def->part_count; i++) { > + total_size += tuple_hash_key_part(&h, &carry, tuple, > + &key_def->parts[i]); > + uint32_t hash = PMurHash32_Result(h, carry, total_size); > + if (!bloom_possible_has(&bloom->parts[i], hash)) > + return false; > + } > + return true; > +} > + > +bool > +tuple_bloom_possible_has_key(const struct tuple_bloom *bloom, > + const char *key, uint32_t part_count, > + const struct key_def *key_def) > +{ > + assert(part_count <= key_def->part_count); > + assert(bloom->part_count == key_def->part_count); > + > + uint32_t h = HASH_SEED; > + uint32_t carry = 0; > + uint32_t total_size = 0; > + > + for (uint32_t i = 0; i < part_count; i++) { > + total_size += tuple_hash_field(&h, &carry, &key, > + key_def->parts[i].coll); > + uint32_t hash = PMurHash32_Result(h, carry, total_size); > + if (!bloom_possible_has(&bloom->parts[i], hash)) > + return false; > + } > + return true; > +} > + > +static size_t > +tuple_bloom_sizeof_part(const struct bloom *part) > +{ > + size_t size = 0; > + size += mp_sizeof_array(3); > + size += mp_sizeof_uint(part->table_size); > + size += mp_sizeof_uint(part->hash_count); > + size += mp_sizeof_bin(bloom_store_size(part)); > + return size; > +} > + > +static char * > +tuple_bloom_encode_part(const struct bloom *part, char *buf) > +{ > + buf = mp_encode_array(buf, 3); > + buf = mp_encode_uint(buf, part->table_size); > + buf = mp_encode_uint(buf, part->hash_count); > + buf = mp_encode_binl(buf, bloom_store_size(part)); > + buf = bloom_store(part, buf); > + return buf; > +} > + > +static int > +tuple_bloom_decode_part(struct bloom *part, const char **data) > +{ > + memset(part, 0, sizeof(*part)); > + if (mp_decode_array(data) != 3) > + unreachable(); > + part->table_size = mp_decode_uint(data); > + part->hash_count = mp_decode_uint(data); > + size_t store_size = mp_decode_binl(data); > + assert(store_size == bloom_store_size(part)); > + if (bloom_load_table(part, *data, runtime.quota) != 0) { > + diag_set(OutOfMemory, store_size, "bloom_load_table", > + "tuple bloom part"); > + return -1; > + } > + *data += store_size; > + return 0; > +} > + > +size_t > +tuple_bloom_size(const struct tuple_bloom *bloom) > +{ > + size_t size = 0; > + size += mp_sizeof_array(bloom->part_count); > + for (uint32_t i = 0; i < bloom->part_count; i++) > + size += tuple_bloom_sizeof_part(&bloom->parts[i]); > + return size; > +} > + > +char * > +tuple_bloom_encode(const struct tuple_bloom *bloom, char *buf) > +{ > + buf = mp_encode_array(buf, bloom->part_count); > + for (uint32_t i = 0; i < bloom->part_count; i++) > + buf = tuple_bloom_encode_part(&bloom->parts[i], buf); > + return buf; > +} > + > +struct tuple_bloom * > +tuple_bloom_decode(const char **data) > +{ > + uint32_t part_count = mp_decode_array(data); > + struct tuple_bloom *bloom = malloc(sizeof(*bloom) + > + part_count * sizeof(*bloom->parts)); > + if (bloom == NULL) { > + diag_set(OutOfMemory, sizeof(*bloom) + > + part_count * sizeof(*bloom->parts), > + "malloc", "tuple bloom"); > + return NULL; > + } > + for (uint32_t i = 0; i < part_count; i++) { > + if (tuple_bloom_decode_part(&bloom->parts[i], data) != 0) { > + for (uint32_t j = 0; j < i; j++) > + bloom_destroy(&bloom->parts[j], runtime.quota); > + free(bloom); > + return NULL; > + } > + } > + bloom->part_count = part_count; > + return bloom; > +} > diff --git a/src/box/tuple_bloom.h b/src/box/tuple_bloom.h > new file mode 100644 > index 0000000..24a7e1c > --- /dev/null > +++ b/src/box/tuple_bloom.h > @@ -0,0 +1,191 @@ > +#ifndef TARANTOOL_BOX_TUPLE_BLOOM_H_INCLUDED > +#define TARANTOOL_BOX_TUPLE_BLOOM_H_INCLUDED > +/* > + * Copyright 2010-2018, Tarantool AUTHORS, please see AUTHORS file. > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * 1. Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the > + * following disclaimer. > + * > + * 2. Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THIS SOFTWARE IS PROVIDED BY ``AS IS'' AND > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL > + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, > + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR > + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF > + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF > + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > + * SUCH DAMAGE. > + */ > + > +#include > +#include > +#include > +#include "salad/bloom.h" > + > +#if defined(__cplusplus) > +extern "C" { > +#endif /* defined(__cplusplus) */ > + > +struct tuple; > +struct key_def; > + > +/** > + * Tuple bloom filter. > + * > + * Consists of a set of bloom filters, one per each sub key. key prefix. > + * When a key is checked to be hashed in the bloom, all its > + * sub keys are checked as well, which lowers the probability > + * of false positive results. > + */ > +struct tuple_bloom { > + /** Number of key parts. */ > + uint32_t part_count; > + /** Array of bloom filters, one per each sub key. */ > + struct bloom parts[0]; > +}; > + > +/** > + * Array of tuple hashes. > + */ > +struct tuple_hash_array { > + /** Number of hashes stored in the array. */ > + uint32_t count; > + /** Capacity of the array of hashes. */ > + uint32_t capacity; > + /** Array of hashes. */ > + uint32_t *values; > +}; > + > +/** > + * Tuple bloom filter builder. > + * > + * Construction of a bloom filter proceeds as follows. > + * First all tuple hashes are stored in a builder object. > + * Once all hashes have been stored, a bloom filter of > + * the optimal size and all the hashes are added to it. > + */ > +struct tuple_bloom_builder { > + /** Number of key parts. */ > + uint32_t part_count; > + /** Hash arrays, one per each sub key. */ > + struct tuple_hash_array parts[0]; > +}; > + > +/** > + * Create a bloom filter builder. > + * @param part_count - number of key parts > + * @return bloom builder on success or NULL on OOM > + */ > +struct tuple_bloom_builder * > +tuple_bloom_builder_new(uint32_t part_count); > + > +/** > + * Destroy a bloom filter builder. > + * @param builder - bloom filter builder > + */ > +void > +tuple_bloom_builder_delete(struct tuple_bloom_builder *builder); > + > +/** > + * Add a tuple hash to a bloom filter builder. > + * @param builder - bloom filter builder > + * @param tuple - tuple to add > + * @param key_def - key definition > + * @param hashed_parts - number of key parts that have already > + * been added to the builder > + * @return 0 on success, -1 on OOM > + */ > +int > +tuple_bloom_builder_add(struct tuple_bloom_builder *builder, > + const struct tuple *tuple, > + const struct key_def *key_def, > + uint32_t hashed_parts); > + > +/** > + * Create a new bloom filter. a new tuple bloom filter. > + * @param builder - bloom filter builder > + * @param fpr - desired false positive rate > + * @return bloom filter on success or NULL on OOM > + */ > +struct tuple_bloom * > +tuple_bloom_new(struct tuple_bloom_builder *builder, double fpr); > + > +/** > + * Delete a bloom filter. > + * @param bloom - bloom filter to delete > + */ > +void > +tuple_bloom_delete(struct tuple_bloom *bloom); > + > +/** > + * Check if a tuple was stored in a bloom filter. > + * @param bloom - bloom filter > + * @param tuple - tuple to check > + * @param key_def - key definition > + * @return true if the tuple may have been stored in the bloom, > + * false if the tuple is definitely not in the bloom > + */ > +bool > +tuple_bloom_possible_has(const struct tuple_bloom *bloom, > + const struct tuple *tuple, > + const struct key_def *key_def); > + suggestion: possible -> maybe > +/** > + * Check if a tuple matching a key was stored in a bloom filter. > + * @param bloom - bloom filter > + * @param key - key to check > + * @param part_count - number of parts in the key > + * @param key_def - key definition > + * @return true if there may be a tuple matching the key stored in > + * the bloom, false if there is definitely no such tuple > + */ > +bool > +tuple_bloom_possible_has_key(const struct tuple_bloom *bloom, > + const char *key, uint32_t part_count, > + const struct key_def *key_def); > + > +/** > + * Return the size of a bloom filter when encoded. > + * @param bloom - bloom filter > + * @return size of the bloom filter, in bytes > + */ > +size_t > +tuple_bloom_size(const struct tuple_bloom *bloom); > + > +/** > + * Encode a bloom filter in MsgPack. > + * @param bloom - bloom filter > + * @param buf - buffer where to store the bloom filter > + * @return pointer to the first byte following encoded data > + */ > +char * > +tuple_bloom_encode(const struct tuple_bloom *bloom, char *buf); > + > +/** > + * Decode a bloom filter from MsgPack. > + * @param data - pointer to buffer storing encoded bloom filter; > + * on success it is advanced by the number of decoded bytes > + * @return the decoded bloom on success or NULL on OOM > + */ > +struct tuple_bloom * > +tuple_bloom_decode(const char **data); > + > +#if defined(__cplusplus) > +} /* extern "C" */ > +#endif /* defined(__cplusplus) */ > + > +#endif /* TARANTOOL_BOX_TUPLE_BLOOM_H_INCLUDED */ > diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc > index 13ebfc5..cfee004 100644 > --- a/src/box/tuple_compare.cc > +++ b/src/box/tuple_compare.cc > @@ -426,6 +426,30 @@ tuple_compare_field_with_hint(const char *field_a, enum mp_type a_type, > } > } > > +uint32_t > +tuple_common_key_parts(const struct tuple *tuple_a, > + const struct tuple *tuple_b, > + const struct key_def *key_def) > +{ > + uint32_t i; > + for (i = 0; i < key_def->part_count; i++) { > + const struct key_part *part = &key_def->parts[i]; > + const char *field_a = tuple_field(tuple_a, part->fieldno); > + const char *field_b = tuple_field(tuple_b, part->fieldno); > + enum mp_type a_type = field_a != NULL ? > + mp_typeof(*field_a) : MP_NIL; > + enum mp_type b_type = field_b != NULL ? > + mp_typeof(*field_b) : MP_NIL; > + if (a_type == MP_NIL && b_type == MP_NIL) > + continue; > + if (a_type == MP_NIL || b_type == MP_NIL || > + tuple_compare_field_with_hint(field_a, a_type, > + field_b, b_type, part->type, part->coll) != 0) > + break; > + } > + return i; > +} > + > template > static inline int > tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, > diff --git a/src/box/tuple_compare.h b/src/box/tuple_compare.h > index 3888bbb..d80e1e7 100644 > --- a/src/box/tuple_compare.h > +++ b/src/box/tuple_compare.h > @@ -42,6 +42,18 @@ extern "C" { > #endif /* defined(__cplusplus) */ > > /** > + * Return the length of the longest common prefix of two tuples. > + * @param tuple_a first tuple > + * @param tuple_b second tuple > + * @param key_def key defintion > + * @return number of parts the two tuples have in common These are not parts of the tuple, these are parts of the key. described in the key def. > + */ > +uint32_t > +tuple_common_key_parts(const struct tuple *tuple_a, > + const struct tuple *tuple_b, > + const struct key_def *key_def); > + > +/** > * Create a comparison function for the key_def > * > * @param key_def key_definition > diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc > index 0f7ba91..0fa8ea5 100644 > --- a/src/box/tuple_hash.cc > +++ b/src/box/tuple_hash.cc > @@ -262,7 +262,7 @@ slowpath: > key_def->key_hash = key_hash_slowpath; > } > > -static uint32_t > +uint32_t > tuple_hash_field(uint32_t *ph1, uint32_t *pcarry, const char **field, > struct coll *coll) > { > @@ -308,6 +308,17 @@ tuple_hash_null(uint32_t *ph1, uint32_t *pcarry) > return mp_sizeof_nil(); > } > > +uint32_t > +tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, > + const struct tuple *tuple, > + const struct key_part *part) > +{ > + const char *field = tuple_field(tuple, part->fieldno); > + if (field == NULL) > + return tuple_hash_null(ph1, pcarry); > + return tuple_hash_field(ph1, pcarry, &field, part->coll); > +} -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov