[PATCH v2 0/5] vinyl: multi-part bloom filter
Vladimir Davydov
vdavydov.dev at gmail.com
Wed Mar 28 22:04:56 MSK 2018
It turns out that having a bloom filter for full key lookups is not
always enough - there are workloads that issue a lot of EQ requests
for partial keys as well. Such workloads would benefit from having a
bloom filter per each partial key. So to improve their performance,
this patch set replaces the current implementation of bloom filter in
vinyl with a multi-part one. Basically, we now maintain one bloom
filter per each partial key and check them all when processing EQ
requests.
Patch 1 of the series replaces mmap() with malloc() in the basic bloom
filter implementation. This is needed to check that the bloom filter
size optimization introduced by patch 4 works, because currently bloom
filter size is rounded up to the system page size (4096 bytes), which
means we have to insert a lot of records to see the difference, which in
turn increases the test time. Nevertheless, it is a good change as is,
because there's no point in preferring mmap() over malloc() there.
Patch 2 renames bloom_possible_has to bloom_maybe_has, as it was
suggested by @kostja.
Patch 3 is the main patch of the series. It replaces the basic bloom
filter with a multi-part one in vinyl. It doesn't try to optimize the
bloom filter size though. The bloom filter size optimization is done in
patch 4. Patch 5 removes the bloom spectrum object, the whole point of
which turned out to be dubious and which is not used anywhere anymore.
More details in comments to individual patches.
https://github.com/tarantool/tarantool/issues/3177
https://github.com/tarantool/tarantool/tree/gh-3177-vy-multipart-bloom-filter
Changes in v2:
- Don't ignore legacy bloom filters. Restore and use them
for full key lookups.
- Replace malloc+memset with calloc in bloom_create().
- Use tuple_bloom_delete() on error paths.
- Rename bloom_possible_has to bloom_maybe_has.
- Minor comment fixes suggested by @kostja.
Vladimir Davydov (5):
bloom: use malloc for bitmap allocations
bloom: rename bloom_possible_has to bloom_maybe_has
vinyl: introduce bloom filters for partial key lookups
bloom: optimize tuple bloom filter size
bloom: drop spectrum
src/box/CMakeLists.txt | 1 +
src/box/iproto_constants.c | 3 +-
src/box/iproto_constants.h | 4 +-
src/box/tuple_bloom.c | 342 ++++++++++++++++++++++++++++++++++++++++++++
src/box/tuple_bloom.h | 210 +++++++++++++++++++++++++++
src/box/tuple_compare.cc | 24 ++++
src/box/tuple_compare.h | 12 ++
src/box/tuple_hash.cc | 13 +-
src/box/tuple_hash.h | 30 ++++
src/box/vy_run.c | 246 ++++++++++++-------------------
src/box/vy_run.h | 23 ++-
src/box/vy_scheduler.c | 18 +--
src/box/xlog.c | 27 ----
src/box/xlog.h | 9 --
src/lib/salad/bloom.c | 133 +++++------------
src/lib/salad/bloom.h | 76 ++--------
test/unit/bloom.cc | 74 +---------
test/unit/bloom.result | 9 --
test/unit/vy_point_lookup.c | 2 +-
test/vinyl/bloom.result | 175 ++++++++++++++++++++++-
test/vinyl/bloom.test.lua | 85 ++++++++++-
test/vinyl/info.result | 16 +--
22 files changed, 1053 insertions(+), 479 deletions(-)
create mode 100644 src/box/tuple_bloom.c
create mode 100644 src/box/tuple_bloom.h
--
2.11.0
More information about the Tarantool-patches
mailing list