[PATCH 0/4] vinyl: multi-part bloom filter

Vladimir Davydov vdavydov.dev at gmail.com
Sat Feb 24 11:32:36 MSK 2018


It turns out that having a bloom filter for full key lookups is not
always enough - there are workloads that issue a lot of EQ requests for
partial keys as well. Such workloads would benefit from having a bloom
filter per each sub key. So to improve their performance, this patch set
replaces the current implementation of bloom filter in vinyl with a
multi-part one. Basically, we now maintain one bloom filter per each sub
key and check them all when processing EQ requests.

Patch 1 of the series replaces mmap() with malloc() in the basic bloom
filter implementation. This is needed to check that the bloom filter
size optimization introduced by patch 3 works, because currently bloom
filter size is rounded up to the system page size (4096 bytes), which
means we have to insert a lot of records to see the difference, which in
turn increases the test time. Nevertheless, it is a good change as is,
because there's no point in preferring mmap() over malloc() there.

Patch 2 is the main patch of the series. It replaces the basic bloom
filter with a multi-part one in vinyl. It doesn't try to optimize the
bloom filter size though. The bloom filter size optimization is done in
patch 3. Patch 4 removes the bloom spectrum object, the whole point of
which turned out to be dubious and which is not used anywhere anymore.
More details in comments to individual patches.

https://github.com/tarantool/tarantool/issues/3177
https://github.com/tarantool/tarantool/tree/gh-3177-vy-multipart-bloom-filter

Vladimir Davydov (4):
  bloom: use malloc for bitmap allocations
  vinyl: introduce bloom filters for partial key lookups
  bloom: optimize tuple bloom filter size
  bloom: drop spectrum

 src/box/CMakeLists.txt      |   1 +
 src/box/iproto_constants.c  |   3 +-
 src/box/iproto_constants.h  |   4 +-
 src/box/tuple_bloom.c       | 286 ++++++++++++++++++++++++++++++++++++++++++++
 src/box/tuple_bloom.h       | 191 +++++++++++++++++++++++++++++
 src/box/tuple_compare.cc    |  24 ++++
 src/box/tuple_compare.h     |  12 ++
 src/box/tuple_hash.cc       |  13 +-
 src/box/tuple_hash.h        |  30 +++++
 src/box/vy_run.c            | 266 ++++++++++++++++------------------------
 src/box/vy_run.h            |  19 ++-
 src/box/vy_scheduler.c      |  25 +---
 src/box/xlog.c              |  27 -----
 src/box/xlog.h              |   9 --
 src/lib/salad/bloom.c       | 136 +++++++--------------
 src/lib/salad/bloom.h       |  72 ++---------
 test/unit/bloom.cc          |  70 +----------
 test/unit/bloom.result      |   9 --
 test/unit/vy_point_lookup.c |  10 +-
 test/vinyl/bloom.result     | 162 ++++++++++++++++++++++++-
 test/vinyl/bloom.test.lua   |  77 +++++++++++-
 test/vinyl/info.result      |  14 +--
 22 files changed, 968 insertions(+), 492 deletions(-)
 create mode 100644 src/box/tuple_bloom.c
 create mode 100644 src/box/tuple_bloom.h

-- 
2.11.0




More information about the Tarantool-patches mailing list