Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: kostja@tarantool.org
Cc: tarantool-patches@freelists.org
Subject: [PATCH v2 0/5] vinyl: multi-part bloom filter
Date: Wed, 28 Mar 2018 22:04:56 +0300	[thread overview]
Message-ID: <cover.1522262496.git.vdavydov.dev@gmail.com> (raw)

It turns out that having a bloom filter for full key lookups is not
always enough - there are workloads that issue a lot of EQ requests
for partial keys as well. Such workloads would benefit from having a
bloom filter per each partial key. So to improve their performance,
this patch set replaces the current implementation of bloom filter in
vinyl with a multi-part one. Basically, we now maintain one bloom
filter per each partial key and check them all when processing EQ
requests.

Patch 1 of the series replaces mmap() with malloc() in the basic bloom
filter implementation. This is needed to check that the bloom filter
size optimization introduced by patch 4 works, because currently bloom
filter size is rounded up to the system page size (4096 bytes), which
means we have to insert a lot of records to see the difference, which in
turn increases the test time. Nevertheless, it is a good change as is,
because there's no point in preferring mmap() over malloc() there.
Patch 2 renames bloom_possible_has to bloom_maybe_has, as it was
suggested by @kostja.

Patch 3 is the main patch of the series. It replaces the basic bloom
filter with a multi-part one in vinyl. It doesn't try to optimize the
bloom filter size though. The bloom filter size optimization is done in
patch 4. Patch 5 removes the bloom spectrum object, the whole point of
which turned out to be dubious and which is not used anywhere anymore.
More details in comments to individual patches.

https://github.com/tarantool/tarantool/issues/3177
https://github.com/tarantool/tarantool/tree/gh-3177-vy-multipart-bloom-filter

Changes in v2:
 - Don't ignore legacy bloom filters. Restore and use them
   for full key lookups.
 - Replace malloc+memset with calloc in bloom_create().
 - Use tuple_bloom_delete() on error paths.
 - Rename bloom_possible_has to bloom_maybe_has.
 - Minor comment fixes suggested by @kostja.

Vladimir Davydov (5):
  bloom: use malloc for bitmap allocations
  bloom: rename bloom_possible_has to bloom_maybe_has
  vinyl: introduce bloom filters for partial key lookups
  bloom: optimize tuple bloom filter size
  bloom: drop spectrum

 src/box/CMakeLists.txt      |   1 +
 src/box/iproto_constants.c  |   3 +-
 src/box/iproto_constants.h  |   4 +-
 src/box/tuple_bloom.c       | 342 ++++++++++++++++++++++++++++++++++++++++++++
 src/box/tuple_bloom.h       | 210 +++++++++++++++++++++++++++
 src/box/tuple_compare.cc    |  24 ++++
 src/box/tuple_compare.h     |  12 ++
 src/box/tuple_hash.cc       |  13 +-
 src/box/tuple_hash.h        |  30 ++++
 src/box/vy_run.c            | 246 ++++++++++++-------------------
 src/box/vy_run.h            |  23 ++-
 src/box/vy_scheduler.c      |  18 +--
 src/box/xlog.c              |  27 ----
 src/box/xlog.h              |   9 --
 src/lib/salad/bloom.c       | 133 +++++------------
 src/lib/salad/bloom.h       |  76 ++--------
 test/unit/bloom.cc          |  74 +---------
 test/unit/bloom.result      |   9 --
 test/unit/vy_point_lookup.c |   2 +-
 test/vinyl/bloom.result     | 175 ++++++++++++++++++++++-
 test/vinyl/bloom.test.lua   |  85 ++++++++++-
 test/vinyl/info.result      |  16 +--
 22 files changed, 1053 insertions(+), 479 deletions(-)
 create mode 100644 src/box/tuple_bloom.c
 create mode 100644 src/box/tuple_bloom.h

-- 
2.11.0

             reply	other threads:[~2018-03-28 19:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-28 19:04 Vladimir Davydov [this message]
2018-03-28 19:04 ` [PATCH v2 1/5] bloom: use malloc for bitmap allocations Vladimir Davydov
2018-03-28 19:04 ` [PATCH v2 2/5] bloom: rename bloom_possible_has to bloom_maybe_has Vladimir Davydov
2018-03-28 19:04 ` [PATCH v2 3/5] vinyl: introduce bloom filters for partial key lookups Vladimir Davydov
2018-03-28 19:05 ` [PATCH v2 4/5] bloom: optimize tuple bloom filter size Vladimir Davydov
2018-03-28 19:05 ` [PATCH v2 5/5] bloom: drop spectrum Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1522262496.git.vdavydov.dev@gmail.com \
    --to=vdavydov.dev@gmail.com \
    --cc=kostja@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [PATCH v2 0/5] vinyl: multi-part bloom filter' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox