From: Vladimir Davydov <vdavydov.dev@gmail.com> To: kostja@tarantool.org Cc: tarantool-patches@freelists.org Subject: [PATCH v2 0/5] vinyl: multi-part bloom filter Date: Wed, 28 Mar 2018 22:04:56 +0300 [thread overview] Message-ID: <cover.1522262496.git.vdavydov.dev@gmail.com> (raw) It turns out that having a bloom filter for full key lookups is not always enough - there are workloads that issue a lot of EQ requests for partial keys as well. Such workloads would benefit from having a bloom filter per each partial key. So to improve their performance, this patch set replaces the current implementation of bloom filter in vinyl with a multi-part one. Basically, we now maintain one bloom filter per each partial key and check them all when processing EQ requests. Patch 1 of the series replaces mmap() with malloc() in the basic bloom filter implementation. This is needed to check that the bloom filter size optimization introduced by patch 4 works, because currently bloom filter size is rounded up to the system page size (4096 bytes), which means we have to insert a lot of records to see the difference, which in turn increases the test time. Nevertheless, it is a good change as is, because there's no point in preferring mmap() over malloc() there. Patch 2 renames bloom_possible_has to bloom_maybe_has, as it was suggested by @kostja. Patch 3 is the main patch of the series. It replaces the basic bloom filter with a multi-part one in vinyl. It doesn't try to optimize the bloom filter size though. The bloom filter size optimization is done in patch 4. Patch 5 removes the bloom spectrum object, the whole point of which turned out to be dubious and which is not used anywhere anymore. More details in comments to individual patches. https://github.com/tarantool/tarantool/issues/3177 https://github.com/tarantool/tarantool/tree/gh-3177-vy-multipart-bloom-filter Changes in v2: - Don't ignore legacy bloom filters. Restore and use them for full key lookups. - Replace malloc+memset with calloc in bloom_create(). - Use tuple_bloom_delete() on error paths. - Rename bloom_possible_has to bloom_maybe_has. - Minor comment fixes suggested by @kostja. Vladimir Davydov (5): bloom: use malloc for bitmap allocations bloom: rename bloom_possible_has to bloom_maybe_has vinyl: introduce bloom filters for partial key lookups bloom: optimize tuple bloom filter size bloom: drop spectrum src/box/CMakeLists.txt | 1 + src/box/iproto_constants.c | 3 +- src/box/iproto_constants.h | 4 +- src/box/tuple_bloom.c | 342 ++++++++++++++++++++++++++++++++++++++++++++ src/box/tuple_bloom.h | 210 +++++++++++++++++++++++++++ src/box/tuple_compare.cc | 24 ++++ src/box/tuple_compare.h | 12 ++ src/box/tuple_hash.cc | 13 +- src/box/tuple_hash.h | 30 ++++ src/box/vy_run.c | 246 ++++++++++++------------------- src/box/vy_run.h | 23 ++- src/box/vy_scheduler.c | 18 +-- src/box/xlog.c | 27 ---- src/box/xlog.h | 9 -- src/lib/salad/bloom.c | 133 +++++------------ src/lib/salad/bloom.h | 76 ++-------- test/unit/bloom.cc | 74 +--------- test/unit/bloom.result | 9 -- test/unit/vy_point_lookup.c | 2 +- test/vinyl/bloom.result | 175 ++++++++++++++++++++++- test/vinyl/bloom.test.lua | 85 ++++++++++- test/vinyl/info.result | 16 +-- 22 files changed, 1053 insertions(+), 479 deletions(-) create mode 100644 src/box/tuple_bloom.c create mode 100644 src/box/tuple_bloom.h -- 2.11.0
next reply other threads:[~2018-03-28 19:04 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-03-28 19:04 Vladimir Davydov [this message] 2018-03-28 19:04 ` [PATCH v2 1/5] bloom: use malloc for bitmap allocations Vladimir Davydov 2018-03-28 19:04 ` [PATCH v2 2/5] bloom: rename bloom_possible_has to bloom_maybe_has Vladimir Davydov 2018-03-28 19:04 ` [PATCH v2 3/5] vinyl: introduce bloom filters for partial key lookups Vladimir Davydov 2018-03-28 19:05 ` [PATCH v2 4/5] bloom: optimize tuple bloom filter size Vladimir Davydov 2018-03-28 19:05 ` [PATCH v2 5/5] bloom: drop spectrum Vladimir Davydov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=cover.1522262496.git.vdavydov.dev@gmail.com \ --to=vdavydov.dev@gmail.com \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='Re: [PATCH v2 0/5] vinyl: multi-part bloom filter' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox