From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 13 Mar 2019 15:25:05 +0300 From: Vladimir Davydov Subject: Re: [PATCH v2 07/14] vinyl: add helpers to add/check statement with bloom Message-ID: <20190313122505.f3atkaws5hneulzr@esperanza> References: <2e56f4fe2f0a7f76556b122e3c4a357511144bb1.1552464666.git.vdavydov.dev@gmail.com> <20190313115934.GG25066@chai> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190313115934.GG25066@chai> To: Konstantin Osipov Cc: tarantool-patches@freelists.org List-ID: On Wed, Mar 13, 2019 at 02:59:34PM +0300, Konstantin Osipov wrote: > * Vladimir Davydov [19/03/13 11:58]: > > > A Vinyl statement may be either a key or a tuple. We must use different > > functions for the two kinds when working with a bloom filter. Let's > > introduce helpers incorporating that logic. > > While we are at it, tuple_bloom_builder is a cumbersome name. > Why not simply bloom_builder So as not to consue with generic bloom defined in salad. > or vy_bloom_builder? Just in case we want to reuse it. After all, tuple_bloom isn't really specific to vinyl. > > Especially the name is confusing since now we have > tuple_bloom_builder_add(_tuple) and tuple_bloom_builder_add_key. > > Besides, tuple_bloom_builder pretends to be generic, not specific > to vinyl engine. I don't think anyone cares (I don't). Well, we can move it to vy_bloom, I guess, but I don't think it's really necessary now. > > > +int > > +tuple_bloom_builder_add_key(struct tuple_bloom_builder *builder, > > + const char *key, uint32_t part_count, > > + struct key_def *key_def) > > +{ > > + (void)part_count; > > + assert(part_count >= key_def->part_count); > > + assert(builder->part_count == key_def->part_count); > > + > > + uint32_t h = HASH_SEED; > > + uint32_t carry = 0; > > + uint32_t total_size = 0; > > + > > Once again, since we are at it I would appreciate an explanation > about our strategy for building bloom filters for partial keys. > No other LSM I'm aware of is doing it this way, so it would be > nice to see a write down of how it works. Okay, I'll add a comment to tuple_bloom explaining what's going on in the builder. > > > > + for (uint32_t i = 0; i < key_def->part_count; i++) { > > + total_size += tuple_hash_field(&h, &carry, &key, > > + key_def->parts[i].coll); > > + uint32_t hash = PMurHash32_Result(h, carry, total_size); > > + if (tuple_hash_array_add(&builder->parts[i], hash) != 0) > > + return -1; > > + } > > + return 0; > > +}