[tarantool-patches] Re: [PATCH v4 04/12] [RAW] swim: introduce SWIM's anti-entropy component

Tue Feb 26 21:28:26 MSK 2019

Thanks for the review.

>> +static inline struct swim *
>> +swim_by_scheduler(struct swim_scheduler *scheduler)
> 
> By strange convention we call such functions simply after the name
> of the class, this was here even before I joined :)
> 
> In other words:
> 
> static inline struct swim *
> swim(struct swim_scheduler *scheduler);

I failed to find any examples. On the contrary, I found
that usually container_of is used without a wrapper in
similar cases:

- memtx_engine.cc:1183
- memtx_hash.c:167, 187
- memtx_tree.c:386, 407
- relay.cc:440, 688, 701
- vinyl.c:2433, 2440, 2459, 2945
- ... many other vinyl files ...
- cbus.c:443, 490, 505, 557, 575, 588
- say.c:314

So I do not think that such policy works for that case.

> 
>> + */
>> +static void
>> +swim_member_delete(struct swim *swim, struct swim_member *member)
> 
> It should be subject-verb-object, so swim_delete_member().

It is. 'swim_member' is subject, 'delete' is verb, 'object' is
omitted. Now I refactored all the code, that swim_member has
only two methods:

     swim_member_new, swim_member_delete

All other methods are owned by swim:

     swim_new_member, swim_delete_member, swim_wait_ack,
     swim_update_member_status, swim_update_member_addr,
     swim_update_member, swim_on_member_status_update,
     swim_on_member_payload_update etc ...

It really looks better now.

> 
>> +static struct swim_member *
>> +swim_member_new(struct swim *swim, const struct sockaddr_in *addr,
>> +		const struct tt_uuid *uuid, enum swim_member_status status)
> 
> swim_create_member() or swim_add_member().
> 

swim_add_member is a public API, which you approved by the way.
I will use swim_new_member. 'Create' does not fit here, because
this function does not take a member to initialize as a parameter.

> 
>> +{
>> +	struct swim_member *member =
>> +		(struct swim_member *) calloc(1, sizeof(*member));
>> +	if (member == NULL) {
>> +		diag_set(OutOfMemory, sizeof(*member), "calloc", "member");
>> +		return NULL;
>> +	}
>> +	member->status = status;
>> +	member->addr = *addr;
>> +	member->uuid = *uuid;
>> +	member->hash = swim_uuid_hash(uuid);
>> +	mh_int_t rc = mh_swim_table_put(swim->members,
>> +					(const struct swim_member **) &member,
>> +					NULL, NULL);
> 
> This part is swim_member_new(), if you want to have
> swim_member_new().> 
>> +	if (rc == mh_end(swim->members)) {
>> +		free(member);
> 
> free(member) is swim_member_delete().

Ok, done. I do not paste diff here because it is too big.

>> +static struct swim_member **
>> +swim_shuffle_members(struct swim *swim)
>> +{
>> +	struct mh_swim_table_t *members = swim->members;
>> +	struct swim_member **shuffled;
>> +	int bsize = sizeof(shuffled[0]) * mh_size(members);
>> +	shuffled = (struct swim_member **) malloc(bsize);
>> +	if (shuffled == NULL) {
>> +		diag_set(OutOfMemory, bsize, "malloc", "shuffled");
>> +		return NULL;
>> +	}
> 
> Shan't we ensure shuffle never fails? Why do you make a copy of
> the shuffled members each time, isn't it easier to allocate a
> shuffled array once?

If you are talking about why I do not cache the memory, then
because 1) a fail does not break swim, 2) it complicates code.

If you mean why do I shuffle the array after each round again -
it is because what randomness is supposed to do here by the
protocol definition. If I shuffle that only once, then some
members can stand in the same positions on different cluster
members and have not even network load during their whole
lifetime. This is why the original SWIM describes eternal
random selection instead of a single shuffle per lifetime.

> 
>> +/**
>> + * Encode anti-entropy header and random members data as many as
>> + * possible to the end of the packet.
>> + * @retval 0 Not error, but nothing is encoded.
>> + * @retval 1 Something is encoded.
>> + */
>> +static int
>> +swim_encode_anti_entropy(struct swim *swim, struct swim_packet *packet)
> 
> Please try to avoid flipping the meaning of the return value. 0 on
> success -1 on error.

It is not an error. And that function can not fail.

> 
> If you return the number of encoded messages,

I do not return the number of encoded messages. What I
return stated in the comment.

> and the function
> never fails, please use unsigned, not int.

We never use unsigned without strict necessity. We even have
this issue: https://github.com/tarantool/tarantool/issues/2161.
So I leave signed.

> 
>> + * Encode source UUID.
>> + * @retval 0 Not error, but nothing is encoded.
>> + * @retval 1 Something is encoded.
>> + */
>> +static inline int
>> +swim_encode_src_uuid(struct swim *swim, struct swim_packet *packet)
>> +{
>> +	struct swim_src_uuid_bin uuid_bin;
>> +	char *pos = swim_packet_alloc(packet, sizeof(uuid_bin));
>> +	if (pos == NULL)
>> +		return 0;
> 
> Looks like you ignore the error because swim algorithm does not
> depend on error. I believe the decision to abort the current round on
> error should be made in the algorithm body, not inside nested
> functions. The functions should faithfully report problems up.

It is not an error. Read the code more attentive. This code
fills the packet with data until it is full. If something
did not fit - it is not an error. The packet is sent anyway.

In all other places, where a real error occurs, it lifts up
to a user, or to a libev callback, where it is reported.

> 
>> +void
>> +swim_info(struct swim *swim, struct info_handler *info)
>> +{
>> +	info_begin(info);
>> +	for (mh_int_t node = mh_first(swim->members),
>> +	     end = mh_end(swim->members); node != end;
>> +	     node = mh_next(swim->members, node)) {
>> +		struct swim_member *m =
>> +			*mh_swim_table_node(swim->members, node);
>> +		info_table_begin(info,
>> +				 sio_strfaddr((struct sockaddr *) &m->addr,
>> +					      sizeof(m->addr)));
>> +		info_append_str(info, "status",
>> +				swim_member_status_strs[m->status]);
>> +		info_append_str(info, "uuid", swim_uuid_str(&m->uuid));
>> +		info_table_end(info);
>> +	}
>> +	info_end(info);
>> +}
> 
> Ouch, swim_info() ties swim together with info.*.

'Info' is moved into src/lib in a separate patchset.

> 
> All this together calls for leaving the core swim features in
> src/lib/swim and having a wrapper in src/ which adds diag* and
> info*
> 
>> +static void
>> +swim_scheduler_on_output(struct ev_loop *loop, struct ev_io *io, int events);
> 
> I once again encourage you to consider merging scheduler and IO
> components.  Let's look at your test harness to see how this is
> feasible.

As discussed verbally, it still makes no sense to merge them.

> 
> 
> -- 
> Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
> http://tarantool.io - www.twitter.com/kostja_osipov
> 

Despite the fixes, the commit is not ready. I still work on
tests.