[tarantool-patches] Re: [PATCH v3 1/6] [RAW] swim: introduce SWIM's anti-entropy component

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Tue Jan 15 17:42:13 MSK 2019



On 09/01/2019 14:45, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [18/12/29 15:07]:
>> +	swim_member_bin_create(&member_bin);
> 
>> +	for (; i < (int) mh_size(swim->members); ++i) {
>> +		char *pos = swim_packet_alloc(packet, sizeof(member_bin));
>> +		if (pos == NULL)
>> +			break;
>> +		struct swim_member *member = swim->shuffled_members[i];
>> +		swim_member_bin_reset(&member_bin, member);
> 
> Why do you need to create() the member if you then reset it?
> Perhaps encode() or fill() is a more suitable verb than reset?

Create initializes constant fields. Reset fills others. But as you
wish. Done here and with similar functions in the next commits.

@@ -269,8 +269,7 @@ struct PACKED swim_member_bin {
  };
  
  static inline void
-swim_member_bin_reset(struct swim_member_bin *header,
-		      struct swim_member *member)
+swim_member_bin_fill(struct swim_member_bin *header, struct swim_member *member)
  {
  	header->v_status = member->status;
  	header->v_addr = mp_bswap_u32(member->addr.sin_addr.s_addr);
@@ -439,7 +438,7 @@ swim_encode_anti_entropy(struct swim *swim, struct swim_msg *msg)
  		if (pos == NULL)
  			break;
  		struct swim_member *member = swim->shuffled_members[i];
-		swim_member_bin_reset(&member_bin, member);
+		swim_member_bin_fill(&member_bin, member);
  		memcpy(pos, &member_bin, sizeof(member_bin));
  	}
  	if (i == 0)

> 
>> +		memcpy(pos, &member_bin, sizeof(member_bin));
>> +	swim_anti_entropy_header_bin_create(&ae_header_bin, i);
>> +	memcpy(header, &ae_header_bin, sizeof(ae_header_bin));
>> +	swim_packet_flush(packet);
> 
> Why flush() and not simply send()?

Because it has nothing to do with network. swim_packet is an
allocator, where backend buffer is a char[UDP_PACKET_SIZE].
Flush() moves current position in that buffer. Please, read
carefully swim_packet and swim_msg API.

As a leading light I've used mpstream API:

     mpstream_flush(), reserve(), advance().

> 
>> +swim_encode_round_msg(struct swim *swim, struct swim_msg *msg)
> 
> Why not simply swim_encode_round()?

Because it takes a message as an argument and encodes it.

> 
>> +/** Once per specified timeout trigger a next broadcast step. */
>> +static void
>> +swim_round_step_begin(struct ev_loop *loop, struct ev_periodic *p, int events)
> 
> Once again I have a difficulty understanding the name. Is it swim
> step begin or swim round begin? What is swim round step? Sounds
> like each round has many steps and each step has a beginning and an end?

It is begin of a step. What is round step is described at the beginning
of the file. Cite:

"
  * Tarantool splits protocol operation into rounds. At the
  * beginning of a round all members are randomly reordered and
  * linked into a list. At each round step a member is popped from
  * the list head, a message is sent to him, and he waits for the
  * next round.
"

> 
> Then I'm missing swim_round_step_end(), swim_round_step_first(),
> or something like that.

There are no first nor second nor end. swim_round_step_begin schedules
sending of a round message to the next member. It does not send
immediately, so technically it is not round_step(), but round_step_begin().

Instead of round_step_end() I have round_step_complete(), just like
for vy_task.

> 
> Looking at the code, swim_round_step_begin() is simply
> swim_round().

It is not round. What is round is described at the beginning of
the file, see cite above.

> 
> 
>> +static void
>> +swim_process_member_update(struct swim *swim, struct swim_member_def *def)
>> +{
>> +	struct swim_member *member = swim_find_member(swim, &def->addr);
>> +	/*
>> +	 * Trivial processing of a new member - just add it to the
>> +	 * members table.
>> +	 */
>> +	if (member == NULL) {
>> +		member = swim_member_new(swim, &def->addr, def->status);
>> +		if (member == NULL)
>> +			diag_log();
>> +	}
>> +}
> 
> Why nothing is done for an existing member?  This needs a comment, no?

In this commit a member can only be added. Not updated nor deleted. These
actions are introduced in the next commits.

> 
>> +
>> +struct swim_transport swim_udp_transport = {
>> +	/* .send_round_msg = */ swim_udp_send_msg,
>> +	/* .recv_msg = */ swim_udp_recv_msg,
>> +};
> 
> Initializing/destroying an endpoint (like calling bind()) should also be
> part of transport api.> 
>> +int
>> +swim_scheduler_bind(struct swim_scheduler *scheduler, struct sockaddr_in *addr)
> 
> And not part of the scheduler api.

As you wish. Done on the branch. I do not pase the diff here
because it is too big.

> 
>> +	    evio_setsockopt_server(fd, AF_INET, SOCK_DGRAM) != 0) {
> 
> The file descriptor itself should also be part of the transport.

In such a case the transport will be not a vtab, but an object, with
its own attributes. As I remember, you asked to make the transport just
a simple vtab. But as you wish. Done on the branch.

> 
>> +static void
>> +swim_scheduler_on_input(struct ev_loop *loop, struct ev_io *io, int events)
>> +{
>> +	assert((events & EV_READ) != 0);
>> +	(void) events;
>> +	(void) loop;
>> +	struct swim_scheduler *scheduler = (struct swim_scheduler *) io->data;
>> +	struct sockaddr_in addr;
>> +	socklen_t len = sizeof(addr);
>> +	struct swim_packet packet;
>> +	struct swim_msg msg;
>> +	swim_msg_create(&msg);
>> +	swim_packet_create(&packet, &msg);
>> +	swim_transport_recv_f recv = scheduler->transport->recv_msg;
>> +	ssize_t size = recv(io->fd, packet.body, packet.end - packet.body,
>> +			    (struct sockaddr *) &addr, &len);
> 
> I don't understand why you do it here, if it's part of the
> transport api.

It is. recv here is a variable taken from scheduler->transport->recv_msg.
But never mind, this code is reworked already due to other comments.




More information about the Tarantool-patches mailing list