Tarantool development patches archive
 help / color / mirror / Atom feed
* [tarantool-patches] [PATCH 0/5] swim member 'left' status
@ 2019-04-09 18:12 Vladislav Shpilevoy
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 1/5] test: allow to remove swim nodes from the cluster Vladislav Shpilevoy
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-09 18:12 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

The patchset introduces and uses a new member status 'left', MEMBER_LEFT.
Originally the status appears in scope of swim_quit() API - a function
gracefully removing the instance from the cluster. Quit is used when a SWIM
instance does not want to be a part of the cluster anymore, and it is
undesirable to just shut it down and drop from other members via failure
detection - it would be much longer, at least.

But appeared that 'left' status also can be applied to smooth UUID update.
Before the patchset UUID update led to appearance of a 'ghost' of the old UUID
for a relatively long time. Now on UUID update the old UUID is marked as left,
and eventually dropped bypassing failure detection. Of course it is not
guaranteed, but it works in most of cases.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-3234-swim-member-left-status
Issue: https://github.com/tarantool/tarantool/issues/3234

Vladislav Shpilevoy (5):
  test: allow to remove swim nodes from the cluster
  test: on close of swim fake fd send its packets, not drop
  test: process IO swim test events before protocol's ones
  swim: introduce quit message
  swim: make UUID update smoother and faster

 src/lib/swim/swim.c             | 207 ++++++++++++++++++++------------
 src/lib/swim/swim.h             |  10 ++
 src/lib/swim/swim_constants.h   |   2 +
 src/lib/swim/swim_io.c          |   8 +-
 src/lib/swim/swim_io.h          |   4 +
 src/lib/swim/swim_proto.c       |  11 ++
 src/lib/swim/swim_proto.h       |  30 +++++
 test/unit/swim.c                | 100 +++++++++++++--
 test/unit/swim.result           |  27 ++++-
 test/unit/swim_test_ev.c        |  27 +----
 test/unit/swim_test_ev.h        |   2 +-
 test/unit/swim_test_transport.c |  47 ++++++--
 test/unit/swim_test_transport.h |   9 +-
 test/unit/swim_test_utils.c     | 152 +++++++++++++++++------
 test/unit/swim_test_utils.h     |  17 +++
 15 files changed, 486 insertions(+), 167 deletions(-)

-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH 1/5] test: allow to remove swim nodes from the cluster
  2019-04-09 18:12 [tarantool-patches] [PATCH 0/5] swim member 'left' status Vladislav Shpilevoy
@ 2019-04-09 18:12 ` Vladislav Shpilevoy
  2019-04-10  7:15   ` [tarantool-patches] " Konstantin Osipov
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 2/5] test: on close of swim fake fd send its packets, not drop Vladislav Shpilevoy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-09 18:12 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Until now it was impossible in swim tests to drop a SWIM instance
from the cluster. It should have been either restarted, or
blocked, but a real drop led to an assertion on any attemp to use
one of methods like swim_wait_timeout(). It was due to inability
to get instance's UUID without the instance itself. Even if it
was stored in membership tables of other instances.

This patch makes swim_cluster store swim instances and UUIDs
separately. This is going to be used to test swim_quit() API.
Also, some cfg parameters are saved as well, like ack timeout, gc
mode. They are used to restart a node with exactly same cfg as
it was before restart. Even if original struct swim * is not
valid already.

Part of #3234
---
 test/unit/swim.c            |   4 +-
 test/unit/swim_test_utils.c | 125 ++++++++++++++++++++++++++----------
 test/unit/swim_test_utils.h |   5 ++
 3 files changed, 99 insertions(+), 35 deletions(-)

diff --git a/test/unit/swim.c b/test/unit/swim.c
index e11f006ae..84ca01ac3 100644
--- a/test/unit/swim.c
+++ b/test/unit/swim.c
@@ -106,12 +106,12 @@ swim_test_uuid_update(void)
 	struct swim *s = swim_cluster_node(cluster, 0);
 	struct tt_uuid new_uuid = uuid_nil;
 	new_uuid.time_low = 1000;
-	is(swim_cfg(s, NULL, -1, -1, -1, &new_uuid), 0, "UUID update");
+	is(swim_cluster_update_uuid(cluster, 0, &new_uuid), 0, "UUID update");
 	is(swim_cluster_wait_fullmesh(cluster, 1), 0,
 	   "old UUID is returned back as a 'ghost' member");
 	is(swim_size(s), 3, "two members in each + ghost third member");
 	new_uuid.time_low = 2;
-	is(swim_cfg(s, NULL, -1, -1, -1, &new_uuid), -1,
+	is(swim_cluster_update_uuid(cluster, 0, &new_uuid), -1,
 	   "can not update to an existing UUID - swim_cfg fails");
 	ok(swim_error_check_match("exists"), "diag says 'exists'");
 	swim_cluster_delete(cluster);
diff --git a/test/unit/swim_test_utils.c b/test/unit/swim_test_utils.c
index bada8ef43..a42e50cae 100644
--- a/test/unit/swim_test_utils.c
+++ b/test/unit/swim_test_utils.c
@@ -36,33 +36,67 @@
 #include "trivia/util.h"
 #include "fiber.h"
 
+/**
+ * SWIM cluster node and its UUID. UUID is stored separately
+ * because sometimes a test wants to drop a SWIM instance, but
+ * still check how does it look in other membership instances.
+ * UUID is necessary since it is a key to lookup a view of that
+ * instance in the member tables.
+ */
+struct swim_node {
+	/** SWIM instance. Can be NULL. */
+	struct swim *swim;
+	/**
+	 * UUID. Is used when @a swim is NULL to lookup view of
+	 * that instance.
+	 */
+	struct tt_uuid uuid;
+};
+
 /**
  * Cluster is a simple array of SWIM instances assigned to
  * different URIs.
  */
 struct swim_cluster {
 	int size;
-	struct swim **node;
+	struct swim_node *node;
+	/**
+	 * Saved values to restart SWIM nodes with the most actual
+	 * configuration.
+	 */
+	double ack_timeout;
+	enum swim_gc_mode gc_mode;
 };
 
+/** Build URI of a SWIM instance for a given @a id. */
+static inline void
+swim_cluster_id_to_uri(char *buffer, int id)
+{
+	sprintf(buffer, "127.0.0.1:%d", id + 1);
+}
+
 struct swim_cluster *
 swim_cluster_new(int size)
 {
 	struct swim_cluster *res = (struct swim_cluster *) malloc(sizeof(*res));
 	assert(res != NULL);
 	int bsize = sizeof(res->node[0]) * size;
-	res->node = (struct swim **) malloc(bsize);
+	res->node = (struct swim_node *) malloc(bsize);
 	assert(res->node != NULL);
 	res->size = size;
+	res->ack_timeout = -1;
+	res->gc_mode = SWIM_GC_DEFAULT;
 	struct tt_uuid uuid;
 	memset(&uuid, 0, sizeof(uuid));
 	char *uri = tt_static_buf();
-	for (int i = 0; i < size; ++i) {
-		res->node[i] = swim_new();
-		assert(res->node[i] != NULL);
-		sprintf(uri, "127.0.0.1:%d", i + 1);
+	struct swim_node *n = res->node;
+	for (int i = 0; i < size; ++i, ++n) {
+		n->swim = swim_new();
+		assert(n->swim != NULL);
+		swim_cluster_id_to_uri(uri, i);
 		uuid.time_low = i + 1;
-		int rc = swim_cfg(res->node[i], uri, -1, -1, -1, &uuid);
+		n->uuid = uuid;
+		int rc = swim_cfg(n->swim, uri, -1, -1, -1, &uuid);
 		assert(rc == 0);
 		(void) rc;
 	}
@@ -71,7 +105,7 @@ swim_cluster_new(int size)
 
 #define swim_cluster_set_cfg(cluster, ...) ({				\
 	for (int i = 0; i < cluster->size; ++i) {			\
-		int rc = swim_cfg(cluster->node[i], __VA_ARGS__);	\
+		int rc = swim_cfg(cluster->node[i].swim, __VA_ARGS__);	\
 		assert(rc == 0);					\
 		(void) rc;						\
 	}								\
@@ -81,28 +115,44 @@ void
 swim_cluster_set_ack_timeout(struct swim_cluster *cluster, double ack_timeout)
 {
 	swim_cluster_set_cfg(cluster, NULL, -1, ack_timeout, -1, NULL);
+	cluster->ack_timeout = ack_timeout;
 }
 
 void
 swim_cluster_set_gc(struct swim_cluster *cluster, enum swim_gc_mode gc_mode)
 {
 	swim_cluster_set_cfg(cluster, NULL, -1, -1, gc_mode, NULL);
+	cluster->gc_mode = gc_mode;
 }
 
 void
 swim_cluster_delete(struct swim_cluster *cluster)
 {
-	for (int i = 0; i < cluster->size; ++i)
-		swim_delete(cluster->node[i]);
+	for (int i = 0; i < cluster->size; ++i) {
+		if (cluster->node[i].swim != NULL)
+			swim_delete(cluster->node[i].swim);
+	}
 	free(cluster->node);
 	free(cluster);
 }
 
+int
+swim_cluster_update_uuid(struct swim_cluster *cluster, int i,
+			 const struct tt_uuid *new_uuid)
+{
+	assert(i >= 0 && i < cluster->size);
+	struct swim_node *n = &cluster->node[i];
+	if (swim_cfg(n->swim, NULL, -1, -1, -1, new_uuid) != 0)
+		return -1;
+	n->uuid = *new_uuid;
+	return 0;
+}
+
 int
 swim_cluster_add_link(struct swim_cluster *cluster, int to_id, int from_id)
 {
-	const struct swim_member *from = swim_self(cluster->node[from_id]);
-	return swim_add_member(cluster->node[to_id], swim_member_uri(from),
+	const struct swim_member *from = swim_self(cluster->node[from_id].swim);
+	return swim_add_member(cluster->node[to_id].swim, swim_member_uri(from),
 			       swim_member_uuid(from));
 }
 
@@ -110,10 +160,12 @@ static const struct swim_member *
 swim_cluster_member_view(struct swim_cluster *cluster, int node_id,
 			 int member_id)
 {
-	struct swim *node = cluster->node[node_id];
-	const struct swim_member *member = swim_self(cluster->node[member_id]);
-	const struct tt_uuid *member_uuid = swim_member_uuid(member);
-	return swim_member_by_uuid(node, member_uuid);
+	/*
+	 * Do not use node[member_id].swim - it can be NULL
+	 * already, for example, in case of quit or deletion.
+	 */
+	return swim_member_by_uuid(cluster->node[node_id].swim,
+				   &cluster->node[member_id].uuid);
 }
 
 enum swim_member_status
@@ -142,44 +194,47 @@ struct swim *
 swim_cluster_node(struct swim_cluster *cluster, int i)
 {
 	assert(i >= 0 && i < cluster->size);
-	return cluster->node[i];
+	return cluster->node[i].swim;
 }
 
 void
 swim_cluster_restart_node(struct swim_cluster *cluster, int i)
 {
 	assert(i >= 0 && i < cluster->size);
-	struct swim *s = cluster->node[i];
-	const struct swim_member *self = swim_self(s);
-	struct tt_uuid uuid = *swim_member_uuid(self);
+	struct swim_node *n = &cluster->node[i];
+	struct swim *s = n->swim;
 	char uri[128];
-	strcpy(uri, swim_member_uri(self));
-	double ack_timeout = swim_ack_timeout(s);
-	swim_delete(s);
+	swim_cluster_id_to_uri(uri, i);
+	if (s != NULL) {
+		assert(tt_uuid_is_equal(swim_member_uuid(swim_self(s)),
+					&n->uuid));
+		swim_delete(s);
+	}
 	s = swim_new();
 	assert(s != NULL);
-	int rc = swim_cfg(s, uri, -1, ack_timeout, -1, &uuid);
+	int rc = swim_cfg(s, uri, -1, cluster->ack_timeout, cluster->gc_mode,
+			  &n->uuid);
 	assert(rc == 0);
 	(void) rc;
-	cluster->node[i] = s;
+	n->swim = s;
 }
 
 void
 swim_cluster_block_io(struct swim_cluster *cluster, int i)
 {
-	swim_test_transport_block_fd(swim_fd(cluster->node[i]));
+	swim_test_transport_block_fd(swim_fd(cluster->node[i].swim));
 }
 
 void
 swim_cluster_unblock_io(struct swim_cluster *cluster, int i)
 {
-	swim_test_transport_unblock_fd(swim_fd(cluster->node[i]));
+	swim_test_transport_unblock_fd(swim_fd(cluster->node[i].swim));
 }
 
 void
 swim_cluster_set_drop(struct swim_cluster *cluster, int i, double value)
 {
-	swim_test_transport_set_drop(swim_fd(cluster->node[i]), value);
+	swim_test_transport_set_drop(swim_fd(cluster->node[i].swim), value);
 }
 
 /** Check if @a s1 knows every member of @a s2's table. */
@@ -201,11 +256,15 @@ swim1_contains_swim2(struct swim *s1, struct swim *s2)
 bool
 swim_cluster_is_fullmesh(struct swim_cluster *cluster)
 {
-	struct swim **end = cluster->node + cluster->size;
-	for (struct swim **s1 = cluster->node; s1 < end; ++s1) {
-		for (struct swim **s2 = s1 + 1; s2 < end; ++s2) {
-			if (! swim1_contains_swim2(*s1, *s2) ||
-			    ! swim1_contains_swim2(*s2, *s1))
+	struct swim_node *end = cluster->node + cluster->size;
+	for (struct swim_node *s1 = cluster->node; s1 < end; ++s1) {
+		if (s1->swim == NULL)
+			continue;
+		for (struct swim_node *s2 = s1 + 1; s2 < end; ++s2) {
+			if (s2->swim == NULL)
+				continue;
+			if (! swim1_contains_swim2(s1->swim, s2->swim) ||
+			    ! swim1_contains_swim2(s2->swim, s1->swim))
 				return false;
 		}
 	}
diff --git a/test/unit/swim_test_utils.h b/test/unit/swim_test_utils.h
index f4397aff4..6e172fb85 100644
--- a/test/unit/swim_test_utils.h
+++ b/test/unit/swim_test_utils.h
@@ -59,6 +59,11 @@ swim_cluster_set_gc(struct swim_cluster *cluster, enum swim_gc_mode gc_mode);
 void
 swim_cluster_delete(struct swim_cluster *cluster);
 
+/** Update UUID of a SWIM instance with id @a i. */
+int
+swim_cluster_update_uuid(struct swim_cluster *cluster, int i,
+			 const struct tt_uuid *new_uuid);
+
 /** Check that an error in diag contains @a msg. */
 bool
 swim_error_check_match(const char *msg);
-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH 2/5] test: on close of swim fake fd send its packets, not drop
  2019-04-09 18:12 [tarantool-patches] [PATCH 0/5] swim member 'left' status Vladislav Shpilevoy
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 1/5] test: allow to remove swim nodes from the cluster Vladislav Shpilevoy
@ 2019-04-09 18:12 ` Vladislav Shpilevoy
  2019-04-10  7:16   ` [tarantool-patches] " Konstantin Osipov
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 3/5] test: process IO swim test events before protocol's ones Vladislav Shpilevoy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-09 18:12 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

The packets originator has already got an OK status and expects
these messages sent even if the originator is closed right after
that. This commit does the TCP-way and sends all the pending
messages before actually closing the fake fd.

Part of #3234
---
 test/unit/swim_test_transport.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/test/unit/swim_test_transport.c b/test/unit/swim_test_transport.c
index e563185ee..f30000135 100644
--- a/test/unit/swim_test_transport.c
+++ b/test/unit/swim_test_transport.c
@@ -135,6 +135,10 @@ swim_fd_open(struct swim_fd *fd)
 	return 0;
 }
 
+/** Send one packet to destination's recv queue. */
+static inline void
+swim_fd_send_packet(struct swim_fd *fd);
+
 /** Close a fake file descriptor. */
 static inline void
 swim_fd_close(struct swim_fd *fd)
@@ -144,8 +148,8 @@ swim_fd_close(struct swim_fd *fd)
 	struct swim_test_packet *i, *tmp;
 	rlist_foreach_entry_safe(i, &fd->recv_queue, in_queue, tmp)
 		swim_test_packet_delete(i);
-	rlist_foreach_entry_safe(i, &fd->send_queue, in_queue, tmp)
-		swim_test_packet_delete(i);
+	while (! rlist_empty(&fd->send_queue))
+		swim_fd_send_packet(fd);
 	rlist_del_entry(fd, in_active);
 	fd->is_opened = false;
 }
@@ -285,12 +289,10 @@ swim_test_is_drop(double rate)
 	return ((double) rand() / RAND_MAX) * 100 < rate;
 }
 
-/** Send one packet to destination's recv queue. */
 static inline void
 swim_fd_send_packet(struct swim_fd *fd)
 {
-	if (rlist_empty(&fd->send_queue))
-		return;
+	assert(! rlist_empty(&fd->send_queue));
 	struct swim_test_packet *p =
 		rlist_shift_entry(&fd->send_queue, struct swim_test_packet,
 				  in_queue);
@@ -311,7 +313,8 @@ swim_transport_do_loop_step(struct ev_loop *loop)
 	 * order. So this reverse + libev reverse = normal order.
 	 */
 	rlist_foreach_entry_reverse(fd, &swim_fd_active, in_active) {
-		swim_fd_send_packet(fd);
+		if (! rlist_empty(&fd->send_queue))
+			swim_fd_send_packet(fd);
 		ev_feed_fd_event(loop, fd->evfd, EV_WRITE);
 	}
 	rlist_foreach_entry_reverse(fd, &swim_fd_active, in_active) {
-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH 3/5] test: process IO swim test events before protocol's ones
  2019-04-09 18:12 [tarantool-patches] [PATCH 0/5] swim member 'left' status Vladislav Shpilevoy
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 1/5] test: allow to remove swim nodes from the cluster Vladislav Shpilevoy
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 2/5] test: on close of swim fake fd send its packets, not drop Vladislav Shpilevoy
@ 2019-04-09 18:12 ` Vladislav Shpilevoy
  2019-04-10  7:17   ` [tarantool-patches] " Konstantin Osipov
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 4/5] swim: introduce quit message Vladislav Shpilevoy
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-09 18:12 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Before that patch the swim test event loop worked like this: pop
a new event, set the global watch to its deadline, process the
event, repeat until the deadlines are the same. These events
usually generate IO events, which are processed next. But after
swim_quit() will be introduced, it is possible to insert new IO
events before protocol's events like round steps and ack checks.

Because of that it would be impossible to process new IO events
only, with timeout = 0, or with timeout > 0, but without changing
the global clock.

For example, a typical test would try to call swim_quit() on a
swim instance, and expect that it has sent all the quit messages
without delays immediately. But before this patch it would be
necessary to run at least one swim round to get to the IO
processing.

The patch splits protocol's events and IO events processing logic
into two functions and calls them explicitly in
swim_wait_timeout() - the main function to check something in the
swim tests.

Part of #3234
---
 test/unit/swim_test_ev.c        | 27 +--------------------------
 test/unit/swim_test_ev.h        |  2 +-
 test/unit/swim_test_transport.c | 32 ++++++++++++++++++++++++++++++--
 test/unit/swim_test_transport.h |  9 +++++----
 test/unit/swim_test_utils.c     | 16 +++++++++++++++-
 5 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/test/unit/swim_test_ev.c b/test/unit/swim_test_ev.c
index 1c4ba8612..135f20107 100644
--- a/test/unit/swim_test_ev.c
+++ b/test/unit/swim_test_ev.c
@@ -281,7 +281,7 @@ swim_ev_timer_stop(struct ev_loop *loop, struct ev_timer *base)
 
 /** Process all the events with the next nearest deadline. */
 void
-swim_do_loop_step(struct ev_loop *loop)
+swim_test_ev_do_loop_step(struct ev_loop *loop)
 {
 	struct swim_event *next_e, *e = event_heap_top(&event_heap);
 	if (e != NULL) {
@@ -296,31 +296,6 @@ swim_do_loop_step(struct ev_loop *loop)
 			e = next_e;
 		} while (e != NULL && e->deadline == watch);
 	}
-	/*
-	 * After events are processed, it is possible that some of
-	 * them generated IO events. Process them too.
-	 */
-	do {
-		swim_transport_do_loop_step(loop);
-		/*
-		 * Just a single loop + invoke is not enough. At
-		 * least two are necessary.
-		 *
-		 * First loop does nothing since send queues are
-		 * empty. First invoke fills send queues.
-		 *
-		 * Second loop moves messages from send to recv
-		 * queues. Second invoke processes messages in
-		 * recv queues.
-		 *
-		 * With indirect messages even 2 cycles is not
-		 * enough - processing of one received message can
-		 * add a new message into another send queue.
-		 */
-		if (ev_pending_count(loop) == 0)
-			break;
-		ev_invoke_pending(loop);
-	} while (true);
 }
 
 void
diff --git a/test/unit/swim_test_ev.h b/test/unit/swim_test_ev.h
index 6a0002c2f..bc925a859 100644
--- a/test/unit/swim_test_ev.h
+++ b/test/unit/swim_test_ev.h
@@ -98,7 +98,7 @@ swim_ev_set_brk(double delay);
 
 /** Play one step of event loop, process generated events. */
 void
-swim_do_loop_step(struct ev_loop *loop);
+swim_test_ev_do_loop_step(struct ev_loop *loop);
 
 /** Destroy pending events, reset global watch. */
 void
diff --git a/test/unit/swim_test_transport.c b/test/unit/swim_test_transport.c
index f30000135..78fda587a 100644
--- a/test/unit/swim_test_transport.c
+++ b/test/unit/swim_test_transport.c
@@ -304,8 +304,12 @@ swim_fd_send_packet(struct swim_fd *fd)
 		swim_test_packet_delete(p);
 }
 
-void
-swim_transport_do_loop_step(struct ev_loop *loop)
+/**
+ * Feed EV_WRITE/READ events to the descriptors having something
+ * to send/recv.
+ */
+static inline void
+swim_test_transport_feed_events(struct ev_loop *loop)
 {
 	struct swim_fd *fd;
 	/*
@@ -322,3 +326,27 @@ swim_transport_do_loop_step(struct ev_loop *loop)
 			ev_feed_fd_event(loop, fd->evfd, EV_READ);
 	}
 }
+
+void
+swim_test_transport_do_loop_step(struct ev_loop *loop)
+{
+	do {
+		ev_invoke_pending(loop);
+		swim_test_transport_feed_events(loop);
+		/*
+		 * Just a single loop + invoke is not enough. At
+		 * least two are necessary.
+		 *
+		 * First loop does nothing since send queues are
+		 * empty. First invoke fills send queues.
+		 *
+		 * Second loop moves messages from send to recv
+		 * queues. Second invoke processes messages in
+		 * recv queues.
+		 *
+		 * With indirect messages even 2 cycles is not
+		 * enough - processing of one received message can
+		 * add a new message into another send queue.
+		 */
+	} while (ev_pending_count(loop) > 0);
+}
diff --git a/test/unit/swim_test_transport.h b/test/unit/swim_test_transport.h
index d291abe91..9248f0e98 100644
--- a/test/unit/swim_test_transport.h
+++ b/test/unit/swim_test_transport.h
@@ -41,12 +41,13 @@ struct ev_loop;
  */
 
 /**
- * Feed EV_WRITE event to all opened descriptors, and EV_READ to
- * ones, who have not empty recv queue. Move packets from send to
- * recv queues. No callbacks is invoked. Only events are fed.
+ * Until there are no new IO events, feed EV_WRITE event to all
+ * opened descriptors; EV_READ to ones, who have not empty recv
+ * queue; invoke callbacks to process the events. Move packets
+ * from send to recv queues.
  */
 void
-swim_transport_do_loop_step(struct ev_loop *loop);
+swim_test_transport_do_loop_step(struct ev_loop *loop);
 
 /**
  * Block a file descriptor so as it can not receive nor send any
diff --git a/test/unit/swim_test_utils.c b/test/unit/swim_test_utils.c
index a42e50cae..8964e345c 100644
--- a/test/unit/swim_test_utils.c
+++ b/test/unit/swim_test_utils.c
@@ -293,10 +293,24 @@ swim_wait_timeout(double timeout, struct swim_cluster *cluster,
 {
 	swim_ev_set_brk(timeout);
 	double deadline = swim_time() + timeout;
+	struct ev_loop *loop = loop();
+	/*
+	 * There can be pending out of bound IO events, affecting
+	 * the result. For example, 'quit' messages, which are
+	 * send immediately without preliminary timeouts or
+	 * whatsoever.
+	 */
+	swim_test_transport_do_loop_step(loop);
 	while (! check(cluster, data)) {
 		if (swim_time() >= deadline)
 			return -1;
-		swim_do_loop_step(loop());
+		swim_test_ev_do_loop_step(loop);
+		/*
+		 * After events are processed, it is possible that
+		 * some of them generated IO events. Process them
+		 * too.
+		 */
+		swim_test_transport_do_loop_step(loop);
 	}
 	return 0;
 }
-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH 4/5] swim: introduce quit message
  2019-04-09 18:12 [tarantool-patches] [PATCH 0/5] swim member 'left' status Vladislav Shpilevoy
                   ` (2 preceding siblings ...)
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 3/5] test: process IO swim test events before protocol's ones Vladislav Shpilevoy
@ 2019-04-09 18:12 ` Vladislav Shpilevoy
  2019-04-10  7:21   ` [tarantool-patches] " Konstantin Osipov
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 5/5] swim: make UUID update smoother and faster Vladislav Shpilevoy
  2019-04-10 10:29 ` [tarantool-patches] Re: [PATCH 0/5] swim member 'left' status Vladislav Shpilevoy
  5 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-09 18:12 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Quit allows to gracefully leave a cluster. Other members will not
consider the quited instance as dead, and will drop it much
earlier than it would happen via failure detection.

Quit works as follows: a special message is send to each member.
Members, got that message, will mark the source as 'left' and
will keep and disseminate that change for one round. In the best
case after one round the left member will be marked as such in
the whole cluster. 'Left' member will not be added back because,
it is prohibited explicitly to add new 'left' members.

Part of #3234
---
 src/lib/swim/swim.c           | 118 +++++++++++++++++++++++++++++++++-
 src/lib/swim/swim.h           |  10 +++
 src/lib/swim/swim_constants.h |   2 +
 src/lib/swim/swim_io.c        |   8 ++-
 src/lib/swim/swim_io.h        |   4 ++
 src/lib/swim/swim_proto.c     |  11 ++++
 src/lib/swim/swim_proto.h     |  30 +++++++++
 test/unit/swim.c              |  83 +++++++++++++++++++++++-
 test/unit/swim.result         |  15 ++++-
 test/unit/swim_test_utils.c   |  11 ++++
 test/unit/swim_test_utils.h   |  12 ++++
 11 files changed, 299 insertions(+), 5 deletions(-)

diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c
index d5ecc6622..f0383554d 100644
--- a/src/lib/swim/swim.c
+++ b/src/lib/swim/swim.c
@@ -885,8 +885,11 @@ swim_decrease_event_ttl(struct swim *swim)
 	rlist_foreach_entry_safe(member, &swim->dissemination_queue,
 				 in_dissemination_queue,
 				 tmp) {
-		if (--member->status_ttl == 0)
+		if (--member->status_ttl == 0) {
 			rlist_del_entry(member, in_dissemination_queue);
+			if (member->status == MEMBER_LEFT)
+				swim_delete_member(swim, member);
+		}
 	}
 }
 
@@ -1026,6 +1029,8 @@ swim_check_acks(struct ev_loop *loop, struct ev_timer *t, int events)
 				continue;
 			}
 			break;
+		case MEMBER_LEFT:
+			continue;
 		default:
 			unreachable();
 		}
@@ -1114,7 +1119,9 @@ swim_upsert_member(struct swim *swim, const struct swim_member_def *def,
 {
 	struct swim_member *member = swim_find_member(swim, &def->uuid);
 	if (member == NULL) {
-		if (def->status == MEMBER_DEAD && swim->gc_mode == SWIM_GC_ON) {
+		if (def->status == MEMBER_LEFT ||
+		    (def->status == MEMBER_DEAD &&
+		     swim->gc_mode == SWIM_GC_ON)) {
 			/*
 			 * Do not 'resurrect' dead members to
 			 * prevent 'ghost' members. Ghost member
@@ -1282,6 +1289,47 @@ swim_process_dissemination(struct swim *swim, const char **pos, const char *end)
 	return swim_process_members(swim, prefix, pos, end);
 }
 
+/**
+ * Decode a quit message. Schedule dissemination, change status.
+ */
+static int
+swim_process_quit(struct swim *swim, const char **pos, const char *end,
+		  const struct tt_uuid *uuid)
+{
+	say_verbose("SWIM %d: process quit", swim_fd(swim));
+	const char *prefix = "invald quit message:";
+	uint32_t size;
+	if (swim_decode_map(pos, end, &size, prefix, "root") != 0)
+		return -1;
+	if (size != 1) {
+		diag_set(SwimError, "%s map of size 1 is expected", prefix);
+		return -1;
+	}
+	uint64_t tmp;
+	if (swim_decode_uint(pos, end, &tmp, prefix, "a key") != 0)
+		return -1;
+	if (tmp != SWIM_QUIT_INCARNATION) {
+		diag_set(SwimError, "%s a key should be incarnation", prefix);
+		return -1;
+	}
+	if (swim_decode_uint(pos, end, &tmp, prefix, "incarnation") != 0)
+		return -1;
+	struct swim_member *m = swim_find_member(swim, uuid);
+	if (m == NULL)
+		return 0;
+	/*
+	 * Check for 'self' in case this instance took UUID of a
+	 * quited instance.
+	 */
+	if (m != swim->self) {
+		swim_update_member_inc_status(swim, m, MEMBER_LEFT, tmp);
+	} else if (tmp >= m->incarnation) {
+		m->incarnation++;
+		swim_on_member_update(swim, m);
+	}
+	return 0;
+}
+
 /** Process a new message. */
 static void
 swim_on_input(struct swim_scheduler *scheduler, const char *pos,
@@ -1325,6 +1373,10 @@ swim_on_input(struct swim_scheduler *scheduler, const char *pos,
 			if (swim_process_dissemination(swim, &pos, end) != 0)
 				goto error;
 			break;
+		case SWIM_QUIT:
+			if (swim_process_quit(swim, &pos, end, &uuid) != 0)
+				goto error;
+			break;
 		default:
 			diag_set(SwimError, "%s unexpected key", prefix);
 			goto error;
@@ -1590,6 +1642,68 @@ swim_delete(struct swim *swim)
 	free(swim->shuffled);
 }
 
+/**
+ * Quit message is broadcasted in the same way as round messages,
+ * step by step, with the only difference that quit round steps
+ * follow each other without delays.
+ */
+static void
+swim_quit_step_complete(struct swim_task *task,
+			struct swim_scheduler *scheduler, int rc)
+{
+	(void) rc;
+	struct swim *swim = swim_by_scheduler(scheduler);
+	if (rlist_empty(&swim->round_queue)) {
+		swim_delete(swim);
+		return;
+	}
+	struct swim_member *m =
+		rlist_shift_entry(&swim->round_queue, struct swim_member,
+				  in_round_queue);
+	swim_task_send(task, &m->addr, scheduler);
+}
+
+/**
+ * Encode 'quit' command.
+ * @retval Number of key-values added to the packet's root map.
+ */
+static inline int
+swim_encode_quit(struct swim *swim, struct swim_packet *packet)
+{
+	struct swim_quit_bin bin;
+	char *pos = swim_packet_alloc(packet, sizeof(bin));
+	if (pos == NULL)
+		return 0;
+	swim_quit_bin_create(&bin, swim->self->incarnation);
+	memcpy(pos, &bin, sizeof(bin));
+	return 1;
+}
+
+void
+swim_quit(struct swim *swim)
+{
+	assert(swim_is_configured(swim));
+	swim_ev_timer_stop(loop(), &swim->round_tick);
+	swim_ev_timer_stop(loop(), &swim->wait_ack_tick);
+	swim_scheduler_stop_input(&swim->scheduler);
+	/* Start the last round - quiting. */
+	if (swim_new_round(swim) != 0) {
+		diag_log();
+		swim_delete(swim);
+		return;
+	}
+	struct swim_task *task = &swim->round_step_task;
+	swim_task_destroy(task);
+	swim_task_create(task, swim_quit_step_complete, swim_task_delete_cb,
+			 "quit");
+	char *header = swim_packet_alloc(&task->packet, 1);
+	int rc = swim_encode_src_uuid(swim, &task->packet) +
+		 swim_encode_quit(swim, &task->packet);
+	assert(rc == 2);
+	mp_encode_map(header, rc);
+	swim_quit_step_complete(task, &swim->scheduler, 0);
+}
+
 const struct swim_member *
 swim_self(struct swim *swim)
 {
diff --git a/src/lib/swim/swim.h b/src/lib/swim/swim.h
index ec924f36f..f8dfdde87 100644
--- a/src/lib/swim/swim.h
+++ b/src/lib/swim/swim.h
@@ -144,6 +144,16 @@ swim_info(struct swim *swim, struct info_handler *info);
 int
 swim_size(const struct swim *swim);
 
+/**
+ * Gracefully leave the cluster, broadcast a notification.
+ * Members, received it, will remove a record about this instance
+ * from their tables, and will not consider it dead. @a swim
+ * object can not be used after quit and should be treated as
+ * deleted.
+ */
+void
+swim_quit(struct swim *swim);
+
 /** Get a SWIM member, describing this instance. */
 const struct swim_member *
 swim_self(struct swim *swim);
diff --git a/src/lib/swim/swim_constants.h b/src/lib/swim/swim_constants.h
index 104f09f47..7869ddf3e 100644
--- a/src/lib/swim/swim_constants.h
+++ b/src/lib/swim/swim_constants.h
@@ -42,6 +42,8 @@ enum swim_member_status {
 	 * the membership after some unacknowledged pings.
 	 */
 	MEMBER_DEAD,
+	/** The member has voluntary left the cluster. */
+	MEMBER_LEFT,
 	swim_member_status_MAX,
 };
 
diff --git a/src/lib/swim/swim_io.c b/src/lib/swim/swim_io.c
index 504f64f32..56c2facc8 100644
--- a/src/lib/swim/swim_io.c
+++ b/src/lib/swim/swim_io.c
@@ -157,6 +157,12 @@ swim_scheduler_bind(struct swim_scheduler *scheduler,
 	return 0;
 }
 
+void
+swim_scheduler_stop_input(struct swim_scheduler *scheduler)
+{
+	swim_ev_io_stop(loop(), &scheduler->input);
+}
+
 void
 swim_scheduler_destroy(struct swim_scheduler *scheduler)
 {
@@ -172,7 +178,7 @@ swim_scheduler_destroy(struct swim_scheduler *scheduler)
 	}
 	swim_transport_destroy(&scheduler->transport);
 	swim_ev_io_stop(loop(), &scheduler->output);
-	swim_ev_io_stop(loop(), &scheduler->input);
+	swim_scheduler_stop_input(scheduler);
 }
 
 static void
diff --git a/src/lib/swim/swim_io.h b/src/lib/swim/swim_io.h
index 25c6ea2e9..a6032127d 100644
--- a/src/lib/swim/swim_io.h
+++ b/src/lib/swim/swim_io.h
@@ -171,6 +171,10 @@ int
 swim_scheduler_bind(struct swim_scheduler *scheduler,
 		    const struct sockaddr_in *addr);
 
+/** Stop accepting new packets from the network. */
+void
+swim_scheduler_stop_input(struct swim_scheduler *scheduler);
+
 /** Destroy scheduler, its queues, close the socket. */
 void
 swim_scheduler_destroy(struct swim_scheduler *scheduler);
diff --git a/src/lib/swim/swim_proto.c b/src/lib/swim/swim_proto.c
index 6b3197790..fa02b61c4 100644
--- a/src/lib/swim/swim_proto.c
+++ b/src/lib/swim/swim_proto.c
@@ -38,6 +38,7 @@
 const char *swim_member_status_strs[] = {
 	"alive",
 	"dead",
+	"left",
 };
 
 const char *swim_fd_msg_type_strs[] = {
@@ -451,3 +452,13 @@ swim_meta_def_decode(struct swim_meta_def *def, const char **pos,
 	}
 	return 0;
 }
+
+void
+swim_quit_bin_create(struct swim_quit_bin *header, uint64_t incarnation)
+{
+	header->k_quit = SWIM_QUIT;
+	header->m_quit = 0x81;
+	header->k_incarnation = SWIM_QUIT_INCARNATION;
+	header->m_incarnation = 0xcf;
+	header->v_incarnation = mp_bswap_u64(incarnation);
+}
diff --git a/src/lib/swim/swim_proto.h b/src/lib/swim/swim_proto.h
index 826443a3b..6ae4475c0 100644
--- a/src/lib/swim/swim_proto.h
+++ b/src/lib/swim/swim_proto.h
@@ -84,6 +84,12 @@
  * |         },                                                  |
  * |         ...                                                 |
  * |     ],                                                      |
+ * |                                                             |
+ * |               OR/AND                                        |
+ * |                                                             |
+ * |     SWIM_QUIT: {                                            |
+ * |         SWIM_QUIT_INCARNATION: uint                         |
+ * |     }                                                       |
  * | }                                                           |
  * +-------------------------------------------------------------+
  */
@@ -128,6 +134,7 @@ enum swim_body_key {
 	SWIM_ANTI_ENTROPY,
 	SWIM_FAILURE_DETECTION,
 	SWIM_DISSEMINATION,
+	SWIM_QUIT,
 };
 
 /**
@@ -450,6 +457,29 @@ swim_meta_def_decode(struct swim_meta_def *def, const char **pos,
 
 /** }}}                     Meta component                      */
 
+enum swim_quit_key {
+	/** Incarnation to ignore old quit messages. */
+	SWIM_QUIT_INCARNATION = 0,
+};
+
+/** Quit section. Describes voluntary quit from the cluster. */
+struct PACKED swim_quit_bin {
+	/** mp_encode_uint(SWIM_QUIT) */
+	uint8_t k_quit;
+	/** mp_encode_map(1) */
+	uint8_t m_quit;
+
+	/** mp_encode_uint(SWIM_QUIT_INCARNATION) */
+	uint8_t k_incarnation;
+	/** mp_encode_uint(64bit incarnation) */
+	uint8_t m_incarnation;
+	uint64_t v_incarnation;
+};
+
+/** Initialize quit section. */
+void
+swim_quit_bin_create(struct swim_quit_bin *header, uint64_t incarnation);
+
 /**
  * Helpers to decode some values - map, array, etc with
  * appropriate checks. All of them set diagnostics on an error
diff --git a/test/unit/swim.c b/test/unit/swim.c
index 84ca01ac3..c2d94fb57 100644
--- a/test/unit/swim.c
+++ b/test/unit/swim.c
@@ -467,10 +467,90 @@ swim_test_undead(void)
 	swim_finish_test();
 }
 
+static void
+swim_test_quit(void)
+{
+	swim_start_test(9);
+	int size = 3;
+	struct swim_cluster *cluster = swim_cluster_new(size);
+	for (int i = 0; i < size; ++i) {
+		for (int j = 0; j < size; ++j)
+			swim_cluster_add_link(cluster, i, j);
+	}
+	swim_cluster_quit_node(cluster, 0);
+	is(swim_cluster_wait_status_everywhere(cluster, 0, MEMBER_LEFT, 0),
+	   0, "'quit' is sent to all the members without delays between "\
+	   "dispatches")
+	/*
+	 * Return the instance back and check that it refutes the
+	 * old LEFT status.
+	 */
+	swim_cluster_restart_node(cluster, 0);
+	is(swim_cluster_wait_incarnation(cluster, 0, 0, 1, 2), 0,
+	   "quited member S1 has returned and refuted the old status");
+	fail_if(swim_cluster_wait_fullmesh(cluster, 2) != 0);
+	/*
+	 * Not trivial test. A member can receive its own 'quit'
+	 * message. It can be reproduced if a member has quited.
+	 * Then another member took the spare UUID, and then
+	 * received the 'quit' message with the same UUID. Of
+	 * course, it should be refuted.
+	 */
+	struct swim *s0 = swim_cluster_node(cluster, 0);
+	struct tt_uuid s0_uuid = *swim_member_uuid(swim_self(s0));
+	struct swim *s1 = swim_cluster_node(cluster, 1);
+	swim_remove_member(s1, &s0_uuid);
+	struct swim *s2 = swim_cluster_node(cluster, 2);
+	swim_remove_member(s2, &s0_uuid);
+	swim_cluster_quit_node(cluster, 0);
+
+	/* Steal UUID of the quited node. */
+	swim_cluster_block_io(cluster, 1);
+	is(swim_cluster_update_uuid(cluster, 1, &s0_uuid), 0, "another "\
+	   "member S2 has taken the quited UUID");
+
+	/* Ensure that S1 is not added back to S3 on quit. */
+	swim_run_for(1);
+	is(swim_cluster_member_status(cluster, 2, 0), swim_member_status_MAX,
+	   "S3 did not add S1 back when received its 'quit'");
+
+	/* Now allow S2 to get the 'self-quit' message. */
+	swim_cluster_unblock_io(cluster, 1);
+	is(swim_cluster_wait_incarnation(cluster, 1, 1, 1, 0), 0,
+	   "S2 finally got 'quit' message from S1, but with its 'own' UUID - "\
+	   "refute it")
+	swim_cluster_delete(cluster);
+
+	/**
+	 * Test that if a new member has arrived with LEFT status
+	 * via dissemination or anti-entropy - it is not added.
+	 * Even if GC is off.
+	 */
+	cluster = swim_cluster_new(3);
+	swim_cluster_set_gc(cluster, SWIM_GC_OFF);
+	swim_cluster_interconnect(cluster, 0, 2);
+	swim_cluster_interconnect(cluster, 1, 2);
+
+	swim_cluster_quit_node(cluster, 0);
+	swim_run_for(2);
+	is(swim_cluster_member_status(cluster, 2, 0), MEMBER_LEFT,
+	   "S3 sees S1 as left");
+	is(swim_cluster_member_status(cluster, 1, 0), swim_member_status_MAX,
+	   "S2 does not see S1 at all");
+	swim_run_for(2);
+	is(swim_cluster_member_status(cluster, 2, 0), swim_member_status_MAX,
+	   "after more time S1 is dropped from S3");
+	is(swim_cluster_member_status(cluster, 1, 0), swim_member_status_MAX,
+	   "and still is not added to S2 - left members can not be added");
+
+	swim_cluster_delete(cluster);
+	swim_finish_test();
+}
+
 static int
 main_f(va_list ap)
 {
-	swim_start_test(12);
+	swim_start_test(13);
 
 	(void) ap;
 	swim_test_ev_init();
@@ -488,6 +568,7 @@ main_f(va_list ap)
 	swim_test_too_big_packet();
 	swim_test_undead();
 	swim_test_packet_loss();
+	swim_test_quit();
 
 	swim_test_transport_free();
 	swim_test_ev_free();
diff --git a/test/unit/swim.result b/test/unit/swim.result
index 615327e27..bd157feff 100644
--- a/test/unit/swim.result
+++ b/test/unit/swim.result
@@ -1,5 +1,5 @@
 	*** main_f ***
-1..12
+1..13
 	*** swim_test_one_link ***
     1..6
     ok 1 - no rounds - no fullmesh
@@ -116,4 +116,17 @@ ok 11 - subtests
     ok 5 - drop rate = 90.00, but the failure is disseminated
 ok 12 - subtests
 	*** swim_test_packet_loss: done ***
+	*** swim_test_quit ***
+    1..9
+    ok 1 - 'quit' is sent to all the members without delays between dispatches
+    ok 2 - quited member S1 has returned and refuted the old status
+    ok 3 - another member S2 has taken the quited UUID
+    ok 4 - S3 did not add S1 back when received its 'quit'
+    ok 5 - S2 finally got 'quit' message from S1, but with its 'own' UUID - refute it
+    ok 6 - S3 sees S1 as left
+    ok 7 - S2 does not see S1 at all
+    ok 8 - after more time S1 is dropped from S3
+    ok 9 - and still is not added to S2 - left members can not be added
+ok 13 - subtests
+	*** swim_test_quit: done ***
 	*** main_f: done ***
diff --git a/test/unit/swim_test_utils.c b/test/unit/swim_test_utils.c
index 8964e345c..da8dd4386 100644
--- a/test/unit/swim_test_utils.c
+++ b/test/unit/swim_test_utils.c
@@ -197,6 +197,17 @@ swim_cluster_node(struct swim_cluster *cluster, int i)
 	return cluster->node[i].swim;
 }
 
+void
+swim_cluster_quit_node(struct swim_cluster *cluster, int i)
+{
+	assert(i >= 0 && i < cluster->size);
+	struct swim_node *n = &cluster->node[i];
+	assert(tt_uuid_is_equal(&n->uuid,
+				swim_member_uuid(swim_self(n->swim))));
+	swim_quit(n->swim);
+	n->swim = NULL;
+}
+
 void
 swim_cluster_restart_node(struct swim_cluster *cluster, int i)
 {
diff --git a/test/unit/swim_test_utils.h b/test/unit/swim_test_utils.h
index 6e172fb85..5e1c192a1 100644
--- a/test/unit/swim_test_utils.h
+++ b/test/unit/swim_test_utils.h
@@ -72,6 +72,10 @@ swim_error_check_match(const char *msg);
 struct swim *
 swim_cluster_node(struct swim_cluster *cluster, int i);
 
+/** Quit a member with id @a id. */
+void
+swim_cluster_quit_node(struct swim_cluster *cluster, int i);
+
 /** Drop and create again a SWIM instance with id @a i. */
 void
 swim_cluster_restart_node(struct swim_cluster *cluster, int i);
@@ -94,6 +98,14 @@ swim_cluster_set_drop(struct swim_cluster *cluster, int i, double value);
 int
 swim_cluster_add_link(struct swim_cluster *cluster, int to_id, int from_id);
 
+/** Add a bidirectional link between two SWIM instances. */
+static inline void
+swim_cluster_interconnect(struct swim_cluster *cluster, int to_id, int from_id)
+{
+	swim_cluster_add_link(cluster, to_id, from_id);
+	swim_cluster_add_link(cluster, from_id, to_id);
+}
+
 enum swim_member_status
 swim_cluster_member_status(struct swim_cluster *cluster, int node_id,
 			   int member_id);
-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH 5/5] swim: make UUID update smoother and faster
  2019-04-09 18:12 [tarantool-patches] [PATCH 0/5] swim member 'left' status Vladislav Shpilevoy
                   ` (3 preceding siblings ...)
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 4/5] swim: introduce quit message Vladislav Shpilevoy
@ 2019-04-09 18:12 ` Vladislav Shpilevoy
  2019-04-10  7:22   ` [tarantool-patches] " Konstantin Osipov
  2019-04-10 10:29 ` [tarantool-patches] Re: [PATCH 0/5] swim member 'left' status Vladislav Shpilevoy
  5 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-09 18:12 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Before this patch UUID update was the same as introduction of a
new member and waiting until the 'old' is dropped as 'dead' by
the failure detection component. It could take 2.5 minutes with
the default ack timeout. What is more, with GC turned off it
would always result in never deleted old UUID.

The patch on a UUID update marks the old UUID as 'left' member.
In the best and most common case it guarantees that old UUID will
be dropped not later than after 2 complete rounds, and marked as
'left' everywhere for log(cluster_size) round steps. Even with GC
turned off.

Part of #3234
---
 src/lib/swim/swim.c   | 89 ++++++-------------------------------------
 test/unit/swim.c      | 13 +++++--
 test/unit/swim.result | 12 +++---
 3 files changed, 27 insertions(+), 87 deletions(-)

diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c
index f0383554d..f19ae3640 100644
--- a/src/lib/swim/swim.c
+++ b/src/lib/swim/swim.c
@@ -545,21 +545,6 @@ swim_member_delete(struct swim_member *member)
 	free(member);
 }
 
-/**
- * Reserve space for one more member in the member table. Used to
- * execute a non-failing UUID update.
- */
-static inline int
-swim_reserve_one_member(struct swim *swim)
-{
-	if (mh_swim_table_reserve(swim->members, mh_size(swim->members) + 1,
-				  NULL) != 0) {
-		diag_set(OutOfMemory, sizeof(mh_int_t), "malloc", "node");
-		return -1;
-	}
-	return 0;
-}
-
 /** Create a new member. It is not registered anywhere here. */
 static struct swim_member *
 swim_member_new(const struct sockaddr_in *addr, const struct tt_uuid *uuid,
@@ -1038,56 +1023,6 @@ swim_check_acks(struct ev_loop *loop, struct ev_timer *t, int events)
 	}
 }
 
-/**
- * Update member's UUID if it is changed. On UUID change the
- * member is reinserted into the member table with a new UUID.
- * @retval 0 Success.
- * @retval -1 Error. Out of memory or the new UUID is already in
- *         use.
- */
-static int
-swim_update_member_uuid(struct swim *swim, struct swim_member *member,
-			const struct tt_uuid *new_uuid)
-{
-	if (tt_uuid_is_equal(new_uuid, &member->uuid))
-		return 0;
-	if (swim_find_member(swim, new_uuid) != NULL) {
-		diag_set(SwimError, "duplicate UUID '%s'",
-			 swim_uuid_str(new_uuid));
-		return -1;
-	}
-	/*
-	 * Reserve before put + delete, because put below can
-	 * call rehash, and a reference to the old place in the
-	 * hash will taint.
-	 */
-	if (swim_reserve_one_member(swim) != 0)
-		return -1;
-	struct mh_swim_table_t *t = swim->members;
-	struct tt_uuid old_uuid = member->uuid;
-	struct mh_swim_table_key key = {member->hash, &old_uuid};
-	mh_int_t old_rc = mh_swim_table_find(t, key, NULL);
-	assert(old_rc != mh_end(t));
-	member->uuid = *new_uuid;
-	member->hash = swim_uuid_hash(new_uuid);
-	mh_int_t new_rc =
-		mh_swim_table_put(t, (const struct swim_member **) &member,
-				  NULL, NULL);
-	/* Can not fail - reserved above. */
-	assert(new_rc != mh_end(t));
-	(void) new_rc;
-	/*
-	 * Old_rc is still valid, because a possible rehash
-	 * happened before put.
-	 */
-	mh_swim_table_del(t, old_rc, NULL);
-	say_verbose("SWIM %d: a member has changed its UUID from %s to %s",
-		    swim_fd(swim), swim_uuid_str(&old_uuid),
-		    swim_uuid_str(new_uuid));
-	swim_on_member_update(swim, member);
-	return 0;
-}
-
 /** Update member's address.*/
 static inline void
 swim_update_member_addr(struct swim *swim, struct swim_member *member,
@@ -1458,6 +1393,7 @@ swim_cfg(struct swim *swim, const char *uri, double heartbeat_rate,
 	if (uri != NULL && swim_uri_to_addr(uri, &addr, prefix) != 0)
 		return -1;
 	bool is_first_cfg = swim->self == NULL;
+	struct swim_member *new_self = NULL;
 	if (is_first_cfg) {
 		if (uuid == NULL || tt_uuid_is_nil(uuid) || uri == NULL) {
 			diag_set(SwimError, "%s UUID and URI are mandatory in "\
@@ -1476,15 +1412,9 @@ swim_cfg(struct swim *swim, const char *uri, double heartbeat_rate,
 				 "already exists", prefix);
 			return -1;
 		}
-		/*
-		 * Reserve one cell for reinsertion of self with a
-		 * new UUID. Reserve is necessary right here, not
-		 * later, for atomic reconfiguration. Without
-		 * reservation in that place it is possible that
-		 * the instance is bound to a new URI, but failed
-		 * to update UUID due to memory issues.
-		 */
-		if (swim_reserve_one_member(swim) != 0)
+		new_self = swim_new_member(swim, &swim->self->addr, uuid,
+					   MEMBER_ALIVE, 0);
+		if (new_self == NULL)
 			return -1;
 	}
 	if (uri != NULL) {
@@ -1496,6 +1426,8 @@ swim_cfg(struct swim *swim, const char *uri, double heartbeat_rate,
 			if (is_first_cfg) {
 				swim_delete_member(swim, swim->self);
 				swim->self = NULL;
+			} else if (new_self != NULL) {
+				swim_delete_member(swim, new_self);
 			}
 			return -1;
 		}
@@ -1514,11 +1446,12 @@ swim_cfg(struct swim *swim, const char *uri, double heartbeat_rate,
 	if (swim->wait_ack_tick.at != ack_timeout && ack_timeout > 0)
 		swim_ev_timer_set(&swim->wait_ack_tick, ack_timeout, 0);
 
+	if (new_self != NULL) {
+		swim->self->status = MEMBER_LEFT;
+		swim_on_member_update(swim, swim->self);
+		swim->self = new_self;
+	}
 	swim_update_member_addr(swim, swim->self, &addr);
-	int rc = swim_update_member_uuid(swim, swim->self, uuid);
-	/* Reserved above. */
-	assert(rc == 0);
-	(void) rc;
 	if (gc_mode != SWIM_GC_DEFAULT)
 		swim->gc_mode = gc_mode;
 	return 0;
diff --git a/test/unit/swim.c b/test/unit/swim.c
index c2d94fb57..94eb08ece 100644
--- a/test/unit/swim.c
+++ b/test/unit/swim.c
@@ -98,18 +98,23 @@ swim_test_sequence(void)
 static void
 swim_test_uuid_update(void)
 {
-	swim_start_test(5);
+	swim_start_test(7);
 
 	struct swim_cluster *cluster = swim_cluster_new(2);
 	swim_cluster_add_link(cluster, 0, 1);
 	fail_if(swim_cluster_wait_fullmesh(cluster, 1) != 0);
 	struct swim *s = swim_cluster_node(cluster, 0);
+	struct tt_uuid old_uuid = *swim_member_uuid(swim_self(s));
 	struct tt_uuid new_uuid = uuid_nil;
 	new_uuid.time_low = 1000;
 	is(swim_cluster_update_uuid(cluster, 0, &new_uuid), 0, "UUID update");
-	is(swim_cluster_wait_fullmesh(cluster, 1), 0,
-	   "old UUID is returned back as a 'ghost' member");
-	is(swim_size(s), 3, "two members in each + ghost third member");
+	is(swim_member_status(swim_member_by_uuid(s, &old_uuid)), MEMBER_LEFT,
+	   "old UUID is marked as 'left'");
+	swim_run_for(5);
+	is(swim_member_by_uuid(s, &old_uuid), NULL,
+	   "old UUID is dropped after a while");
+	ok(swim_cluster_is_fullmesh(cluster), "dropped everywhere");
+	is(swim_size(s), 2, "two members in each");
 	new_uuid.time_low = 2;
 	is(swim_cluster_update_uuid(cluster, 0, &new_uuid), -1,
 	   "can not update to an existing UUID - swim_cfg fails");
diff --git a/test/unit/swim.result b/test/unit/swim.result
index bd157feff..7277f2ee6 100644
--- a/test/unit/swim.result
+++ b/test/unit/swim.result
@@ -16,12 +16,14 @@ ok 1 - subtests
 ok 2 - subtests
 	*** swim_test_sequence: done ***
 	*** swim_test_uuid_update ***
-    1..5
+    1..7
     ok 1 - UUID update
-    ok 2 - old UUID is returned back as a 'ghost' member
-    ok 3 - two members in each + ghost third member
-    ok 4 - can not update to an existing UUID - swim_cfg fails
-    ok 5 - diag says 'exists'
+    ok 2 - old UUID is marked as 'left'
+    ok 3 - old UUID is dropped after a while
+    ok 4 - dropped everywhere
+    ok 5 - two members in each
+    ok 6 - can not update to an existing UUID - swim_cfg fails
+    ok 7 - diag says 'exists'
 ok 3 - subtests
 	*** swim_test_uuid_update: done ***
 	*** swim_test_cfg ***
-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 1/5] test: allow to remove swim nodes from the cluster
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 1/5] test: allow to remove swim nodes from the cluster Vladislav Shpilevoy
@ 2019-04-10  7:15   ` Konstantin Osipov
  0 siblings, 0 replies; 12+ messages in thread
From: Konstantin Osipov @ 2019-04-10  7:15 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/04/09 21:14]:
> Until now it was impossible in swim tests to drop a SWIM instance
> from the cluster. It should have been either restarted, or
> blocked, but a real drop led to an assertion on any attemp to use
> one of methods like swim_wait_timeout(). It was due to inability
> to get instance's UUID without the instance itself. Even if it
> was stored in membership tables of other instances.
> 
> This patch makes swim_cluster store swim instances and UUIDs
> separately. This is going to be used to test swim_quit() API.
> Also, some cfg parameters are saved as well, like ack timeout, gc
> mode. They are used to restart a node with exactly same cfg as
> it was before restart. Even if original struct swim * is not
> valid already.
> 
> Part of #3234

OK to push.


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 2/5] test: on close of swim fake fd send its packets, not drop
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 2/5] test: on close of swim fake fd send its packets, not drop Vladislav Shpilevoy
@ 2019-04-10  7:16   ` Konstantin Osipov
  0 siblings, 0 replies; 12+ messages in thread
From: Konstantin Osipov @ 2019-04-10  7:16 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/04/09 21:14]:
> The packets originator has already got an OK status and expects
> these messages sent even if the originator is closed right after
> that. This commit does the TCP-way and sends all the pending
> messages before actually closing the fake fd.
> 
> Part of #3234

OK to push.


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 3/5] test: process IO swim test events before protocol's ones
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 3/5] test: process IO swim test events before protocol's ones Vladislav Shpilevoy
@ 2019-04-10  7:17   ` Konstantin Osipov
  0 siblings, 0 replies; 12+ messages in thread
From: Konstantin Osipov @ 2019-04-10  7:17 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/04/09 21:14]:
> Before that patch the swim test event loop worked like this: pop
> a new event, set the global watch to its deadline, process the
> event, repeat until the deadlines are the same. These events
> usually generate IO events, which are processed next. But after
> swim_quit() will be introduced, it is possible to insert new IO
> events before protocol's events like round steps and ack checks.
> 
> Because of that it would be impossible to process new IO events
> only, with timeout = 0, or with timeout > 0, but without changing
> the global clock.
> 
> For example, a typical test would try to call swim_quit() on a
> swim instance, and expect that it has sent all the quit messages
> without delays immediately. But before this patch it would be
> necessary to run at least one swim round to get to the IO
> processing.
> 
> The patch splits protocol's events and IO events processing logic
> into two functions and calls them explicitly in
> swim_wait_timeout() - the main function to check something in the
> swim tests.

OK to push.


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 4/5] swim: introduce quit message
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 4/5] swim: introduce quit message Vladislav Shpilevoy
@ 2019-04-10  7:21   ` Konstantin Osipov
  0 siblings, 0 replies; 12+ messages in thread
From: Konstantin Osipov @ 2019-04-10  7:21 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/04/09 21:14]:
> Quit allows to gracefully leave a cluster. Other members will not
> consider the quited instance as dead, and will drop it much
> earlier than it would happen via failure detection.
> 
> Quit works as follows: a special message is send to each member.
> Members, got that message, will mark the source as 'left' and
> will keep and disseminate that change for one round. In the best
> case after one round the left member will be marked as such in
> the whole cluster. 'Left' member will not be added back because,
> it is prohibited explicitly to add new 'left' members.

OK to push.


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 5/5] swim: make UUID update smoother and faster
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 5/5] swim: make UUID update smoother and faster Vladislav Shpilevoy
@ 2019-04-10  7:22   ` Konstantin Osipov
  0 siblings, 0 replies; 12+ messages in thread
From: Konstantin Osipov @ 2019-04-10  7:22 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/04/09 21:14]:
> Before this patch UUID update was the same as introduction of a
> new member and waiting until the 'old' is dropped as 'dead' by
> the failure detection component. It could take 2.5 minutes with
> the default ack timeout. What is more, with GC turned off it
> would always result in never deleted old UUID.
> 
> The patch on a UUID update marks the old UUID as 'left' member.
> In the best and most common case it guarantees that old UUID will
> be dropped not later than after 2 complete rounds, and marked as
> 'left' everywhere for log(cluster_size) round steps. Even with GC
> turned off.
> 
> Part of #3234

OK to push.

 

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 0/5] swim member 'left' status
  2019-04-09 18:12 [tarantool-patches] [PATCH 0/5] swim member 'left' status Vladislav Shpilevoy
                   ` (4 preceding siblings ...)
  2019-04-09 18:12 ` [tarantool-patches] [PATCH 5/5] swim: make UUID update smoother and faster Vladislav Shpilevoy
@ 2019-04-10 10:29 ` Vladislav Shpilevoy
  5 siblings, 0 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-10 10:29 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

The whole patchset is pushed to the master.

On 09/04/2019 21:12, Vladislav Shpilevoy wrote:
> The patchset introduces and uses a new member status 'left', MEMBER_LEFT.
> Originally the status appears in scope of swim_quit() API - a function
> gracefully removing the instance from the cluster. Quit is used when a SWIM
> instance does not want to be a part of the cluster anymore, and it is
> undesirable to just shut it down and drop from other members via failure
> detection - it would be much longer, at least.
> 
> But appeared that 'left' status also can be applied to smooth UUID update.
> Before the patchset UUID update led to appearance of a 'ghost' of the old UUID
> for a relatively long time. Now on UUID update the old UUID is marked as left,
> and eventually dropped bypassing failure detection. Of course it is not
> guaranteed, but it works in most of cases.
> 
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-3234-swim-member-left-status
> Issue: https://github.com/tarantool/tarantool/issues/3234
> 
> Vladislav Shpilevoy (5):
>   test: allow to remove swim nodes from the cluster
>   test: on close of swim fake fd send its packets, not drop
>   test: process IO swim test events before protocol's ones
>   swim: introduce quit message
>   swim: make UUID update smoother and faster
> 
>  src/lib/swim/swim.c             | 207 ++++++++++++++++++++------------
>  src/lib/swim/swim.h             |  10 ++
>  src/lib/swim/swim_constants.h   |   2 +
>  src/lib/swim/swim_io.c          |   8 +-
>  src/lib/swim/swim_io.h          |   4 +
>  src/lib/swim/swim_proto.c       |  11 ++
>  src/lib/swim/swim_proto.h       |  30 +++++
>  test/unit/swim.c                | 100 +++++++++++++--
>  test/unit/swim.result           |  27 ++++-
>  test/unit/swim_test_ev.c        |  27 +----
>  test/unit/swim_test_ev.h        |   2 +-
>  test/unit/swim_test_transport.c |  47 ++++++--
>  test/unit/swim_test_transport.h |   9 +-
>  test/unit/swim_test_utils.c     | 152 +++++++++++++++++------
>  test/unit/swim_test_utils.h     |  17 +++
>  15 files changed, 486 insertions(+), 167 deletions(-)
> 
> -- 
> 2.17.2 (Apple Git-113)
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-04-10 10:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-09 18:12 [tarantool-patches] [PATCH 0/5] swim member 'left' status Vladislav Shpilevoy
2019-04-09 18:12 ` [tarantool-patches] [PATCH 1/5] test: allow to remove swim nodes from the cluster Vladislav Shpilevoy
2019-04-10  7:15   ` [tarantool-patches] " Konstantin Osipov
2019-04-09 18:12 ` [tarantool-patches] [PATCH 2/5] test: on close of swim fake fd send its packets, not drop Vladislav Shpilevoy
2019-04-10  7:16   ` [tarantool-patches] " Konstantin Osipov
2019-04-09 18:12 ` [tarantool-patches] [PATCH 3/5] test: process IO swim test events before protocol's ones Vladislav Shpilevoy
2019-04-10  7:17   ` [tarantool-patches] " Konstantin Osipov
2019-04-09 18:12 ` [tarantool-patches] [PATCH 4/5] swim: introduce quit message Vladislav Shpilevoy
2019-04-10  7:21   ` [tarantool-patches] " Konstantin Osipov
2019-04-09 18:12 ` [tarantool-patches] [PATCH 5/5] swim: make UUID update smoother and faster Vladislav Shpilevoy
2019-04-10  7:22   ` [tarantool-patches] " Konstantin Osipov
2019-04-10 10:29 ` [tarantool-patches] Re: [PATCH 0/5] swim member 'left' status Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox