Tarantool development patches archive
 help / color / mirror / Atom feed
* [tarantool-patches] [PATCH 0/5] swim dissemination component
@ 2019-04-05 11:57 Vladislav Shpilevoy
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 1/5] swim: encapsulate member bin info into a 'passport' Vladislav Shpilevoy
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-05 11:57 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

The patchset consists again of some follow-ups to the previous commits; of
preparatory patches for dissemination to make the code more reusable and the
tests faster; and finally the dissemination component is introduced.

I do not duplicate here the component description, it is quite comprehensive
in the commit message and comments. The only noticable thing is that the
component alone does not care about changed uuid steady dissemination. Here UUID
update looks just like any other update with some minor differencies. And it
leads to a 'bug', that even with the stable network and the fullmesh a ghost of
the old UUID will live in the cluster until it is dropped by failure detection
component. It takes long time, since ack_timeout is 30 seconds by default, and
it is necessary to unack 5 pings to drop a member - 2.5 minutes in total.

Next patches will fix that problem for the most popular case.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-3234-swim-dissemination
Issue: https://github.com/tarantool/tarantool/issues/3234

Vladislav Shpilevoy (5):
  swim: encapsulate member bin info into a 'passport'
  swim: make members array decoder be a separate function
  test: speed up swim big cluster failure detection
  test: set packet drop rate instead of flag in swim tests
  swim: introduce dissemination component

 src/lib/swim/swim.c             | 174 ++++++++++++++++++++++++++++++--
 src/lib/swim/swim.h             |   4 +
 src/lib/swim/swim_proto.c       |  72 ++++++++++---
 src/lib/swim/swim_proto.h       |  82 +++++++++++++--
 test/unit/swim.c                | 108 ++++++++++++++------
 test/unit/swim.result           |  27 +++--
 test/unit/swim_test_transport.c |  27 +++--
 test/unit/swim_test_transport.h |   9 +-
 test/unit/swim_test_utils.c     |  59 ++++++++++-
 test/unit/swim_test_utils.h     |  22 +++-
 10 files changed, 501 insertions(+), 83 deletions(-)

-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH 1/5] swim: encapsulate member bin info into a 'passport'
  2019-04-05 11:57 [tarantool-patches] [PATCH 0/5] swim dissemination component Vladislav Shpilevoy
@ 2019-04-05 11:57 ` Vladislav Shpilevoy
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 2/5] swim: make members array decoder be a separate function Vladislav Shpilevoy
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-05 11:57 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Each member stored in components dissemination and anti-entropy
should carry a unique identifier, a status, and an address. Those
attributes are UUID, IP, Port, enum swim_member_status,
incarnation.

Now they are sent only in scope of anti-entropy, but forthcoming
dissemination component also would like to use these attributes
for each event.

This commit makes the vital attributes and their code more
reusable by encapsulation of them into a binary passport
structure.

Part of #3234
---
 src/lib/swim/swim_proto.c | 46 ++++++++++++++++++++++++++-------------
 src/lib/swim/swim_proto.h | 26 +++++++++++++++++-----
 2 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/src/lib/swim/swim_proto.c b/src/lib/swim/swim_proto.c
index 4e607e796..416e1d99e 100644
--- a/src/lib/swim/swim_proto.c
+++ b/src/lib/swim/swim_proto.c
@@ -314,32 +314,48 @@ swim_anti_entropy_header_bin_create(struct swim_anti_entropy_header_bin *header,
 	header->v_anti_entropy = mp_bswap_u16(batch_size);
 }
 
+static inline void
+swim_passport_bin_create(struct swim_passport_bin *passport)
+{
+	passport->k_status = SWIM_MEMBER_STATUS;
+	passport->k_addr = SWIM_MEMBER_ADDRESS;
+	passport->m_addr = 0xce;
+	passport->k_port = SWIM_MEMBER_PORT;
+	passport->m_port = 0xcd;
+	passport->k_uuid = SWIM_MEMBER_UUID;
+	passport->m_uuid = 0xc4;
+	passport->m_uuid_len = UUID_LEN;
+	passport->k_incarnation = SWIM_MEMBER_INCARNATION;
+	passport->m_incarnation = 0xcf;
+}
+
+static inline void
+swim_passport_bin_fill(struct swim_passport_bin *passport,
+		       const struct sockaddr_in *addr,
+		       const struct tt_uuid *uuid,
+		       enum swim_member_status status, uint64_t incarnation)
+{
+	passport->v_status = status;
+	passport->v_addr = mp_bswap_u32(ntohl(addr->sin_addr.s_addr));
+	passport->v_port = mp_bswap_u16(ntohs(addr->sin_port));
+	memcpy(passport->v_uuid, uuid, UUID_LEN);
+	passport->v_incarnation = mp_bswap_u64(incarnation);
+}
+
 void
 swim_member_bin_fill(struct swim_member_bin *header,
 		     const struct sockaddr_in *addr, const struct tt_uuid *uuid,
 		     enum swim_member_status status, uint64_t incarnation)
 {
-	header->v_status = status;
-	header->v_addr = mp_bswap_u32(ntohl(addr->sin_addr.s_addr));
-	header->v_port = mp_bswap_u16(ntohs(addr->sin_port));
-	memcpy(header->v_uuid, uuid, UUID_LEN);
-	header->v_incarnation = mp_bswap_u64(incarnation);
+	swim_passport_bin_fill(&header->passport, addr, uuid, status,
+			       incarnation);
 }
 
 void
 swim_member_bin_create(struct swim_member_bin *header)
 {
 	header->m_header = 0x85;
-	header->k_status = SWIM_MEMBER_STATUS;
-	header->k_addr = SWIM_MEMBER_ADDRESS;
-	header->m_addr = 0xce;
-	header->k_port = SWIM_MEMBER_PORT;
-	header->m_port = 0xcd;
-	header->k_uuid = SWIM_MEMBER_UUID;
-	header->m_uuid = 0xc4;
-	header->m_uuid_len = UUID_LEN;
-	header->k_incarnation = SWIM_MEMBER_INCARNATION;
-	header->m_incarnation = 0xcf;
+	swim_passport_bin_create(&header->passport);
 }
 
 void
diff --git a/src/lib/swim/swim_proto.h b/src/lib/swim/swim_proto.h
index 07905df31..0e73f37fb 100644
--- a/src/lib/swim/swim_proto.h
+++ b/src/lib/swim/swim_proto.h
@@ -239,13 +239,16 @@ swim_anti_entropy_header_bin_create(struct swim_anti_entropy_header_bin *header,
 				    uint16_t batch_size);
 
 /**
- * SWIM member MessagePack template. Represents one record in
- * anti-entropy section.
+ * The structure represents a passport of a member. It consists of
+ * some vital necessary member attributes, allowing to detect its
+ * state, exact address. The whole passport is necessary for each
+ * info related to a member: for anti-entropy records, for
+ * dissemination events. The components can inherit that structure
+ * and add more attributes. For example, anti-entropy can add a
+ * mandatory payload; dissemination adds optional old UUID and
+ * payload.
  */
-struct PACKED swim_member_bin {
-	/** mp_encode_map(5) */
-	uint8_t m_header;
-
+struct PACKED swim_passport_bin {
 	/** mp_encode_uint(SWIM_MEMBER_STATUS) */
 	uint8_t k_status;
 	/** mp_encode_uint(enum member_status) */
@@ -277,6 +280,17 @@ struct PACKED swim_member_bin {
 	uint64_t v_incarnation;
 };
 
+/**
+ * SWIM member MessagePack template. Represents one record in
+ * anti-entropy section.
+ */
+struct PACKED swim_member_bin {
+	/** mp_encode_map(5) */
+	uint8_t m_header;
+	/** Basic member info like status, address. */
+	struct swim_passport_bin passport;
+};
+
 /** Initialize antri-entropy record. */
 void
 swim_member_bin_create(struct swim_member_bin *header);
-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH 2/5] swim: make members array decoder be a separate function
  2019-04-05 11:57 [tarantool-patches] [PATCH 0/5] swim dissemination component Vladislav Shpilevoy
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 1/5] swim: encapsulate member bin info into a 'passport' Vladislav Shpilevoy
@ 2019-04-05 11:57 ` Vladislav Shpilevoy
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 3/5] test: speed up swim big cluster failure detection Vladislav Shpilevoy
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-05 11:57 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

At this moment SWIM protocol stores array of members only in one
place: inside the anti-entropy component. Its decoding is a
simple loop taking the member definitions one by one and
upserting them into the member table.

But the dissemination also has something kinda like members
array: an array of events. The trick is that an event is
basically the same as a member +/- a couple of optional fields.
Events are also decoded into the member definition structure. It
means that anti-entropy decoder can be easily reused.

Part of #3234
---
 src/lib/swim/swim.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c
index f65fb60a3..a30a83886 100644
--- a/src/lib/swim/swim.c
+++ b/src/lib/swim/swim.c
@@ -1019,12 +1019,14 @@ skip:
 	return 0;
 }
 
-/** Decode an anti-entropy message, update member table. */
+/**
+ * Decode a bunch of members encoded as a MessagePack array. Each
+ * correctly decoded member is upserted into the member table.
+ */
 static int
-swim_process_anti_entropy(struct swim *swim, const char **pos, const char *end)
+swim_process_members(struct swim *swim, const char *prefix,
+		     const char **pos, const char *end)
 {
-	say_verbose("SWIM %d: process anti-entropy", swim_fd(swim));
-	const char *prefix = "invalid anti-entropy message:";
 	uint32_t size;
 	if (swim_decode_array(pos, end, &size, prefix, "root") != 0)
 		return -1;
@@ -1044,6 +1046,15 @@ swim_process_anti_entropy(struct swim *swim, const char **pos, const char *end)
 	return 0;
 }
 
+/** Decode an anti-entropy message, update member table. */
+static int
+swim_process_anti_entropy(struct swim *swim, const char **pos, const char *end)
+{
+	say_verbose("SWIM %d: process anti-entropy", swim_fd(swim));
+	const char *prefix = "invalid anti-entropy message:";
+	return swim_process_members(swim, prefix, pos, end);
+}
+
 /**
  * Decode a failure detection message. Schedule acks, process
  * acks.
-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH 3/5] test: speed up swim big cluster failure detection
  2019-04-05 11:57 [tarantool-patches] [PATCH 0/5] swim dissemination component Vladislav Shpilevoy
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 1/5] swim: encapsulate member bin info into a 'passport' Vladislav Shpilevoy
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 2/5] swim: make members array decoder be a separate function Vladislav Shpilevoy
@ 2019-04-05 11:57 ` Vladislav Shpilevoy
  2019-04-09  8:43   ` [tarantool-patches] " Konstantin Osipov
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 4/5] test: set packet drop rate instead of flag in swim tests Vladislav Shpilevoy
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-05 11:57 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

The test checks that if a member has failed in a big cluster, it
is eventually deleted from all instances. But it takes too much
real time despite usage of virtual time.

This is because member total deletion takes
O(N + ack_timeout * 5) time. N so as to wait until every member
pinged the failed one at least once, + 3 * ack_timeout to learn
that it is dead, and + 2 * ack_timeout to drop it. Of course, it
is an upper border, and usually it is faster but not much. For
example, on the cluster of size 50 it takes easily 55 virtual
seconds.

On the contrary, to just learn that a member is dead on every
instance takes O(log(N)) according to the SWIM paper. On the
same test with 50 instances cluster it takes ~15 virtual seconds
to disseminate 'dead' status of the failed member on every
instance. And even without dissemination component, with
anti-entropy only.

Leaping ahead, for the subsequent patches it is tested that with
the dissemination component it takes already ~6 virtual seconds.

In the summary, without losing test coverage it is much faster to
turn off SWIM GC and wait until the failed member looks dead on
all instances.

Part of #3234
---
 test/unit/swim.c            | 52 ++++++++++++++++++++-------------
 test/unit/swim.result       |  5 ++--
 test/unit/swim_test_utils.c | 57 +++++++++++++++++++++++++++++++++++++
 test/unit/swim_test_utils.h | 20 +++++++++++++
 4 files changed, 112 insertions(+), 22 deletions(-)

diff --git a/test/unit/swim.c b/test/unit/swim.c
index 860d3211e..d77225f6c 100644
--- a/test/unit/swim.c
+++ b/test/unit/swim.c
@@ -374,33 +374,45 @@ swim_test_refute(void)
 static void
 swim_test_too_big_packet(void)
 {
-	swim_start_test(2);
+	swim_start_test(3);
 	int size = 50;
+	double ack_timeout = 1;
+	double first_dead_timeout = 20;
+	double everywhere_dead_timeout = size * 3;
+	int drop_id = size / 2;
+
 	struct swim_cluster *cluster = swim_cluster_new(size);
 	for (int i = 1; i < size; ++i)
 		swim_cluster_add_link(cluster, 0, i);
-	is(swim_cluster_wait_fullmesh(cluster, size), 0, "despite S1 can not "\
-	   "send all the %d members in a one packet, fullmesh is eventually "\
-	   "reached", size);
-	swim_cluster_set_ack_timeout(cluster, 1);
-	int drop_id = size / 2;
+
+	is(swim_cluster_wait_fullmesh(cluster, size * 2), 0, "despite S1 can "\
+	   "not send all the %d members in a one packet, fullmesh is "\
+	   "eventually reached", size);
+
+	swim_cluster_set_ack_timeout(cluster, ack_timeout);
 	swim_cluster_set_drop(cluster, drop_id, true);
+	is(swim_cluster_wait_status_anywhere(cluster, drop_id, MEMBER_DEAD,
+					     first_dead_timeout), 0,
+	   "a dead member is detected in time not depending on cluster size");
 	/*
-	 * Dissemination of a detected failure takes long time
-	 * without help of the component, intended for that.
+	 * GC is off to simplify and speed up checks. When no GC
+	 * the test is sure that it is safe to check for
+	 * MEMBER_DEAD everywhere, because it is impossible that a
+	 * member is considered dead in one place, but already
+	 * deleted on another. Also, total member deletion takes
+	 * linear time, because a member is deleted from an
+	 * instance only when *that* instance will not receive
+	 * some direct acks from the member. Deletion and
+	 * additional pings are not triggered if a member dead
+	 * status is received indirectly via dissemination or
+	 * anti-entropy. Otherwise it could produce linear network
+	 * load on the already weak member.
 	 */
-	double timeout = size * 10;
-	int i = 0;
-	for (; i < size; ++i) {
-		double start = swim_time();
-		if (i != drop_id &&
-		   swim_cluster_wait_status(cluster, i, drop_id,
-					    swim_member_status_MAX, timeout) != 0)
-			break;
-		timeout -= swim_time() - start;
-	}
-	is(i, size, "S%d drops all the packets - it should become dead",
-	   drop_id + 1);
+	swim_cluster_set_gc(cluster, SWIM_GC_OFF);
+	is(swim_cluster_wait_status_everywhere(cluster, drop_id, MEMBER_DEAD,
+					       everywhere_dead_timeout), 0,
+	   "S%d death is eventually learned by everyone", drop_id + 1);
+
 	swim_cluster_delete(cluster);
 	swim_finish_test();
 }
diff --git a/test/unit/swim.result b/test/unit/swim.result
index 904f061f6..3393870c2 100644
--- a/test/unit/swim.result
+++ b/test/unit/swim.result
@@ -94,9 +94,10 @@ ok 8 - subtests
 ok 9 - subtests
 	*** swim_test_basic_gossip: done ***
 	*** swim_test_too_big_packet ***
-    1..2
+    1..3
     ok 1 - despite S1 can not send all the 50 members in a one packet, fullmesh is eventually reached
-    ok 2 - S26 drops all the packets - it should become dead
+    ok 2 - a dead member is detected in time not depending on cluster size
+    ok 3 - S26 death is eventually learned by everyone
 ok 10 - subtests
 	*** swim_test_too_big_packet: done ***
 	*** swim_test_undead ***
diff --git a/test/unit/swim_test_utils.c b/test/unit/swim_test_utils.c
index bb413372c..277a73498 100644
--- a/test/unit/swim_test_utils.c
+++ b/test/unit/swim_test_utils.c
@@ -361,6 +361,39 @@ swim_loop_check_member(struct swim_cluster *cluster, void *data)
 	return true;
 }
 
+/**
+ * Callback to check that a member matches a template on any
+ * instance in the cluster.
+ */
+static bool
+swim_loop_check_member_anywhere(struct swim_cluster *cluster, void *data)
+{
+	struct swim_member_template *t = (struct swim_member_template *) data;
+	for (t->node_id = 0; t->node_id < cluster->size; ++t->node_id) {
+		if (t->node_id != t->member_id &&
+		    swim_loop_check_member(cluster, data))
+			return true;
+	}
+	return false;
+}
+
+/**
+ * Callback to check that a member matches a template on every
+ * instance in the cluster.
+ */
+static bool
+swim_loop_check_member_everywhere(struct swim_cluster *cluster, void *data)
+{
+	struct swim_member_template *t = (struct swim_member_template *) data;
+	for (t->node_id = 0; t->node_id < cluster->size; ++t->node_id) {
+		if (t->node_id != t->member_id &&
+		    !swim_loop_check_member(cluster, data))
+			return false;
+	}
+	return true;
+}
+
+
 int
 swim_cluster_wait_status(struct swim_cluster *cluster, int node_id,
 			 int member_id, enum swim_member_status status,
@@ -383,6 +416,30 @@ swim_cluster_wait_incarnation(struct swim_cluster *cluster, int node_id,
 	return swim_wait_timeout(timeout, cluster, swim_loop_check_member, &t);
 }
 
+int
+swim_cluster_wait_status_anywhere(struct swim_cluster *cluster, int member_id,
+				  enum swim_member_status status,
+				  double timeout)
+{
+	struct swim_member_template t;
+	swim_member_template_create(&t, -1, member_id);
+	swim_member_template_set_status(&t, status);
+	return swim_wait_timeout(timeout, cluster,
+				 swim_loop_check_member_anywhere, &t);
+}
+
+int
+swim_cluster_wait_status_everywhere(struct swim_cluster *cluster, int member_id,
+				    enum swim_member_status status,
+				    double timeout)
+{
+	struct swim_member_template t;
+	swim_member_template_create(&t, -1, member_id);
+	swim_member_template_set_status(&t, status);
+	return swim_wait_timeout(timeout, cluster,
+				 swim_loop_check_member_everywhere, &t);
+}
+
 bool
 swim_error_check_match(const char *msg)
 {
diff --git a/test/unit/swim_test_utils.h b/test/unit/swim_test_utils.h
index d2ef00817..6e99b4879 100644
--- a/test/unit/swim_test_utils.h
+++ b/test/unit/swim_test_utils.h
@@ -118,6 +118,26 @@ swim_cluster_wait_status(struct swim_cluster *cluster, int node_id,
 			 int member_id, enum swim_member_status status,
 			 double timeout);
 
+/**
+ * Wait until a member with id @a member_id is seen with @a status
+ * in the membership table of any instance in @a cluster. At most
+ * @a timeout seconds.
+ */
+int
+swim_cluster_wait_status_anywhere(struct swim_cluster *cluster, int member_id,
+				  enum swim_member_status status,
+				  double timeout);
+
+/**
+ * Wait until a member with id @a member_id is seen with @a status
+ * in the membership table of every instance in @a cluster. At
+ * most @a timeout seconds.
+ */
+int
+swim_cluster_wait_status_everywhere(struct swim_cluster *cluster, int member_id,
+				    enum swim_member_status status,
+				    double timeout);
+
 /**
  * Wait until a member with id @a member_id is seen with @a
  * incarnation in the membership table of a member with id @a
-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH 4/5] test: set packet drop rate instead of flag in swim tests
  2019-04-05 11:57 [tarantool-patches] [PATCH 0/5] swim dissemination component Vladislav Shpilevoy
                   ` (2 preceding siblings ...)
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 3/5] test: speed up swim big cluster failure detection Vladislav Shpilevoy
@ 2019-04-05 11:57 ` Vladislav Shpilevoy
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 5/5] swim: introduce dissemination component Vladislav Shpilevoy
  2019-04-09 12:25 ` [tarantool-patches] Re: [PATCH 0/5] swim " Vladislav Shpilevoy
  5 siblings, 0 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-05 11:57 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Before dissemination component it was enough in the tests to
either drop all packets to/from a certain member, or do not drop
at all. But after dissemination it will be time to test more
granulated packet loss table: not 0/100, but 5/10/20/50/.../100
packet loss rate.

Part of #3234
---
 test/unit/swim.c                | 10 +++++-----
 test/unit/swim_test_transport.c | 27 +++++++++++++++++++--------
 test/unit/swim_test_transport.h |  9 +++++----
 test/unit/swim_test_utils.c     |  2 +-
 test/unit/swim_test_utils.h     |  2 +-
 5 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/test/unit/swim.c b/test/unit/swim.c
index d77225f6c..358230e15 100644
--- a/test/unit/swim.c
+++ b/test/unit/swim.c
@@ -293,7 +293,7 @@ swim_test_basic_gossip(void)
 	 */
 	swim_cluster_add_link(cluster, 0, 1);
 	swim_cluster_add_link(cluster, 1, 0);
-	swim_cluster_set_drop(cluster, 1, true);
+	swim_cluster_set_drop(cluster, 1, 100);
 	/*
 	 * Wait two no-ACKs on S1 from S2. +1 sec to send a first
 	 * ping.
@@ -353,9 +353,9 @@ swim_test_refute(void)
 	swim_cluster_set_ack_timeout(cluster, 2);
 
 	swim_cluster_add_link(cluster, 0, 1);
-	swim_cluster_set_drop(cluster, 1, true);
+	swim_cluster_set_drop(cluster, 1, 100);
 	fail_if(swim_cluster_wait_status(cluster, 0, 1, MEMBER_DEAD, 7) != 0);
-	swim_cluster_set_drop(cluster, 1, false);
+	swim_cluster_set_drop(cluster, 1, 0);
 	is(swim_cluster_wait_incarnation(cluster, 1, 1, 1, 1), 0,
 	   "S2 increments its own incarnation to refute its death");
 	is(swim_cluster_wait_incarnation(cluster, 0, 1, 1, 1), 0,
@@ -390,7 +390,7 @@ swim_test_too_big_packet(void)
 	   "eventually reached", size);
 
 	swim_cluster_set_ack_timeout(cluster, ack_timeout);
-	swim_cluster_set_drop(cluster, drop_id, true);
+	swim_cluster_set_drop(cluster, drop_id, 100);
 	is(swim_cluster_wait_status_anywhere(cluster, drop_id, MEMBER_DEAD,
 					     first_dead_timeout), 0,
 	   "a dead member is detected in time not depending on cluster size");
@@ -426,7 +426,7 @@ swim_test_undead(void)
 	swim_cluster_set_ack_timeout(cluster, 1);
 	swim_cluster_add_link(cluster, 0, 1);
 	swim_cluster_add_link(cluster, 1, 0);
-	swim_cluster_set_drop(cluster, 1, true);
+	swim_cluster_set_drop(cluster, 1, 100);
 	is(swim_cluster_wait_status(cluster, 0, 1, MEMBER_DEAD, 4), 0,
 	   "member S2 is dead");
 	swim_run_for(5);
diff --git a/test/unit/swim_test_transport.c b/test/unit/swim_test_transport.c
index 3b59eaf4d..e563185ee 100644
--- a/test/unit/swim_test_transport.c
+++ b/test/unit/swim_test_transport.c
@@ -97,10 +97,10 @@ struct swim_fd {
 	 */
 	bool is_opened;
 	/**
-	 * True if any message sent to that fd should be just
-	 * dropped, not queued.
+	 * Probability of packet loss. For both sends and
+	 * receipts.
 	 */
-	bool is_dropping;
+	double drop_rate;
 	/**
 	 * Link in the list of opened and non-blocked descriptors.
 	 * Used to feed them all EV_WRITE.
@@ -130,7 +130,7 @@ swim_fd_open(struct swim_fd *fd)
 		return -1;
 	}
 	fd->is_opened = true;
-	fd->is_dropping = false;
+	fd->drop_rate = 0;
 	rlist_add_tail_entry(&swim_fd_active, fd, in_active);
 	return 0;
 }
@@ -156,7 +156,7 @@ swim_test_transport_init(void)
 	for (int i = 0, evfd = FAKE_FD_BASE; i < FAKE_FD_NUMBER; ++i, ++evfd) {
 		swim_fd[i].evfd = evfd;
 		swim_fd[i].is_opened = false;
-		swim_fd[i].is_dropping = false;
+		swim_fd[i].drop_rate = 0;
 		rlist_create(&swim_fd[i].in_active);
 		rlist_create(&swim_fd[i].recv_queue);
 		rlist_create(&swim_fd[i].send_queue);
@@ -268,11 +268,21 @@ swim_test_transport_unblock_fd(int fd)
 }
 
 void
-swim_test_transport_set_drop(int fd, bool value)
+swim_test_transport_set_drop(int fd, double value)
 {
 	struct swim_fd *sfd = &swim_fd[fd - FAKE_FD_BASE];
 	if (sfd->is_opened)
-		sfd->is_dropping = value;
+		sfd->drop_rate = value;
+}
+
+/**
+ * Returns true with probability @a rate, and is used to decided
+ * wether to drop a packet or not.
+ */
+static inline bool
+swim_test_is_drop(double rate)
+{
+	return ((double) rand() / RAND_MAX) * 100 < rate;
 }
 
 /** Send one packet to destination's recv queue. */
@@ -285,7 +295,8 @@ swim_fd_send_packet(struct swim_fd *fd)
 		rlist_shift_entry(&fd->send_queue, struct swim_test_packet,
 				  in_queue);
 	struct swim_fd *dst = &swim_fd[ntohs(p->dst.sin_port)];
-	if (dst->is_opened && ! dst->is_dropping && ! fd->is_dropping)
+	if (dst->is_opened && !swim_test_is_drop(dst->drop_rate) &&
+	    !swim_test_is_drop(fd->drop_rate))
 		rlist_add_tail_entry(&dst->recv_queue, p, in_queue);
 	else
 		swim_test_packet_delete(p);
diff --git a/test/unit/swim_test_transport.h b/test/unit/swim_test_transport.h
index 5a1a92271..d291abe91 100644
--- a/test/unit/swim_test_transport.h
+++ b/test/unit/swim_test_transport.h
@@ -60,12 +60,13 @@ void
 swim_test_transport_unblock_fd(int fd);
 
 /**
- * Set to true, if all incomming and outgoing packets should be
- * dropped. Note, that the node, owning @a fd, thinks, that its
- * packets are sent.
+ * Drop rate of incomming and outgoing packets. Note, that even if
+ * a packet is dropped on send, the node, owning @a fd, still
+ * thinks, that the packet is sent. It is not a sender-visible
+ * error.
  */
 void
-swim_test_transport_set_drop(int fd, bool value);
+swim_test_transport_set_drop(int fd, double value);
 
 /** Initialize test transport system. */
 void
diff --git a/test/unit/swim_test_utils.c b/test/unit/swim_test_utils.c
index 277a73498..bada8ef43 100644
--- a/test/unit/swim_test_utils.c
+++ b/test/unit/swim_test_utils.c
@@ -177,7 +177,7 @@ swim_cluster_unblock_io(struct swim_cluster *cluster, int i)
 }
 
 void
-swim_cluster_set_drop(struct swim_cluster *cluster, int i, bool value)
+swim_cluster_set_drop(struct swim_cluster *cluster, int i, double value)
 {
 	swim_test_transport_set_drop(swim_fd(cluster->node[i]), value);
 }
diff --git a/test/unit/swim_test_utils.h b/test/unit/swim_test_utils.h
index 6e99b4879..f4397aff4 100644
--- a/test/unit/swim_test_utils.h
+++ b/test/unit/swim_test_utils.h
@@ -80,7 +80,7 @@ void
 swim_cluster_unblock_io(struct swim_cluster *cluster, int i);
 
 void
-swim_cluster_set_drop(struct swim_cluster *cluster, int i, bool value);
+swim_cluster_set_drop(struct swim_cluster *cluster, int i, double value);
 
 /**
  * Explicitly add a member of id @a from_id to a member of id
-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] [PATCH 5/5] swim: introduce dissemination component
  2019-04-05 11:57 [tarantool-patches] [PATCH 0/5] swim dissemination component Vladislav Shpilevoy
                   ` (3 preceding siblings ...)
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 4/5] test: set packet drop rate instead of flag in swim tests Vladislav Shpilevoy
@ 2019-04-05 11:57 ` Vladislav Shpilevoy
  2019-04-08 20:13   ` [tarantool-patches] " Vladislav Shpilevoy
  2019-04-09 12:25 ` [tarantool-patches] Re: [PATCH 0/5] swim " Vladislav Shpilevoy
  5 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-05 11:57 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Dissemination components broadcasts events about member status
updates. When any member attribute is updated (incarnation,
status, UUID, address), the member stands into an event queue.
Members from the queue are encoded into each round step message
with a higher priority and before anti-entropy section.

It means, then even if a cluster consists of hundreds of members
and one of them was updated on one of instances, this update will
be disseminated regardless of whether this memeber is encoded
into anti-entropy section or not. It drastically speeds events
dissemination up, according to the SWIM paper, and is noticed in
the tests.

Part of #3234
---
 src/lib/swim/swim.c       | 155 +++++++++++++++++++++++++++++++++++++-
 src/lib/swim/swim.h       |   4 +
 src/lib/swim/swim_proto.c |  26 +++++++
 src/lib/swim/swim_proto.h |  56 ++++++++++++++
 test/unit/swim.c          |  48 ++++++++++--
 test/unit/swim.result     |  22 ++++--
 6 files changed, 293 insertions(+), 18 deletions(-)

diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c
index a30a83886..8eac102c8 100644
--- a/src/lib/swim/swim.c
+++ b/src/lib/swim/swim.c
@@ -269,6 +269,27 @@ struct swim_member {
 	 * message to it.
 	 */
 	struct heap_node in_wait_ack_heap;
+	/**
+	 *
+	 *                 Dissemination component
+	 *
+	 * Dissemination component sends events. Event is a
+	 * notification about member status update. So formally,
+	 * this structure already has all the needed attributes.
+	 * But also an event somehow should be sent to all members
+	 * at least once according to SWIM, so it requires
+	 * something like TTL for each type of event, which gets
+	 * decremented on each send. And a member can not be
+	 * removed from the global table until it gets dead and
+	 * its status TTLs is 0, so as to allow other members
+	 * learn its dead status.
+	 */
+	int status_ttl;
+	/**
+	 * Events are put into a queue sorted by event occurrence
+	 * time.
+	 */
+	struct rlist in_events_queue;
 };
 
 #define mh_name _swim_table
@@ -364,6 +385,12 @@ struct swim {
 	struct ev_timer wait_ack_tick;
 	/** GC state saying how to remove dead members. */
 	enum swim_gc_mode gc_mode;
+	/**
+	 *
+	 *                 Dissemination component
+	 */
+	/** Queue of events sorted by occurrence time. */
+	struct rlist events_queue;
 };
 
 /** Put the member into a list of ACK waiters. */
@@ -377,6 +404,25 @@ swim_wait_ack(struct swim *swim, struct swim_member *member)
 	}
 }
 
+/**
+ * On literally any update of a member it stands into a queue of
+ * events to disseminate the update. Note that status TTL is
+ * always set, even if UUID is updated, or any other attribute. It
+ * is because 1) it simplifies the code when status TTL is bigger
+ * than all other ones, 2) status occupies only 2 bytes in a
+ * packet, so it is never worse to send it on any update, but
+ * reduces entropy.
+ */
+static inline void
+swim_register_event(struct swim *swim, struct swim_member *member)
+{
+	if (rlist_empty(&member->in_events_queue)) {
+		rlist_add_tail_entry(&swim->events_queue, member,
+				     in_events_queue);
+	}
+	member->status_ttl = mh_size(swim->members);
+}
+
 /**
  * Make all needed actions to process a member's update like a
  * change of its status, or incarnation, or both.
@@ -384,8 +430,8 @@ swim_wait_ack(struct swim *swim, struct swim_member *member)
 static void
 swim_on_member_update(struct swim *swim, struct swim_member *member)
 {
-	(void) swim;
 	member->unacknowledged_pings = 0;
+	swim_register_event(swim, member);
 }
 
 /**
@@ -468,6 +514,9 @@ swim_member_delete(struct swim_member *member)
 	swim_task_destroy(&member->ack_task);
 	swim_task_destroy(&member->ping_task);
 
+	/* Dissemination component. */
+	assert(rlist_empty(&member->in_events_queue));
+
 	free(member);
 }
 
@@ -509,6 +558,10 @@ swim_member_new(const struct sockaddr_in *addr, const struct tt_uuid *uuid,
 	swim_task_create(&member->ack_task, NULL, NULL, "ack");
 	swim_task_create(&member->ping_task, swim_ping_task_complete, NULL,
 			 "ping");
+
+	/* Dissemination component. */
+	rlist_create(&member->in_events_queue);
+
 	return member;
 }
 
@@ -531,6 +584,9 @@ swim_delete_member(struct swim *swim, struct swim_member *member)
 	if (! heap_node_is_stray(&member->in_wait_ack_heap))
 		wait_ack_heap_delete(&swim->wait_ack_heap, member);
 
+	/* Dissemination component. */
+	rlist_del_entry(member, in_events_queue);
+
 	swim_member_delete(member);
 }
 
@@ -590,6 +646,10 @@ swim_new_member(struct swim *swim, const struct sockaddr_in *addr,
 	}
 	if (mh_size(swim->members) > 1)
 		swim_ev_timer_start(loop(), &swim->round_tick);
+
+	/* Dissemination component. */
+	swim_on_member_update(swim, member);
+
 	say_verbose("SWIM %d: member %s is added, total is %d", swim_fd(swim),
 		    swim_uuid_str(&member->uuid), mh_size(swim->members));
 	return member;
@@ -727,6 +787,42 @@ swim_encode_failure_detection(struct swim *swim, struct swim_packet *packet,
 	return 1;
 }
 
+/**
+ * Encode dissemination component.
+ * @retval Number of key-values added to the packet's root map.
+ */
+static int
+swim_encode_dissemination(struct swim *swim, struct swim_packet *packet)
+{
+	struct swim_diss_header_bin diss_header_bin;
+	int size = sizeof(diss_header_bin);
+	char *header = swim_packet_reserve(packet, size);
+	if (header == NULL)
+		return 0;
+	int i = 0;
+	char *pos = header + size;
+	struct swim_member *m;
+	struct swim_event_bin event_bin;
+	swim_event_bin_create(&event_bin);
+	rlist_foreach_entry(m, &swim->events_queue, in_events_queue) {
+		int new_size = size + sizeof(event_bin);
+		if (swim_packet_reserve(packet, new_size) == NULL)
+			break;
+		swim_event_bin_fill(&event_bin, m->status, &m->addr, &m->uuid,
+				    m->incarnation);
+		memcpy(pos, &event_bin, sizeof(event_bin));
+		pos += sizeof(event_bin);
+		size = new_size;
+		++i;
+	}
+	if (i == 0)
+		return 0;
+	swim_diss_header_bin_create(&diss_header_bin, i);
+	memcpy(header, &diss_header_bin, sizeof(diss_header_bin));
+	swim_packet_advance(packet, size);
+	return 1;
+}
+
 /** Encode SWIM components into a UDP packet. */
 static void
 swim_encode_round_msg(struct swim *swim, struct swim_packet *packet)
@@ -737,12 +833,34 @@ swim_encode_round_msg(struct swim *swim, struct swim_packet *packet)
 	map_size += swim_encode_src_uuid(swim, packet);
 	map_size += swim_encode_failure_detection(swim, packet,
 						  SWIM_FD_MSG_PING);
+	map_size += swim_encode_dissemination(swim, packet);
 	map_size += swim_encode_anti_entropy(swim, packet);
 
 	assert(mp_sizeof_map(map_size) == 1 && map_size >= 2);
 	mp_encode_map(header, map_size);
 }
 
+/**
+ * Decrement TTLs of all events. It is done after each round step.
+ * Note, that when there are too many events to fit into a packet,
+ * the tail of events list without being disseminated start
+ * reeking and rotting, and the most far events can be deleted
+ * without ever being sent. But hardly this situation is reachable
+ * since even 1000 bytes can fit 37 events of ~27 bytes each, that
+ * means in fact a failure of 37 instances. In such a case event
+ * taint is the most mild problem.
+ */
+static void
+swim_decrease_events_ttl(struct swim *swim)
+{
+	struct swim_member *member, *tmp;
+	rlist_foreach_entry_safe(member, &swim->events_queue, in_events_queue,
+				 tmp) {
+		if (--member->status_ttl == 0)
+			rlist_del_entry(member, in_events_queue);
+	}
+}
+
 /**
  * Once per specified timeout trigger a next round step. In round
  * step a next memeber is taken from the round queue and a round
@@ -799,10 +917,12 @@ swim_complete_step(struct swim_task *task,
 		rlist_shift(&swim->round_queue);
 		if (rc > 0) {
 			/*
-			 * Each round message contains failure
-			 * detection section with a ping.
+			 * Each round message contains
+			 * dissemination and failure detection
+			 * sections.
 			 */
 			swim_wait_ack(swim, m);
+			swim_decrease_events_ttl(swim);
 		}
 	}
 }
@@ -872,7 +992,7 @@ swim_check_acks(struct ev_loop *loop, struct ev_timer *t, int events)
 			break;
 		case MEMBER_DEAD:
 			if (m->unacknowledged_pings >= NO_ACKS_TO_GC &&
-			    swim->gc_mode == SWIM_GC_ON) {
+			    swim->gc_mode == SWIM_GC_ON && m->status_ttl == 0) {
 				swim_delete_member(swim, m);
 				continue;
 			}
@@ -1121,6 +1241,18 @@ swim_process_failure_detection(struct swim *swim, const char **pos,
 	return 0;
 }
 
+/**
+ * Decode a dissemination message. Schedule new events, update
+ * members.
+ */
+static int
+swim_process_dissemination(struct swim *swim, const char **pos, const char *end)
+{
+	say_verbose("SWIM %d: process dissemination", swim_fd(swim));
+	const char *prefix = "invald dissemination message:";
+	return swim_process_members(swim, prefix, pos, end);
+}
+
 /** Process a new message. */
 static void
 swim_on_input(struct swim_scheduler *scheduler, const char *pos,
@@ -1160,6 +1292,10 @@ swim_on_input(struct swim_scheduler *scheduler, const char *pos,
 							   src, &uuid) != 0)
 				goto error;
 			break;
+		case SWIM_DISSEMINATION:
+			if (swim_process_dissemination(swim, &pos, end) != 0)
+				goto error;
+			break;
 		default:
 			diag_set(SwimError, "%s unexpected key", prefix);
 			goto error;
@@ -1199,6 +1335,10 @@ swim_new(void)
 			   ACK_TIMEOUT_DEFAULT, 0);
 	swim->wait_ack_tick.data = (void *) swim;
 	swim->gc_mode = SWIM_GC_ON;
+
+	/* Dissemination component. */
+	rlist_create(&swim->events_queue);
+
 	return swim;
 }
 
@@ -1393,6 +1533,12 @@ swim_info(struct swim *swim, struct info_handler *info)
 	info_end(info);
 }
 
+int
+swim_size(const struct swim *swim)
+{
+	return mh_size(swim->members);
+}
+
 void
 swim_delete(struct swim *swim)
 {
@@ -1407,6 +1553,7 @@ swim_delete(struct swim *swim)
 		rlist_del_entry(m, in_round_queue);
 		if (! heap_node_is_stray(&m->in_wait_ack_heap))
 			wait_ack_heap_delete(&swim->wait_ack_heap, m);
+		rlist_del_entry(m, in_events_queue);
 		swim_member_delete(m);
 	}
 	wait_ack_heap_destroy(&swim->wait_ack_heap);
diff --git a/src/lib/swim/swim.h b/src/lib/swim/swim.h
index 94ddc5dfa..ec924f36f 100644
--- a/src/lib/swim/swim.h
+++ b/src/lib/swim/swim.h
@@ -140,6 +140,10 @@ swim_probe_member(struct swim *swim, const char *uri);
 void
 swim_info(struct swim *swim, struct info_handler *info);
 
+/** Get SWIM member table size. */
+int
+swim_size(const struct swim *swim);
+
 /** Get a SWIM member, describing this instance. */
 const struct swim_member *
 swim_self(struct swim *swim);
diff --git a/src/lib/swim/swim_proto.c b/src/lib/swim/swim_proto.c
index 416e1d99e..6b3197790 100644
--- a/src/lib/swim/swim_proto.c
+++ b/src/lib/swim/swim_proto.c
@@ -358,6 +358,32 @@ swim_member_bin_create(struct swim_member_bin *header)
 	swim_passport_bin_create(&header->passport);
 }
 
+void
+swim_diss_header_bin_create(struct swim_diss_header_bin *header,
+			    uint16_t batch_size)
+{
+	header->k_header = SWIM_DISSEMINATION;
+	header->m_header = 0xdc;
+	header->v_header = mp_bswap_u16(batch_size);
+}
+
+void
+swim_event_bin_create(struct swim_event_bin *header)
+{
+	swim_passport_bin_create(&header->passport);
+}
+
+void
+swim_event_bin_fill(struct swim_event_bin *header,
+		    enum swim_member_status status,
+		    const struct sockaddr_in *addr, const struct tt_uuid *uuid,
+		    uint64_t incarnation)
+{
+	header->m_header = 0x85;
+	swim_passport_bin_fill(&header->passport, addr, uuid, status,
+			       incarnation);
+}
+
 void
 swim_meta_header_bin_create(struct swim_meta_header_bin *header,
 			    const struct sockaddr_in *src)
diff --git a/src/lib/swim/swim_proto.h b/src/lib/swim/swim_proto.h
index 0e73f37fb..826443a3b 100644
--- a/src/lib/swim/swim_proto.h
+++ b/src/lib/swim/swim_proto.h
@@ -61,6 +61,19 @@
  * |                                                             |
  * |               OR/AND                                        |
  * |                                                             |
+ * |     SWIM_DISSEMINATION: [                                   |
+ * |         {                                                   |
+ * |             SWIM_MEMBER_STATUS: uint, enum member_status,   |
+ * |             SWIM_MEMBER_ADDRESS: uint, ip,                  |
+ * |             SWIM_MEMBER_PORT: uint, port,                   |
+ * |             SWIM_MEMBER_UUID: 16 byte UUID,                 |
+ * |             SWIM_MEMBER_INCARNATION: uint                   |
+ * |         },                                                  |
+ * |         ...                                                 |
+ * |     ],                                                      |
+ * |                                                             |
+ * |               OR/AND                                        |
+ * |                                                             |
  * |     SWIM_ANTI_ENTROPY: [                                    |
  * |         {                                                   |
  * |             SWIM_MEMBER_STATUS: uint, enum member_status,   |
@@ -114,6 +127,7 @@ enum swim_body_key {
 	SWIM_SRC_UUID = 0,
 	SWIM_ANTI_ENTROPY,
 	SWIM_FAILURE_DETECTION,
+	SWIM_DISSEMINATION,
 };
 
 /**
@@ -308,6 +322,48 @@ swim_member_bin_fill(struct swim_member_bin *header,
 
 /** }}}                  Anti-entropy component                 */
 
+/** {{{                 Dissemination component                 */
+
+/** SWIM dissemination MessagePack template. */
+struct PACKED swim_diss_header_bin {
+	/** mp_encode_uint(SWIM_DISSEMINATION) */
+	uint8_t k_header;
+	/** mp_encode_array() */
+	uint8_t m_header;
+	uint16_t v_header;
+};
+
+/** Initialize dissemination header. */
+void
+swim_diss_header_bin_create(struct swim_diss_header_bin *header,
+			    uint16_t batch_size);
+
+/** SWIM event MessagePack template. */
+struct PACKED swim_event_bin {
+	/** mp_encode_map(5 or 6) */
+	uint8_t m_header;
+	/** Basic member info like status, address. */
+	struct swim_passport_bin passport;
+};
+
+/** Initialize dissemination record. */
+void
+swim_event_bin_create(struct swim_event_bin *header);
+
+/**
+ * Since usually there are many evnets, it is faster to reset a
+ * few fields in an existing template, then each time create a
+ * new template. So the usage pattern is create(), fill(),
+ * fill() ... .
+ */
+void
+swim_event_bin_fill(struct swim_event_bin *header,
+		    enum swim_member_status status,
+		    const struct sockaddr_in *addr, const struct tt_uuid *uuid,
+		    uint64_t incarnation);
+
+/** }}}                 Dissemination component                 */
+
 /** {{{                     Meta component                      */
 
 /**
diff --git a/test/unit/swim.c b/test/unit/swim.c
index 358230e15..e11f006ae 100644
--- a/test/unit/swim.c
+++ b/test/unit/swim.c
@@ -98,7 +98,7 @@ swim_test_sequence(void)
 static void
 swim_test_uuid_update(void)
 {
-	swim_start_test(4);
+	swim_start_test(5);
 
 	struct swim_cluster *cluster = swim_cluster_new(2);
 	swim_cluster_add_link(cluster, 0, 1);
@@ -109,6 +109,7 @@ swim_test_uuid_update(void)
 	is(swim_cfg(s, NULL, -1, -1, -1, &new_uuid), 0, "UUID update");
 	is(swim_cluster_wait_fullmesh(cluster, 1), 0,
 	   "old UUID is returned back as a 'ghost' member");
+	is(swim_size(s), 3, "two members in each + ghost third member");
 	new_uuid.time_low = 2;
 	is(swim_cfg(s, NULL, -1, -1, -1, &new_uuid), -1,
 	   "can not update to an existing UUID - swim_cfg fails");
@@ -245,11 +246,12 @@ swim_test_basic_failure_detection(void)
 	is(swim_cluster_wait_status(cluster, 0, 1, MEMBER_DEAD, 0.1), 0,
 	   "but it is dead after one more");
 
-	is(swim_cluster_wait_status(cluster, 0, 1, swim_member_status_MAX,
-				    0.9), -1,
-	   "after 1 more unack the member still is not deleted");
-	is(swim_cluster_wait_status(cluster, 0, 1, swim_member_status_MAX,
-				    0.1), 0, "but it is dropped after 1 more");
+	swim_run_for(1);
+	is(swim_cluster_member_status(cluster, 0, 1), MEMBER_DEAD, "after 2 "\
+	   "more unacks the member still is not deleted - dissemination TTL "\
+	   "keeps it");
+	is(swim_cluster_wait_status(cluster, 0, 1, swim_member_status_MAX, 2),
+	   0, "but it is dropped after 2 rounds when TTL gets 0");
 
 	/*
 	 * After IO unblock pending messages will be processed all
@@ -378,7 +380,7 @@ swim_test_too_big_packet(void)
 	int size = 50;
 	double ack_timeout = 1;
 	double first_dead_timeout = 20;
-	double everywhere_dead_timeout = size * 3;
+	double everywhere_dead_timeout = size;
 	int drop_id = size / 2;
 
 	struct swim_cluster *cluster = swim_cluster_new(size);
@@ -417,6 +419,35 @@ swim_test_too_big_packet(void)
 	swim_finish_test();
 }
 
+static void
+swim_test_packet_loss(void)
+{
+	double network_drop_rate[] = {5, 10, 20, 50, 90};
+	swim_start_test(lengthof(network_drop_rate));
+	int size = 20;
+	int drop_id = 0;
+	double ack_timeout = 1;
+
+	for (int i = 0; i < (int) lengthof(network_drop_rate); ++i) {
+		double rate = network_drop_rate[i];
+		struct swim_cluster *cluster = swim_cluster_new(size);
+		for (int j = 0; j < size; ++j) {
+			swim_cluster_set_drop(cluster, j, rate);
+			for (int k = 0; k < size; ++k)
+				swim_cluster_add_link(cluster, j, k);
+		}
+		swim_cluster_set_ack_timeout(cluster, ack_timeout);
+		swim_cluster_set_drop(cluster, drop_id, 100);
+		swim_cluster_set_gc(cluster, SWIM_GC_OFF);
+		double timeout = size * 100.0 / (100 - rate);
+		is(swim_cluster_wait_status_everywhere(cluster, drop_id,
+						       MEMBER_DEAD, 1000), 0,
+		   "drop rate = %.2f, but the failure is disseminated", rate);
+		swim_cluster_delete(cluster);
+	}
+	swim_finish_test();
+}
+
 static void
 swim_test_undead(void)
 {
@@ -439,7 +470,7 @@ swim_test_undead(void)
 static int
 main_f(va_list ap)
 {
-	swim_start_test(11);
+	swim_start_test(12);
 
 	(void) ap;
 	swim_test_ev_init();
@@ -456,6 +487,7 @@ main_f(va_list ap)
 	swim_test_basic_gossip();
 	swim_test_too_big_packet();
 	swim_test_undead();
+	swim_test_packet_loss();
 
 	swim_test_transport_free();
 	swim_test_ev_free();
diff --git a/test/unit/swim.result b/test/unit/swim.result
index 3393870c2..615327e27 100644
--- a/test/unit/swim.result
+++ b/test/unit/swim.result
@@ -1,5 +1,5 @@
 	*** main_f ***
-1..11
+1..12
 	*** swim_test_one_link ***
     1..6
     ok 1 - no rounds - no fullmesh
@@ -16,11 +16,12 @@ ok 1 - subtests
 ok 2 - subtests
 	*** swim_test_sequence: done ***
 	*** swim_test_uuid_update ***
-    1..4
+    1..5
     ok 1 - UUID update
     ok 2 - old UUID is returned back as a 'ghost' member
-    ok 3 - can not update to an existing UUID - swim_cfg fails
-    ok 4 - diag says 'exists'
+    ok 3 - two members in each + ghost third member
+    ok 4 - can not update to an existing UUID - swim_cfg fails
+    ok 5 - diag says 'exists'
 ok 3 - subtests
 	*** swim_test_uuid_update: done ***
 	*** swim_test_cfg ***
@@ -65,8 +66,8 @@ ok 5 - subtests
     ok 1 - node is added as alive
     ok 2 - member still is not dead after 2 noacks
     ok 3 - but it is dead after one more
-    ok 4 - after 1 more unack the member still is not deleted
-    ok 5 - but it is dropped after 1 more
+    ok 4 - after 2 more unacks the member still is not deleted - dissemination TTL keeps it
+    ok 5 - but it is dropped after 2 rounds when TTL gets 0
     ok 6 - fullmesh is restored
     ok 7 - a member is added back on an ACK
 ok 6 - subtests
@@ -106,4 +107,13 @@ ok 10 - subtests
     ok 2 - but it is never deleted due to the cfg option
 ok 11 - subtests
 	*** swim_test_undead: done ***
+	*** swim_test_packet_loss ***
+    1..5
+    ok 1 - drop rate = 5.00, but the failure is disseminated
+    ok 2 - drop rate = 10.00, but the failure is disseminated
+    ok 3 - drop rate = 20.00, but the failure is disseminated
+    ok 4 - drop rate = 50.00, but the failure is disseminated
+    ok 5 - drop rate = 90.00, but the failure is disseminated
+ok 12 - subtests
+	*** swim_test_packet_loss: done ***
 	*** main_f: done ***
-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 5/5] swim: introduce dissemination component
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 5/5] swim: introduce dissemination component Vladislav Shpilevoy
@ 2019-04-08 20:13   ` Vladislav Shpilevoy
  2019-04-09  9:58     ` Konstantin Osipov
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-08 20:13 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Kostja pushed some comment updates. I copy-pasted
them below and fixed or reverted some of them with
an explanation why.

Please, do not forget to answer on this email and others
in the thread with an LGTM or new review fixes/comments.

> commit ce5b71857310b7c692d6d5eb8aae3f2bc2604320
> Author: Konstantin Osipov <kostja@tarantool.org>
> Date:   Mon Apr 8 17:25:13 2019 +0300
> 
>     swim: update comments, use singular in method names.
> 
> diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c
> index 8eac102c8..215960911 100644
> --- a/src/lib/swim/swim.c
> +++ b/src/lib/swim/swim.c
> @@ -274,22 +274,47 @@ struct swim_member {
>  	 *                 Dissemination component
>  	 *
>  	 * Dissemination component sends events. Event is a
> -	 * notification about member status update. So formally,
> -	 * this structure already has all the needed attributes.
> -	 * But also an event somehow should be sent to all members
> -	 * at least once according to SWIM, so it requires
> -	 * something like TTL for each type of event, which gets
> -	 * decremented on each send. And a member can not be
> -	 * removed from the global table until it gets dead and
> -	 * its status TTLs is 0, so as to allow other members
> -	 * learn its dead status.
> +	 * notification about some member state update. We maintain
> +	 * a different event type for each significant member
> +	 * attribute - status, incarnation, etc to not send entire
> +	 * member state each time any member attribute changes.
> +	 * The state of an event is stored within

My update:

==================================================================
 	 * Dissemination component sends events. Event is a
-	 * notification about some member state update. We maintain
-	 * a different event type for each significant member
-	 * attribute - status, incarnation, etc to not send entire
+	 * notification about some member state update. The member
+	 * maintains a different event type for each significant
+	 * attribute - status, incarnation, etc not to send entire
 	 * member state each time any member attribute changes.
-	 * The state of an event is stored within
==================================================================

1) Note, that I deliberately avoid personalization in the comments.
IMO, too extensive usage of 'we', 'us' and other pronouns harms
readability. Sometimes it is not obvious what do you mean by
'we'/'us', and it is better to say explicitly 'instance', 'member'
etc. After all, it is not a chat or face-to-face conversation. It
is a document.

2) In English 'to not do ...' is wrong, it is rather Russian
construction. In English they use 'not to do ...'.

3) I dropped 'stored within' because obviously that sentence
is truncated, and makes no sense.

> +	 *
> +	 * According to SWIM, an event should be sent to
> +	 * all members at least once. It'd be silly to continue
> +	 * sending an old event if a member attribute has changed
> +	 * since the beginning of the round. To this end
> +	 * we also maintain TTL (time-to-live) counter
> +	 * associated with each event type.

My update:

==================================================================
-	 * According to SWIM, an event should be sent to
-	 * all members at least once. It'd be silly to continue
-	 * sending an old event if a member attribute has changed
-	 * since the beginning of the round. To this end
-	 * we also maintain TTL (time-to-live) counter
-	 * associated with each event type.
+	 * According to SWIM, an event should be sent to all
+	 * members at least once - for that a TTL (time-to-live)
+	 * counter is maintained for each independent event type.
==================================================================

I dropped "It'd be silly ..." because I do not see a link to that
sentence. Sending old outdated member attribute value and sending
an attribute only once has nothing to do in common.

> +	 *
> +	 * When a member state changes, the TTL is reset to the
> +	 * cluster size. It is then decremented after each send.
> +	 * This guarantees that each member state change is sent
> +	 * to each SWIM member at least once. If a new event of
> +	 * the same type is generated before a round is finished,
> +	 * we update the current event object and reset the
> +	 * relevant TTL.

My update:

==================================================================
-	 * we update the current event object and reset the
-	 * relevant TTL.
+	 * the current event object is updated in place with reset
+	 * of the TTL.
==================================================================

I dropped 'we'.

> +	 *
> +	 * To conclude, TTL works in two ways: to see which
> +	 * specific member attribute needs dissemination and to
> +	 * track how many cluster members still need to learn
> +	 * about the change from us.

My update:

==================================================================
-	 * about the change from us.
+	 * about the change from this instance.
==================================================================

I dropped 'us'.

> +	 */
> +
> +	/**
> +	 * status_ttl is reset whenever a member status changes.
> +	 * We also reset it whenever any other attribute changes,
> +	 * to be able to easily track whether we need to send
> +	 * any events at all.
> +	 * Besides, a dead member can not be removed from the
> +	 * member table until its status TTL is 0, so as to allow
> +	 * other members learn its dead status.

My update:

==================================================================
 	/**
-	 * status_ttl is reset whenever a member status changes.
-	 * We also reset it whenever any other attribute changes,
-	 * to be able to easily track whether we need to send
-	 * any events at all.
-	 * Besides, a dead member can not be removed from the
-	 * member table until its status TTL is 0, so as to allow
-	 * other members learn its dead status.
+	 * General TTL reset each time when any visible member
+	 * attribute is updated. It is always bigger or equal than
+	 * any other TTLs. In addition it helps to keep a dead
+	 * member not dropped until the TTL gets zero so as to
+	 * allow other members to learn the dead status.
==================================================================

I dropped 'status_ttl' name from all the comments, because it is an
erroneous practice to duplicate compilable code into comments. If
in future status_ttl name is changed, we 100% will not update some
of the comments and will have the same that we deal in SQLite code
with.

Also, I dropped 'we'.

>  	 */
>  	int status_ttl;
>  	/**
> -	 * Events are put into a queue sorted by event occurrence
> -	 * time.
> +	 * All created events are put into a queue sorted by event time.
>  	 */
> -	struct rlist in_events_queue;
> +	struct rlist in_event_queue;
>  };
> @@ -708,7 +739,7 @@ swim_new_round(struct swim *swim)
>  /**
>   * Encode anti-entropy header and random members data as many as
>   * possible to the end of the packet.
> - * @retval Number of key-values added to the packet's root map.
> + * @retval Number of key-value pairs added to the packet's root map.

I've reverted that change, because

 - it has nothing in common with the patch. It is an old anti-entropy
   comment, not a new dissemination code.

 - we have exactly the same @retval comment on other swim_encode_...()
   functions. I think, that they should be consistent.

 - it pollutes git history.

>   */
>  static int
>  swim_encode_anti_entropy(struct swim *swim, struct swim_packet *packet)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 3/5] test: speed up swim big cluster failure detection
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 3/5] test: speed up swim big cluster failure detection Vladislav Shpilevoy
@ 2019-04-09  8:43   ` Konstantin Osipov
  2019-04-09 11:47     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Osipov @ 2019-04-09  8:43 UTC (permalink / raw)
  To: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/04/05 16:12]:

The more I look at the tests the more confusing the test function
names become. 

The namespace of test functions is clearly clashing with the main
swim namespace, which makes the tests hard to read and follow:

it's unclear which function belongs to the tests harness and which
to the swim itself. Please come up with a harness api prefix.

Please feel free to do it in a subsequent patch.

> The test checks that if a member has failed in a big cluster, it
> is eventually deleted from all instances. But it takes too much
> real time despite usage of virtual time.
> 
> This is because member total deletion takes
> O(N + ack_timeout * 5) time. N so as to wait until every member
> pinged the failed one at least once, + 3 * ack_timeout to learn
> that it is dead, and + 2 * ack_timeout to drop it. Of course, it
> is an upper border, and usually it is faster but not much. For
> example, on the cluster of size 50 it takes easily 55 virtual
> seconds.
> 
> On the contrary, to just learn that a member is dead on every
> instance takes O(log(N)) according to the SWIM paper. On the
> same test with 50 instances cluster it takes ~15 virtual seconds
> to disseminate 'dead' status of the failed member on every
> instance. And even without dissemination component, with
> anti-entropy only.
> 
> Leaping ahead, for the subsequent patches it is tested that with
> the dissemination component it takes already ~6 virtual seconds.
> 
> In the summary, without losing test coverage it is much faster to
> turn off SWIM GC and wait until the failed member looks dead on
> all instances.
> 
> Part of #3234
> ---
>  test/unit/swim.c            | 52 ++++++++++++++++++++-------------
>  test/unit/swim.result       |  5 ++--
>  test/unit/swim_test_utils.c | 57 +++++++++++++++++++++++++++++++++++++
>  test/unit/swim_test_utils.h | 20 +++++++++++++
>  4 files changed, 112 insertions(+), 22 deletions(-)
> 
> diff --git a/test/unit/swim.c b/test/unit/swim.c
> index 860d3211e..d77225f6c 100644
> --- a/test/unit/swim.c
> +++ b/test/unit/swim.c
> @@ -374,33 +374,45 @@ swim_test_refute(void)
>  static void
>  swim_test_too_big_packet(void)
>  {
> -	swim_start_test(2);
> +	swim_start_test(3);
>  	int size = 50;
> +	double ack_timeout = 1;
> +	double first_dead_timeout = 20;
> +	double everywhere_dead_timeout = size * 3;
> +	int drop_id = size / 2;
> +
>  	struct swim_cluster *cluster = swim_cluster_new(size);
>  	for (int i = 1; i < size; ++i)
>  		swim_cluster_add_link(cluster, 0, i);
> -	is(swim_cluster_wait_fullmesh(cluster, size), 0, "despite S1 can not "\
> -	   "send all the %d members in a one packet, fullmesh is eventually "\
> -	   "reached", size);
> -	swim_cluster_set_ack_timeout(cluster, 1);
> -	int drop_id = size / 2;
> +
> +	is(swim_cluster_wait_fullmesh(cluster, size * 2), 0, "despite S1 can "\
> +	   "not send all the %d members in a one packet, fullmesh is "\
> +	   "eventually reached", size);
> +
> +	swim_cluster_set_ack_timeout(cluster, ack_timeout);
>  	swim_cluster_set_drop(cluster, drop_id, true);
> +	is(swim_cluster_wait_status_anywhere(cluster, drop_id, MEMBER_DEAD,
> +					     first_dead_timeout), 0,
> +	   "a dead member is detected in time not depending on cluster size");
>  	/*
> -	 * Dissemination of a detected failure takes long time
> -	 * without help of the component, intended for that.
> +	 * GC is off to simplify and speed up checks. When no GC
> +	 * the test is sure that it is safe to check for
> +	 * MEMBER_DEAD everywhere, because it is impossible that a
> +	 * member is considered dead in one place, but already
> +	 * deleted on another. Also, total member deletion takes
> +	 * linear time, because a member is deleted from an
> +	 * instance only when *that* instance will not receive
> +	 * some direct acks from the member. Deletion and
> +	 * additional pings are not triggered if a member dead
> +	 * status is received indirectly via dissemination or
> +	 * anti-entropy. Otherwise it could produce linear network
> +	 * load on the already weak member.
>  	 */
> -	double timeout = size * 10;
> -	int i = 0;
> -	for (; i < size; ++i) {
> -		double start = swim_time();
> -		if (i != drop_id &&
> -		   swim_cluster_wait_status(cluster, i, drop_id,
> -					    swim_member_status_MAX, timeout) != 0)
> -			break;
> -		timeout -= swim_time() - start;
> -	}
> -	is(i, size, "S%d drops all the packets - it should become dead",
> -	   drop_id + 1);
> +	swim_cluster_set_gc(cluster, SWIM_GC_OFF);
> +	is(swim_cluster_wait_status_everywhere(cluster, drop_id, MEMBER_DEAD,
> +					       everywhere_dead_timeout), 0,
> +	   "S%d death is eventually learned by everyone", drop_id + 1);
> +
>  	swim_cluster_delete(cluster);
>  	swim_finish_test();
>  }
> diff --git a/test/unit/swim.result b/test/unit/swim.result
> index 904f061f6..3393870c2 100644
> --- a/test/unit/swim.result
> +++ b/test/unit/swim.result
> @@ -94,9 +94,10 @@ ok 8 - subtests
>  ok 9 - subtests
>  	*** swim_test_basic_gossip: done ***
>  	*** swim_test_too_big_packet ***
> -    1..2
> +    1..3
>      ok 1 - despite S1 can not send all the 50 members in a one packet, fullmesh is eventually reached
> -    ok 2 - S26 drops all the packets - it should become dead
> +    ok 2 - a dead member is detected in time not depending on cluster size
> +    ok 3 - S26 death is eventually learned by everyone
>  ok 10 - subtests
>  	*** swim_test_too_big_packet: done ***
>  	*** swim_test_undead ***
> diff --git a/test/unit/swim_test_utils.c b/test/unit/swim_test_utils.c
> index bb413372c..277a73498 100644
> --- a/test/unit/swim_test_utils.c
> +++ b/test/unit/swim_test_utils.c
> @@ -361,6 +361,39 @@ swim_loop_check_member(struct swim_cluster *cluster, void *data)
>  	return true;
>  }
>  
> +/**
> + * Callback to check that a member matches a template on any
> + * instance in the cluster.
> + */
> +static bool
> +swim_loop_check_member_anywhere(struct swim_cluster *cluster, void *data)
> +{
> +	struct swim_member_template *t = (struct swim_member_template *) data;
> +	for (t->node_id = 0; t->node_id < cluster->size; ++t->node_id) {
> +		if (t->node_id != t->member_id &&
> +		    swim_loop_check_member(cluster, data))
> +			return true;
> +	}
> +	return false;
> +}
> +
> +/**
> + * Callback to check that a member matches a template on every
> + * instance in the cluster.
> + */
> +static bool
> +swim_loop_check_member_everywhere(struct swim_cluster *cluster, void *data)
> +{
> +	struct swim_member_template *t = (struct swim_member_template *) data;
> +	for (t->node_id = 0; t->node_id < cluster->size; ++t->node_id) {
> +		if (t->node_id != t->member_id &&
> +		    !swim_loop_check_member(cluster, data))
> +			return false;
> +	}
> +	return true;
> +}
> +
> +
>  int
>  swim_cluster_wait_status(struct swim_cluster *cluster, int node_id,
>  			 int member_id, enum swim_member_status status,
> @@ -383,6 +416,30 @@ swim_cluster_wait_incarnation(struct swim_cluster *cluster, int node_id,
>  	return swim_wait_timeout(timeout, cluster, swim_loop_check_member, &t);
>  }
>  
> +int
> +swim_cluster_wait_status_anywhere(struct swim_cluster *cluster, int member_id,
> +				  enum swim_member_status status,
> +				  double timeout)
> +{
> +	struct swim_member_template t;
> +	swim_member_template_create(&t, -1, member_id);
> +	swim_member_template_set_status(&t, status);
> +	return swim_wait_timeout(timeout, cluster,
> +				 swim_loop_check_member_anywhere, &t);
> +}
> +
> +int
> +swim_cluster_wait_status_everywhere(struct swim_cluster *cluster, int member_id,
> +				    enum swim_member_status status,
> +				    double timeout)
> +{
> +	struct swim_member_template t;
> +	swim_member_template_create(&t, -1, member_id);
> +	swim_member_template_set_status(&t, status);
> +	return swim_wait_timeout(timeout, cluster,
> +				 swim_loop_check_member_everywhere, &t);
> +}
> +
>  bool
>  swim_error_check_match(const char *msg)
>  {
> diff --git a/test/unit/swim_test_utils.h b/test/unit/swim_test_utils.h
> index d2ef00817..6e99b4879 100644
> --- a/test/unit/swim_test_utils.h
> +++ b/test/unit/swim_test_utils.h
> @@ -118,6 +118,26 @@ swim_cluster_wait_status(struct swim_cluster *cluster, int node_id,
>  			 int member_id, enum swim_member_status status,
>  			 double timeout);
>  
> +/**
> + * Wait until a member with id @a member_id is seen with @a status
> + * in the membership table of any instance in @a cluster. At most
> + * @a timeout seconds.
> + */
> +int
> +swim_cluster_wait_status_anywhere(struct swim_cluster *cluster, int member_id,
> +				  enum swim_member_status status,
> +				  double timeout);
> +
> +/**
> + * Wait until a member with id @a member_id is seen with @a status
> + * in the membership table of every instance in @a cluster. At
> + * most @a timeout seconds.
> + */
> +int
> +swim_cluster_wait_status_everywhere(struct swim_cluster *cluster, int member_id,
> +				    enum swim_member_status status,
> +				    double timeout);
> +
>  /**
>   * Wait until a member with id @a member_id is seen with @a
>   * incarnation in the membership table of a member with id @a
> -- 
> 2.17.2 (Apple Git-113)
> 

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 5/5] swim: introduce dissemination component
  2019-04-08 20:13   ` [tarantool-patches] " Vladislav Shpilevoy
@ 2019-04-09  9:58     ` Konstantin Osipov
  2019-04-09 11:47       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Osipov @ 2019-04-09  9:58 UTC (permalink / raw)
  To: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/04/09 00:29]:
> Kostja pushed some comment updates. I copy-pasted
> them below and fixed or reverted some of them with
> an explanation why.
> 
> Please, do not forget to answer on this email and others
> in the thread with an LGTM or new review fixes/comments.

I can only see one patch, not a stack of 5 patches. 

This one patch is OK to push with the accumulated fixes for the
comments. 

Please re-submit the remaining of the patch stack for review after
that.

> 

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 3/5] test: speed up swim big cluster failure detection
  2019-04-09  8:43   ` [tarantool-patches] " Konstantin Osipov
@ 2019-04-09 11:47     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-09 11:47 UTC (permalink / raw)
  To: tarantool-patches, Konstantin Osipov



On 09/04/2019 11:43, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/04/05 16:12]:
> 
> The more I look at the tests the more confusing the test function
> names become. 
> 
> The namespace of test functions is clearly clashing with the main
> swim namespace, which makes the tests hard to read and follow:
> 
> it's unclear which function belongs to the tests harness and which
> to the swim itself. Please come up with a harness api prefix.

Short answer: 'swim_cluster_' is the harness API prefix. There are
only two harness' public functions not prefixed: swim_run_for() and
swim_error_check_match(), but they are just utilities, and not
related to a concrete set of swim instances (cluster). I've fixed
these two functions to be prefixed with 'swim_test_'.

Long answer below.

> 
> Please feel free to do it in a subsequent patch.

I do not understand, how is it supposed to help, because it is not
necessary to know which function are public, and which are not, to
understand what happens in the test. My logic is that a test should
be easy to read like a text, and to understand the scenario. And for
that we usually never prefix our test functions with special prefixes.
Even in unit tests. You still continue to create new rules so as to
forget about them afterwards, and create contradictory ones or just
break olds. And you never document any of your ideas.

Below you can find some examples of not prefixing test functions.
Some of them are simple, some of them (just like swim) are a part of
a harness.

Examples:
- unit/bitset_iterator.c, test functions, not prefixed:
  * bitsets_create
  * bitsets_destroy
  * nums_fill
  * nums_comparator
  * nums_sort
  * nums_shuffle

- unit/cbus.c, test functions, not prefixed:
  * do_nothing
  * flush_cb
  * finish_execution
  * worker_f
  * worker_start
  * worker_stop
  * do_forced_flush
  * do_some_event

- unit/cbus_stress.c, test functions and structures, not prefixed:
  * struct conn
  * struct thread
  * thread_create, thread_name, thread_func, thread_destroy,
    thread_connect, thread_disconnect, thread_connect_random,
    thread_disconnect_random, thread_send, thread_send_random
  * struct thread_msg
  * thread_msg_received_cb

  By the way, that test was authored by Vova and *committed by you*.

- unit/column_mask.c, test functions and structures, not prefixed:
  * struct tuple_op_template
  * struct tuple_update_template
  * struct tuple_template
  * tuple_new_raw
  * tuple_new_update
  * tuple_update_alloc_f
  * check_update_result
  
  That test was authored by me and *committed by you*.

- unit/csv.c, test functions and structures, not prefixed:
  * print_endl
  * print_field
  * buf_endl
  * buf_field
  * struct counter
  * line_counter
  * fieldsizes_counter
  * csv_out

- unit/fiber.c, test functions, not prefixed:
  * noop_f
  * cancel_f
  * exception_f
  * no_exception_f
  * cancel_dead_f
  * stack_expand
  
  Here the names were invented or patched *by you*.

- unit/histogram.c, test functions, not prefixed:
  * int64_cmp
  * int64_sort
  * gen_buckets
  * gen_rand_data
  * gen_rand_value

unit/say.c, test functions and structures, not prefixed:
  * parse_logger_type
  * parse_syslog_opts
  * format_func_custom
  * struct create_log
  * dummy_log

  These parts of the test were authored by Ilya and
  *committed by you*.

My favorite part - Vinyl. It is relatively new, and was
almost completely reviewed and committed by you.

unit/vy_iterators_helper.c/.h, test functions and structures,
not prefixed:
  * vy_new_simple_stmt
  * vy_mem_insert_template
  * vy_cache_insert_templates_chain
  * vy_cache_on_write_template
  * init_read_views_list
  * vy_stmt_are_same
  * struct vy_stmt_template

unit/vy_log.stub.c, test functions, not prefixed:
  * vy_log_next_id
  * vy_log_tx_begin
  * vy_log_tx_commit
  * vy_log_write
  * vy_recovery_lsm_by_index_id

unit/vy_point_lookup.c: write_run().

I skipped most of the small test files, and decided not
to scan non-unit tests. But apparently in non-unit tests
we never use prefixes as well. And evidently this lack of
useless padding-out-the-code prefixes does not aggravate
readability of the tests.

What is more, even now all non-test functions can be
identified by the first 'struct swim *' and
'struct swim_member *' parameters. Test functions are
prefixed with 'swim_cluster_' prefix. The only methods, not
prefixed with 'swim_cluster_' here were 'swim_run_for'
and 'swim_error_check_match'.

I renamed these two methods with a prefix 'swim_test_', and
changed 'swim_start/finish_test' macro to 'swim_test_start/finish'
to be consistent. Now *all* public test harness methods are
either prefixed with 'swim_cluster_' or with 'swim_test_'.

I will not send it as a follow-up patch in order to keep git
history as clean as possible. Only old tests I will fix
separately, before the main patch.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 5/5] swim: introduce dissemination component
  2019-04-09  9:58     ` Konstantin Osipov
@ 2019-04-09 11:47       ` Vladislav Shpilevoy
  0 siblings, 0 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-09 11:47 UTC (permalink / raw)
  To: tarantool-patches, Konstantin Osipov



On 09/04/2019 12:58, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/04/09 00:29]:
>> Kostja pushed some comment updates. I copy-pasted
>> them below and fixed or reverted some of them with
>> an explanation why.
>>
>> Please, do not forget to answer on this email and others
>> in the thread with an LGTM or new review fixes/comments.
> 
> I can only see one patch, not a stack of 5 patches. 

There is a problem with your email client. The whole patchset
has reached the online storage:
https://www.freelists.org/post/tarantool-patches/PATCH-05-swim-dissemination-component

Additionally, what is strange, you've answered on one of the
previous patches, but now you say, that you do not see them.
🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔🤔

> 
> This one patch is OK to push with the accumulated fixes for the
> comments. 

Unfortunately, it is blocked by the previous patches
in the stack.

> 
> Please re-submit the remaining of the patch stack for review after
> that.

As you wish.

> 
>>
> 
> -- 
> Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
> http://tarantool.io - www.twitter.com/kostja_osipov
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tarantool-patches] Re: [PATCH 0/5] swim dissemination component
  2019-04-05 11:57 [tarantool-patches] [PATCH 0/5] swim dissemination component Vladislav Shpilevoy
                   ` (4 preceding siblings ...)
  2019-04-05 11:57 ` [tarantool-patches] [PATCH 5/5] swim: introduce dissemination component Vladislav Shpilevoy
@ 2019-04-09 12:25 ` Vladislav Shpilevoy
  5 siblings, 0 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2019-04-09 12:25 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

As I see, you've already pushed the old patchset, with
*two* commits above with review fixes, and without the
fixes you requested earlier. It reminds me this 'patchset':
73237320d8e40f8d4244c90ca4182f850c7cb402,
'sio: remove exceptions'. Good job. Well done. After all,
'git' is just an FTP client, right? It is not supposed to
push atomic and clean patches, to be able to use 'git blame',
to revert changes.

Talking of the mailing list, and our SOP on how to review
and push patches - they does not matter too. Great.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-04-09 12:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-05 11:57 [tarantool-patches] [PATCH 0/5] swim dissemination component Vladislav Shpilevoy
2019-04-05 11:57 ` [tarantool-patches] [PATCH 1/5] swim: encapsulate member bin info into a 'passport' Vladislav Shpilevoy
2019-04-05 11:57 ` [tarantool-patches] [PATCH 2/5] swim: make members array decoder be a separate function Vladislav Shpilevoy
2019-04-05 11:57 ` [tarantool-patches] [PATCH 3/5] test: speed up swim big cluster failure detection Vladislav Shpilevoy
2019-04-09  8:43   ` [tarantool-patches] " Konstantin Osipov
2019-04-09 11:47     ` Vladislav Shpilevoy
2019-04-05 11:57 ` [tarantool-patches] [PATCH 4/5] test: set packet drop rate instead of flag in swim tests Vladislav Shpilevoy
2019-04-05 11:57 ` [tarantool-patches] [PATCH 5/5] swim: introduce dissemination component Vladislav Shpilevoy
2019-04-08 20:13   ` [tarantool-patches] " Vladislav Shpilevoy
2019-04-09  9:58     ` Konstantin Osipov
2019-04-09 11:47       ` Vladislav Shpilevoy
2019-04-09 12:25 ` [tarantool-patches] Re: [PATCH 0/5] swim " Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox