[tarantool-patches] [PATCH v2 0/2] SWIM log TTD, and suspicious anti-entropy

Tarantool development patches archive
 help / color / mirror / Atom feed

* [tarantool-patches] [PATCH v2 0/2] SWIM log TTD, and suspicious anti-entropy
@ 2019-06-30 20:01 Vladislav Shpilevoy
  2019-06-30 20:01 ` [tarantool-patches] [PATCH v2 1/2] swim: disseminate event for log(cluster_size) steps Vladislav Shpilevoy
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Vladislav Shpilevoy @ 2019-06-30 20:01 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

The patchset is logically bound to another mail thread "SWIM log TTD, and TTL".
It does the same with TTD, but solves the problem of resurrection in another
way, which does not depend on time. See message of the second commit for
explanations.

V1: https://www.freelists.org/post/tarantool-patches/PATCH-02-SWIM-big-cluster-improvements-part-1
Changes in V2:
- Drop commit about preference of new events over old ones;
- Move logarithmic TTD in a separate commit;
- Make anti-entropy suspicious in order to prevent member resurrections;

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-4253-dissemination-speed-v2
Issue: https://github.com/tarantool/tarantool/issues/4253

Vladislav Shpilevoy (2):
  swim: disseminate event for log(cluster_size) steps
  swim: be suspicious when add new member

 src/lib/swim/swim.c   | 48 +++++++++++++++++++++++---
 test/unit/swim.c      | 79 ++++++++++++++++++++++++++++++++++++++-----
 test/unit/swim.result | 17 ++++++----
 3 files changed, 126 insertions(+), 18 deletions(-)

-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tarantool-patches] [PATCH v2 1/2] swim: disseminate event for log(cluster_size) steps
  2019-06-30 20:01 [tarantool-patches] [PATCH v2 0/2] SWIM log TTD, and suspicious anti-entropy Vladislav Shpilevoy
@ 2019-06-30 20:01 ` Vladislav Shpilevoy
  2019-07-03 19:06   ` [tarantool-patches] " Konstantin Osipov
  2019-06-30 20:01 ` [tarantool-patches] [PATCH v2 2/2] swim: be suspicious when add new member Vladislav Shpilevoy
  2019-07-05 22:45 ` [tarantool-patches] Re: [PATCH v2 0/2] SWIM log TTD, and suspicious anti-entropy Vladislav Shpilevoy
  2 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2019-06-30 20:01 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Before the patch there was a problem of events and anti-entropy
starvation, when a cluster generates so many events, that they
consume the whole UDP packet. A packet fits up to 26 events. If
during the event storm something important happens, that event is
likely to be lost, and not disseminated until the storm is over.

Sadly, there is no way to prevent a storm, but it can be made
much shorter. For that the patch makes TTD of events logarithmic
instead of linear of cluster size.

According to the SWIM paper and to experiments the logarithm is
really enough. Linear TTD was a redundant overkill.

When events live shorter, it does not solve a problem of the
events starvation - still some of them can be lost in case of a
storm. But it frees some space for anti-entropy, which can finish
dissemination of lost events.

Experiments in a simulation of a cluster with 100 nodes showed,
that a failure dissemination happened in ~110 steps if there is
a storm. Basically, no dissemination at all.

After the patch it is ~20 steps. So it is logarithmic as it
should be, although with a bigger constant than without a storm.

However the patch can't be pushed as is. Several tests are broken
now, because due to short TTD 'dead' members are dropped from
member tables too early. It creates a possibility, that another
node resurrects them back via anti-entropy.

Part of #4253
---
 src/lib/swim/swim.c | 19 ++++++++++++-
 test/unit/swim.c    | 68 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c
index bb9e9f519..9647094f5 100644
--- a/src/lib/swim/swim.c
+++ b/src/lib/swim/swim.c
@@ -616,7 +616,24 @@ swim_register_event(struct swim *swim, struct swim_member *member)
 		rlist_add_tail_entry(&swim->dissemination_queue, member,
 				     in_dissemination_queue);
 	}
-	member->status_ttd = mh_size(swim->members);
+	/*
+	 * Logarithm is a perfect number of disseminations of an
+	 * event.
+	 *
+	 * Firstly, it matches the dissemination speed.
+	 *
+	 * Secondly, bigger number of disseminations (for example,
+	 * linear) causes events and anti-entropy starvation in
+	 * big clusters, when lots of events occupy the whole UDP
+	 * packet, and factually the same packet content is being
+	 * sent for quite a long time. No randomness. Anti-entropy
+	 * does not get a chance to disseminate something new and
+	 * random. Bigger orders are redundant and harmful.
+	 *
+	 * Thirdly, logarithm is proved by the original
+	 * SWIM paper as the best option.
+	 */
+	member->status_ttd = ceil(log2(mh_size(swim->members)));
 	swim_cached_round_msg_invalidate(swim);
 }
 
diff --git a/test/unit/swim.c b/test/unit/swim.c
index 3486d3f73..d776a8231 100644
--- a/test/unit/swim.c
+++ b/test/unit/swim.c
@@ -43,6 +43,7 @@
 #include "swim_test_utils.h"
 #include "trigger.h"
 #include <fcntl.h>
+#include <math.h>
 
 /**
  * Test result is a real returned value of main_f. Fiber_join can
@@ -1152,10 +1153,74 @@ swim_test_generation(void)
 	swim_finish_test();
 }
 
+static void
+swim_test_dissemination_speed(void)
+{
+	swim_start_test(2);
+
+	int size = 100;
+	double ack_timeout = 0.1;
+	struct swim_cluster *cluster = swim_cluster_new(size);
+	swim_cluster_set_ack_timeout(cluster, ack_timeout);
+	swim_cluster_set_gc(cluster, SWIM_GC_OFF);
+	for (int i = 0; i < size; ++i) {
+		for (int j = i + 1; j < size; ++j)
+			swim_cluster_interconnect(cluster, i, j);
+	}
+	swim_cluster_set_drop(cluster, 0, 100);
+	fail_if(swim_cluster_wait_status_anywhere(cluster, 0,
+						  MEMBER_DEAD, size) != 0);
+	/*
+	 * Not a trivial problem - at start of a cluster there are
+	 * so many events, that they occupy a UDP packet fully.
+	 * All these events are 'a new member is added'. And
+	 * because of that other much more important events
+	 * starve. In this concrete test a new event 'member is
+	 * dead' starves. To beat that problem SWIM says that
+	 * events should be disseminated not longer than for a
+	 * O(log) round steps. In such a case all the events are
+	 * expired quite fast, and anti-entropy swiftly finishes
+	 * the job. Usually this test works in log * 2, log * 3
+	 * steps. Here it is log * 6 to avoid flakiness in some
+	 * extra rare and slow random cases, but to still check
+	 * for O(log) speed.
+	 */
+	is(swim_cluster_wait_status_everywhere(cluster, 0, MEMBER_DEAD,
+					       log2(size) * 6), 0,
+	   "dissemination work in log time even at the very start of a cluster");
+	swim_cluster_set_drop(cluster, 0, 0);
+	fail_if(swim_cluster_wait_status_everywhere(cluster, 0,
+						    MEMBER_ALIVE, size) != 0);
+	/*
+	 * Another big-cluster case. Assume, that something
+	 * happened and all the members generated an event. For
+	 * example, changed their payload. It creates a storm of
+	 * events, among which some important ones can be lost.
+	 * Such as a failure detection. The only solution again -
+	 * make the events as short living as possible in order to
+	 * faster free space in a UDP packet for other events and
+	 * for anti-entropy. The test below proves that even when
+	 * there is an event storm, failure dissemination still
+	 * works for O(log) time.
+	 */
+	swim_cluster_set_drop(cluster, 0, 100);
+	fail_if(swim_cluster_wait_status_anywhere(cluster, 0,
+						  MEMBER_DEAD, size) != 0);
+	for (int i = 0; i < size; ++i)
+		swim_cluster_member_set_payload(cluster, i, "", 0);
+	is(swim_cluster_wait_status_everywhere(cluster, 0, MEMBER_DEAD,
+					       log2(size) * 6), 0,
+	   "dissemination can withstand an event storm");
+
+	swim_cluster_delete(cluster);
+
+	swim_finish_test();
+}
+
 static int
 main_f(va_list ap)
 {
-	swim_start_test(22);
+	swim_start_test(23);
 
 	(void) ap;
 	swim_test_ev_init();
@@ -1183,6 +1248,7 @@ main_f(va_list ap)
 	swim_test_slow_net();
 	swim_test_triggers();
 	swim_test_generation();
+	swim_test_dissemination_speed();
 
 	swim_test_transport_free();
 	swim_test_ev_free();
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tarantool-patches] [PATCH v2 2/2] swim: be suspicious when add new member
  2019-06-30 20:01 [tarantool-patches] [PATCH v2 0/2] SWIM log TTD, and suspicious anti-entropy Vladislav Shpilevoy
  2019-06-30 20:01 ` [tarantool-patches] [PATCH v2 1/2] swim: disseminate event for log(cluster_size) steps Vladislav Shpilevoy
@ 2019-06-30 20:01 ` Vladislav Shpilevoy
  2019-07-03 19:07   ` [tarantool-patches] " Konstantin Osipov
  2019-07-05 22:45 ` [tarantool-patches] Re: [PATCH v2 0/2] SWIM log TTD, and suspicious anti-entropy Vladislav Shpilevoy
  2 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2019-06-30 20:01 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

The previous commit solves one important problem with too long
event dissemination. Events could for too long time occupy the
whole UDP packet. Now they live log() time, but 'dead' and 'left'
members were bound to TTD. Such members were deleted after TTD
is 0.

Now they are deleted to early. Cluster nodes too early forget
about dead ones, and nodes not aware of death of the latters, can
accidentally resurrect them via anti-entropy. Cluster nodes need
to be suspicious when someone tells them to add a new not dead
member.

This patch makes SWIM add a new member in two cases only: manually
and if an ACK was received from it. A new member can't be added
indirectly via events and anti-entropy anymore. Instead, a ping is
sent to the members who are said to be new and alive. If ACK is
received directly from them, then they are added.

The patch does not affect updates. They are still indirect,
because if something has updated in an existing member, then it
is definitely alive.

Part of #4253
---
 src/lib/swim/swim.c   | 29 ++++++++++++++++++++++++++---
 test/unit/swim.c      | 11 ++++-------
 test/unit/swim.result | 17 +++++++++++------
 3 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c
index 9647094f5..6fbac00ec 100644
--- a/src/lib/swim/swim.c
+++ b/src/lib/swim/swim.c
@@ -1510,6 +1510,10 @@ swim_update_member(struct swim *swim, const struct swim_member_def *def,
  * @param swim SWIM instance to upsert into.
  * @param def Member definition to build a new member or update an
  *        existing one.
+ * @param is_direct True, if the member definition was received
+ *        directly from that member. Via ping or ack from him.
+ *        Otherwise the definition can't be trusted, if it states,
+ *        that a new member should be added.
  * @param[out] result A result member: a new, or an updated, or
  *        NULL in case of nothing has changed. For example, @a def
  *        was too old.
@@ -1520,7 +1524,7 @@ swim_update_member(struct swim *swim, const struct swim_member_def *def,
  */
 static int
 swim_upsert_member(struct swim *swim, const struct swim_member_def *def,
-		   struct swim_member **result)
+		   bool is_direct, struct swim_member **result)
 {
 	struct swim_member *member = swim_find_member(swim, &def->uuid);
 	if (member == NULL) {
@@ -1538,6 +1542,25 @@ swim_upsert_member(struct swim *swim, const struct swim_member_def *def,
 			 * forever.
 			 */
 			goto skip;
+		} else if (def->status < MEMBER_DEAD && ! is_direct) {
+			/*
+			 * If a member is not dead, then it can't
+			 * be added as simple as that. In fact it
+			 * can be already dead, but sender of that
+			 * definition didn't know that. The only
+			 * way to be sure - send him a ping. And
+			 * one exception - if the sender of the
+			 * definition is exactly that member, then
+			 * it can be trusted of course.
+			 */
+			struct swim_task *t = swim_task_new(swim_task_delete_cb,
+							    swim_task_delete_cb,
+							    "probe ping");
+			if (t == NULL)
+				return -1;
+			swim_send_ping(swim, t, &def->addr);
+			*result = NULL;
+			return 0;
 		}
 		*result = swim_new_member(swim, &def->addr, &def->uuid,
 					  def->status, &def->incarnation,
@@ -1597,7 +1620,7 @@ swim_process_members(struct swim *swim, const char *prefix,
 		struct swim_member *member;
 		if (swim_member_def_decode(&def, pos, end, prefix) != 0)
 			return -1;
-		if (swim_upsert_member(swim, &def, &member) != 0) {
+		if (swim_upsert_member(swim, &def, false, &member) != 0) {
 			/*
 			 * Not a critical error. Other members
 			 * still can be updated.
@@ -1639,7 +1662,7 @@ swim_process_failure_detection(struct swim *swim, const char **pos,
 	mdef.incarnation = def.incarnation;
 	mdef.uuid = *uuid;
 	struct swim_member *member;
-	if (swim_upsert_member(swim, &mdef, &member) != 0)
+	if (swim_upsert_member(swim, &mdef, true, &member) != 0)
 		return -1;
 	/*
 	 * It can be NULL, for example, in case of too old
diff --git a/test/unit/swim.c b/test/unit/swim.c
index d776a8231..db9441157 100644
--- a/test/unit/swim.c
+++ b/test/unit/swim.c
@@ -254,7 +254,7 @@ swim_test_add_remove(void)
 static void
 swim_test_basic_failure_detection(void)
 {
-	swim_start_test(9);
+	swim_start_test(8);
 	struct swim_cluster *cluster = swim_cluster_new(2);
 	swim_cluster_set_ack_timeout(cluster, 0.5);
 
@@ -275,11 +275,8 @@ swim_test_basic_failure_detection(void)
 	   "but it is dead after one more");
 
 	swim_run_for(1);
-	is(swim_cluster_member_status(cluster, 0, 1), MEMBER_DEAD, "after 2 "\
-	   "more unacks the member still is not deleted - dissemination TTD "\
-	   "keeps it");
 	is(swim_cluster_wait_status(cluster, 0, 1, swim_member_status_MAX, 2),
-	   0, "but it is dropped after 2 rounds when TTD gets 0");
+	   0, "it is dropped after one more round when TTD gets 0");
 
 	/*
 	 * After IO unblock pending messages will be processed all
@@ -584,12 +581,12 @@ swim_test_quit(void)
 	swim_cluster_interconnect(cluster, 1, 2);
 
 	swim_cluster_quit_node(cluster, 0);
-	swim_run_for(2);
+	swim_run_for(1);
 	is(swim_cluster_member_status(cluster, 2, 0), MEMBER_LEFT,
 	   "S3 sees S1 as left");
 	is(swim_cluster_member_status(cluster, 1, 0), swim_member_status_MAX,
 	   "S2 does not see S1 at all");
-	swim_run_for(2);
+	swim_run_for(1);
 	is(swim_cluster_member_status(cluster, 2, 0), swim_member_status_MAX,
 	   "after more time S1 is dropped from S3");
 	is(swim_cluster_member_status(cluster, 1, 0), swim_member_status_MAX,
diff --git a/test/unit/swim.result b/test/unit/swim.result
index 04a2778e6..1fde80132 100644
--- a/test/unit/swim.result
+++ b/test/unit/swim.result
@@ -1,5 +1,5 @@
 	*** main_f ***
-1..22
+1..23
 	*** swim_test_one_link ***
     1..6
     ok 1 - no rounds - no fullmesh
@@ -65,16 +65,15 @@ ok 4 - subtests
 ok 5 - subtests
 	*** swim_test_add_remove: done ***
 	*** swim_test_basic_failure_detection ***
-    1..9
+    1..8
     ok 1 - node is added as alive
     ok 2 - member still is not suspected after 1 noack
     ok 3 - but it is suspected after one more
     ok 4 - it is not dead after 2 more noacks
     ok 5 - but it is dead after one more
-    ok 6 - after 2 more unacks the member still is not deleted - dissemination TTD keeps it
-    ok 7 - but it is dropped after 2 rounds when TTD gets 0
-    ok 8 - fullmesh is restored
-    ok 9 - a member is added back on an ACK
+    ok 6 - it is dropped after one more round when TTD gets 0
+    ok 7 - fullmesh is restored
+    ok 8 - a member is added back on an ACK
 ok 6 - subtests
 	*** swim_test_basic_failure_detection: done ***
 	*** swim_test_probe ***
@@ -237,4 +236,10 @@ ok 21 - subtests
     ok 3 - S2 sees new generation of S1
 ok 22 - subtests
 	*** swim_test_generation: done ***
+	*** swim_test_dissemination_speed ***
+    1..2
+    ok 1 - dissemination work in log time even at the very start of a cluster
+    ok 2 - dissemination can withstand an event storm
+ok 23 - subtests
+	*** swim_test_dissemination_speed: done ***
 	*** main_f: done ***
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tarantool-patches] Re: [PATCH v2 1/2] swim: disseminate event for log(cluster_size) steps
  2019-06-30 20:01 ` [tarantool-patches] [PATCH v2 1/2] swim: disseminate event for log(cluster_size) steps Vladislav Shpilevoy
@ 2019-07-03 19:06   ` Konstantin Osipov
  2019-07-03 23:30     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 10+ messages in thread
From: Konstantin Osipov @ 2019-07-03 19:06 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/07/01 10:04]:

lgtm

> Before the patch there was a problem of events and anti-entropy
> starvation, when a cluster generates so many events, that they
> consume the whole UDP packet. A packet fits up to 26 events. If
> during the event storm something important happens, that event is
> likely to be lost, and not disseminated until the storm is over.
> 
> Sadly, there is no way to prevent a storm, but it can be made
> much shorter. For that the patch makes TTD of events logarithmic
> instead of linear of cluster size.
> 
> According to the SWIM paper and to experiments the logarithm is
> really enough. Linear TTD was a redundant overkill.
> 
> When events live shorter, it does not solve a problem of the
> events starvation - still some of them can be lost in case of a
> storm. But it frees some space for anti-entropy, which can finish
> dissemination of lost events.
> 
> Experiments in a simulation of a cluster with 100 nodes showed,
> that a failure dissemination happened in ~110 steps if there is
> a storm. Basically, no dissemination at all.
> 
> After the patch it is ~20 steps. So it is logarithmic as it
> should be, although with a bigger constant than without a storm.
> 
> However the patch can't be pushed as is. Several tests are broken
> now, because due to short TTD 'dead' members are dropped from
> member tables too early. It creates a possibility, that another
> node resurrects them back via anti-entropy.
> 
> Part of #4253
> ---
>  src/lib/swim/swim.c | 19 ++++++++++++-
>  test/unit/swim.c    | 68 ++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 85 insertions(+), 2 deletions(-)
> 
> diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c
> index bb9e9f519..9647094f5 100644
> --- a/src/lib/swim/swim.c
> +++ b/src/lib/swim/swim.c
> @@ -616,7 +616,24 @@ swim_register_event(struct swim *swim, struct swim_member *member)
>  		rlist_add_tail_entry(&swim->dissemination_queue, member,
>  				     in_dissemination_queue);
>  	}
> -	member->status_ttd = mh_size(swim->members);
> +	/*
> +	 * Logarithm is a perfect number of disseminations of an
> +	 * event.
> +	 *
> +	 * Firstly, it matches the dissemination speed.
> +	 *
> +	 * Secondly, bigger number of disseminations (for example,
> +	 * linear) causes events and anti-entropy starvation in
> +	 * big clusters, when lots of events occupy the whole UDP
> +	 * packet, and factually the same packet content is being
> +	 * sent for quite a long time. No randomness. Anti-entropy
> +	 * does not get a chance to disseminate something new and
> +	 * random. Bigger orders are redundant and harmful.
> +	 *
> +	 * Thirdly, logarithm is proved by the original
> +	 * SWIM paper as the best option.
> +	 */
> +	member->status_ttd = ceil(log2(mh_size(swim->members)));
>  	swim_cached_round_msg_invalidate(swim);
>  }
>  
> diff --git a/test/unit/swim.c b/test/unit/swim.c
> index 3486d3f73..d776a8231 100644
> --- a/test/unit/swim.c
> +++ b/test/unit/swim.c
> @@ -43,6 +43,7 @@
>  #include "swim_test_utils.h"
>  #include "trigger.h"
>  #include <fcntl.h>
> +#include <math.h>
>  
>  /**
>   * Test result is a real returned value of main_f. Fiber_join can
> @@ -1152,10 +1153,74 @@ swim_test_generation(void)
>  	swim_finish_test();
>  }
>  
> +static void
> +swim_test_dissemination_speed(void)
> +{
> +	swim_start_test(2);
> +
> +	int size = 100;
> +	double ack_timeout = 0.1;
> +	struct swim_cluster *cluster = swim_cluster_new(size);
> +	swim_cluster_set_ack_timeout(cluster, ack_timeout);
> +	swim_cluster_set_gc(cluster, SWIM_GC_OFF);
> +	for (int i = 0; i < size; ++i) {
> +		for (int j = i + 1; j < size; ++j)
> +			swim_cluster_interconnect(cluster, i, j);
> +	}
> +	swim_cluster_set_drop(cluster, 0, 100);
> +	fail_if(swim_cluster_wait_status_anywhere(cluster, 0,
> +						  MEMBER_DEAD, size) != 0);
> +	/*
> +	 * Not a trivial problem - at start of a cluster there are
> +	 * so many events, that they occupy a UDP packet fully.
> +	 * All these events are 'a new member is added'. And
> +	 * because of that other much more important events
> +	 * starve. In this concrete test a new event 'member is
> +	 * dead' starves. To beat that problem SWIM says that
> +	 * events should be disseminated not longer than for a
> +	 * O(log) round steps. In such a case all the events are
> +	 * expired quite fast, and anti-entropy swiftly finishes
> +	 * the job. Usually this test works in log * 2, log * 3
> +	 * steps. Here it is log * 6 to avoid flakiness in some
> +	 * extra rare and slow random cases, but to still check
> +	 * for O(log) speed.
> +	 */
> +	is(swim_cluster_wait_status_everywhere(cluster, 0, MEMBER_DEAD,
> +					       log2(size) * 6), 0,
> +	   "dissemination work in log time even at the very start of a cluster");
> +	swim_cluster_set_drop(cluster, 0, 0);
> +	fail_if(swim_cluster_wait_status_everywhere(cluster, 0,
> +						    MEMBER_ALIVE, size) != 0);
> +	/*
> +	 * Another big-cluster case. Assume, that something
> +	 * happened and all the members generated an event. For
> +	 * example, changed their payload. It creates a storm of
> +	 * events, among which some important ones can be lost.
> +	 * Such as a failure detection. The only solution again -
> +	 * make the events as short living as possible in order to
> +	 * faster free space in a UDP packet for other events and
> +	 * for anti-entropy. The test below proves that even when
> +	 * there is an event storm, failure dissemination still
> +	 * works for O(log) time.
> +	 */
> +	swim_cluster_set_drop(cluster, 0, 100);
> +	fail_if(swim_cluster_wait_status_anywhere(cluster, 0,
> +						  MEMBER_DEAD, size) != 0);
> +	for (int i = 0; i < size; ++i)
> +		swim_cluster_member_set_payload(cluster, i, "", 0);
> +	is(swim_cluster_wait_status_everywhere(cluster, 0, MEMBER_DEAD,
> +					       log2(size) * 6), 0,
> +	   "dissemination can withstand an event storm");
> +
> +	swim_cluster_delete(cluster);
> +
> +	swim_finish_test();
> +}
> +
>  static int
>  main_f(va_list ap)
>  {
> -	swim_start_test(22);
> +	swim_start_test(23);
>  
>  	(void) ap;
>  	swim_test_ev_init();
> @@ -1183,6 +1248,7 @@ main_f(va_list ap)
>  	swim_test_slow_net();
>  	swim_test_triggers();
>  	swim_test_generation();
> +	swim_test_dissemination_speed();
>  
>  	swim_test_transport_free();
>  	swim_test_ev_free();
> -- 
> 2.20.1 (Apple Git-117)
> 

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/2] swim: be suspicious when add new member
  2019-06-30 20:01 ` [tarantool-patches] [PATCH v2 2/2] swim: be suspicious when add new member Vladislav Shpilevoy
@ 2019-07-03 19:07   ` Konstantin Osipov
  2019-07-03 23:32     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 10+ messages in thread
From: Konstantin Osipov @ 2019-07-03 19:07 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/07/01 10:04]:
> The previous commit solves one important problem with too long
> event dissemination. Events could for too long time occupy the
> whole UDP packet. Now they live log() time, but 'dead' and 'left'
> members were bound to TTD. Such members were deleted after TTD
> is 0.

lgtm

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tarantool-patches] Re: [PATCH v2 1/2] swim: disseminate event for log(cluster_size) steps
  2019-07-03 19:06   ` [tarantool-patches] " Konstantin Osipov
@ 2019-07-03 23:30     ` Vladislav Shpilevoy
  2019-07-04  8:10       ` Konstantin Osipov
  0 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2019-07-03 23:30 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

Hi! Thanks for the review!

>> diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c
>> index bb9e9f519..9647094f5 100644
>> --- a/src/lib/swim/swim.c
>> +++ b/src/lib/swim/swim.c
>> @@ -616,7 +616,24 @@ swim_register_event(struct swim *swim, struct swim_member *member)
>>  		rlist_add_tail_entry(&swim->dissemination_queue, member,
>>  				     in_dissemination_queue);
>>  	}
>> -	member->status_ttd = mh_size(swim->members);
>> +	/*
>> +	 * Logarithm is a perfect number of disseminations of an
>> +	 * event.
>> +	 *
>> +	 * Firstly, it matches the dissemination speed.
>> +	 *
>> +	 * Secondly, bigger number of disseminations (for example,
>> +	 * linear) causes events and anti-entropy starvation in
>> +	 * big clusters, when lots of events occupy the whole UDP
>> +	 * packet, and factually the same packet content is being
>> +	 * sent for quite a long time. No randomness. Anti-entropy
>> +	 * does not get a chance to disseminate something new and
>> +	 * random. Bigger orders are redundant and harmful.
>> +	 *
>> +	 * Thirdly, logarithm is proved by the original
>> +	 * SWIM paper as the best option.
>> +	 */
>> +	member->status_ttd = ceil(log2(mh_size(swim->members)));

I've changed this place to

    member->status_ttd = ceil(log2(mh_size(swim->members))) + 1;

It allows to do not break the tests in this commit, and fixes a bug,
when status_ttd became negative for 'self'. Because ceil(log2(1)) = 0,
and on a next round step it became -1, -2, etc.

I didn't push the patch yet, if you have anything against that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/2] swim: be suspicious when add new member
  2019-07-03 19:07   ` [tarantool-patches] " Konstantin Osipov
@ 2019-07-03 23:32     ` Vladislav Shpilevoy
  2019-07-04  8:10       ` Konstantin Osipov
  0 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2019-07-03 23:32 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches

Thanks for the review!

Since the previous commit now does not break the tests, I've
removed from this one all the test fixes, but added a new test:

> static void
> swim_test_suspect_new_members(void)
> {
> 	swim_start_test(2);
> 
> 	struct swim_cluster *cluster = swim_cluster_new(3);
> 	swim_cluster_set_ack_timeout(cluster, 1);
> 	swim_cluster_interconnect(cluster, 0, 1);
> 	swim_cluster_interconnect(cluster, 1, 2);
> 
> 	swim_cluster_set_drop(cluster, 0, 100);
> 	swim_cluster_block_io(cluster, 2);
> 	is(swim_cluster_wait_status(cluster, 1, 0, swim_member_status_MAX, 15),
> 	   0, "S2 dropped S1 as dead");
> 	swim_cluster_unblock_io(cluster, 2);
> 	swim_run_for(1);
> 	is(swim_cluster_member_status(cluster, 2, 0), swim_member_status_MAX,
> 	   "S3 didn't add S1 from S2's messages, because S1 didn't answer "\
> 	   "on a ping");
> 
> 	swim_cluster_delete(cluster);
> 
> 	swim_finish_test();
> }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tarantool-patches] Re: [PATCH v2 1/2] swim: disseminate event for log(cluster_size) steps
  2019-07-03 23:30     ` Vladislav Shpilevoy
@ 2019-07-04  8:10       ` Konstantin Osipov
  0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Osipov @ 2019-07-04  8:10 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/07/04 10:15]:
> >> +	/*
> >> +	 * Logarithm is a perfect number of disseminations of an
> >> +	 * event.
> >> +	 *
> >> +	 * Firstly, it matches the dissemination speed.
> >> +	 *
> >> +	 * Secondly, bigger number of disseminations (for example,
> >> +	 * linear) causes events and anti-entropy starvation in
> >> +	 * big clusters, when lots of events occupy the whole UDP
> >> +	 * packet, and factually the same packet content is being
> >> +	 * sent for quite a long time. No randomness. Anti-entropy
> >> +	 * does not get a chance to disseminate something new and
> >> +	 * random. Bigger orders are redundant and harmful.
> >> +	 *
> >> +	 * Thirdly, logarithm is proved by the original
> >> +	 * SWIM paper as the best option.
> >> +	 */
> >> +	member->status_ttd = ceil(log2(mh_size(swim->members)));
> 
> I've changed this place to
> 
>     member->status_ttd = ceil(log2(mh_size(swim->members))) + 1;
> 
> It allows to do not break the tests in this commit, and fixes a bug,
> when status_ttd became negative for 'self'. Because ceil(log2(1)) = 0,
> and on a next round step it became -1, -2, etc.
> 
> I didn't push the patch yet, if you have anything against that.

go ahead and push

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/2] swim: be suspicious when add new member
  2019-07-03 23:32     ` Vladislav Shpilevoy
@ 2019-07-04  8:10       ` Konstantin Osipov
  0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Osipov @ 2019-07-04  8:10 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/07/04 10:15]:
> Thanks for the review!
> 
> Since the previous commit now does not break the tests, I've
> removed from this one all the test fixes, but added a new test:

Thanks!


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tarantool-patches] Re: [PATCH v2 0/2] SWIM log TTD, and suspicious anti-entropy
  2019-06-30 20:01 [tarantool-patches] [PATCH v2 0/2] SWIM log TTD, and suspicious anti-entropy Vladislav Shpilevoy
  2019-06-30 20:01 ` [tarantool-patches] [PATCH v2 1/2] swim: disseminate event for log(cluster_size) steps Vladislav Shpilevoy
  2019-06-30 20:01 ` [tarantool-patches] [PATCH v2 2/2] swim: be suspicious when add new member Vladislav Shpilevoy
@ 2019-07-05 22:45 ` Vladislav Shpilevoy
  2 siblings, 0 replies; 10+ messages in thread
From: Vladislav Shpilevoy @ 2019-07-05 22:45 UTC (permalink / raw)
  To: tarantool-patches; +Cc: kostja

Pushed to the master.

On 30/06/2019 22:01, Vladislav Shpilevoy wrote:
> The patchset is logically bound to another mail thread "SWIM log TTD, and TTL".
> It does the same with TTD, but solves the problem of resurrection in another
> way, which does not depend on time. See message of the second commit for
> explanations.
> 
> V1: https://www.freelists.org/post/tarantool-patches/PATCH-02-SWIM-big-cluster-improvements-part-1
> Changes in V2:
> - Drop commit about preference of new events over old ones;
> - Move logarithmic TTD in a separate commit;
> - Make anti-entropy suspicious in order to prevent member resurrections;
> 
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-4253-dissemination-speed-v2
> Issue: https://github.com/tarantool/tarantool/issues/4253
> 
> Vladislav Shpilevoy (2):
>   swim: disseminate event for log(cluster_size) steps
>   swim: be suspicious when add new member
> 
>  src/lib/swim/swim.c   | 48 +++++++++++++++++++++++---
>  test/unit/swim.c      | 79 ++++++++++++++++++++++++++++++++++++++-----
>  test/unit/swim.result | 17 ++++++----
>  3 files changed, 126 insertions(+), 18 deletions(-)
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-07-05 22:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-30 20:01 [tarantool-patches] [PATCH v2 0/2] SWIM log TTD, and suspicious anti-entropy Vladislav Shpilevoy
2019-06-30 20:01 ` [tarantool-patches] [PATCH v2 1/2] swim: disseminate event for log(cluster_size) steps Vladislav Shpilevoy
2019-07-03 19:06   ` [tarantool-patches] " Konstantin Osipov
2019-07-03 23:30     ` Vladislav Shpilevoy
2019-07-04  8:10       ` Konstantin Osipov
2019-06-30 20:01 ` [tarantool-patches] [PATCH v2 2/2] swim: be suspicious when add new member Vladislav Shpilevoy
2019-07-03 19:07   ` [tarantool-patches] " Konstantin Osipov
2019-07-03 23:32     ` Vladislav Shpilevoy
2019-07-04  8:10       ` Konstantin Osipov
2019-07-05 22:45 ` [tarantool-patches] Re: [PATCH v2 0/2] SWIM log TTD, and suspicious anti-entropy Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox