[tarantool-patches] Re: [PATCH 3/5] test: speed up swim big cluster failure detection

Konstantin Osipov kostja at tarantool.org
Tue Apr 9 11:43:03 MSK 2019


* Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [19/04/05 16:12]:

The more I look at the tests the more confusing the test function
names become. 

The namespace of test functions is clearly clashing with the main
swim namespace, which makes the tests hard to read and follow:

it's unclear which function belongs to the tests harness and which
to the swim itself. Please come up with a harness api prefix.

Please feel free to do it in a subsequent patch.

> The test checks that if a member has failed in a big cluster, it
> is eventually deleted from all instances. But it takes too much
> real time despite usage of virtual time.
> 
> This is because member total deletion takes
> O(N + ack_timeout * 5) time. N so as to wait until every member
> pinged the failed one at least once, + 3 * ack_timeout to learn
> that it is dead, and + 2 * ack_timeout to drop it. Of course, it
> is an upper border, and usually it is faster but not much. For
> example, on the cluster of size 50 it takes easily 55 virtual
> seconds.
> 
> On the contrary, to just learn that a member is dead on every
> instance takes O(log(N)) according to the SWIM paper. On the
> same test with 50 instances cluster it takes ~15 virtual seconds
> to disseminate 'dead' status of the failed member on every
> instance. And even without dissemination component, with
> anti-entropy only.
> 
> Leaping ahead, for the subsequent patches it is tested that with
> the dissemination component it takes already ~6 virtual seconds.
> 
> In the summary, without losing test coverage it is much faster to
> turn off SWIM GC and wait until the failed member looks dead on
> all instances.
> 
> Part of #3234
> ---
>  test/unit/swim.c            | 52 ++++++++++++++++++++-------------
>  test/unit/swim.result       |  5 ++--
>  test/unit/swim_test_utils.c | 57 +++++++++++++++++++++++++++++++++++++
>  test/unit/swim_test_utils.h | 20 +++++++++++++
>  4 files changed, 112 insertions(+), 22 deletions(-)
> 
> diff --git a/test/unit/swim.c b/test/unit/swim.c
> index 860d3211e..d77225f6c 100644
> --- a/test/unit/swim.c
> +++ b/test/unit/swim.c
> @@ -374,33 +374,45 @@ swim_test_refute(void)
>  static void
>  swim_test_too_big_packet(void)
>  {
> -	swim_start_test(2);
> +	swim_start_test(3);
>  	int size = 50;
> +	double ack_timeout = 1;
> +	double first_dead_timeout = 20;
> +	double everywhere_dead_timeout = size * 3;
> +	int drop_id = size / 2;
> +
>  	struct swim_cluster *cluster = swim_cluster_new(size);
>  	for (int i = 1; i < size; ++i)
>  		swim_cluster_add_link(cluster, 0, i);
> -	is(swim_cluster_wait_fullmesh(cluster, size), 0, "despite S1 can not "\
> -	   "send all the %d members in a one packet, fullmesh is eventually "\
> -	   "reached", size);
> -	swim_cluster_set_ack_timeout(cluster, 1);
> -	int drop_id = size / 2;
> +
> +	is(swim_cluster_wait_fullmesh(cluster, size * 2), 0, "despite S1 can "\
> +	   "not send all the %d members in a one packet, fullmesh is "\
> +	   "eventually reached", size);
> +
> +	swim_cluster_set_ack_timeout(cluster, ack_timeout);
>  	swim_cluster_set_drop(cluster, drop_id, true);
> +	is(swim_cluster_wait_status_anywhere(cluster, drop_id, MEMBER_DEAD,
> +					     first_dead_timeout), 0,
> +	   "a dead member is detected in time not depending on cluster size");
>  	/*
> -	 * Dissemination of a detected failure takes long time
> -	 * without help of the component, intended for that.
> +	 * GC is off to simplify and speed up checks. When no GC
> +	 * the test is sure that it is safe to check for
> +	 * MEMBER_DEAD everywhere, because it is impossible that a
> +	 * member is considered dead in one place, but already
> +	 * deleted on another. Also, total member deletion takes
> +	 * linear time, because a member is deleted from an
> +	 * instance only when *that* instance will not receive
> +	 * some direct acks from the member. Deletion and
> +	 * additional pings are not triggered if a member dead
> +	 * status is received indirectly via dissemination or
> +	 * anti-entropy. Otherwise it could produce linear network
> +	 * load on the already weak member.
>  	 */
> -	double timeout = size * 10;
> -	int i = 0;
> -	for (; i < size; ++i) {
> -		double start = swim_time();
> -		if (i != drop_id &&
> -		   swim_cluster_wait_status(cluster, i, drop_id,
> -					    swim_member_status_MAX, timeout) != 0)
> -			break;
> -		timeout -= swim_time() - start;
> -	}
> -	is(i, size, "S%d drops all the packets - it should become dead",
> -	   drop_id + 1);
> +	swim_cluster_set_gc(cluster, SWIM_GC_OFF);
> +	is(swim_cluster_wait_status_everywhere(cluster, drop_id, MEMBER_DEAD,
> +					       everywhere_dead_timeout), 0,
> +	   "S%d death is eventually learned by everyone", drop_id + 1);
> +
>  	swim_cluster_delete(cluster);
>  	swim_finish_test();
>  }
> diff --git a/test/unit/swim.result b/test/unit/swim.result
> index 904f061f6..3393870c2 100644
> --- a/test/unit/swim.result
> +++ b/test/unit/swim.result
> @@ -94,9 +94,10 @@ ok 8 - subtests
>  ok 9 - subtests
>  	*** swim_test_basic_gossip: done ***
>  	*** swim_test_too_big_packet ***
> -    1..2
> +    1..3
>      ok 1 - despite S1 can not send all the 50 members in a one packet, fullmesh is eventually reached
> -    ok 2 - S26 drops all the packets - it should become dead
> +    ok 2 - a dead member is detected in time not depending on cluster size
> +    ok 3 - S26 death is eventually learned by everyone
>  ok 10 - subtests
>  	*** swim_test_too_big_packet: done ***
>  	*** swim_test_undead ***
> diff --git a/test/unit/swim_test_utils.c b/test/unit/swim_test_utils.c
> index bb413372c..277a73498 100644
> --- a/test/unit/swim_test_utils.c
> +++ b/test/unit/swim_test_utils.c
> @@ -361,6 +361,39 @@ swim_loop_check_member(struct swim_cluster *cluster, void *data)
>  	return true;
>  }
>  
> +/**
> + * Callback to check that a member matches a template on any
> + * instance in the cluster.
> + */
> +static bool
> +swim_loop_check_member_anywhere(struct swim_cluster *cluster, void *data)
> +{
> +	struct swim_member_template *t = (struct swim_member_template *) data;
> +	for (t->node_id = 0; t->node_id < cluster->size; ++t->node_id) {
> +		if (t->node_id != t->member_id &&
> +		    swim_loop_check_member(cluster, data))
> +			return true;
> +	}
> +	return false;
> +}
> +
> +/**
> + * Callback to check that a member matches a template on every
> + * instance in the cluster.
> + */
> +static bool
> +swim_loop_check_member_everywhere(struct swim_cluster *cluster, void *data)
> +{
> +	struct swim_member_template *t = (struct swim_member_template *) data;
> +	for (t->node_id = 0; t->node_id < cluster->size; ++t->node_id) {
> +		if (t->node_id != t->member_id &&
> +		    !swim_loop_check_member(cluster, data))
> +			return false;
> +	}
> +	return true;
> +}
> +
> +
>  int
>  swim_cluster_wait_status(struct swim_cluster *cluster, int node_id,
>  			 int member_id, enum swim_member_status status,
> @@ -383,6 +416,30 @@ swim_cluster_wait_incarnation(struct swim_cluster *cluster, int node_id,
>  	return swim_wait_timeout(timeout, cluster, swim_loop_check_member, &t);
>  }
>  
> +int
> +swim_cluster_wait_status_anywhere(struct swim_cluster *cluster, int member_id,
> +				  enum swim_member_status status,
> +				  double timeout)
> +{
> +	struct swim_member_template t;
> +	swim_member_template_create(&t, -1, member_id);
> +	swim_member_template_set_status(&t, status);
> +	return swim_wait_timeout(timeout, cluster,
> +				 swim_loop_check_member_anywhere, &t);
> +}
> +
> +int
> +swim_cluster_wait_status_everywhere(struct swim_cluster *cluster, int member_id,
> +				    enum swim_member_status status,
> +				    double timeout)
> +{
> +	struct swim_member_template t;
> +	swim_member_template_create(&t, -1, member_id);
> +	swim_member_template_set_status(&t, status);
> +	return swim_wait_timeout(timeout, cluster,
> +				 swim_loop_check_member_everywhere, &t);
> +}
> +
>  bool
>  swim_error_check_match(const char *msg)
>  {
> diff --git a/test/unit/swim_test_utils.h b/test/unit/swim_test_utils.h
> index d2ef00817..6e99b4879 100644
> --- a/test/unit/swim_test_utils.h
> +++ b/test/unit/swim_test_utils.h
> @@ -118,6 +118,26 @@ swim_cluster_wait_status(struct swim_cluster *cluster, int node_id,
>  			 int member_id, enum swim_member_status status,
>  			 double timeout);
>  
> +/**
> + * Wait until a member with id @a member_id is seen with @a status
> + * in the membership table of any instance in @a cluster. At most
> + * @a timeout seconds.
> + */
> +int
> +swim_cluster_wait_status_anywhere(struct swim_cluster *cluster, int member_id,
> +				  enum swim_member_status status,
> +				  double timeout);
> +
> +/**
> + * Wait until a member with id @a member_id is seen with @a status
> + * in the membership table of every instance in @a cluster. At
> + * most @a timeout seconds.
> + */
> +int
> +swim_cluster_wait_status_everywhere(struct swim_cluster *cluster, int member_id,
> +				    enum swim_member_status status,
> +				    double timeout);
> +
>  /**
>   * Wait until a member with id @a member_id is seen with @a
>   * incarnation in the membership table of a member with id @a
> -- 
> 2.17.2 (Apple Git-113)
> 

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov




More information about the Tarantool-patches mailing list