From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 12A0A2C250 for ; Fri, 5 Apr 2019 07:57:36 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bju3iBy1DJMv for ; Fri, 5 Apr 2019 07:57:35 -0400 (EDT) Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 6A54B2BF16 for ; Fri, 5 Apr 2019 07:57:34 -0400 (EDT) From: Vladislav Shpilevoy Subject: [tarantool-patches] [PATCH 3/5] test: speed up swim big cluster failure detection Date: Fri, 5 Apr 2019 14:57:29 +0300 Message-Id: <13b859581390bf1f202096fbaa2405204fcbb4dc.1554465150.git.v.shpilevoy@tarantool.org> In-Reply-To: References: In-Reply-To: References: Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: kostja@tarantool.org The test checks that if a member has failed in a big cluster, it is eventually deleted from all instances. But it takes too much real time despite usage of virtual time. This is because member total deletion takes O(N + ack_timeout * 5) time. N so as to wait until every member pinged the failed one at least once, + 3 * ack_timeout to learn that it is dead, and + 2 * ack_timeout to drop it. Of course, it is an upper border, and usually it is faster but not much. For example, on the cluster of size 50 it takes easily 55 virtual seconds. On the contrary, to just learn that a member is dead on every instance takes O(log(N)) according to the SWIM paper. On the same test with 50 instances cluster it takes ~15 virtual seconds to disseminate 'dead' status of the failed member on every instance. And even without dissemination component, with anti-entropy only. Leaping ahead, for the subsequent patches it is tested that with the dissemination component it takes already ~6 virtual seconds. In the summary, without losing test coverage it is much faster to turn off SWIM GC and wait until the failed member looks dead on all instances. Part of #3234 --- test/unit/swim.c | 52 ++++++++++++++++++++------------- test/unit/swim.result | 5 ++-- test/unit/swim_test_utils.c | 57 +++++++++++++++++++++++++++++++++++++ test/unit/swim_test_utils.h | 20 +++++++++++++ 4 files changed, 112 insertions(+), 22 deletions(-) diff --git a/test/unit/swim.c b/test/unit/swim.c index 860d3211e..d77225f6c 100644 --- a/test/unit/swim.c +++ b/test/unit/swim.c @@ -374,33 +374,45 @@ swim_test_refute(void) static void swim_test_too_big_packet(void) { - swim_start_test(2); + swim_start_test(3); int size = 50; + double ack_timeout = 1; + double first_dead_timeout = 20; + double everywhere_dead_timeout = size * 3; + int drop_id = size / 2; + struct swim_cluster *cluster = swim_cluster_new(size); for (int i = 1; i < size; ++i) swim_cluster_add_link(cluster, 0, i); - is(swim_cluster_wait_fullmesh(cluster, size), 0, "despite S1 can not "\ - "send all the %d members in a one packet, fullmesh is eventually "\ - "reached", size); - swim_cluster_set_ack_timeout(cluster, 1); - int drop_id = size / 2; + + is(swim_cluster_wait_fullmesh(cluster, size * 2), 0, "despite S1 can "\ + "not send all the %d members in a one packet, fullmesh is "\ + "eventually reached", size); + + swim_cluster_set_ack_timeout(cluster, ack_timeout); swim_cluster_set_drop(cluster, drop_id, true); + is(swim_cluster_wait_status_anywhere(cluster, drop_id, MEMBER_DEAD, + first_dead_timeout), 0, + "a dead member is detected in time not depending on cluster size"); /* - * Dissemination of a detected failure takes long time - * without help of the component, intended for that. + * GC is off to simplify and speed up checks. When no GC + * the test is sure that it is safe to check for + * MEMBER_DEAD everywhere, because it is impossible that a + * member is considered dead in one place, but already + * deleted on another. Also, total member deletion takes + * linear time, because a member is deleted from an + * instance only when *that* instance will not receive + * some direct acks from the member. Deletion and + * additional pings are not triggered if a member dead + * status is received indirectly via dissemination or + * anti-entropy. Otherwise it could produce linear network + * load on the already weak member. */ - double timeout = size * 10; - int i = 0; - for (; i < size; ++i) { - double start = swim_time(); - if (i != drop_id && - swim_cluster_wait_status(cluster, i, drop_id, - swim_member_status_MAX, timeout) != 0) - break; - timeout -= swim_time() - start; - } - is(i, size, "S%d drops all the packets - it should become dead", - drop_id + 1); + swim_cluster_set_gc(cluster, SWIM_GC_OFF); + is(swim_cluster_wait_status_everywhere(cluster, drop_id, MEMBER_DEAD, + everywhere_dead_timeout), 0, + "S%d death is eventually learned by everyone", drop_id + 1); + swim_cluster_delete(cluster); swim_finish_test(); } diff --git a/test/unit/swim.result b/test/unit/swim.result index 904f061f6..3393870c2 100644 --- a/test/unit/swim.result +++ b/test/unit/swim.result @@ -94,9 +94,10 @@ ok 8 - subtests ok 9 - subtests *** swim_test_basic_gossip: done *** *** swim_test_too_big_packet *** - 1..2 + 1..3 ok 1 - despite S1 can not send all the 50 members in a one packet, fullmesh is eventually reached - ok 2 - S26 drops all the packets - it should become dead + ok 2 - a dead member is detected in time not depending on cluster size + ok 3 - S26 death is eventually learned by everyone ok 10 - subtests *** swim_test_too_big_packet: done *** *** swim_test_undead *** diff --git a/test/unit/swim_test_utils.c b/test/unit/swim_test_utils.c index bb413372c..277a73498 100644 --- a/test/unit/swim_test_utils.c +++ b/test/unit/swim_test_utils.c @@ -361,6 +361,39 @@ swim_loop_check_member(struct swim_cluster *cluster, void *data) return true; } +/** + * Callback to check that a member matches a template on any + * instance in the cluster. + */ +static bool +swim_loop_check_member_anywhere(struct swim_cluster *cluster, void *data) +{ + struct swim_member_template *t = (struct swim_member_template *) data; + for (t->node_id = 0; t->node_id < cluster->size; ++t->node_id) { + if (t->node_id != t->member_id && + swim_loop_check_member(cluster, data)) + return true; + } + return false; +} + +/** + * Callback to check that a member matches a template on every + * instance in the cluster. + */ +static bool +swim_loop_check_member_everywhere(struct swim_cluster *cluster, void *data) +{ + struct swim_member_template *t = (struct swim_member_template *) data; + for (t->node_id = 0; t->node_id < cluster->size; ++t->node_id) { + if (t->node_id != t->member_id && + !swim_loop_check_member(cluster, data)) + return false; + } + return true; +} + + int swim_cluster_wait_status(struct swim_cluster *cluster, int node_id, int member_id, enum swim_member_status status, @@ -383,6 +416,30 @@ swim_cluster_wait_incarnation(struct swim_cluster *cluster, int node_id, return swim_wait_timeout(timeout, cluster, swim_loop_check_member, &t); } +int +swim_cluster_wait_status_anywhere(struct swim_cluster *cluster, int member_id, + enum swim_member_status status, + double timeout) +{ + struct swim_member_template t; + swim_member_template_create(&t, -1, member_id); + swim_member_template_set_status(&t, status); + return swim_wait_timeout(timeout, cluster, + swim_loop_check_member_anywhere, &t); +} + +int +swim_cluster_wait_status_everywhere(struct swim_cluster *cluster, int member_id, + enum swim_member_status status, + double timeout) +{ + struct swim_member_template t; + swim_member_template_create(&t, -1, member_id); + swim_member_template_set_status(&t, status); + return swim_wait_timeout(timeout, cluster, + swim_loop_check_member_everywhere, &t); +} + bool swim_error_check_match(const char *msg) { diff --git a/test/unit/swim_test_utils.h b/test/unit/swim_test_utils.h index d2ef00817..6e99b4879 100644 --- a/test/unit/swim_test_utils.h +++ b/test/unit/swim_test_utils.h @@ -118,6 +118,26 @@ swim_cluster_wait_status(struct swim_cluster *cluster, int node_id, int member_id, enum swim_member_status status, double timeout); +/** + * Wait until a member with id @a member_id is seen with @a status + * in the membership table of any instance in @a cluster. At most + * @a timeout seconds. + */ +int +swim_cluster_wait_status_anywhere(struct swim_cluster *cluster, int member_id, + enum swim_member_status status, + double timeout); + +/** + * Wait until a member with id @a member_id is seen with @a status + * in the membership table of every instance in @a cluster. At + * most @a timeout seconds. + */ +int +swim_cluster_wait_status_everywhere(struct swim_cluster *cluster, int member_id, + enum swim_member_status status, + double timeout); + /** * Wait until a member with id @a member_id is seen with @a * incarnation in the membership table of a member with id @a -- 2.17.2 (Apple Git-113)