From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id EAC112C19D for ; Wed, 24 Apr 2019 16:28:57 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SbZCk5IWy_wU for ; Wed, 24 Apr 2019 16:28:57 -0400 (EDT) Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 88F432C0C5 for ; Wed, 24 Apr 2019 16:28:57 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH 6/6] swim: introduce suspicion References: <0810d2237700cc59a23559f083cd238c1a88d8ec.1556116199.git.v.shpilevoy@tarantool.org> <20190424170101.GF13687@atlas> From: Vladislav Shpilevoy Message-ID: <3819d551-e050-0fca-fe66-f30b064f22e6@tarantool.org> Date: Wed, 24 Apr 2019 23:28:55 +0300 MIME-Version: 1.0 In-Reply-To: <20190424170101.GF13687@atlas> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: Konstantin Osipov Cc: tarantool-patches@freelists.org On 24/04/2019 20:01, Konstantin Osipov wrote: > * Vladislav Shpilevoy [19/04/24 18:50]: >> + /** >> + * Number of attempts to reach out a member via another >> + * members when it did not answer on a regular ping . >> + */ >> + INDIRECT_PING_COUNT = 2, > > As far as I can see it's not the number of attempts, it's the > number of cluster members involved as proxies for sending an > indirect ping. The fact that you attempt an indirect ping only > once before switching member state to dead is secondary. > > This has to do with lack of a description in changeset or code > comment of indirect pings strategy. Sorry, you are right. They are not 'attempts' strictly speaking. New comment: ================================================================== */ NO_ACKS_TO_GC = 2, /** - * Number of attempts to reach out a member via another - * members when it did not answer on a regular ping . + * Number of pings sent indirectly to a member via other + * members when it did not answer on a regular ping. The + * messages are sent in parallel and via different + * members. */ INDIRECT_PING_COUNT = 2, }; ================================================================== > >> +/** >> + * Put the member into a list of ACK waiters. @a hop_count says >> + * how many hops from one member to another the ACK is expected to >> + * do. >> + */ >> static void >> -swim_wait_ack(struct swim *swim, struct swim_member *member) >> +swim_wait_ack(struct swim *swim, struct swim_member *member, int hop_count) >> { >> if (heap_node_is_stray(&member->in_wait_ack_heap)) { >> - member->ping_deadline = swim_time() + swim->wait_ack_tick.at; >> + double timeout = swim->wait_ack_tick.at * hop_count; > > Is hop_count ever greater than two? If not, I would not use int > hop_count, but use bool is_indirect_ping, and add a comment > explaining that in case of indirect pings the number of hops is > exactly two, so we need to increase the timeout. Ok, I do not mind. ================================================================== diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c index af9b59cbb..24229bf0c 100644 --- a/src/lib/swim/swim.c +++ b/src/lib/swim/swim.c @@ -493,10 +493,18 @@ swim_cached_round_msg_invalidate(struct swim *swim) * do. */ static void -swim_wait_ack(struct swim *swim, struct swim_member *member, int hop_count) +swim_wait_ack(struct swim *swim, struct swim_member *member, + bool was_ping_indirect) { if (heap_node_is_stray(&member->in_wait_ack_heap)) { - double timeout = swim->wait_ack_tick.at * hop_count; + double timeout = swim->wait_ack_tick.at; + /* + * Direct ping is two hops: PING + ACK. + * Indirect ping is four hops: PING, FORWARD PING, + * ACK, FORWARD ACK. This is why x2 for indirects. + */ + if (was_ping_indirect) + timeout *= 2; member->ping_deadline = swim_time() + timeout; wait_ack_heap_insert(&swim->wait_ack_heap, member); swim_ev_timer_start(loop(), &swim->wait_ack_tick); @@ -627,7 +635,7 @@ swim_ping_task_complete(struct swim_task *task, struct swim *swim = swim_by_scheduler(scheduler); struct swim_member *m = container_of(task, struct swim_member, ping_task); - swim_wait_ack(swim, m, 1); + swim_wait_ack(swim, m, false); } /** Free member's resources. */ @@ -1082,7 +1090,7 @@ swim_complete_step(struct swim_task *task, * dissemination and failure detection * sections. */ - swim_wait_ack(swim, m, 1); + swim_wait_ack(swim, m, false); swim_decrease_event_ttd(swim); } } @@ -1239,7 +1247,7 @@ swim_iping_task_complete(struct swim_task *base_task, struct swim_member *m = swim_find_member(swim, &b->dst_uuid); if (m != NULL) - swim_wait_ack(swim, m, 2); + swim_wait_ack(swim, m, true); } swim_iping_task_delete(t); } ================================================================== > >> +/** >> + * When indirect pings are sent, each is represented by this >> + * structure. >> + */ >> +struct swim_iping_task { >> + /** Base task used by the scheduler. */ >> + struct swim_task base; >> + /** >> + * Reference to a block of other indirect ping tasks sent >> + * at the same moment. Used to decide whether it is needed >> + * to start waiting for an ACK, and on which member. >> + */ >> + struct swim_iping_block *block; > > I don't understand this comment, why do you need a reference? > > I would not use it at all, seems like we have a separate data > structure when existing structures would do just fine. > Talking of block - we have decided to get rid of it. To understand why, I should explain why was it needed. 1) To wait for an ACK I need a struct swim_member to put it into ack waiter queue. But in task_complete() callback I don't know the member nor its UUID. struct swim_task is too encapsulated inside the transport level. Iping block stores the UUID of sender. 2) It is possible, that a first indirect ACK was sent and we got a result faster than managed to send a next indirect ACK. In such a case we will wait for the second ACK, even having a response on the first one. We've decided that block can be removed if UUID is added to struct swim_task, and if we mark ACK and ping with timestamps. But I found another solution for ACK/ping timing problem. When a next indirect ping is sent, lets watch at swim_member status. If it is not suspected anymore, it means, that an ACK had already been received while this ping was waiting for EV_WRITE. I will not paste here diff of this change since it is too big. I would rather send V2 of this patchset. >> switch (def.type) { >> case SWIM_FD_MSG_PING: >> - if (! swim_task_is_scheduled(&member->ack_task)) >> + if (proxy != NULL) { >> + if (swim_send_indirect_ack(swim, &member->addr, >> + proxy) != 0) >> + diag_log(); >> + } else if (! swim_task_is_scheduled(&member->ack_task)) { >> swim_send_ack(swim, &member->ack_task, &member->addr); >> + } >> break; > > Why do you ever need to distinguish direct and indirect acks? Why not > respond to the sender address, while preserving the routing table > from the ping request, assuming the sender can figure out whether > the response is addressed to him directly or it should proxy it > further based on the received proxy section of the response? Such a long sentence, do not know how to split it in parts. Ok. Firstly, I do not distinguish indirect ACKs when they are received by the pinger. This is thanks to the transport level full isolation. I do not even have a special request type like 'PING_REQUEST' or something, like classical implementation does. In theory, I could even send usual anti-entropy and dissemination messages indirectly. Secondly, to preserve routing table I need to distinguish direct from indirect ACKs on sender side, because routing table is generated only when proxy address is set. I can't just say 'swim_send_ack(swim, member)' and let the transport level decide whether it should use proxy or not. The transport level doesn't have an address-table to choose proxies, as well as it does not know, that this message is a response to another message, which in turn was received via proxy. Only SWIM core has this information. So it should give the transport level a hint via 'swim_task_proxy()' method. swim_send_ack and swim_send_indirect_ack are just wrappers around swim_send_fd_msg(), passing different proxy values. The former sets proxy NULL, the latter not NULL. That's all. > > -- > Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 > http://tarantool.io - www.twitter.com/kostja_osipov >