From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: Konstantin Osipov <kostja@tarantool.org> Cc: tarantool-patches@freelists.org Subject: [tarantool-patches] Re: [PATCH 6/6] swim: introduce suspicion Date: Wed, 24 Apr 2019 23:28:55 +0300 [thread overview] Message-ID: <3819d551-e050-0fca-fe66-f30b064f22e6@tarantool.org> (raw) In-Reply-To: <20190424170101.GF13687@atlas> On 24/04/2019 20:01, Konstantin Osipov wrote: > * Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/04/24 18:50]: >> + /** >> + * Number of attempts to reach out a member via another >> + * members when it did not answer on a regular ping . >> + */ >> + INDIRECT_PING_COUNT = 2, > > As far as I can see it's not the number of attempts, it's the > number of cluster members involved as proxies for sending an > indirect ping. The fact that you attempt an indirect ping only > once before switching member state to dead is secondary. > > This has to do with lack of a description in changeset or code > comment of indirect pings strategy. Sorry, you are right. They are not 'attempts' strictly speaking. New comment: ================================================================== */ NO_ACKS_TO_GC = 2, /** - * Number of attempts to reach out a member via another - * members when it did not answer on a regular ping . + * Number of pings sent indirectly to a member via other + * members when it did not answer on a regular ping. The + * messages are sent in parallel and via different + * members. */ INDIRECT_PING_COUNT = 2, }; ================================================================== > >> +/** >> + * Put the member into a list of ACK waiters. @a hop_count says >> + * how many hops from one member to another the ACK is expected to >> + * do. >> + */ >> static void >> -swim_wait_ack(struct swim *swim, struct swim_member *member) >> +swim_wait_ack(struct swim *swim, struct swim_member *member, int hop_count) >> { >> if (heap_node_is_stray(&member->in_wait_ack_heap)) { >> - member->ping_deadline = swim_time() + swim->wait_ack_tick.at; >> + double timeout = swim->wait_ack_tick.at * hop_count; > > Is hop_count ever greater than two? If not, I would not use int > hop_count, but use bool is_indirect_ping, and add a comment > explaining that in case of indirect pings the number of hops is > exactly two, so we need to increase the timeout. Ok, I do not mind. ================================================================== diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c index af9b59cbb..24229bf0c 100644 --- a/src/lib/swim/swim.c +++ b/src/lib/swim/swim.c @@ -493,10 +493,18 @@ swim_cached_round_msg_invalidate(struct swim *swim) * do. */ static void -swim_wait_ack(struct swim *swim, struct swim_member *member, int hop_count) +swim_wait_ack(struct swim *swim, struct swim_member *member, + bool was_ping_indirect) { if (heap_node_is_stray(&member->in_wait_ack_heap)) { - double timeout = swim->wait_ack_tick.at * hop_count; + double timeout = swim->wait_ack_tick.at; + /* + * Direct ping is two hops: PING + ACK. + * Indirect ping is four hops: PING, FORWARD PING, + * ACK, FORWARD ACK. This is why x2 for indirects. + */ + if (was_ping_indirect) + timeout *= 2; member->ping_deadline = swim_time() + timeout; wait_ack_heap_insert(&swim->wait_ack_heap, member); swim_ev_timer_start(loop(), &swim->wait_ack_tick); @@ -627,7 +635,7 @@ swim_ping_task_complete(struct swim_task *task, struct swim *swim = swim_by_scheduler(scheduler); struct swim_member *m = container_of(task, struct swim_member, ping_task); - swim_wait_ack(swim, m, 1); + swim_wait_ack(swim, m, false); } /** Free member's resources. */ @@ -1082,7 +1090,7 @@ swim_complete_step(struct swim_task *task, * dissemination and failure detection * sections. */ - swim_wait_ack(swim, m, 1); + swim_wait_ack(swim, m, false); swim_decrease_event_ttd(swim); } } @@ -1239,7 +1247,7 @@ swim_iping_task_complete(struct swim_task *base_task, struct swim_member *m = swim_find_member(swim, &b->dst_uuid); if (m != NULL) - swim_wait_ack(swim, m, 2); + swim_wait_ack(swim, m, true); } swim_iping_task_delete(t); } ================================================================== > >> +/** >> + * When indirect pings are sent, each is represented by this >> + * structure. >> + */ >> +struct swim_iping_task { >> + /** Base task used by the scheduler. */ >> + struct swim_task base; >> + /** >> + * Reference to a block of other indirect ping tasks sent >> + * at the same moment. Used to decide whether it is needed >> + * to start waiting for an ACK, and on which member. >> + */ >> + struct swim_iping_block *block; > > I don't understand this comment, why do you need a reference? > > I would not use it at all, seems like we have a separate data > structure when existing structures would do just fine. > Talking of block - we have decided to get rid of it. To understand why, I should explain why was it needed. 1) To wait for an ACK I need a struct swim_member to put it into ack waiter queue. But in task_complete() callback I don't know the member nor its UUID. struct swim_task is too encapsulated inside the transport level. Iping block stores the UUID of sender. 2) It is possible, that a first indirect ACK was sent and we got a result faster than managed to send a next indirect ACK. In such a case we will wait for the second ACK, even having a response on the first one. We've decided that block can be removed if UUID is added to struct swim_task, and if we mark ACK and ping with timestamps. But I found another solution for ACK/ping timing problem. When a next indirect ping is sent, lets watch at swim_member status. If it is not suspected anymore, it means, that an ACK had already been received while this ping was waiting for EV_WRITE. I will not paste here diff of this change since it is too big. I would rather send V2 of this patchset. >> switch (def.type) { >> case SWIM_FD_MSG_PING: >> - if (! swim_task_is_scheduled(&member->ack_task)) >> + if (proxy != NULL) { >> + if (swim_send_indirect_ack(swim, &member->addr, >> + proxy) != 0) >> + diag_log(); >> + } else if (! swim_task_is_scheduled(&member->ack_task)) { >> swim_send_ack(swim, &member->ack_task, &member->addr); >> + } >> break; > > Why do you ever need to distinguish direct and indirect acks? Why not > respond to the sender address, while preserving the routing table > from the ping request, assuming the sender can figure out whether > the response is addressed to him directly or it should proxy it > further based on the received proxy section of the response? Such a long sentence, do not know how to split it in parts. Ok. Firstly, I do not distinguish indirect ACKs when they are received by the pinger. This is thanks to the transport level full isolation. I do not even have a special request type like 'PING_REQUEST' or something, like classical implementation does. In theory, I could even send usual anti-entropy and dissemination messages indirectly. Secondly, to preserve routing table I need to distinguish direct from indirect ACKs on sender side, because routing table is generated only when proxy address is set. I can't just say 'swim_send_ack(swim, member)' and let the transport level decide whether it should use proxy or not. The transport level doesn't have an address-table to choose proxies, as well as it does not know, that this message is a response to another message, which in turn was received via proxy. Only SWIM core has this information. So it should give the transport level a hint via 'swim_task_proxy()' method. swim_send_ack and swim_send_indirect_ack are just wrappers around swim_send_fd_msg(), passing different proxy values. The former sets proxy NULL, the latter not NULL. That's all. > > -- > Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 > http://tarantool.io - www.twitter.com/kostja_osipov >
next prev parent reply other threads:[~2019-04-24 20:28 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-04-24 14:36 [tarantool-patches] [PATCH 0/6] swim suspicion Vladislav Shpilevoy 2019-04-24 14:36 ` [tarantool-patches] [PATCH 1/6] test: rename swim_cluster_node to swim_cluster_member Vladislav Shpilevoy 2019-04-24 16:37 ` [tarantool-patches] " Konstantin Osipov 2019-04-24 14:36 ` [tarantool-patches] [PATCH 2/6] test: remove swim packet filter destructors Vladislav Shpilevoy 2019-04-24 16:37 ` [tarantool-patches] " Konstantin Osipov 2019-04-24 14:36 ` [tarantool-patches] [PATCH 3/6] test: introduce swim packet filter by destination address Vladislav Shpilevoy 2019-04-24 16:38 ` [tarantool-patches] " Konstantin Osipov 2019-04-24 14:36 ` [tarantool-patches] [PATCH 4/6] swim: wrap sio_strfaddr() Vladislav Shpilevoy 2019-04-24 16:40 ` [tarantool-patches] " Konstantin Osipov 2019-04-24 20:23 ` Vladislav Shpilevoy 2019-04-25 10:34 ` Konstantin Osipov 2019-04-25 13:50 ` Vladislav Shpilevoy 2019-04-24 14:36 ` [tarantool-patches] [PATCH 5/6] swim: introduce routing Vladislav Shpilevoy 2019-04-24 16:46 ` [tarantool-patches] " Konstantin Osipov 2019-04-24 20:25 ` Vladislav Shpilevoy 2019-04-25 10:39 ` Konstantin Osipov 2019-04-25 13:50 ` Vladislav Shpilevoy 2019-04-25 13:57 ` Konstantin Osipov 2019-04-24 14:36 ` [tarantool-patches] [PATCH 6/6] swim: introduce suspicion Vladislav Shpilevoy 2019-04-24 17:01 ` [tarantool-patches] " Konstantin Osipov 2019-04-24 20:28 ` Vladislav Shpilevoy [this message] 2019-04-25 10:42 ` Konstantin Osipov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=3819d551-e050-0fca-fe66-f30b064f22e6@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='[tarantool-patches] Re: [PATCH 6/6] swim: introduce suspicion' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox