From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id D7D7F2C2CE for ; Wed, 24 Apr 2019 13:01:04 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vAR1G59ffIse for ; Wed, 24 Apr 2019 13:01:04 -0400 (EDT) Received: from smtp14.mail.ru (smtp14.mail.ru [94.100.181.95]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 93B6A2C1E5 for ; Wed, 24 Apr 2019 13:01:04 -0400 (EDT) Date: Wed, 24 Apr 2019 20:01:01 +0300 From: Konstantin Osipov Subject: [tarantool-patches] Re: [PATCH 6/6] swim: introduce suspicion Message-ID: <20190424170101.GF13687@atlas> References: <0810d2237700cc59a23559f083cd238c1a88d8ec.1556116199.git.v.shpilevoy@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0810d2237700cc59a23559f083cd238c1a88d8ec.1556116199.git.v.shpilevoy@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: Vladislav Shpilevoy Cc: tarantool-patches@freelists.org * Vladislav Shpilevoy [19/04/24 18:50]: > + /** > + * Number of attempts to reach out a member via another > + * members when it did not answer on a regular ping . > + */ > + INDIRECT_PING_COUNT = 2, As far as I can see it's not the number of attempts, it's the number of cluster members involved as proxies for sending an indirect ping. The fact that you attempt an indirect ping only once before switching member state to dead is secondary. This has to do with lack of a description in changeset or code comment of indirect pings strategy. > +/** > + * Put the member into a list of ACK waiters. @a hop_count says > + * how many hops from one member to another the ACK is expected to > + * do. > + */ > static void > -swim_wait_ack(struct swim *swim, struct swim_member *member) > +swim_wait_ack(struct swim *swim, struct swim_member *member, int hop_count) > { > if (heap_node_is_stray(&member->in_wait_ack_heap)) { > - member->ping_deadline = swim_time() + swim->wait_ack_tick.at; > + double timeout = swim->wait_ack_tick.at * hop_count; Is hop_count ever greater than two? If not, I would not use int hop_count, but use bool is_indirect_ping, and add a comment explaining that in case of indirect pings the number of hops is exactly two, so we need to increase the timeout. > +/** > + * When indirect pings are sent, each is represented by this > + * structure. > + */ > +struct swim_iping_task { > + /** Base task used by the scheduler. */ > + struct swim_task base; > + /** > + * Reference to a block of other indirect ping tasks sent > + * at the same moment. Used to decide whether it is needed > + * to start waiting for an ACK, and on which member. > + */ > + struct swim_iping_block *block; I don't understand this comment, why do you need a reference? I would not use it at all, seems like we have a separate data structure when existing structures would do just fine. > +}; > + > +/** > + * An array of indirect ping tasks sent simultaneously via > + * different proxies. The block contains meta information allowing > + * to 1) start waiting for an ACK only after first successful > + * send; 2) determine which member has sent the pings - UUID is > + * needed for that, inet address is not enough. > + * > + * The block is deleted when the last task is finished. > + */ > +struct swim_iping_block { > + /** Array of indirect ping tasks. */ > + struct swim_iping_task tasks[INDIRECT_PING_COUNT]; > + /** > + * UUID of the destination member. Used to set ping > + * deadline in that member when at least one is sent > + * successfully. > + */ > + struct tt_uuid dst_uuid; > + /** > + * The flag is used to wait for an ACK only once after a > + * first ping is sent to protect from a case, when an ACK > + * is received faster, than the last ping is sent. Then > + * the whole indirect ping block should be considered > + * acked. > + */ Why care if indirect ping block is acknowledged or not? What matters is member status, and it is changed when we get an ack. In other words, I'd drop the whole idea with blocks and just send multiple independent pings. > switch (def.type) { > case SWIM_FD_MSG_PING: > - if (! swim_task_is_scheduled(&member->ack_task)) > + if (proxy != NULL) { > + if (swim_send_indirect_ack(swim, &member->addr, > + proxy) != 0) > + diag_log(); > + } else if (! swim_task_is_scheduled(&member->ack_task)) { > swim_send_ack(swim, &member->ack_task, &member->addr); > + } > break; Why do you ever need to distinguish direct and indirect acks? Why not respond to the sender address, while preserving the routing table from the ping request, assuming the sender can figure out whether the response is addressed to him directly or it should proxy it further based on the received proxy section of the response? -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov