[tarantool-patches] [PATCH v3 2/6] [RAW] swim: introduce failure detection component

Konstantin Osipov kostja at tarantool.org
Wed Jan 9 16:48:09 MSK 2019


* Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [18/12/29 15:07]:
>  enum {
>  	/** How often to send membership messages and pings. */
>  	HEARTBEAT_RATE_DEFAULT = 1,
> +	/**
> +	 * If a ping was sent, it is considered to be lost after
> +	 * this time without an ack.
> +	 */
> +	ACK_TIMEOUT = 1,

The timeout should be configurable. A reasonable default looks
more close 30 seconds at least - we have many cases in
(malfunctioning) production when long requests stall the event
loop for 10-15 seconds, such requests should not lead to
membership storms.

> +	/**
> +	 * If a member has not been responding to pings this
> +	 * number of times, it is considered to be dead.
> +	 */
> +	NO_ACKS_TO_DEAD = 3,
> +	/**
> +	 * If a not pinned member confirmed to be dead, it is
> +	 * removed from the membership after at least this number
> +	 * of failed pings.
> +	 */
> +	NO_ACKS_TO_GC = NO_ACKS_TO_DEAD + 2,
>  };
>  
> +	bool is_pinned;
> +	/** Growing number to refute old messages. */

reject, or perhaps ignore? 

refute is usually used for arguments in a heated conversation.

> +	/**
> +	 * How many pings did not receive an ack in a row. After
> +	 * a threshold the instance is marked as dead. After more
> +	 * it is removed from the table (if not pinned).
> +	 */
> +	int failed_pings;

These are more like unacknowledged pings. Have they failed? Maybe.

> +	/** When the latest ping was sent to this member. */
> +	double ping_ts;

last_ping_time? Why use double and not ev_time_t? 

> +	/** Type of the failure detection message: ping or ack. */
> +	SWIM_FD_MSG_TYPE,
> +	/**
> +	 * Incarnation of the sender. To make the member alive if
> +	 * it was considered to be dead, but ping/ack with greater
> +	 * incarnation was received from it.
> +	 */
> +	SWIM_FD_INCARNATION,

Uhm, FD is too commonly used for a file descriptor. Please use a
different name. Why not simply SWIM_PING and SWIM_ACK?

>  struct swim_transport swim_udp_transport = {
>  	/* .send_round_msg = */ swim_udp_send_msg,
> +	/* .send_ping = */ swim_udp_send_msg,
> +	/* .send_ack = */ swim_udp_send_msg,

Why do you need a separate transport api for ack/ping sends?
Shouldn't send/recv be enough? This is a transport layer, it
should be unaware of protocol details.


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov



More information about the Tarantool-patches mailing list