[tarantool-patches] [PATCH v3 2/6] [RAW] swim: introduce failure detection component
Konstantin Osipov
kostja at tarantool.org
Wed Jan 9 16:48:09 MSK 2019
* Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [18/12/29 15:07]:
> enum {
> /** How often to send membership messages and pings. */
> HEARTBEAT_RATE_DEFAULT = 1,
> + /**
> + * If a ping was sent, it is considered to be lost after
> + * this time without an ack.
> + */
> + ACK_TIMEOUT = 1,
The timeout should be configurable. A reasonable default looks
more close 30 seconds at least - we have many cases in
(malfunctioning) production when long requests stall the event
loop for 10-15 seconds, such requests should not lead to
membership storms.
> + /**
> + * If a member has not been responding to pings this
> + * number of times, it is considered to be dead.
> + */
> + NO_ACKS_TO_DEAD = 3,
> + /**
> + * If a not pinned member confirmed to be dead, it is
> + * removed from the membership after at least this number
> + * of failed pings.
> + */
> + NO_ACKS_TO_GC = NO_ACKS_TO_DEAD + 2,
> };
>
> + bool is_pinned;
> + /** Growing number to refute old messages. */
reject, or perhaps ignore?
refute is usually used for arguments in a heated conversation.
> + /**
> + * How many pings did not receive an ack in a row. After
> + * a threshold the instance is marked as dead. After more
> + * it is removed from the table (if not pinned).
> + */
> + int failed_pings;
These are more like unacknowledged pings. Have they failed? Maybe.
> + /** When the latest ping was sent to this member. */
> + double ping_ts;
last_ping_time? Why use double and not ev_time_t?
> + /** Type of the failure detection message: ping or ack. */
> + SWIM_FD_MSG_TYPE,
> + /**
> + * Incarnation of the sender. To make the member alive if
> + * it was considered to be dead, but ping/ack with greater
> + * incarnation was received from it.
> + */
> + SWIM_FD_INCARNATION,
Uhm, FD is too commonly used for a file descriptor. Please use a
different name. Why not simply SWIM_PING and SWIM_ACK?
> struct swim_transport swim_udp_transport = {
> /* .send_round_msg = */ swim_udp_send_msg,
> + /* .send_ping = */ swim_udp_send_msg,
> + /* .send_ack = */ swim_udp_send_msg,
Why do you need a separate transport api for ack/ping sends?
Shouldn't send/recv be enough? This is a transport layer, it
should be unaware of protocol details.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
More information about the Tarantool-patches
mailing list