From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id EC446311C2 for ; Thu, 20 Jun 2019 17:22:58 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gqnfttXBDfXN for ; Thu, 20 Jun 2019 17:22:58 -0400 (EDT) Received: from smtp62.i.mail.ru (smtp62.i.mail.ru [217.69.128.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 63B91311AC for ; Thu, 20 Jun 2019 17:22:58 -0400 (EDT) From: Vladislav Shpilevoy Subject: [tarantool-patches] [PATCH 1/2] swim: encapsulate incarnation behind 'age' Date: Thu, 20 Jun 2019 23:23:24 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: kostja@tarantool.org Traditional SWIM describes member age as incarnation - monotonically growing number to refute false gossips. But it is not enough in the real world because of necessity to detect restarts. Incarnations are not persisted, and even being persistent it won't help without addition of new incarnation-like attributes. This patch encapsulates incarnation into an 'age' to simplify further work around this area. Part of #4280 --- src/lib/swim/swim.c | 260 ++++++++++++++++++++---------------- src/lib/swim/swim_proto.c | 54 +++++--- src/lib/swim/swim_proto.h | 85 ++++++++---- src/lua/swim.lua | 7 +- test/unit/swim.c | 13 +- test/unit/swim_test_utils.c | 23 ++-- test/unit/swim_test_utils.h | 11 +- 7 files changed, 265 insertions(+), 188 deletions(-) diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c index 2b37d41e0..b55945a34 100644 --- a/src/lib/swim/swim.c +++ b/src/lib/swim/swim.c @@ -213,6 +213,24 @@ swim_uuid_hash(const struct tt_uuid *uuid) return mh_strn_hash((const char *) uuid, UUID_LEN); } +/** + * Compare two age values. + * @retval <0 l < r. + * @retval >0 l > r. + * @retval =0 l == r. + */ +static inline int +swim_age_cmp(const struct swim_age *l, const struct swim_age *r, + enum swim_ev_mask *diff) +{ + if (l->incarnation == r->incarnation) { + *diff = 0; + return 0; + } + *diff = SWIM_EV_NEW_INCARNATION; + return l->incarnation < r->incarnation ? -1 : 1; +} + /** * A cluster member description. This structure describes the * last known state of an instance. This state is updated @@ -302,19 +320,17 @@ struct swim_member { * further. Otherwise @a payload is suspected to be * outdated and can be updated in two cases only: * - * 1) when it is received with a bigger incarnation from - * anywhere; + * 1) when it is received with a newer age from anywhere; * - * 2) when it is received with the same incarnation, but - * local payload is outdated. + * 2) when it is received with the same age, but local + * payload is outdated. * - * A payload can become outdated, if anyhow a new - * incarnation of the member has been learned, but not a - * new payload. For example, a message with new payload - * could be lost, and at the same time this instance - * responds to a ping with newly incarnated ack. The ack - * receiver will learn the new incarnation, but not the - * new payload. + * A payload can become outdated, if anyhow a new age of + * the member has been learned, but not a new payload. For + * example, a message with the new payload could be lost, + * and at the same time this instance responds to a ping + * with a newly aged ack. The ack receiver will learn the + * new age, but not the new payload. * * In this case it can't be said exactly whether the * member has updated payload, or another attribute. The @@ -353,18 +369,17 @@ struct swim_member { * Failure detection component */ /** - * A monotonically growing number to refute old member's - * state, characterized by a triplet - * {incarnation, status, address}. + * A monotonically growing age value to refute old + * member's state such as status, address, payload etc. */ - uint64_t incarnation; + struct swim_age age; /** * How many recent pings did not receive an ack while the * member was in the current status. When this number * reaches a configured threshold the instance is marked * as dead. After a few more unacknowledged it is removed * from the member table. This counter is reset on each - * acknowledged ping, status or incarnation change. + * acknowledged ping, status or age change. */ int unacknowledged_pings; /** @@ -578,7 +593,7 @@ swim_register_event(struct swim *swim, struct swim_member *member) /** * Make all needed actions to process a member's update like a - * change of its status, or incarnation, or both. + * change of its status, or age, or both. */ static void swim_on_member_update(struct swim *swim, struct swim_member *member, @@ -619,19 +634,20 @@ swim_has_pending_events(struct swim *swim) } /** - * Update status and incarnation of the member if needed. Statuses - * are compared as a compound key: {incarnation, status}. So @a - * new_status can override an old one only if its incarnation is - * greater, or the same, but its status is "bigger". Statuses are - * compared by their identifier, so "alive" < "dead". This - * protects from the case when a member is detected as dead on one - * instance, but overridden by another instance with the same - * incarnation's "alive" message. + * Update status and age of the member if needed. Statuses are + * compared as a compound key: {age, status}. So @a new_status + * overrides any older value, or of the same age, but if + * @a new_status status is "bigger". + * + * Statuses are compared by their identifier, so "alive" < "dead". + * This protects from the case when a member is detected as dead + * on one instance, but overridden by another instance with an + * "alive" message of the same age. */ static inline void -swim_update_member_inc_status(struct swim *swim, struct swim_member *member, +swim_update_member_age_status(struct swim *swim, struct swim_member *member, enum swim_member_status new_status, - uint64_t incarnation) + const struct swim_age *age) { /* * Source of truth about self is this instance and it is @@ -639,16 +655,16 @@ swim_update_member_inc_status(struct swim *swim, struct swim_member *member, * separately. */ assert(member != swim->self); - if (member->incarnation < incarnation) { - enum swim_ev_mask events = SWIM_EV_NEW_INCARNATION; + enum swim_ev_mask events; + int cmp = swim_age_cmp(&member->age, age, &events); + if (cmp < 0) { if (new_status != member->status) { events |= SWIM_EV_NEW_STATUS; member->status = new_status; } - member->incarnation = incarnation; + member->age = *age; swim_on_member_update(swim, member, events); - } else if (member->incarnation == incarnation && - member->status < new_status) { + } else if (cmp == 0 && member->status < new_status) { member->status = new_status; swim_on_member_update(swim, member, SWIM_EV_NEW_STATUS); } @@ -760,7 +776,7 @@ swim_member_delete(struct swim_member *member) /** Create a new member. It is not registered anywhere here. */ static struct swim_member * swim_member_new(const struct sockaddr_in *addr, const struct tt_uuid *uuid, - enum swim_member_status status, uint64_t incarnation) + enum swim_member_status status, const struct swim_age *age) { struct swim_member *member = (struct swim_member *) calloc(1, sizeof(*member)); @@ -776,7 +792,7 @@ swim_member_new(const struct sockaddr_in *addr, const struct tt_uuid *uuid, rlist_create(&member->in_round_queue); /* Failure detection component. */ - member->incarnation = incarnation; + member->age = *age; heap_node_create(&member->in_wait_ack_heap); swim_task_create(&member->ack_task, NULL, NULL, "ack"); swim_task_create(&member->ping_task, swim_ping_task_complete, NULL, @@ -835,7 +851,8 @@ swim_find_member(struct swim *swim, const struct tt_uuid *uuid) static struct swim_member * swim_new_member(struct swim *swim, const struct sockaddr_in *addr, const struct tt_uuid *uuid, enum swim_member_status status, - uint64_t incarnation, const char *payload, int payload_size) + const struct swim_age *age, const char *payload, + int payload_size) { int new_bsize = sizeof(swim->shuffled[0]) * (mh_size(swim->members) + 1); @@ -855,8 +872,7 @@ swim_new_member(struct swim *swim, const struct sockaddr_in *addr, "wait_ack_heap"); return NULL; } - struct swim_member *member = - swim_member_new(addr, uuid, status, incarnation); + struct swim_member *member = swim_member_new(addr, uuid, status, age); if (member == NULL) return NULL; assert(swim_find_member(swim, uuid) == NULL); @@ -962,7 +978,7 @@ swim_encode_member(struct swim_packet *packet, struct swim_member *m, if (pos == NULL) return -1; swim_passport_bin_fill(passport, &m->addr, &m->uuid, m->status, - m->incarnation, encode_payload); + &m->age, encode_payload); memcpy(pos, passport, sizeof(*passport)); if (encode_payload) { pos += sizeof(*passport); @@ -1043,8 +1059,7 @@ swim_encode_failure_detection(struct swim *swim, struct swim_packet *packet, char *pos = swim_packet_alloc(packet, size); if (pos == NULL) return 0; - swim_fd_header_bin_create(&fd_header_bin, type, - swim->self->incarnation); + swim_fd_header_bin_create(&fd_header_bin, type, &swim->self->age); memcpy(pos, &fd_header_bin, size); return 1; } @@ -1405,21 +1420,23 @@ swim_update_member_addr(struct swim *swim, struct swim_member *member, /** * Update an existing member with a new definition. It is expected - * that @a def has an incarnation not older that @a member has. + * that @a def is not older than @a member. */ static inline void swim_update_member(struct swim *swim, const struct swim_member_def *def, struct swim_member *member) { assert(member != swim->self); - assert(def->incarnation >= member->incarnation); + enum swim_ev_mask events; + int cmp = swim_age_cmp(&def->age, &member->age, &events); + assert(cmp >= 0); /* * Payload update rules are simple: it can be updated - * either if the new payload has a bigger incarnation, or - * the same incarnation, but local payload is outdated. + * either if the new payload is younger, or of the same + * age, but local payload is outdated. */ bool update_payload = false; - if (def->incarnation > member->incarnation) { + if (cmp > 0) { if (! swim_inaddr_eq(&def->addr, &member->addr)) swim_update_member_addr(swim, member, &def->addr); if (def->payload_size >= 0) @@ -1435,8 +1452,7 @@ swim_update_member(struct swim *swim, const struct swim_member_def *def, /* Not such a critical error. */ diag_log(); } - swim_update_member_inc_status(swim, member, def->status, - def->incarnation); + swim_update_member_age_status(swim, member, def->status, &def->age); } /** @@ -1475,37 +1491,38 @@ swim_upsert_member(struct swim *swim, const struct swim_member_def *def, goto skip; } *result = swim_new_member(swim, &def->addr, &def->uuid, - def->status, def->incarnation, + def->status, &def->age, def->payload, def->payload_size); return *result != NULL ? 0 : -1; } *result = member; struct swim_member *self = swim->self; + enum swim_ev_mask events; + int cmp = swim_age_cmp(&def->age, &member->age, &events); if (member != self) { - if (def->incarnation < member->incarnation) + if (cmp < 0) goto skip; swim_update_member(swim, def, member); return 0; } /* - * It is possible that other instances know a bigger - * incarnation of this instance - such thing happens when - * the instance restarts and loses its local incarnation - * number. It will be restored by receiving dissemination - * and anti-entropy messages about self. + * It is possible that other instances know a bigger age + * of this instance - such thing happens when the instance + * restarts and loses its local age value. It will be + * restored by receiving dissemination and anti-entropy + * messages about self. */ - if (self->incarnation < def->incarnation) { - self->incarnation = def->incarnation; - swim_on_member_update(swim, self, SWIM_EV_NEW_INCARNATION); + if (cmp > 0) { + self->age = def->age; + swim_on_member_update(swim, self, events); } - if (def->status != MEMBER_ALIVE && - def->incarnation == self->incarnation) { + if (def->status != MEMBER_ALIVE && cmp == 0) { /* * In the cluster a gossip exists that this * instance is not alive. Refute this information * with a bigger incarnation. */ - self->incarnation++; + self->age.incarnation++; swim_on_member_update(swim, self, SWIM_EV_NEW_INCARNATION); } return 0; @@ -1569,33 +1586,33 @@ swim_process_failure_detection(struct swim *swim, const char **pos, swim_fd_msg_type_strs[def.type]); swim_member_def_create(&mdef); mdef.addr = *src; - mdef.incarnation = def.incarnation; + mdef.age = def.age; mdef.uuid = *uuid; struct swim_member *member; if (swim_upsert_member(swim, &mdef, &member) != 0) return -1; /* - * It can be NULL, for example, in case of too old - * incarnation of the failure detection request. For the - * obvious reasons we do not use outdated ACKs. But we - * also ignore outdated pings. It is because 1) we need to - * be consistent in neglect of all old messages; 2) if a - * ping is considered old, then after it was sent, this - * SWIM instance has already interacted with the sender - * and learned its new incarnation. + * It can be NULL, for example, in case of too old failure + * detection request. For the obvious reasons we do not + * use outdated ACKs. But we also ignore outdated pings. + * It is because 1) we need to be consistent in neglect of + * all old messages; 2) if a ping is considered old, then + * after it was sent, this SWIM instance has already + * interacted with the sender and learned its new age. */ if (member == NULL) return 0; /* - * It is well known fact, that SWIM compares statuses as - * compound keys {incarnation, status}. If inc1 == inc2, - * but status1 > status2, nothing should happen. But it - * works for anti-entropy only, when the new status is - * received indirectly, as a gossip. Here is a different - * case - this message was received from the member - * directly, and evidently it is alive. + * It is a well known fact, that SWIM compares statuses as + * compound keys {age, status}. If age1 == age2, but + * status1 > status2, nothing should happen. But it works + * for anti-entropy only, when the new status is received + * indirectly, as a gossip. Here is a different case - + * this message was received from the member directly, and + * evidently it is alive. */ - if (def.incarnation == member->incarnation && + enum swim_ev_mask events; + if (swim_age_cmp(&def.age, &member->age, &events) == 0 && member->status != MEMBER_ALIVE) { member->status = MEMBER_ALIVE; swim_on_member_update(swim, member, SWIM_EV_NEW_STATUS); @@ -1650,15 +1667,22 @@ swim_process_quit(struct swim *swim, const char **pos, const char *end, diag_set(SwimError, "%s map of size 1 is expected", prefix); return -1; } - uint64_t tmp; - if (swim_decode_uint(pos, end, &tmp, prefix, "a key") != 0) - return -1; - if (tmp != SWIM_QUIT_INCARNATION) { - diag_set(SwimError, "%s a key should be incarnation", prefix); - return -1; + struct swim_age age; + uint64_t key; + for (uint32_t i = 0; i < size; ++i) { + if (swim_decode_uint(pos, end, &key, prefix, "a key") != 0) + return -1; + switch (key) { + case SWIM_QUIT_INCARNATION: + if (swim_decode_uint(pos, end, &age.incarnation, prefix, + "incarnation") != 0) + return -1; + break; + default: + diag_set(SwimError, "%s unknown key", prefix); + return -1; + } } - if (swim_decode_uint(pos, end, &tmp, prefix, "incarnation") != 0) - return -1; struct swim_member *m = swim_find_member(swim, uuid); if (m == NULL) return 0; @@ -1666,11 +1690,14 @@ swim_process_quit(struct swim *swim, const char **pos, const char *end, * Check for 'self' in case this instance took UUID of a * quited instance. */ + enum swim_ev_mask events; if (m != swim->self) { - swim_update_member_inc_status(swim, m, MEMBER_LEFT, tmp); - } else if (tmp >= m->incarnation) { - m->incarnation = tmp + 1; - swim_on_member_update(swim, m, SWIM_EV_NEW_INCARNATION); + swim_update_member_age_status(swim, m, MEMBER_LEFT, &age); + } else if (swim_age_cmp(&age, &m->age, &events) >= 0) { + m->age = age; + ++m->age.incarnation; + swim_on_member_update(swim, m, events | + SWIM_EV_NEW_INCARNATION); } return 0; } @@ -1887,7 +1914,8 @@ swim_cfg(struct swim *swim, const char *uri, double heartbeat_rate, struct sockaddr_in addr; if (uri != NULL && swim_uri_to_addr(uri, &addr, prefix) != 0) return -1; - bool is_first_cfg = swim->self == NULL; + struct swim_member *self = swim->self; + bool is_first_cfg = self == NULL; struct swim_member *new_self = NULL; if (is_first_cfg) { if (uuid == NULL || tt_uuid_is_nil(uuid) || uri == NULL) { @@ -1895,21 +1923,23 @@ swim_cfg(struct swim *swim, const char *uri, double heartbeat_rate, "a first config", prefix); return -1; } - swim->self = swim_new_member(swim, &addr, uuid, MEMBER_ALIVE, - 0, NULL, 0); - if (swim->self == NULL) + struct swim_age age; + swim_age_create(&age, 0); + new_self = swim_new_member(swim, &addr, uuid, MEMBER_ALIVE, + &age, NULL, 0); + if (new_self == NULL) return -1; } else if (uuid == NULL || tt_uuid_is_nil(uuid)) { - uuid = &swim->self->uuid; - } else if (! tt_uuid_is_equal(uuid, &swim->self->uuid)) { + uuid = &self->uuid; + } else if (! tt_uuid_is_equal(uuid, &self->uuid)) { if (swim_find_member(swim, uuid) != NULL) { diag_set(SwimError, "%s a member with such UUID "\ "already exists", prefix); return -1; } - new_self = swim_new_member(swim, &swim->self->addr, uuid, - MEMBER_ALIVE, 0, swim->self->payload, - swim->self->payload_size); + new_self = swim_new_member(swim, &self->addr, uuid, + MEMBER_ALIVE, &self->age, + self->payload, self->payload_size); if (new_self == NULL) return -1; } @@ -1919,12 +1949,8 @@ swim_cfg(struct swim *swim, const char *uri, double heartbeat_rate, * was not changed. */ if (swim_scheduler_bind(&swim->scheduler, &addr) != 0) { - if (is_first_cfg) { - swim_delete_member(swim, swim->self); - swim->self = NULL; - } else if (new_self != NULL) { + if (new_self != NULL) swim_delete_member(swim, new_self); - } return -1; } /* @@ -1937,7 +1963,7 @@ swim_cfg(struct swim *swim, const char *uri, double heartbeat_rate, tt_sprintf("SWIM event handler %d", swim_fd(swim))); } else { - addr = swim->self->addr; + addr = self->addr; } struct ev_timer *t = &swim->round_tick; struct ev_loop *l = swim_loop(); @@ -1954,15 +1980,17 @@ swim_cfg(struct swim *swim, const char *uri, double heartbeat_rate, } if (new_self != NULL) { - swim->self->status = MEMBER_LEFT; - swim_on_member_update(swim, swim->self, SWIM_EV_NEW_STATUS); + if (self != NULL) { + self->status = MEMBER_LEFT; + swim_on_member_update(swim, self, SWIM_EV_NEW_STATUS); + } swim->self = new_self; + self = new_self; } - if (! swim_inaddr_eq(&addr, &swim->self->addr)) { - swim->self->incarnation++; - swim_on_member_update(swim, swim->self, - SWIM_EV_NEW_INCARNATION); - swim_update_member_addr(swim, swim->self, &addr); + if (! swim_inaddr_eq(&addr, &self->addr)) { + self->age.incarnation++; + swim_on_member_update(swim, self, SWIM_EV_NEW_INCARNATION); + swim_update_member_addr(swim, self, &addr); } if (gc_mode != SWIM_GC_DEFAULT) swim->gc_mode = gc_mode; @@ -1994,7 +2022,7 @@ swim_set_payload(struct swim *swim, const char *payload, int payload_size) struct swim_member *self = swim->self; if (swim_update_member_payload(swim, self, payload, payload_size) != 0) return -1; - self->incarnation++; + self->age.incarnation++; swim_on_member_update(swim, self, SWIM_EV_NEW_INCARNATION); return 0; } @@ -2013,7 +2041,9 @@ swim_add_member(struct swim *swim, const char *uri, const struct tt_uuid *uuid) return -1; struct swim_member *member = swim_find_member(swim, uuid); if (member == NULL) { - member = swim_new_member(swim, &addr, uuid, MEMBER_ALIVE, 0, + struct swim_age age; + swim_age_create(&age, 0); + member = swim_new_member(swim, &addr, uuid, MEMBER_ALIVE, &age, NULL, -1); return member == NULL ? -1 : 0; } @@ -2173,7 +2203,7 @@ swim_encode_quit(struct swim *swim, struct swim_packet *packet) char *pos = swim_packet_alloc(packet, sizeof(bin)); if (pos == NULL) return 0; - swim_quit_bin_create(&bin, swim->self->incarnation); + swim_quit_bin_create(&bin, &swim->self->age); memcpy(pos, &bin, sizeof(bin)); return 1; } @@ -2268,7 +2298,7 @@ swim_member_uuid(const struct swim_member *member) uint64_t swim_member_incarnation(const struct swim_member *member) { - return member->incarnation; + return member->age.incarnation; } const char * diff --git a/src/lib/swim/swim_proto.c b/src/lib/swim/swim_proto.c index 938631e49..c42d67c0a 100644 --- a/src/lib/swim/swim_proto.c +++ b/src/lib/swim/swim_proto.c @@ -169,6 +169,24 @@ swim_check_inaddr_not_empty(const struct sockaddr_in *addr, const char *prefix, return -1; } +/** + * Create a binary age structure. It requires key values by which + * the age fields should be marked. + */ +static inline void +swim_age_bin_create(struct swim_age_bin *bin, uint8_t incarnation_key) +{ + bin->k_incarnation = incarnation_key; + bin->m_incarnation = 0xcf; +} + +/** Fill already created @a bin with a real age value. */ +static inline void +swim_age_bin_fill(struct swim_age_bin *bin, const struct swim_age *age) +{ + bin->v_incarnation = mp_bswap_u64(age->incarnation); +} + /** * Create a binary address structure. It requires explicit IP and * port keys specification since the keys depend on the component @@ -248,7 +266,7 @@ swim_decode_member_key(enum swim_member_key key, const char **pos, return -1; break; case SWIM_MEMBER_INCARNATION: - if (swim_decode_uint(pos, end, &def->incarnation, prefix, + if (swim_decode_uint(pos, end, &def->age.incarnation, prefix, "member incarnation") != 0) return -1; break; @@ -308,17 +326,17 @@ swim_src_uuid_bin_create(struct swim_src_uuid_bin *header, void swim_fd_header_bin_create(struct swim_fd_header_bin *header, - enum swim_fd_msg_type type, uint64_t incarnation) + enum swim_fd_msg_type type, + const struct swim_age *age) { header->k_header = SWIM_FAILURE_DETECTION; - header->m_header = 0x82; + header->m_header = 0x81 + SWIM_AGE_BIN_SIZE; header->k_type = SWIM_FD_MSG_TYPE; header->v_type = type; - header->k_incarnation = SWIM_FD_INCARNATION; - header->m_incarnation = 0xcf; - header->v_incarnation = mp_bswap_u64(incarnation); + swim_age_bin_create(&header->age, SWIM_FD_INCARNATION); + swim_age_bin_fill(&header->age, age); } int @@ -353,7 +371,7 @@ swim_failure_detection_def_decode(struct swim_failure_detection_def *def, def->type = key; break; case SWIM_FD_INCARNATION: - if (swim_decode_uint(pos, end, &def->incarnation, + if (swim_decode_uint(pos, end, &def->age.incarnation, prefix, "incarnation") != 0) return -1; break; @@ -401,24 +419,24 @@ swim_passport_bin_create(struct swim_passport_bin *passport) passport->k_uuid = SWIM_MEMBER_UUID; passport->m_uuid = 0xc4; passport->m_uuid_len = UUID_LEN; - passport->k_incarnation = SWIM_MEMBER_INCARNATION; - passport->m_incarnation = 0xcf; + swim_age_bin_create(&passport->age, SWIM_MEMBER_INCARNATION); } void swim_passport_bin_fill(struct swim_passport_bin *passport, const struct sockaddr_in *addr, const struct tt_uuid *uuid, - enum swim_member_status status, uint64_t incarnation, - bool encode_payload) + enum swim_member_status status, + const struct swim_age *age, bool encode_payload) { - int map_size = 3 + SWIM_INADDR_BIN_SIZE + encode_payload; + int map_size = 2 + SWIM_AGE_BIN_SIZE + SWIM_INADDR_BIN_SIZE + + encode_payload; assert(mp_sizeof_map(map_size) == 1); passport->m_header = 0x80 | map_size; passport->v_status = status; swim_inaddr_bin_fill(&passport->addr, addr); memcpy(passport->v_uuid, uuid, UUID_LEN); - passport->v_incarnation = mp_bswap_u64(incarnation); + swim_age_bin_fill(&passport->age, age); } void @@ -556,13 +574,13 @@ swim_meta_def_decode(struct swim_meta_def *def, const char **pos, } void -swim_quit_bin_create(struct swim_quit_bin *header, uint64_t incarnation) +swim_quit_bin_create(struct swim_quit_bin *header, const struct swim_age *age) { header->k_quit = SWIM_QUIT; - header->m_quit = 0x81; - header->k_incarnation = SWIM_QUIT_INCARNATION; - header->m_incarnation = 0xcf; - header->v_incarnation = mp_bswap_u64(incarnation); + assert(mp_sizeof_map(SWIM_AGE_BIN_SIZE) == 1); + header->m_quit = 0x80 | SWIM_AGE_BIN_SIZE; + swim_age_bin_create(&header->age, SWIM_QUIT_INCARNATION); + swim_age_bin_fill(&header->age, age); } void diff --git a/src/lib/swim/swim_proto.h b/src/lib/swim/swim_proto.h index 482d79fb1..acd266db4 100644 --- a/src/lib/swim/swim_proto.h +++ b/src/lib/swim/swim_proto.h @@ -112,6 +112,47 @@ enum { * +-------------------------------------------------------------+ */ +/** + * Age is attached to every sent piece of information, and is used + * to reject/rewrite old data. + */ +struct swim_age { + /** + * Incarnation is a volatile fully automatic part, which + * is used to refute incorrect and rewrite old information + * during SWIM instance life cycle. + */ + uint64_t incarnation; +}; + +/** Initialize age. */ +static inline void +swim_age_create(struct swim_age *age, uint64_t incarnation) +{ + age->incarnation = incarnation; +} + +enum { + /** + * Number of keys in the age binary structure. Structures + * storing an age should use this size so as to correctly + * encode MessagePack map header. + */ + SWIM_AGE_BIN_SIZE = 1, +}; + +/** + * Binary age structure. It is expected that parent structure is a + * map. + */ +struct PACKED swim_age_bin { + /** mp_encode_uint(incarnation key) */ + uint8_t k_incarnation; + /** mp_encode_uint(64bit incarnation) */ + uint8_t m_incarnation; + uint64_t v_incarnation; +}; + /** * SWIM member attributes from anti-entropy and dissemination * messages. @@ -119,7 +160,7 @@ enum { struct swim_member_def { struct tt_uuid uuid; struct sockaddr_in addr; - uint64_t incarnation; + struct swim_age age; enum swim_member_status status; const char *payload; int payload_size; @@ -184,9 +225,9 @@ enum swim_fd_key { /** Type of the failure detection message: ping or ack. */ SWIM_FD_MSG_TYPE, /** - * Incarnation of the sender. To make the member alive if - * it was considered dead, but ping/ack with greater - * incarnation was received from it. + * Age of the sender. To make the member alive if it was + * considered dead, but a newer ping/ack was received from + * it. */ SWIM_FD_INCARNATION, }; @@ -212,24 +253,22 @@ struct PACKED swim_fd_header_bin { /** mp_encode_uint(enum swim_fd_msg_type) */ uint8_t v_type; - /** mp_encode_uint(SWIM_FD_INCARNATION) */ - uint8_t k_incarnation; - /** mp_encode_uint(64bit incarnation) */ - uint8_t m_incarnation; - uint64_t v_incarnation; + /** SWIM_FD_INCARNATION */ + struct swim_age_bin age; }; /** Initialize failure detection section. */ void swim_fd_header_bin_create(struct swim_fd_header_bin *header, - enum swim_fd_msg_type type, uint64_t incarnation); + enum swim_fd_msg_type type, + const struct swim_age *age); /** A decoded failure detection message. */ struct swim_failure_detection_def { /** Type of the message. */ enum swim_fd_msg_type type; - /** Incarnation of the sender. */ - uint64_t incarnation; + /** Age of the sender. */ + struct swim_age age; }; /** @@ -339,11 +378,8 @@ struct PACKED swim_passport_bin { uint8_t m_uuid_len; uint8_t v_uuid[UUID_LEN]; - /** mp_encode_uint(SWIM_MEMBER_INCARNATION) */ - uint8_t k_incarnation; - /** mp_encode_uint(64bit incarnation) */ - uint8_t m_incarnation; - uint64_t v_incarnation; + /** SWIM_MEMBER_INCARNATION */ + struct swim_age_bin age; }; /** @@ -384,8 +420,8 @@ void swim_passport_bin_fill(struct swim_passport_bin *passport, const struct sockaddr_in *addr, const struct tt_uuid *uuid, - enum swim_member_status status, uint64_t incarnation, - bool encode_payload); + enum swim_member_status status, + const struct swim_age *age, bool encode_payload); /** }}} Anti-entropy component */ @@ -561,7 +597,7 @@ swim_route_bin_create(struct swim_route_bin *route, /** }}} Meta component */ enum swim_quit_key { - /** Incarnation to ignore old quit messages. */ + /** Age to ignore old quit messages. */ SWIM_QUIT_INCARNATION = 0, }; @@ -572,16 +608,13 @@ struct PACKED swim_quit_bin { /** mp_encode_map(1) */ uint8_t m_quit; - /** mp_encode_uint(SWIM_QUIT_INCARNATION) */ - uint8_t k_incarnation; - /** mp_encode_uint(64bit incarnation) */ - uint8_t m_incarnation; - uint64_t v_incarnation; + /** SWIM_QUIT_INCARNATION */ + struct swim_age_bin age; }; /** Initialize quit section. */ void -swim_quit_bin_create(struct swim_quit_bin *header, uint64_t incarnation); +swim_quit_bin_create(struct swim_quit_bin *header, const struct swim_age *age); /** * Helpers to decode some values - map, array, etc with diff --git a/src/lua/swim.lua b/src/lua/swim.lua index 4f91ac233..e411a397d 100644 --- a/src/lua/swim.lua +++ b/src/lua/swim.lua @@ -360,13 +360,12 @@ end -- -- This function caches its result. It means, that only first call -- actually decodes cdata payload. All the next calls return --- pointer to the same result, until payload is changed with a new --- incarnation. +-- pointer to the same result, until a newer payload appears. -- local function swim_member_payload(m) local ptr = swim_check_member(m, 'member:payload()') - -- Two keys are needed. Incarnation is not enough, because a - -- new incarnation can be disseminated earlier than a new + -- Both the age and the flag are needed. Age is not enough, + -- because a new age can be disseminated earlier than a new -- payload. For example, via ACK messages. local key1 = capi.swim_member_incarnation(ptr) local key2 = capi.swim_member_is_payload_up_to_date(ptr) diff --git a/test/unit/swim.c b/test/unit/swim.c index 0977e0969..bffc0985d 100644 --- a/test/unit/swim.c +++ b/test/unit/swim.c @@ -391,15 +391,15 @@ swim_test_refute(void) fail_if(swim_cluster_wait_status(cluster, 0, 1, MEMBER_SUSPECTED, 4) != 0); swim_cluster_set_drop(cluster, 1, 0); - is(swim_cluster_wait_incarnation(cluster, 1, 1, 1, 1), 0, + is(swim_cluster_wait_age(cluster, 1, 1, 1, 1), 0, "S2 increments its own incarnation to refute its suspicion"); - is(swim_cluster_wait_incarnation(cluster, 0, 1, 1, 1), 0, + is(swim_cluster_wait_age(cluster, 0, 1, 1, 1), 0, "new incarnation has reached S1 with a next round message"); swim_cluster_restart_node(cluster, 1); is(swim_cluster_member_incarnation(cluster, 1, 1), 0, "after restart S2's incarnation is 0 again"); - is(swim_cluster_wait_incarnation(cluster, 1, 1, 1, 1), 0, + is(swim_cluster_wait_age(cluster, 1, 1, 1, 1), 0, "S2 learned its old bigger incarnation 1 from S0"); swim_cluster_delete(cluster); @@ -527,7 +527,7 @@ swim_test_quit(void) * old LEFT status. */ swim_cluster_restart_node(cluster, 0); - is(swim_cluster_wait_incarnation(cluster, 0, 0, 1, 2), 0, + is(swim_cluster_wait_age(cluster, 0, 0, 1, 2), 0, "quited member S1 has returned and refuted the old status"); fail_if(swim_cluster_wait_fullmesh(cluster, 2) != 0); /* @@ -557,9 +557,8 @@ swim_test_quit(void) /* Now allow S2 to get the 'self-quit' message. */ swim_cluster_unblock_io(cluster, 1); - is(swim_cluster_wait_incarnation(cluster, 1, 1, 2, 0), 0, - "S2 finally got 'quit' message from S1, but with its 'own' UUID - "\ - "refute it") + is(swim_cluster_wait_age(cluster, 1, 1, 2, 0), 0, "S2 finally got "\ + "'quit' message from S1, but with its 'own' UUID - refute it") swim_cluster_delete(cluster); /** diff --git a/test/unit/swim_test_utils.c b/test/unit/swim_test_utils.c index 463c62390..e646c3a47 100644 --- a/test/unit/swim_test_utils.c +++ b/test/unit/swim_test_utils.c @@ -709,10 +709,10 @@ struct swim_member_template { bool need_check_status; enum swim_member_status status; /** - * True, if the incarnation should be checked to be equal - * to @a incarnation. + * True, if the age should be checked to be equal to a + * target. */ - bool need_check_incarnation; + bool need_check_age; uint64_t incarnation; /** * True, if the payload should be checked to be equal to @@ -747,13 +747,13 @@ swim_member_template_set_status(struct swim_member_template *t, /** * Set that the member template should be used to check member - * incarnation. + * age. */ static inline void -swim_member_template_set_incarnation(struct swim_member_template *t, - uint64_t incarnation) +swim_member_template_set_age(struct swim_member_template *t, + uint64_t incarnation) { - t->need_check_incarnation = true; + t->need_check_age = true; t->incarnation = incarnation; } @@ -793,7 +793,7 @@ swim_loop_check_member(struct swim_cluster *cluster, void *data) } if (t->need_check_status && status != t->status) return false; - if (t->need_check_incarnation && incarnation != t->incarnation) + if (t->need_check_age && incarnation != t->incarnation) return false; if (t->need_check_payload && (payload_size != t->payload_size || @@ -847,13 +847,12 @@ swim_cluster_wait_status(struct swim_cluster *cluster, int node_id, } int -swim_cluster_wait_incarnation(struct swim_cluster *cluster, int node_id, - int member_id, uint64_t incarnation, - double timeout) +swim_cluster_wait_age(struct swim_cluster *cluster, int node_id, int member_id, + uint64_t incarnation, double timeout) { struct swim_member_template t; swim_member_template_create(&t, node_id, member_id); - swim_member_template_set_incarnation(&t, incarnation); + swim_member_template_set_age(&t, incarnation); return swim_wait_timeout(timeout, cluster, swim_loop_check_member, &t); } diff --git a/test/unit/swim_test_utils.h b/test/unit/swim_test_utils.h index fde84e39b..309636b87 100644 --- a/test/unit/swim_test_utils.h +++ b/test/unit/swim_test_utils.h @@ -218,14 +218,13 @@ swim_cluster_wait_status_everywhere(struct swim_cluster *cluster, int member_id, double timeout); /** - * Wait until a member with id @a member_id is seen with @a - * incarnation in the membership table of a member with id @a - * node_id. At most @a timeout seconds. + * Wait until a member with id @a member_id is seen with a needed + * age in the membership table of a member with id @a node_id. At + * most @a timeout seconds. */ int -swim_cluster_wait_incarnation(struct swim_cluster *cluster, int node_id, - int member_id, uint64_t incarnation, - double timeout); +swim_cluster_wait_age(struct swim_cluster *cluster, int node_id, int member_id, + uint64_t incarnation, double timeout); /** * Wait until a member with id @a member_id is seen with -- 2.20.1 (Apple Git-117)