From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 329FA2965D for ; Thu, 7 Mar 2019 07:33:54 -0500 (EST) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wYaxOsHF8iX3 for ; Thu, 7 Mar 2019 07:33:54 -0500 (EST) Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 58D9629656 for ; Thu, 7 Mar 2019 07:33:53 -0500 (EST) Subject: [tarantool-patches] Re: [PATCH 1/1] swim: introduce SWIM's anti-entropy component References: <5deb15eab37345ba1abc8772a4c78308d8e8f9c1.1551884926.git.v.shpilevoy@tarantool.org> <20190307102231.GB5263@chai> From: Vladislav Shpilevoy Message-ID: <8cab7faa-9991-7554-4659-8b62d141184d@tarantool.org> Date: Thu, 7 Mar 2019 15:33:50 +0300 MIME-Version: 1.0 In-Reply-To: <20190307102231.GB5263@chai> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: Konstantin Osipov Cc: tarantool-patches@freelists.org On 07/03/2019 13:22, Konstantin Osipov wrote: > * Vladislav Shpilevoy [19/03/07 11:19]: > >> diff --git a/cmake/profile.cmake b/cmake/profile.cmake >> index 0ba31fa2c..70fa74694 100644 >> --- a/cmake/profile.cmake >> +++ b/cmake/profile.cmake >> @@ -1,5 +1,5 @@ >> set(CMAKE_REQUIRED_FLAGS "-fprofile-arcs -ftest-coverage") >> -check_library_exists("" __gcov_flush "" HAVE_GCOV) >> +set(HAVE_GCOV 1) >> set(CMAKE_REQUIRED_FLAGS "") > > Stray change, please remove. It is removed. You ignored my first follow-up email. Again. > >> + /** >> + * Swim instance is its own iterator. Such iterators are >> + * obviously unstable and fragile, but it is good enough >> + * for a fast non-yielding scan of members table. Each new >> + * public API iterator iterates through member table using >> + * this variable as a storage of the current position. >> + */ > > This comment is unclear, Since you rejected my previous edition of > this comment, please check with Vladimir Davydov and write a > better comment than that. Your previous edition was just wrong. It was not unclear. Talking of Vova, I cite you: 'вова смотрит то, что важно вове'. Since obviously Vova does not care about that subsystem, I do not see any sense to ask him to review SWIM or any part of it again. Nonetheless, I just a bit tired to argue with you about each comment, so I replaced it with your version (wrong). @@ -276,11 +276,13 @@ struct swim { */ struct swim_scheduler scheduler; /** - * Swim instance is its own iterator. Such iterators are - * obviously unstable and fragile, but it is good enough - * for a fast non-yielding scan of members table. Each new - * public API iterator iterates through member table using - * this variable as a storage of the current position. + * An offset of this instance in the hash table. Is used + * to iterate in the hash table starting from this swim + * instance. Such iteration is unstable between yields + * (i.e. member positions may change when the table is + * resized after an incoming event), but still is useful + * for a fast non-yielding scan of the member table + * starting from this instance. */ mh_int_t iterator; > >> +/** >> + * Encode anti-entropy header and random members data as many as >> + * possible to the end of the packet. >> + * @retval 0 Not error, but nothing is encoded. >> + * @retval 1 Something is encoded. >> + */ > > This comment is confusing. You actually return the number of > members added to messagepack map. You can add 0 or 1 member. > Please rewrite. I do not see anything confusing. You just said the same what is written here. But as you wish. @@ -464,8 +464,7 @@ swim_new_round(struct swim *swim) /** * Encode anti-entropy header and random members data as many as * possible to the end of the packet. - * @retval 0 Not error, but nothing is encoded. - * @retval 1 Something is encoded. + * @retval Number of key-values added to the packet's root map. */ static int swim_encode_anti_entropy(struct swim *swim, struct swim_packet *packet) > >> +static int >> +swim_encode_anti_entropy(struct swim *swim, struct swim_packet *packet) > > >> + struct swim_member_bin member_bin; >> + int size = sizeof(ae_header_bin); >> + char *header = swim_packet_reserve(packet, size); >> + if (header == NULL) >> + return 0; > > I don't understand why this error can ever be ignored. Because it is not an error, I repeat it again, third time. Any swim_packet_reserve(), swim_packet_allow() can not return an error - they do not allocate anything, they just propagate some pointers. Anti-entropy is not a mandatory component, and it is ok, if it is not encoded because of too big dissemination component. > The result of > this check does not depend on environment: each time you invoke > this function your header data won't fit in the packet. This And nevertheless it fits. I do not understand what do you mean. Header data fits, and the protocol works. It is proven by tests. I reserve space for header data before encoding all the components. > violates the basic assumptions you make in the protocol, i.e. that > you can fit at least one member in the packet. Please either Where did you see such an assumption? Anti-entropy is an **optional** component. If the dissemination component is too big, then it is ok to omit anti-entropy. > explain how you can ignore such a basic problem as when you first > ever header assert/panic. > >> + char *pos = header; >> + swim_member_bin_create(&member_bin); >> + struct mh_swim_table_t *t = swim->members; >> + int i = 0, member_count = mh_size(t); >> + int rnd = swim_scaled_rand(0, member_count - 1); >> + for (mh_int_t rc = mh_swim_table_random(t, rnd), end = mh_end(t); >> + i < member_count; ++i) { >> + struct swim_member *m = *mh_swim_table_node(t, rc); >> + int new_size = size + sizeof(member_bin); >> + if (swim_packet_reserve(packet, new_size) == NULL) >> + break; > > Same here. You should assert/panic that you can fit at least one member > in the datagram. Otherwise the protocol is unusable, and this > error condition does not depend on environment - it's a > programming bug. Read my answers above. Anti-entropy is not a linchpin of the protocol, and it is ok to do not encode it. Talking of assertions, I added them to swim_encode_round_msg() which checks that the packet now always has at least two sections: SRC_UUID and ANTI_ENTROPY. It is not so in the next patches though and changes later. > >> +/** >> + * Encode source UUID. >> + * @retval 0 Not error, but nothing is encoded. >> + * @retval 1 Something is encoded. >> + */ >> +static inline int >> +swim_encode_src_uuid(struct swim *swim, struct swim_packet *packet) >> +{ > > This comment is also confusing. Take a look at the usage: > Please update the comment. As you wish. @@ -511,8 +510,7 @@ swim_encode_anti_entropy(struct swim *swim, struct swim_packet *packet) /** * Encode source UUID. - * @retval 0 Not error, but nothing is encoded. - * @retval 1 Something is encoded. + * @retval Number of key-values added to the packet's root map. */ static inline int swim_encode_src_uuid(struct swim *swim, struct swim_packet *packet) > >> + int map_size = 0; >> + map_size += swim_encode_src_uuid(swim, packet); >> + map_size += swim_encode_anti_entropy(swim, packet); > >> + >> +/** >> + * Update or create a member by its definition, received from a > > Update or create is "replace", or upsert in Tarantool vocabulary. Update or create are English words not reserved by Tarantool DML/DDL. > Update does nothing if no member exists. This is why I wrote 'or create'. > The choice of the name is also confusing since you chose to return > NULL on error. > Please rename. I presented that name to you earlier and it was ok for you. But as you wish, I renamed it to swim_upsert_member, if we are talking in "Tarantool Vocabulary". @@ -657,7 +657,7 @@ swim_update_member_addr(struct swim *swim, struct swim_member *member, * @retval New member, or updated old member. */ static struct swim_member * -swim_update_member(struct swim *swim, const struct swim_member_def *def) +swim_upsert_member(struct swim *swim, const struct swim_member_def *def) { struct swim_member *member = swim_find_member(swim, &def->uuid); if (member == NULL) { @@ -683,7 +683,7 @@ swim_process_anti_entropy(struct swim *swim, const char **pos, const char *end) struct swim_member_def def; if (swim_member_def_decode(&def, pos, end, prefix) != 0) return -1; - if (swim_update_member(swim, &def) == NULL) { + if (swim_upsert_member(swim, &def) == NULL) { /* * Not a critical error. Other members * still can be updated. > >> + if (swim_update_member(swim, &def) == NULL) { >> + /* >> + * Not a critical error. Other members >> + * still can be updated. >> + */ >> + diag_log(); > > You're not using the return value. I assume it will be used in a > subsequent patch. Otherwise please return {0,-1} It is not used in a subsequent patch, and here I do not return -1 because I want to process other members. Fail of one member update should not reject the whole packet. > >> + } >> + } >> + return 0; >> + if (size == 0) { >> + diag_set(SwimError, "%s body can not be empty", prefix); >> + goto error; >> + } > > Are you sure you want to use a single exception for both protocol > and transport errors? Yes, I am sure. > Why would you ever have a protocol error > in production? It is impossible if the code was tested enough. And it wonders me even more why should I care about a special type of error for SWIM protocol violations. > It suggests that protocol errors, generally, should > be treated differently from transport errors. Nothing to be done > in this patch about it, to be addressed in a subsequent patch. > >> +void >> +swim_delete(struct swim *swim) >> +{ >> + swim_scheduler_destroy(&swim->scheduler); >> + swim_ev_timer_stop(loop(), &swim->round_tick); >> + swim_task_destroy(&swim->round_step_task); >> + mh_int_t node = mh_first(swim->members); >> + while (node != mh_end(swim->members)) { >> + struct swim_member *m = >> + *mh_swim_table_node(swim->members, node); >> + swim_delete_member(swim, m); >> + node = mh_first(swim->members); >> + } > > This is O(n^2) from the number of members, since the hash table is > not generally shrinking after you delete a member from it. Please > make it linear: go over all mhash members, delete them with > swim_member_delete(), then delete the mhash itself. I still do not understand how do you select 'hot paths'. This place is called maybe 1-2 times per instance lifetime. And you propose to pollute it with broken encapsulation of removing the members directly bypassing swim instance. I have some assertions in swim_member_delete that it is already removed from the swim instance, and you force me to crutch that assertions here. But as you wish, again. @@ -932,12 +932,11 @@ swim_delete(struct swim *swim) swim_scheduler_destroy(&swim->scheduler); swim_ev_timer_stop(loop(), &swim->round_tick); swim_task_destroy(&swim->round_step_task); - mh_int_t node = mh_first(swim->members); - while (node != mh_end(swim->members)) {\ + mh_int_t node; + mh_foreach(swim->members, node) { struct swim_member *m = *mh_swim_table_node(swim->members, node); - swim_delete_member(swim, m); - node = mh_first(swim->members); + rlist_del_entry(m, in_round_queue); + swim_member_delete(m); } mh_swim_table_delete(swim->members); free(swim->shuffled); > >> +struct swim_iterator * >> +swim_iterator_open(struct swim *swim) >> +{ >> + assert(swim_is_configured(swim)); >> + swim->iterator = mh_first(swim->members); >> + return (struct swim_iterator *) swim; >> +} > > What are you trying to save on by fusing the iterator in? A The same question about swim_delete() - what are you trying to save in swim_delete() and why? > separate data structure? I am trying to save memory allocation for the iterator, because it is likely to be called often. Much more often than swim_delete. > What's wrong with a separate iterator > object pointing copied by-value? This code is more complicated > than necessary. To be addressed in a separate patch. It is smaller than with additional special iterators instead of one simple number mh_int_t. I do not think that it makes any sense to create standalone and stable iterators until an explicit request. > >> +static inline int >> +swim_decode_ip(struct sockaddr_in *address, const char **pos, const char *end, >> + const char *prefix, const char *param_name) >> +{ >> + uint64_t ip; >> + if (swim_decode_uint(pos, end, &ip, prefix, param_name) != 0) >> + return -1; >> + if (ip > UINT32_MAX) { >> + diag_set(SwimError, "%s %s is an invalid IP address", prefix, >> + param_name); >> + return -1; >> + } >> + address->sin_addr.s_addr = ip; > > I believe you should convert host order to network order when > assigning. Please add htons. I send port and ip in network order and do not need htons here, but as you wish. @@ -92,7 +92,7 @@ swim_decode_ip(struct sockaddr_in *address, const char **pos, const char *end, param_name); return -1; } - address->sin_addr.s_addr = ip; + address->sin_addr.s_addr = htonl(ip); return 0; } @@ -108,7 +108,7 @@ swim_decode_port(struct sockaddr_in *address, const char **pos, const char *end, param_name); return -1; } - address->sin_port = port; + address->sin_port = htons(port); return 0; } @@ -244,8 +244,8 @@ swim_member_bin_fill(struct swim_member_bin *header, enum swim_member_status status) { header->v_status = status; - header->v_addr = mp_bswap_u32(addr->sin_addr.s_addr); - header->v_port = mp_bswap_u16(addr->sin_port); + header->v_addr = mp_bswap_u32(ntohl(addr->sin_addr.s_addr)); + header->v_port = mp_bswap_u16(ntohs(addr->sin_port)); memcpy(header->v_uuid, uuid, UUID_LEN); } @@ -273,10 +273,10 @@ swim_meta_header_bin_create(struct swim_meta_header_bin *header, header->v_version = mp_bswap_u32(tarantool_version_id()); header->k_addr = SWIM_META_SRC_ADDRESS; header->m_addr = 0xce; - header->v_addr = mp_bswap_u32(src->sin_addr.s_addr); + header->v_addr = mp_bswap_u32(ntohl(src->sin_addr.s_addr)); header->k_port = SWIM_META_SRC_PORT; header->m_port = 0xcd; - header->v_port = mp_bswap_u16(src->sin_port); + header->v_port = mp_bswap_u16(ntohs(src->sin_port)); } > >> +static int >> +main_f(va_list ap) >> +{ >> + swim_start_test(5); >> + >> + (void) ap; >> + struct ev_loop *loop = loop(); >> + swim_test_ev_init(); >> + swim_test_transport_init(); >> + >> + swim_test_one_link(); >> + swim_test_sequence(); >> + swim_test_uuid_update(); >> + swim_test_cfg(); >> + swim_test_add_remove(); >> + >> + swim_test_transport_free(); >> + swim_test_ev_free(); >> + >> + test_result = check_plan(); >> + footer(); >> + return 0; >> +} > > Please also add to the test plan (in a separate patch): > > - hitting the boundary of udp datagram (i.e. case when > swim_packet_reserve() returns error) > - packet loss in the network - 5%, 10%, 20%, 50%, 90% > - packet loss from a single swim member > - one or few members "lagging" behind the rest. > - speed at which a detected failure spreads through the network > - a "flaky" member which is on and off > > I understand anti-entropy itself does not exhibit too much > testable properties, so this test plan is good enough for the > first patch. This test plan is partially covered now. But other examples require a notion of 'dead' member. The fist commit introduces only 'alive' members. In this commit, on anti-entropy level you do not know what it is 'on', 'off', 'down', 'up', 'loss level'. It appears in the failure detection component. I added your plan to my backlog. > > Please fix the above comments and push. Since I did not fix some of the above comments, I am waiting for your response before push. > > -- > Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 > http://tarantool.io - www.twitter.com/kostja_osipov >