From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Vladislav Shpilevoy Subject: [PATCH v3 0/6] SWIM draft Date: Sat, 29 Dec 2018 13:14:09 +0300 Message-Id: To: tarantool-patches@freelists.org Cc: vdavydov.dev@gmail.com, kostja@tarantool.org List-ID: First commit message is a comprehensive information about SWIM which I will not duplicate here. This is only a description of the patchset. SWIM consists of two main components - dissemination and failure detection, and one additional component - anti-entropy. The patchset introduces them one by one in the first three commits. Next two commits are technical improvements. The last commit allows SWIM user to carry its own data with dissemination and anti-entropy component messages. Note, these commits contain bugs, typos, and have no tests. The goal of this review is a highlevel approval of API so as to start writing tests. Nonetheless here I describe some known bugs and opened questions: 1. I tried to do not allocate most used swim_tasks without necessity and saved ping_task and ack_task in struct swim_member as attributes to reuse them. But now I do not think it is worth 'perf win', but complicates task destruction. I am planning to always allocate/delete swim_task. Also it is required for indirect ping/acks where I can not allocate tasks in advance. 2. ACKs now can be lost. I start waiting for an ACK once a PING task is scheduled, but it is not correct. Techincally ping still is not sent and swim_check_acks can mistakenly treat it as a lost ping. I am planning to fix it with saving last received ACK timestamp in struct member and start waiting for an ACK *after* PING task is finished, *but only if* member did not receive ACK already. This problem originally arises from the fact that PING can be not the only packet in a message, so I can not always safely start waiting for an ACK in swim_task.complete callback. ACK can arrive after PING packet is sent, but before the whole message is sent. Gitlab Lua version of SWIM has no this problem since it has no multi-packet support. 3. In swim_round_step_complete() it is unsafe to assume that member in queue round head is still the same as it was during scheduling task. I am planning to just do shift in swim_round_step_begin. 4. There is a problem with 'immortal' members. When a member is declaed as a dead, its state is disseminated to other members, and it is deleted from the table. But other members via anti-entropy component can return it back. The member still will have 'dead' status, but never deleted. I am planning to fix it via do not adding dead members from anti-entropy component to the local table. Opened questions: 1. Should timestamp be added to each PING/ACK in addition to incarnation? It protects from the case when ACK is duplicated accidentally, or arrived with the same incarnation, but too late. Gitlab Lua version does it, but protocol, as I remember, does not specify it. Also the code is very obfuscated in some places and still needs renaming, refactoring in most places, some movements of diffs between commits, sorry. http://github.com/tarantool/tarantool/tree/gerold103/gh-3234-swim https://github.com/tarantool/tarantool/issues/3234 Changes in v3: - packets can carry arbitrary payload; - socket reading/writing related routines and structures are moved to swim_scheduler. Changes in v2: - new API with explicit members addition, removal; - ability to create multiple SWIM instances per one Tarantool process; - multi-packet sending of one SWIM message. V1: https://www.freelists.org/post/tarantool-patches/PATCH-05-SWIM V2: https://www.freelists.org/post/tarantool-patches/PATCH-v2-06-SWIM Vladislav Shpilevoy (6): [RAW] swim: introduce SWIM's anti-entropy component [RAW] swim: introduce failure detection component [RAW] swim: introduce a dissemination component [RAW] swim: keep encoded round message cached [RAW] swim: send one UDP packet per EV_WRITE event [RAW] swim: introduce payload src/CMakeLists.txt | 3 +- src/evio.c | 3 +- src/evio.h | 4 + src/lib/CMakeLists.txt | 1 + src/lib/swim/CMakeLists.txt | 7 + src/lib/swim/swim.c | 1653 +++++++++++++++++++++++++++++++++ src/lib/swim/swim.h | 99 ++ src/lib/swim/swim_io.c | 180 ++++ src/lib/swim/swim_io.h | 316 +++++++ src/lib/swim/swim_transport.h | 66 ++ src/lua/init.c | 2 + src/lua/swim.c | 244 +++++ src/lua/swim.h | 47 + 13 files changed, 2622 insertions(+), 3 deletions(-) create mode 100644 src/lib/swim/CMakeLists.txt create mode 100644 src/lib/swim/swim.c create mode 100644 src/lib/swim/swim.h create mode 100644 src/lib/swim/swim_io.c create mode 100644 src/lib/swim/swim_io.h create mode 100644 src/lib/swim/swim_transport.h create mode 100644 src/lua/swim.c create mode 100644 src/lua/swim.h -- 2.17.2 (Apple Git-113)