[tarantool-patches] [PATCH 0/2] SWIM big cluster improvements, part 1

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Sun Jun 30 02:45:01 MSK 2019


It is a first patchset to make SWIM more usable in big clusters. Most of the
problems arise from incorrect interpretation of the original SWIM paper, and
extensions adaptation.

The two commits improve current version of dissemination without some dramatic
changes. First one is quite trivial.

Second one can be discussed. To understand the next proposals, please read the
commit first, and then return here.

Ok, assume you know the problem the second commit solves. But there is another
option - to protect from spurious resurrections lets suspiciously treat
anti-entropy section, and do not add new members from it as is. Instead, if we
see in the anti-entropy section a new member, lets send him a ping. Just call
probe_member() function on its URI. And don't add it to the member table now. If
it is really alive, then it will respond ack, and we add him to the table.

Basically, we add a new member to the table in two cases only - a user called
add_member(), or an ACK is received.

A problem is that it produces additional load on the member, if it is really
alive - everyone will start pinging him. We could try to wait a random timeout
before sending a ping, or do it with a certain probability, but not sure if it
would be simpler than the current solution with a limbo queue.

Besides, we can't protect from spurious resurrections on 100%, even using the
solution above. This is because of UDP - we can receive a very old ACK messages
from a member, which was deleted long time ago. And we will add him as a new and
alive member. So is it worth trying to send pings to each new member before its
addition? After all, in the most frequent case a new member is alive.

The way how to prevent resurrections does not affect performance, just a
warning.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-4253-dissemination-speed
Issue: https://github.com/tarantool/tarantool/issues/4253

Vladislav Shpilevoy (2):
  swim: prefer new events for dissemination over old
  swim: disseminate event for log(cluster_size) steps

 src/lib/swim/swim.c   | 130 +++++++++++++++++++++++++++++++++++++++---
 test/unit/swim.c      |  72 ++++++++++++++++++++++-
 test/unit/swim.result |   8 ++-
 3 files changed, 199 insertions(+), 11 deletions(-)

-- 
2.20.1 (Apple Git-117)





More information about the Tarantool-patches mailing list