From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 94744211F4 for ; Sat, 29 Jun 2019 19:44:12 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kwiQac3AuXLx for ; Sat, 29 Jun 2019 19:44:11 -0400 (EDT) Received: from smtp49.i.mail.ru (smtp49.i.mail.ru [94.100.177.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 3B26C21196 for ; Sat, 29 Jun 2019 19:44:11 -0400 (EDT) From: Vladislav Shpilevoy Subject: [tarantool-patches] [PATCH 0/2] SWIM big cluster improvements, part 1 Date: Sun, 30 Jun 2019 01:45:01 +0200 Message-Id: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: kostja@tarantool.org It is a first patchset to make SWIM more usable in big clusters. Most of the problems arise from incorrect interpretation of the original SWIM paper, and extensions adaptation. The two commits improve current version of dissemination without some dramatic changes. First one is quite trivial. Second one can be discussed. To understand the next proposals, please read the commit first, and then return here. Ok, assume you know the problem the second commit solves. But there is another option - to protect from spurious resurrections lets suspiciously treat anti-entropy section, and do not add new members from it as is. Instead, if we see in the anti-entropy section a new member, lets send him a ping. Just call probe_member() function on its URI. And don't add it to the member table now. If it is really alive, then it will respond ack, and we add him to the table. Basically, we add a new member to the table in two cases only - a user called add_member(), or an ACK is received. A problem is that it produces additional load on the member, if it is really alive - everyone will start pinging him. We could try to wait a random timeout before sending a ping, or do it with a certain probability, but not sure if it would be simpler than the current solution with a limbo queue. Besides, we can't protect from spurious resurrections on 100%, even using the solution above. This is because of UDP - we can receive a very old ACK messages from a member, which was deleted long time ago. And we will add him as a new and alive member. So is it worth trying to send pings to each new member before its addition? After all, in the most frequent case a new member is alive. The way how to prevent resurrections does not affect performance, just a warning. Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-4253-dissemination-speed Issue: https://github.com/tarantool/tarantool/issues/4253 Vladislav Shpilevoy (2): swim: prefer new events for dissemination over old swim: disseminate event for log(cluster_size) steps src/lib/swim/swim.c | 130 +++++++++++++++++++++++++++++++++++++++--- test/unit/swim.c | 72 ++++++++++++++++++++++- test/unit/swim.result | 8 ++- 3 files changed, 199 insertions(+), 11 deletions(-) -- 2.20.1 (Apple Git-117)