From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 8E93121146 for ; Sun, 30 Jun 2019 12:23:18 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vcFgZzAigNjb for ; Sun, 30 Jun 2019 12:23:18 -0400 (EDT) Received: from smtp49.i.mail.ru (smtp49.i.mail.ru [94.100.177.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id D952921101 for ; Sun, 30 Jun 2019 12:23:17 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH 2/2] swim: disseminate event for log(cluster_size) steps References: <6ec1fe0f2d9f22d5d254bf9b434ccbd72fa72eb7.1561851087.git.v.shpilevoy@tarantool.org> <20190630065527.GC18621@atlas> From: Vladislav Shpilevoy Message-ID: <0ef559cb-c69a-fb21-d3e4-527beac009e1@tarantool.org> Date: Sun, 30 Jun 2019 18:24:13 +0200 MIME-Version: 1.0 In-Reply-To: <20190630065527.GC18621@atlas> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: Konstantin Osipov Cc: tarantool-patches@freelists.org On 30/06/2019 08:55, Konstantin Osipov wrote: > * Vladislav Shpilevoy [19/06/30 09:04]: > > I don' >> Before the patch there was a problem of events and anti-entropy >> starvation, when a cluster generates so many events, that they >> consume the whole UDP packet. If during the event storm something >> important happens, that event is likely to be lost, and not >> disseminated until the storm is over. >> >> Sadly, there is no way to prevent a storm, but it can be made >> much shorter. For that the patch makes TTD of events logarithmic >> instead of linear of cluster size. >> >> According to the SWIM paper and to the experiments the logarithm >> is really enough. Linear TTD was a redundant overkill. >> >> When events live shorter, it does not solve a problem of the >> events starvation - still some of them can be lost in case of a >> storm. But it frees some space for anti-entropy, which can finish >> dissemination of lost events. >> >> Experiments in a simulation of a cluster with 100 nodes showed, >> that a failure dissemination happened in ~110 steps if there is >> a storm. Linear dissemination is the worst problem. >> After the patch it is ~20 steps. So it is logarithmic as it >> should be, although with a bigger constant than without a storm. > > You say nothing in this commit about limbo queue. I have serious > doubts about your manipulation with it. The patch needs to be > split into pieces, each addressing its own problem and having a > test. Now I only see 1 test for so many changes. > Limbo queue makes no sense before logarithmic TTD. I can add it in a separate commit, but then the first commit with log TTD will fail some tests. And they will be fixed in the second one adding the limbo queue. If you don't like the limbo queue idea, then answer on my cover letter in this thread, where I explain what alternatives exist for the limbo queue.