[tarantool-patches] Re: [PATCH 2/2] swim: disseminate event for log(cluster_size) steps

Konstantin Osipov kostja at tarantool.org
Sun Jun 30 09:55:27 MSK 2019


* Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [19/06/30 09:04]:

 I don'
> Before the patch there was a problem of events and anti-entropy
> starvation, when a cluster generates so many events, that they
> consume the whole UDP packet. If during the event storm something
> important happens, that event is likely to be lost, and not
> disseminated until the storm is over.
> 
> Sadly, there is no way to prevent a storm, but it can be made
> much shorter. For that the patch makes TTD of events logarithmic
> instead of linear of cluster size.
> 
> According to the SWIM paper and to the experiments the logarithm
> is really enough. Linear TTD was a redundant overkill.
> 
> When events live shorter, it does not solve a problem of the
> events starvation - still some of them can be lost in case of a
> storm. But it frees some space for anti-entropy, which can finish
> dissemination of lost events.
> 
> Experiments in a simulation of a cluster with 100 nodes showed,
> that a failure dissemination happened in ~110 steps if there is
> a storm. Linear dissemination is the worst problem.
> After the patch it is ~20 steps. So it is logarithmic as it
> should be, although with a bigger constant than without a storm.

You say nothing in this commit about limbo queue. I have serious
doubts about your manipulation with it. The patch needs to be
split into pieces, each addressing its own problem and having a
test. Now I only see 1 test for so many changes.

-- 
Konstantin Osipov, Moscow, Russia




More information about the Tarantool-patches mailing list