From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp14.mail.ru (smtp14.mail.ru [94.100.181.95]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id AB203469719 for ; Tue, 6 Oct 2020 10:30:59 +0300 (MSK) References: <20201002103312.23042-1-sergepetrenko@tarantool.org> <2a1cf4cc-6ab3-f5b0-774b-8a6e04be095a@tarantool.org> From: Serge Petrenko Message-ID: <5a94f44a-62d8-1ded-bc46-f85c1d99284f@tarantool.org> Date: Tue, 6 Oct 2020 10:30:58 +0300 MIME-Version: 1.0 In-Reply-To: <2a1cf4cc-6ab3-f5b0-774b-8a6e04be095a@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-GB Subject: Re: [Tarantool-patches] [PATCH] raft: add a test with synchronous replication List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org 06.10.2020 00:40, Vladislav Shpilevoy пишет: > Hi! Thanks for the fixes! Thanks for the  review! > >>>> + | --- >>>> + | ... >>>> diff --git a/test/replication/election_replica.lua b/test/replication/election_replica.lua >>>> index 36ea1f077..887d8a2a0 100644 >>>> --- a/test/replication/election_replica.lua >>>> +++ b/test/replication/election_replica.lua >>>> @@ -19,8 +20,11 @@ box.cfg({ >>>>       replication_timeout = 0.1, >>>>       election_is_enabled = true, >>>>       election_is_candidate = true, >>>> -    election_timeout = 0.1, >>>> -    replication_synchro_quorum = 3, >>>> +    -- Should be at least as big as replication_disconnect_timeout, which is >>>> +    -- 4 * replication_timeout. >>>> +    election_timeout = 0.4, >>> 2. Why? Election timeout has nothing to do with disconnect. It is about >>> split vote. This also will slow down raft_basic.test.lua, which is not >>> supposed to be long. For heartbeat timeouts Raft already uses >>> replication_disconnect_timeout = replication_timeout * 4. >> I've seen cases when a leader is elected, but doesn't send out the is_leader flag >> in time, so new elections start over and over again. This only happened when the >> tests were run in parallel, so the problem was probably in high load. > It should not be a problem. 100ms is enough to eventually elect a leader when the > instances run on the same machine. Several election attempts should not lead to > a test fail. Because even 0.4 may lead to that. It is not a guaranteed protection. > >> So, my logic was that if we wait for 4 times replication timeout for the leader to >> come back why not wait for 4 * replication timeout for the leader to establish >> its leadership. >> >> I mean, if it's considered a normal situation when a leader disappears for not more >> than 4 * replication_timeout, and this doesn't trigger an election, why should >> elections end before at least 4 * replication_timeout seconds pass? > Because it is safe to retry it, and it is normal due to split vote possibility. > >> By the way, the raft paper doesn't have a separate leader disconnect timeout. The >> same election timeout is used for this purpose. So that's another argument for >> setting election_timeout to at least 4 * replication_timeout. > But I see your point. I started a discussion with other participants. It is > likely we will remove election_timeout option and use replication death timeout > instead. This might be reasonable. It looks like detecting a split vote and ending an election early isn't that hard since the instances send out their votes to every cluster member. > > Also we will probably drop election_is_enabled and election_is_candidate, and > replace them with a new option election_mode, which is a string: either 'off', > or 'candidate', or 'voter'. Another alternative - 'off' / 'on' / 'voter'. > Or 'voter' -> 'only_vote'. Idk yet. Anyway it looks better than 2 flags, I think. Yeah, sounds good. > > The patch LGTM. However it seems your didn't push the update on the branch. Oh, my bad. Fixed now. -- Serge Petrenko