From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp54.i.mail.ru (smtp54.i.mail.ru [217.69.128.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id BE067469719 for ; Mon, 5 Oct 2020 11:52:43 +0300 (MSK) References: <20201002103312.23042-1-sergepetrenko@tarantool.org> From: Serge Petrenko Message-ID: Date: Mon, 5 Oct 2020 11:52:42 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-GB Subject: Re: [Tarantool-patches] [PATCH] raft: add a test with synchronous replication List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org 04.10.2020 16:54, Vladislav Shpilevoy пишет: > Hi! Thanks for the patch! Hi! Thanks for the review! > >> diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result >> new file mode 100644 >> index 000000000..1bf13d7bc >> --- /dev/null >> +++ b/test/replication/election_qsync.result >> @@ -0,0 +1,125 @@ >> +SERVERS = {'election_replica1', 'election_replica2', 'election_replica3'} >> + | --- >> + | ... >> +test_run:create_cluster(SERVERS, "replication", {args='2'}) >> + | --- >> + | ... >> +test_run:wait_fullmesh(SERVERS) >> + | --- >> + | ... >> + >> +nrs = {true, true, true} > 1. What is 'nrs'? I'll add a comment.  nrs is a parameter passed to get_leader(). nrs = {true, true, true} means check whether any of the 3 instances is a leader. Once I kill the former leader, nrs[former_leader_nr]  = false, and I wait till one  of the two instances left becomes a leader. nrs is short for "numbers", I guess. > >> + | --- >> + | ... >> diff --git a/test/replication/election_replica.lua b/test/replication/election_replica.lua >> index 36ea1f077..887d8a2a0 100644 >> --- a/test/replication/election_replica.lua >> +++ b/test/replication/election_replica.lua >> @@ -19,8 +20,11 @@ box.cfg({ >> replication_timeout = 0.1, >> election_is_enabled = true, >> election_is_candidate = true, >> - election_timeout = 0.1, >> - replication_synchro_quorum = 3, >> + -- Should be at least as big as replication_disconnect_timeout, which is >> + -- 4 * replication_timeout. >> + election_timeout = 0.4, > 2. Why? Election timeout has nothing to do with disconnect. It is about > split vote. This also will slow down raft_basic.test.lua, which is not > supposed to be long. For heartbeat timeouts Raft already uses > replication_disconnect_timeout = replication_timeout * 4. I've seen cases when a leader is elected, but doesn't send out the is_leader flag in time, so new elections start over and over again. This only happened when the tests were run in parallel, so the problem was probably in high load. So, my logic was that if we wait for 4 times replication timeout for the leader to come back why not wait for 4 * replication timeout for the leader to establish its leadership. I mean, if it's considered a normal situation when a leader disappears for not more than 4 * replication_timeout, and this doesn't trigger an election, why should elections end before at least 4 * replication_timeout seconds pass? By the way, the raft paper doesn't have a separate leader disconnect timeout. The same election timeout is used for this purpose. So that's another argument for setting election_timeout to at least 4 * replication_timeout. Speaking of raft_basic.test.lua becoming slow, let's pass election_timeout as an argument to replica, just like I do it for replication_synchro_quorum. Here are the changes: diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result index 1bf13d7bc..9497b37bf 100644 --- a/test/replication/election_qsync.result +++ b/test/replication/election_qsync.result @@ -44,13 +44,16 @@ test_run:cmd('setopt delimiter ""');  SERVERS = {'election_replica1', 'election_replica2', 'election_replica3'}   | ---   | ... -test_run:create_cluster(SERVERS, "replication", {args='2'}) +test_run:create_cluster(SERVERS, "replication", {args='2 0.4'})   | ---   | ...  test_run:wait_fullmesh(SERVERS)   | ---   | ... +-- Any of the three instances may be the leader now. +-- When the former leader is killed, we expect one of the two instances left +-- to become a leader, so nrs[former_leader_nr] = false.  nrs = {true, true, true}   | ---   | ... @@ -94,7 +97,7 @@ for i = 1,10 do      c:eval('box.cfg{replication_synchro_timeout=1000}')      c.space._schema:replace{'smth'}      c.space.test:get{i} -    test_run:cmd('start server '..old_leader..' with wait=True, wait_load=True, args="2"') +    test_run:cmd('start server '..old_leader..' with wait=True, wait_load=True, args="2 0.4"')      nrs[old_leader_nr] = true      old_leader_nr = new_leader_nr      old_leader = new_leader diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua index f069c71bb..bca1b20c7 100644 --- a/test/replication/election_qsync.test.lua +++ b/test/replication/election_qsync.test.lua @@ -29,9 +29,12 @@ end;  test_run:cmd('setopt delimiter ""');  SERVERS = {'election_replica1', 'election_replica2', 'election_replica3'} -test_run:create_cluster(SERVERS, "replication", {args='2'}) +test_run:create_cluster(SERVERS, "replication", {args='2 0.4'})  test_run:wait_fullmesh(SERVERS) +-- Any of the three instances may be the leader now. +-- When the former leader is killed, we expect one of the two instances left +-- to become a leader, so nrs[former_leader_nr] = false.  nrs = {true, true, true}  old_leader_nr = get_leader(nrs)  old_leader = 'election_replica'..old_leader_nr @@ -58,7 +61,7 @@ for i = 1,10 do      c:eval('box.cfg{replication_synchro_timeout=1000}')      c.space._schema:replace{'smth'}      c.space.test:get{i} -    test_run:cmd('start server '..old_leader..' with wait=True, wait_load=True, args="2"') +    test_run:cmd('start server '..old_leader..' with wait=True, wait_load=True, args="2 0.4"')      nrs[old_leader_nr] = true      old_leader_nr = new_leader_nr      old_leader = new_leader diff --git a/test/replication/election_replica.lua b/test/replication/election_replica.lua index 887d8a2a0..b7d1aebe7 100644 --- a/test/replication/election_replica.lua +++ b/test/replication/election_replica.lua @@ -3,6 +3,7 @@  local INSTANCE_ID = string.match(arg[0], "%d")  local SOCKET_DIR = require('fio').cwd()  local SYNCHRO_QUORUM = arg[1] and tonumber(arg[1]) or 3 +local ELECTION_TIMEOUT = arg[2] and tonumber(arg[2]) or 0.1  local function instance_uri(instance_id)      return SOCKET_DIR..'/election_replica'..instance_id..'.sock'; @@ -20,9 +21,7 @@ box.cfg({      replication_timeout = 0.1,      election_is_enabled = true,      election_is_candidate = true, -    -- Should be at least as big as replication_disconnect_timeout, which is -    -- 4 * replication_timeout. -    election_timeout = 0.4, +    election_timeout = ELECTION_TIMEOUT,      replication_synchro_quorum = SYNCHRO_QUORUM,      replication_synchro_timeout = 0.1,      -- To reveal more election logs. -- Serge Petrenko