From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sergepetrenko@tarantool.org>
Received: from smtp54.i.mail.ru (smtp54.i.mail.ru [217.69.128.34])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id BE067469719
 for <tarantool-patches@dev.tarantool.org>;
 Mon,  5 Oct 2020 11:52:43 +0300 (MSK)
References: <20201002103312.23042-1-sergepetrenko@tarantool.org>
 <acac499f-b8e3-d9d0-9a73-aa76e8cbcf08@tarantool.org>
From: Serge Petrenko <sergepetrenko@tarantool.org>
Message-ID: <e8035fec-ed79-9e9a-9e27-041fd49ae7b5@tarantool.org>
Date: Mon, 5 Oct 2020 11:52:42 +0300
MIME-Version: 1.0
In-Reply-To: <acac499f-b8e3-d9d0-9a73-aa76e8cbcf08@tarantool.org>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
Subject: Re: [Tarantool-patches] [PATCH] raft: add a test with synchronous
	replication
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org


04.10.2020 16:54, Vladislav Shpilevoy пишет:
> Hi! Thanks for the patch!
Hi! Thanks for the review!
>
>> diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result
>> new file mode 100644
>> index 000000000..1bf13d7bc
>> --- /dev/null
>> +++ b/test/replication/election_qsync.result
>> @@ -0,0 +1,125 @@
>> +SERVERS = {'election_replica1', 'election_replica2', 'election_replica3'}
>> + | ---
>> + | ...
>> +test_run:create_cluster(SERVERS, "replication", {args='2'})
>> + | ---
>> + | ...
>> +test_run:wait_fullmesh(SERVERS)
>> + | ---
>> + | ...
>> +
>> +nrs = {true, true, true}
> 1. What is 'nrs'?

I'll add a comment.  nrs is a parameter passed to get_leader().
nrs = {true, true, true} means check whether any of the 3 instances is a 
leader.
Once I kill the former leader, nrs[former_leader_nr]  = false, and I 
wait till one  of the
two instances left becomes a leader.

nrs is short for "numbers", I guess.

>
>> + | ---
>> + | ...
>> diff --git a/test/replication/election_replica.lua b/test/replication/election_replica.lua
>> index 36ea1f077..887d8a2a0 100644
>> --- a/test/replication/election_replica.lua
>> +++ b/test/replication/election_replica.lua
>> @@ -19,8 +20,11 @@ box.cfg({
>>       replication_timeout = 0.1,
>>       election_is_enabled = true,
>>       election_is_candidate = true,
>> -    election_timeout = 0.1,
>> -    replication_synchro_quorum = 3,
>> +    -- Should be at least as big as replication_disconnect_timeout, which is
>> +    -- 4 * replication_timeout.
>> +    election_timeout = 0.4,
> 2. Why? Election timeout has nothing to do with disconnect. It is about
> split vote. This also will slow down raft_basic.test.lua, which is not
> supposed to be long. For heartbeat timeouts Raft already uses
> replication_disconnect_timeout = replication_timeout * 4.

I've seen cases when a leader is elected, but doesn't send out the 
is_leader flag
in time, so new elections start over and over again. This only happened 
when the
tests were run in parallel, so the problem was probably in high load.

So, my logic was that if we wait for 4 times replication timeout for the 
leader to
come back why not wait for 4 * replication timeout for the leader to 
establish
its leadership.

I mean, if it's considered a normal situation when a leader disappears 
for not more
than 4 * replication_timeout, and this doesn't trigger an election, why 
should
elections end before at least 4 * replication_timeout seconds pass?

By the way, the raft paper doesn't have a separate leader disconnect 
timeout. The
same election timeout is used for this purpose. So that's another 
argument for
setting election_timeout to at least 4 * replication_timeout.

Speaking of raft_basic.test.lua becoming slow, let's pass 
election_timeout as an
argument to replica, just like I do it for replication_synchro_quorum.


Here are the changes:


diff --git a/test/replication/election_qsync.result 
b/test/replication/election_qsync.result
index 1bf13d7bc..9497b37bf 100644
--- a/test/replication/election_qsync.result
+++ b/test/replication/election_qsync.result
@@ -44,13 +44,16 @@ test_run:cmd('setopt delimiter ""');
  SERVERS = {'election_replica1', 'election_replica2', 'election_replica3'}
   | ---
   | ...
-test_run:create_cluster(SERVERS, "replication", {args='2'})
+test_run:create_cluster(SERVERS, "replication", {args='2 0.4'})
   | ---
   | ...
  test_run:wait_fullmesh(SERVERS)
   | ---
   | ...

+-- Any of the three instances may be the leader now.
+-- When the former leader is killed, we expect one of the two instances 
left
+-- to become a leader, so nrs[former_leader_nr] = false.
  nrs = {true, true, true}
   | ---
   | ...
@@ -94,7 +97,7 @@ for i = 1,10 do
      c:eval('box.cfg{replication_synchro_timeout=1000}')
      c.space._schema:replace{'smth'}
      c.space.test:get{i}
-    test_run:cmd('start server '..old_leader..' with wait=True, 
wait_load=True, args="2"')
+    test_run:cmd('start server '..old_leader..' with wait=True, 
wait_load=True, args="2 0.4"')
      nrs[old_leader_nr] = true
      old_leader_nr = new_leader_nr
      old_leader = new_leader
diff --git a/test/replication/election_qsync.test.lua 
b/test/replication/election_qsync.test.lua
index f069c71bb..bca1b20c7 100644
--- a/test/replication/election_qsync.test.lua
+++ b/test/replication/election_qsync.test.lua
@@ -29,9 +29,12 @@ end;
  test_run:cmd('setopt delimiter ""');

  SERVERS = {'election_replica1', 'election_replica2', 'election_replica3'}
-test_run:create_cluster(SERVERS, "replication", {args='2'})
+test_run:create_cluster(SERVERS, "replication", {args='2 0.4'})
  test_run:wait_fullmesh(SERVERS)

+-- Any of the three instances may be the leader now.
+-- When the former leader is killed, we expect one of the two instances 
left
+-- to become a leader, so nrs[former_leader_nr] = false.
  nrs = {true, true, true}
  old_leader_nr = get_leader(nrs)
  old_leader = 'election_replica'..old_leader_nr
@@ -58,7 +61,7 @@ for i = 1,10 do
      c:eval('box.cfg{replication_synchro_timeout=1000}')
      c.space._schema:replace{'smth'}
      c.space.test:get{i}
-    test_run:cmd('start server '..old_leader..' with wait=True, 
wait_load=True, args="2"')
+    test_run:cmd('start server '..old_leader..' with wait=True, 
wait_load=True, args="2 0.4"')
      nrs[old_leader_nr] = true
      old_leader_nr = new_leader_nr
      old_leader = new_leader
diff --git a/test/replication/election_replica.lua 
b/test/replication/election_replica.lua
index 887d8a2a0..b7d1aebe7 100644
--- a/test/replication/election_replica.lua
+++ b/test/replication/election_replica.lua
@@ -3,6 +3,7 @@
  local INSTANCE_ID = string.match(arg[0], "%d")
  local SOCKET_DIR = require('fio').cwd()
  local SYNCHRO_QUORUM = arg[1] and tonumber(arg[1]) or 3
+local ELECTION_TIMEOUT = arg[2] and tonumber(arg[2]) or 0.1

  local function instance_uri(instance_id)
      return SOCKET_DIR..'/election_replica'..instance_id..'.sock';
@@ -20,9 +21,7 @@ box.cfg({
      replication_timeout = 0.1,
      election_is_enabled = true,
      election_is_candidate = true,
-    -- Should be at least as big as replication_disconnect_timeout, 
which is
-    -- 4 * replication_timeout.
-    election_timeout = 0.4,
+    election_timeout = ELECTION_TIMEOUT,
      replication_synchro_quorum = SYNCHRO_QUORUM,
      replication_synchro_timeout = 0.1,
      -- To reveal more election logs.


-- 
Serge Petrenko