Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] raft: add a test with synchronous replication
Date: Mon, 5 Oct 2020 11:52:42 +0300	[thread overview]
Message-ID: <e8035fec-ed79-9e9a-9e27-041fd49ae7b5@tarantool.org> (raw)
In-Reply-To: <acac499f-b8e3-d9d0-9a73-aa76e8cbcf08@tarantool.org>


04.10.2020 16:54, Vladislav Shpilevoy пишет:
> Hi! Thanks for the patch!
Hi! Thanks for the review!
>
>> diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result
>> new file mode 100644
>> index 000000000..1bf13d7bc
>> --- /dev/null
>> +++ b/test/replication/election_qsync.result
>> @@ -0,0 +1,125 @@
>> +SERVERS = {'election_replica1', 'election_replica2', 'election_replica3'}
>> + | ---
>> + | ...
>> +test_run:create_cluster(SERVERS, "replication", {args='2'})
>> + | ---
>> + | ...
>> +test_run:wait_fullmesh(SERVERS)
>> + | ---
>> + | ...
>> +
>> +nrs = {true, true, true}
> 1. What is 'nrs'?

I'll add a comment.  nrs is a parameter passed to get_leader().
nrs = {true, true, true} means check whether any of the 3 instances is a 
leader.
Once I kill the former leader, nrs[former_leader_nr]  = false, and I 
wait till one  of the
two instances left becomes a leader.

nrs is short for "numbers", I guess.

>
>> + | ---
>> + | ...
>> diff --git a/test/replication/election_replica.lua b/test/replication/election_replica.lua
>> index 36ea1f077..887d8a2a0 100644
>> --- a/test/replication/election_replica.lua
>> +++ b/test/replication/election_replica.lua
>> @@ -19,8 +20,11 @@ box.cfg({
>>       replication_timeout = 0.1,
>>       election_is_enabled = true,
>>       election_is_candidate = true,
>> -    election_timeout = 0.1,
>> -    replication_synchro_quorum = 3,
>> +    -- Should be at least as big as replication_disconnect_timeout, which is
>> +    -- 4 * replication_timeout.
>> +    election_timeout = 0.4,
> 2. Why? Election timeout has nothing to do with disconnect. It is about
> split vote. This also will slow down raft_basic.test.lua, which is not
> supposed to be long. For heartbeat timeouts Raft already uses
> replication_disconnect_timeout = replication_timeout * 4.

I've seen cases when a leader is elected, but doesn't send out the 
is_leader flag
in time, so new elections start over and over again. This only happened 
when the
tests were run in parallel, so the problem was probably in high load.

So, my logic was that if we wait for 4 times replication timeout for the 
leader to
come back why not wait for 4 * replication timeout for the leader to 
establish
its leadership.

I mean, if it's considered a normal situation when a leader disappears 
for not more
than 4 * replication_timeout, and this doesn't trigger an election, why 
should
elections end before at least 4 * replication_timeout seconds pass?

By the way, the raft paper doesn't have a separate leader disconnect 
timeout. The
same election timeout is used for this purpose. So that's another 
argument for
setting election_timeout to at least 4 * replication_timeout.

Speaking of raft_basic.test.lua becoming slow, let's pass 
election_timeout as an
argument to replica, just like I do it for replication_synchro_quorum.


Here are the changes:


diff --git a/test/replication/election_qsync.result 
b/test/replication/election_qsync.result
index 1bf13d7bc..9497b37bf 100644
--- a/test/replication/election_qsync.result
+++ b/test/replication/election_qsync.result
@@ -44,13 +44,16 @@ test_run:cmd('setopt delimiter ""');
  SERVERS = {'election_replica1', 'election_replica2', 'election_replica3'}
   | ---
   | ...
-test_run:create_cluster(SERVERS, "replication", {args='2'})
+test_run:create_cluster(SERVERS, "replication", {args='2 0.4'})
   | ---
   | ...
  test_run:wait_fullmesh(SERVERS)
   | ---
   | ...

+-- Any of the three instances may be the leader now.
+-- When the former leader is killed, we expect one of the two instances 
left
+-- to become a leader, so nrs[former_leader_nr] = false.
  nrs = {true, true, true}
   | ---
   | ...
@@ -94,7 +97,7 @@ for i = 1,10 do
      c:eval('box.cfg{replication_synchro_timeout=1000}')
      c.space._schema:replace{'smth'}
      c.space.test:get{i}
-    test_run:cmd('start server '..old_leader..' with wait=True, 
wait_load=True, args="2"')
+    test_run:cmd('start server '..old_leader..' with wait=True, 
wait_load=True, args="2 0.4"')
      nrs[old_leader_nr] = true
      old_leader_nr = new_leader_nr
      old_leader = new_leader
diff --git a/test/replication/election_qsync.test.lua 
b/test/replication/election_qsync.test.lua
index f069c71bb..bca1b20c7 100644
--- a/test/replication/election_qsync.test.lua
+++ b/test/replication/election_qsync.test.lua
@@ -29,9 +29,12 @@ end;
  test_run:cmd('setopt delimiter ""');

  SERVERS = {'election_replica1', 'election_replica2', 'election_replica3'}
-test_run:create_cluster(SERVERS, "replication", {args='2'})
+test_run:create_cluster(SERVERS, "replication", {args='2 0.4'})
  test_run:wait_fullmesh(SERVERS)

+-- Any of the three instances may be the leader now.
+-- When the former leader is killed, we expect one of the two instances 
left
+-- to become a leader, so nrs[former_leader_nr] = false.
  nrs = {true, true, true}
  old_leader_nr = get_leader(nrs)
  old_leader = 'election_replica'..old_leader_nr
@@ -58,7 +61,7 @@ for i = 1,10 do
      c:eval('box.cfg{replication_synchro_timeout=1000}')
      c.space._schema:replace{'smth'}
      c.space.test:get{i}
-    test_run:cmd('start server '..old_leader..' with wait=True, 
wait_load=True, args="2"')
+    test_run:cmd('start server '..old_leader..' with wait=True, 
wait_load=True, args="2 0.4"')
      nrs[old_leader_nr] = true
      old_leader_nr = new_leader_nr
      old_leader = new_leader
diff --git a/test/replication/election_replica.lua 
b/test/replication/election_replica.lua
index 887d8a2a0..b7d1aebe7 100644
--- a/test/replication/election_replica.lua
+++ b/test/replication/election_replica.lua
@@ -3,6 +3,7 @@
  local INSTANCE_ID = string.match(arg[0], "%d")
  local SOCKET_DIR = require('fio').cwd()
  local SYNCHRO_QUORUM = arg[1] and tonumber(arg[1]) or 3
+local ELECTION_TIMEOUT = arg[2] and tonumber(arg[2]) or 0.1

  local function instance_uri(instance_id)
      return SOCKET_DIR..'/election_replica'..instance_id..'.sock';
@@ -20,9 +21,7 @@ box.cfg({
      replication_timeout = 0.1,
      election_is_enabled = true,
      election_is_candidate = true,
-    -- Should be at least as big as replication_disconnect_timeout, 
which is
-    -- 4 * replication_timeout.
-    election_timeout = 0.4,
+    election_timeout = ELECTION_TIMEOUT,
      replication_synchro_quorum = SYNCHRO_QUORUM,
      replication_synchro_timeout = 0.1,
      -- To reveal more election logs.


-- 
Serge Petrenko

  reply	other threads:[~2020-10-05  8:52 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 10:33 Serge Petrenko
2020-10-04 13:54 ` Vladislav Shpilevoy
2020-10-05  8:52   ` Serge Petrenko [this message]
2020-10-05 21:40     ` Vladislav Shpilevoy
2020-10-06  7:30       ` Serge Petrenko
2020-10-06 10:04 ` Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8035fec-ed79-9e9a-9e27-041fd49ae7b5@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] raft: add a test with synchronous replication' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox