* [Tarantool-patches] [PATCH 0/2] Raft slow tests @ 2020-11-06 23:46 Vladislav Shpilevoy 2020-11-06 23:46 ` [Tarantool-patches] [PATCH 1/2] test: fix a typo in election_basic Vladislav Shpilevoy ` (3 more replies) 0 siblings, 4 replies; 9+ messages in thread From: Vladislav Shpilevoy @ 2020-11-06 23:46 UTC (permalink / raw) To: tarantool-patches, sergepetrenko The patchset speeds up election tests, which could work almost a minute before. Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5499-slow-raft-test Issue: https://github.com/tarantool/tarantool/issues/5499 Vladislav Shpilevoy (2): test: fix a typo in election_basic test: speed up election_qsync test/replication/election_basic.result | 2 +- test/replication/election_basic.test.lua | 2 +- test/replication/election_qsync.result | 6 ++++++ test/replication/election_qsync.test.lua | 4 ++++ 4 files changed, 12 insertions(+), 2 deletions(-) -- 2.21.1 (Apple Git-122.3) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Tarantool-patches] [PATCH 1/2] test: fix a typo in election_basic 2020-11-06 23:46 [Tarantool-patches] [PATCH 0/2] Raft slow tests Vladislav Shpilevoy @ 2020-11-06 23:46 ` Vladislav Shpilevoy 2020-11-09 9:01 ` Serge Petrenko 2020-11-06 23:46 ` [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync Vladislav Shpilevoy ` (2 subsequent siblings) 3 siblings, 1 reply; 9+ messages in thread From: Vladislav Shpilevoy @ 2020-11-06 23:46 UTC (permalink / raw) To: tarantool-patches, sergepetrenko The typo led to not resetting the election timeout to the default value. It was left 1000, and as a result the next election tests could work extremely long. Part of #5499 --- test/replication/election_basic.result | 2 +- test/replication/election_basic.test.lua | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/test/replication/election_basic.result b/test/replication/election_basic.result index 03917c7e4..4d7d33f2b 100644 --- a/test/replication/election_basic.result +++ b/test/replication/election_basic.result @@ -6,7 +6,7 @@ test_run = require('test_run').new() -- gh-1146: Raft protocol for automated leader election. -- -old_election_timeout = box.cfg_election_timeout +old_election_timeout = box.cfg.election_timeout | --- | ... diff --git a/test/replication/election_basic.test.lua b/test/replication/election_basic.test.lua index 1b4bb8d27..821f73cea 100644 --- a/test/replication/election_basic.test.lua +++ b/test/replication/election_basic.test.lua @@ -3,7 +3,7 @@ test_run = require('test_run').new() -- gh-1146: Raft protocol for automated leader election. -- -old_election_timeout = box.cfg_election_timeout +old_election_timeout = box.cfg.election_timeout -- Election is turned off by default. assert(box.cfg.election_mode == 'off') -- 2.21.1 (Apple Git-122.3) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] test: fix a typo in election_basic 2020-11-06 23:46 ` [Tarantool-patches] [PATCH 1/2] test: fix a typo in election_basic Vladislav Shpilevoy @ 2020-11-09 9:01 ` Serge Petrenko 0 siblings, 0 replies; 9+ messages in thread From: Serge Petrenko @ 2020-11-09 9:01 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches 07.11.2020 02:46, Vladislav Shpilevoy пишет: > The typo led to not resetting the election timeout to the default > value. It was left 1000, and as a result the next election tests > could work extremely long. Hi! Good catch, LGTM. > > Part of #5499 > --- > test/replication/election_basic.result | 2 +- > test/replication/election_basic.test.lua | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/test/replication/election_basic.result b/test/replication/election_basic.result > index 03917c7e4..4d7d33f2b 100644 > --- a/test/replication/election_basic.result > +++ b/test/replication/election_basic.result > @@ -6,7 +6,7 @@ test_run = require('test_run').new() > -- gh-1146: Raft protocol for automated leader election. > -- > > -old_election_timeout = box.cfg_election_timeout > +old_election_timeout = box.cfg.election_timeout > | --- > | ... > > diff --git a/test/replication/election_basic.test.lua b/test/replication/election_basic.test.lua > index 1b4bb8d27..821f73cea 100644 > --- a/test/replication/election_basic.test.lua > +++ b/test/replication/election_basic.test.lua > @@ -3,7 +3,7 @@ test_run = require('test_run').new() > -- gh-1146: Raft protocol for automated leader election. > -- > > -old_election_timeout = box.cfg_election_timeout > +old_election_timeout = box.cfg.election_timeout > > -- Election is turned off by default. > assert(box.cfg.election_mode == 'off') -- Serge Petrenko ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync 2020-11-06 23:46 [Tarantool-patches] [PATCH 0/2] Raft slow tests Vladislav Shpilevoy 2020-11-06 23:46 ` [Tarantool-patches] [PATCH 1/2] test: fix a typo in election_basic Vladislav Shpilevoy @ 2020-11-06 23:46 ` Vladislav Shpilevoy 2020-11-09 9:20 ` Serge Petrenko 2020-11-10 21:11 ` [Tarantool-patches] [PATCH 0/2] Raft slow tests Alexander V. Tikhonov 2020-11-10 22:05 ` Vladislav Shpilevoy 3 siblings, 1 reply; 9+ messages in thread From: Vladislav Shpilevoy @ 2020-11-06 23:46 UTC (permalink / raw) To: tarantool-patches, sergepetrenko The test used default election timeout - 5 seconds. That made the test work at least 5 seconds when test re-election. 5 seconds to ensure the old leader is dead + some time to perform the election. Closes #5499 --- test/replication/election_qsync.result | 6 ++++++ test/replication/election_qsync.test.lua | 4 ++++ 2 files changed, 10 insertions(+) diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result index 086b17686..abf43201f 100644 --- a/test/replication/election_qsync.result +++ b/test/replication/election_qsync.result @@ -9,6 +9,9 @@ box.schema.user.grant('guest', 'super') old_election_mode = box.cfg.election_mode | --- | ... +old_election_timeout = box.cfg.election_timeout + | --- + | ... old_replication_synchro_timeout = box.cfg.replication_synchro_timeout | --- | ... @@ -62,6 +65,7 @@ fiber = require('fiber') -- Replication timeout is small to speed up a first election start. box.cfg{ \ election_mode = 'candidate', \ + election_timeout = 0.1, \ replication_synchro_quorum = 3, \ replication_synchro_timeout = 1000000, \ replication_timeout = 0.1, \ @@ -116,6 +120,7 @@ box.cfg{replication_synchro_timeout = 1000000} -- up notice of the old leader death. box.cfg{ \ election_mode = 'candidate', \ + election_timeout = 0.1, \ replication_timeout = 0.01, \ } | --- @@ -143,6 +148,7 @@ test_run:cmd('delete server replica') | ... box.cfg{ \ election_mode = old_election_mode, \ + election_timeout = old_election_timeout, \ replication_timeout = old_replication_timeout, \ replication = old_replication, \ replication_synchro_timeout = old_replication_synchro_timeout, \ diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua index 6a80f4859..f668a9d78 100644 --- a/test/replication/election_qsync.test.lua +++ b/test/replication/election_qsync.test.lua @@ -2,6 +2,7 @@ test_run = require('test_run').new() box.schema.user.grant('guest', 'super') old_election_mode = box.cfg.election_mode +old_election_timeout = box.cfg.election_timeout old_replication_synchro_timeout = box.cfg.replication_synchro_timeout old_replication_timeout = box.cfg.replication_timeout old_replication = box.cfg.replication @@ -30,6 +31,7 @@ fiber = require('fiber') -- Replication timeout is small to speed up a first election start. box.cfg{ \ election_mode = 'candidate', \ + election_timeout = 0.1, \ replication_synchro_quorum = 3, \ replication_synchro_timeout = 1000000, \ replication_timeout = 0.1, \ @@ -59,6 +61,7 @@ box.cfg{replication_synchro_timeout = 1000000} -- up notice of the old leader death. box.cfg{ \ election_mode = 'candidate', \ + election_timeout = 0.1, \ replication_timeout = 0.01, \ } @@ -70,6 +73,7 @@ box.space.test:drop() test_run:cmd('delete server replica') box.cfg{ \ election_mode = old_election_mode, \ + election_timeout = old_election_timeout, \ replication_timeout = old_replication_timeout, \ replication = old_replication, \ replication_synchro_timeout = old_replication_synchro_timeout, \ -- 2.21.1 (Apple Git-122.3) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync 2020-11-06 23:46 ` [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync Vladislav Shpilevoy @ 2020-11-09 9:20 ` Serge Petrenko 2020-11-09 22:36 ` Vladislav Shpilevoy 0 siblings, 1 reply; 9+ messages in thread From: Serge Petrenko @ 2020-11-09 9:20 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches 07.11.2020 02:46, Vladislav Shpilevoy пишет: > The test used default election timeout - 5 seconds. That made the > test work at least 5 seconds when test re-election. 5 seconds to > ensure the old leader is dead + some time to perform the election. > > Closes #5499 > --- Hi! Thanks for the patch! Please see one question below. > test/replication/election_qsync.result | 6 ++++++ > test/replication/election_qsync.test.lua | 4 ++++ > 2 files changed, 10 insertions(+) > > diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result > index 086b17686..abf43201f 100644 > --- a/test/replication/election_qsync.result > +++ b/test/replication/election_qsync.result > @@ -9,6 +9,9 @@ box.schema.user.grant('guest', 'super') > old_election_mode = box.cfg.election_mode > | --- > | ... > +old_election_timeout = box.cfg.election_timeout > + | --- > + | ... > old_replication_synchro_timeout = box.cfg.replication_synchro_timeout > | --- > | ... > @@ -62,6 +65,7 @@ fiber = require('fiber') > -- Replication timeout is small to speed up a first election start. I checked, and the test speeds up a lot with this patch. But I don't understand why. We have only two instances, only one of them is candidate. Thanks to the small replication_timeout, the election starts shortly after the old leader dies. election_timeout isn't involved here, AFAICS. Am I missing something? Even when the test is restarted, there shouldn't be any 're-elections'. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 0.1, \ > replication_synchro_quorum = 3, \ > replication_synchro_timeout = 1000000, \ > replication_timeout = 0.1, \ > @@ -116,6 +120,7 @@ box.cfg{replication_synchro_timeout = 1000000} > -- up notice of the old leader death. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 0.1, \ > replication_timeout = 0.01, \ > } > | --- > @@ -143,6 +148,7 @@ test_run:cmd('delete server replica') > | ... > box.cfg{ \ > election_mode = old_election_mode, \ > + election_timeout = old_election_timeout, \ > replication_timeout = old_replication_timeout, \ > replication = old_replication, \ > replication_synchro_timeout = old_replication_synchro_timeout, \ > diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua > index 6a80f4859..f668a9d78 100644 > --- a/test/replication/election_qsync.test.lua > +++ b/test/replication/election_qsync.test.lua > @@ -2,6 +2,7 @@ test_run = require('test_run').new() > box.schema.user.grant('guest', 'super') > > old_election_mode = box.cfg.election_mode > +old_election_timeout = box.cfg.election_timeout > old_replication_synchro_timeout = box.cfg.replication_synchro_timeout > old_replication_timeout = box.cfg.replication_timeout > old_replication = box.cfg.replication > @@ -30,6 +31,7 @@ fiber = require('fiber') > -- Replication timeout is small to speed up a first election start. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 0.1, \ > replication_synchro_quorum = 3, \ > replication_synchro_timeout = 1000000, \ > replication_timeout = 0.1, \ > @@ -59,6 +61,7 @@ box.cfg{replication_synchro_timeout = 1000000} > -- up notice of the old leader death. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 0.1, \ > replication_timeout = 0.01, \ > } > > @@ -70,6 +73,7 @@ box.space.test:drop() > test_run:cmd('delete server replica') > box.cfg{ \ > election_mode = old_election_mode, \ > + election_timeout = old_election_timeout, \ > replication_timeout = old_replication_timeout, \ > replication = old_replication, \ > replication_synchro_timeout = old_replication_synchro_timeout, \ -- Serge Petrenko ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync 2020-11-09 9:20 ` Serge Petrenko @ 2020-11-09 22:36 ` Vladislav Shpilevoy 2020-11-10 7:44 ` Serge Petrenko 0 siblings, 1 reply; 9+ messages in thread From: Vladislav Shpilevoy @ 2020-11-09 22:36 UTC (permalink / raw) To: Serge Petrenko, tarantool-patches Hi! Thanks for the review! >> @@ -62,6 +65,7 @@ fiber = require('fiber') >> -- Replication timeout is small to speed up a first election start. > > I checked, and the test speeds up a lot with this patch. But I don't understand > why. We have only two instances, only one of them is candidate. > Thanks to the small replication_timeout, the election starts shortly after the > old leader dies. election_timeout isn't involved here, AFAICS. > Am I missing something? > Even when the test is restarted, there shouldn't be any 're-elections'. That is a very good question! Honestly, I didn't think of it. Strangely, when I did the patch, I just "believed" this is what I should do. I did some digging now, and there is a simple explanation. And a bug. So this is really good you looked at this patch skeptically. Here is what is happening in the test, in short: master: create_replica() master: set_mode('voter') replica: set_mode('candidate') replica: wait_leader() When the instance is clear, master's term is 1 and this term does not have votes yet. When replica is started, it votes for self, and gets elected quite fast. This is happening fast even without this patch, when you run this test separately. But when the test is running after some other Raft tests, on the leader the term is not 1. It can be 3, 4, or more. When replica is attached, the leader does not send its state to the replica, because election is disabled. So the replica starts from term 1, tries to elect itself and is ignored because its term is too old. Waits for election timeout and tries again. This is repeated as many times as is the term value. This is why the tests running in the end were always slower. There is a bug. The first node should have sent its state when election mode was set to the voter on it. But it didn't. I reworked the patch so now the timeout, on the contrary, is set to 1000000 seconds. And the tests hangs infinitely unless we send Raft state when election is enabled. The new patch: ==================== raft: send state when state machine is started Raft didn't broadcast its state when the state machine was started. It could lead to the state being never sent until some other node would generate a term number bigger that the local one. That happened when a node participated in some elections, accumulated a big term number, then the election was turned off, and a new replica was connected in a 'candidate' state. Then the first node was configured to be a 'voter'. The first node didn't send anything to the replica, because at the moment of its connection the election was off. So the replica started from term 1, tried to start elections in this term, but was ignored by the first node. It waited for election timeout, bumped the term to 2, and the process was repeated until the replica reached the first node's term + 1. It could take very long time. The patch fixes it so now Raft broadcasts its state when it is enabled. To cover the replicas connected while it was disabled. Closes #5499 diff --git a/src/box/raft.c b/src/box/raft.c index 914b0d68f..28ca74cb5 100644 --- a/src/box/raft.c +++ b/src/box/raft.c @@ -877,6 +877,14 @@ raft_sm_start(void) raft_sm_wait_leader_found(); } box_update_ro_summary(); + /* + * Nothing changed. But when raft was stopped, its state wasn't sent to + * replicas. At least this was happening at the moment of this being + * written. On the other hand, this instance may have a term bigger than + * any other term in the cluster. And if it wouldn't share the term, it + * would ignore all the messages, including vote requests. + */ + raft_schedule_broadcast(); } static void diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result index 086b17686..cb349efcc 100644 --- a/test/replication/election_qsync.result +++ b/test/replication/election_qsync.result @@ -9,6 +9,9 @@ box.schema.user.grant('guest', 'super') old_election_mode = box.cfg.election_mode | --- | ... +old_election_timeout = box.cfg.election_timeout + | --- + | ... old_replication_synchro_timeout = box.cfg.replication_synchro_timeout | --- | ... @@ -60,8 +63,11 @@ fiber = require('fiber') | --- | ... -- Replication timeout is small to speed up a first election start. +-- Election timeout is set to a huge value to ensure the election does not hang +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. box.cfg{ \ election_mode = 'candidate', \ + election_timeout = 1000000, \ replication_synchro_quorum = 3, \ replication_synchro_timeout = 1000000, \ replication_timeout = 0.1, \ @@ -114,8 +120,11 @@ box.cfg{replication_synchro_timeout = 1000000} -- Configure separately from synchro timeout not to depend on the order of -- synchro and election options appliance. Replication timeout is tiny to speed -- up notice of the old leader death. +-- Election timeout is set to a huge value to ensure the election does not hang +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. box.cfg{ \ election_mode = 'candidate', \ + election_timeout = 1000000, \ replication_timeout = 0.01, \ } | --- @@ -143,6 +152,7 @@ test_run:cmd('delete server replica') | ... box.cfg{ \ election_mode = old_election_mode, \ + election_timeout = old_election_timeout, \ replication_timeout = old_replication_timeout, \ replication = old_replication, \ replication_synchro_timeout = old_replication_synchro_timeout, \ diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua index 6a80f4859..eb89e5b79 100644 --- a/test/replication/election_qsync.test.lua +++ b/test/replication/election_qsync.test.lua @@ -2,6 +2,7 @@ test_run = require('test_run').new() box.schema.user.grant('guest', 'super') old_election_mode = box.cfg.election_mode +old_election_timeout = box.cfg.election_timeout old_replication_synchro_timeout = box.cfg.replication_synchro_timeout old_replication_timeout = box.cfg.replication_timeout old_replication = box.cfg.replication @@ -28,8 +29,11 @@ box.cfg{election_mode = 'voter'} test_run:switch('replica') fiber = require('fiber') -- Replication timeout is small to speed up a first election start. +-- Election timeout is set to a huge value to ensure the election does not hang +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. box.cfg{ \ election_mode = 'candidate', \ + election_timeout = 1000000, \ replication_synchro_quorum = 3, \ replication_synchro_timeout = 1000000, \ replication_timeout = 0.1, \ @@ -57,8 +61,11 @@ box.cfg{replication_synchro_timeout = 1000000} -- Configure separately from synchro timeout not to depend on the order of -- synchro and election options appliance. Replication timeout is tiny to speed -- up notice of the old leader death. +-- Election timeout is set to a huge value to ensure the election does not hang +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. box.cfg{ \ election_mode = 'candidate', \ + election_timeout = 1000000, \ replication_timeout = 0.01, \ } @@ -70,6 +77,7 @@ box.space.test:drop() test_run:cmd('delete server replica') box.cfg{ \ election_mode = old_election_mode, \ + election_timeout = old_election_timeout, \ replication_timeout = old_replication_timeout, \ replication = old_replication, \ replication_synchro_timeout = old_replication_synchro_timeout, \ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync 2020-11-09 22:36 ` Vladislav Shpilevoy @ 2020-11-10 7:44 ` Serge Petrenko 0 siblings, 0 replies; 9+ messages in thread From: Serge Petrenko @ 2020-11-10 7:44 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches 10.11.2020 01:36, Vladislav Shpilevoy пишет: > Hi! Thanks for the review! > >>> @@ -62,6 +65,7 @@ fiber = require('fiber') >>> -- Replication timeout is small to speed up a first election start. >> I checked, and the test speeds up a lot with this patch. But I don't understand >> why. We have only two instances, only one of them is candidate. >> Thanks to the small replication_timeout, the election starts shortly after the >> old leader dies. election_timeout isn't involved here, AFAICS. >> Am I missing something? >> Even when the test is restarted, there shouldn't be any 're-elections'. > That is a very good question! Honestly, I didn't think of it. Strangely, > when I did the patch, I just "believed" this is what I should do. > > I did some digging now, and there is a simple explanation. And a bug. So > this is really good you looked at this patch skeptically. > > Here is what is happening in the test, in short: > > master: create_replica() > master: set_mode('voter') > > replica: set_mode('candidate') > replica: wait_leader() > > When the instance is clear, master's term is 1 and this term does not have > votes yet. When replica is started, it votes for self, and gets elected > quite fast. > > This is happening fast even without this patch, when you run this test > separately. > > But when the test is running after some other Raft tests, on the leader the > term is not 1. It can be 3, 4, or more. When replica is attached, the > leader does not send its state to the replica, because election is disabled. > > So the replica starts from term 1, tries to elect itself and is ignored > because its term is too old. Waits for election timeout and tries again. This > is repeated as many times as is the term value. > > This is why the tests running in the end were always slower. > > There is a bug. The first node should have sent its state when election mode > was set to the voter on it. But it didn't. > > I reworked the patch so now the timeout, on the contrary, is set to 1000000 > seconds. And the tests hangs infinitely unless we send Raft state when election > is enabled. Thanks for all the digging and the explanation! The new patch LGTM. > > The new patch: > > ==================== > raft: send state when state machine is started > > Raft didn't broadcast its state when the state machine was > started. It could lead to the state being never sent until some > other node would generate a term number bigger that the local one. > > That happened when a node participated in some elections, > accumulated a big term number, then the election was turned off, > and a new replica was connected in a 'candidate' state. Then the > first node was configured to be a 'voter'. > > The first node didn't send anything to the replica, because at > the moment of its connection the election was off. > > So the replica started from term 1, tried to start elections in > this term, but was ignored by the first node. It waited for > election timeout, bumped the term to 2, and the process was > repeated until the replica reached the first node's term + 1. It > could take very long time. > > The patch fixes it so now Raft broadcasts its state when it is > enabled. To cover the replicas connected while it was disabled. > > Closes #5499 > > diff --git a/src/box/raft.c b/src/box/raft.c > index 914b0d68f..28ca74cb5 100644 > --- a/src/box/raft.c > +++ b/src/box/raft.c > @@ -877,6 +877,14 @@ raft_sm_start(void) > raft_sm_wait_leader_found(); > } > box_update_ro_summary(); > + /* > + * Nothing changed. But when raft was stopped, its state wasn't sent to > + * replicas. At least this was happening at the moment of this being > + * written. On the other hand, this instance may have a term bigger than > + * any other term in the cluster. And if it wouldn't share the term, it > + * would ignore all the messages, including vote requests. > + */ > + raft_schedule_broadcast(); > } > > static void > diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result > index 086b17686..cb349efcc 100644 > --- a/test/replication/election_qsync.result > +++ b/test/replication/election_qsync.result > @@ -9,6 +9,9 @@ box.schema.user.grant('guest', 'super') > old_election_mode = box.cfg.election_mode > | --- > | ... > +old_election_timeout = box.cfg.election_timeout > + | --- > + | ... > old_replication_synchro_timeout = box.cfg.replication_synchro_timeout > | --- > | ... > @@ -60,8 +63,11 @@ fiber = require('fiber') > | --- > | ... > -- Replication timeout is small to speed up a first election start. > +-- Election timeout is set to a huge value to ensure the election does not hang > +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 1000000, \ > replication_synchro_quorum = 3, \ > replication_synchro_timeout = 1000000, \ > replication_timeout = 0.1, \ > @@ -114,8 +120,11 @@ box.cfg{replication_synchro_timeout = 1000000} > -- Configure separately from synchro timeout not to depend on the order of > -- synchro and election options appliance. Replication timeout is tiny to speed > -- up notice of the old leader death. > +-- Election timeout is set to a huge value to ensure the election does not hang > +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 1000000, \ > replication_timeout = 0.01, \ > } > | --- > @@ -143,6 +152,7 @@ test_run:cmd('delete server replica') > | ... > box.cfg{ \ > election_mode = old_election_mode, \ > + election_timeout = old_election_timeout, \ > replication_timeout = old_replication_timeout, \ > replication = old_replication, \ > replication_synchro_timeout = old_replication_synchro_timeout, \ > diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua > index 6a80f4859..eb89e5b79 100644 > --- a/test/replication/election_qsync.test.lua > +++ b/test/replication/election_qsync.test.lua > @@ -2,6 +2,7 @@ test_run = require('test_run').new() > box.schema.user.grant('guest', 'super') > > old_election_mode = box.cfg.election_mode > +old_election_timeout = box.cfg.election_timeout > old_replication_synchro_timeout = box.cfg.replication_synchro_timeout > old_replication_timeout = box.cfg.replication_timeout > old_replication = box.cfg.replication > @@ -28,8 +29,11 @@ box.cfg{election_mode = 'voter'} > test_run:switch('replica') > fiber = require('fiber') > -- Replication timeout is small to speed up a first election start. > +-- Election timeout is set to a huge value to ensure the election does not hang > +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 1000000, \ > replication_synchro_quorum = 3, \ > replication_synchro_timeout = 1000000, \ > replication_timeout = 0.1, \ > @@ -57,8 +61,11 @@ box.cfg{replication_synchro_timeout = 1000000} > -- Configure separately from synchro timeout not to depend on the order of > -- synchro and election options appliance. Replication timeout is tiny to speed > -- up notice of the old leader death. > +-- Election timeout is set to a huge value to ensure the election does not hang > +-- anywhere. Indeed, there can't be a split-vote when candidate is only one. > box.cfg{ \ > election_mode = 'candidate', \ > + election_timeout = 1000000, \ > replication_timeout = 0.01, \ > } > > @@ -70,6 +77,7 @@ box.space.test:drop() > test_run:cmd('delete server replica') > box.cfg{ \ > election_mode = old_election_mode, \ > + election_timeout = old_election_timeout, \ > replication_timeout = old_replication_timeout, \ > replication = old_replication, \ > replication_synchro_timeout = old_replication_synchro_timeout, \ -- Serge Petrenko ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 0/2] Raft slow tests 2020-11-06 23:46 [Tarantool-patches] [PATCH 0/2] Raft slow tests Vladislav Shpilevoy 2020-11-06 23:46 ` [Tarantool-patches] [PATCH 1/2] test: fix a typo in election_basic Vladislav Shpilevoy 2020-11-06 23:46 ` [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync Vladislav Shpilevoy @ 2020-11-10 21:11 ` Alexander V. Tikhonov 2020-11-10 22:05 ` Vladislav Shpilevoy 3 siblings, 0 replies; 9+ messages in thread From: Alexander V. Tikhonov @ 2020-11-10 21:11 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches Hi Vlad, I've checked all results in gitlab-ci, and no new degradations found [1], patch LGTM. [1] - https://gitlab.com/tarantool/tarantool/-/pipelines/213912679 On Sat, Nov 07, 2020 at 12:46:31AM +0100, Vladislav Shpilevoy wrote: > The patchset speeds up election tests, which could work almost a minute before. > > Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5499-slow-raft-test > Issue: https://github.com/tarantool/tarantool/issues/5499 > > Vladislav Shpilevoy (2): > test: fix a typo in election_basic > test: speed up election_qsync > > test/replication/election_basic.result | 2 +- > test/replication/election_basic.test.lua | 2 +- > test/replication/election_qsync.result | 6 ++++++ > test/replication/election_qsync.test.lua | 4 ++++ > 4 files changed, 12 insertions(+), 2 deletions(-) > > -- > 2.21.1 (Apple Git-122.3) > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Tarantool-patches] [PATCH 0/2] Raft slow tests 2020-11-06 23:46 [Tarantool-patches] [PATCH 0/2] Raft slow tests Vladislav Shpilevoy ` (2 preceding siblings ...) 2020-11-10 21:11 ` [Tarantool-patches] [PATCH 0/2] Raft slow tests Alexander V. Tikhonov @ 2020-11-10 22:05 ` Vladislav Shpilevoy 3 siblings, 0 replies; 9+ messages in thread From: Vladislav Shpilevoy @ 2020-11-10 22:05 UTC (permalink / raw) To: tarantool-patches, sergepetrenko Pushed to master and 2.6. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-11-10 22:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-06 23:46 [Tarantool-patches] [PATCH 0/2] Raft slow tests Vladislav Shpilevoy 2020-11-06 23:46 ` [Tarantool-patches] [PATCH 1/2] test: fix a typo in election_basic Vladislav Shpilevoy 2020-11-09 9:01 ` Serge Petrenko 2020-11-06 23:46 ` [Tarantool-patches] [PATCH 2/2] test: speed up election_qsync Vladislav Shpilevoy 2020-11-09 9:20 ` Serge Petrenko 2020-11-09 22:36 ` Vladislav Shpilevoy 2020-11-10 7:44 ` Serge Petrenko 2020-11-10 21:11 ` [Tarantool-patches] [PATCH 0/2] Raft slow tests Alexander V. Tikhonov 2020-11-10 22:05 ` Vladislav Shpilevoy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox