From: Serge Petrenko <sergepetrenko@tarantool.org> To: Sergey Bronnikov <sergeyb@tarantool.org> Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH 3/3] replication: add test with random leaders promotion and demotion Date: Tue, 1 Sep 2020 16:03:16 +0300 [thread overview] Message-ID: <9b889490-07da-9afc-cf2e-16d6700d255c@tarantool.org> (raw) In-Reply-To: <20200831100539.GA94343@pony.bronevichok.ru> 31.08.2020 13:05, Sergey Bronnikov пишет: > Sergey, thanks for review! > See my comments inline, test updated in a branch. Hi! Thanks for the fixes. See my comments below. > > On 13:49 Thu 27 Aug , Serge Petrenko wrote: >> Hi! Thanks for the fixes! >> I’m pasting parts of the patch below to comment on. >> >> +box.cfg({ >> + listen = instance_uri(INSTANCE_ID); >> + replication_connect_quorum = 3; >> + replication = { >> + instance_uri(1); >> + instance_uri(2); >> + instance_uri(3); >> + instance_uri(4); >> + instance_uri(5); >> + }; >> +}) >> >> >> * You should either omit `replication_connect_quorum` at all, or set it to 5. >> Omitting it will have the same effect. >> I think you meant `replication_synchro_quorum` here, then it makes sense >> to set it to 3. Also `replication_synchro_timeout` should be set here, >> I’ll mention it again below. > removed replication_connect_quorum at all + +box.once("bootstrap", function() + local test_run = require('test_run').new() + box.cfg{replication_synchro_timeout = 1000, replication_synchro_quorum = 5} + box.cfg{read_only = false} + box.schema.user.grant("guest", 'replication') +end) Since you moved space creation out of `box.once` call, you don't need test_run here. >> + >> +NUM_INSTANCES = 5 >> +BROKEN_QUORUM = NUM_INSTANCES + 1 >> + >> >> >> * BROKEN_QUORUM assigned but never used. > removed > >> >> + >> +test_run:create_cluster(SERVERS, "replication", {args="0.1"}) >> +test_run:wait_fullmesh(SERVERS) >> >> >> * You’re passing some argument to qsync1, … qsync5 instances, but you never use it. > removed argument > >> >> +current_leader_id = 1 >> +test_run:switch(SERVERS[current_leader_id]) >> +box.cfg{replication_synchro_timeout=1} >> >> >> * You should set `replication_synchro_timeout` on every instance, not only >> on qsync1 so you better move this box.cfg call to the instance file. >> Besides, the timeout should be bigger (much bigger), like Vlad said. >> We typically use 30 seconds for various replication timeouts. >> It’s fairly common when a test is stable on your machine, but is flaky on >> testing machines. > increased timeout up to 1000 and set synchro settings in a qsync.lua file > >> +test_run:switch('default') >> + >> +-- Testcase body. >> +for i=1,10 do \ >> + new_leader_id = random(current_leader_id, 1, #SERVERS) \ >> + test_run:switch(SERVERS[new_leader_id]) \ >> + box.cfg{read_only=false} \ >> + f1 = fiber.create(function() \ >> + pcall(box.space.sync:truncate{}) \ >> + end) \ >> >> >> * Why put truncate call in a separate fiber? >> >> Why use truncate at all? You may just replace all your `insert` calls below >> with `replace`, and then truncate won’t be needed. This is up to you >> though. > As far as I remember original idea was to cleanup before next insert. > I rewrote a body of test and keep all inserted values in a space and > after loop check amount of inserted values. > >> >> + f2 = fiber.create(function() \ >> + for i=1,10000 do box.space.sync:insert{i} end \ >> + end) \ >> >> * So you’re testing a case when a leader has some unconfirmed transactions in >> limbo and then a leader change happens. Then you need to call >> `clear_synchro_queue` on a new leader to wait for confirmation of old txns. Otherwise >> the new leader fails to insert its data, but the test doesn’t show this, because you >> don’t check fiber state or `insert()` return values. > added clear_synchro_queue() call to updated test > >> + test_run:switch('default') \ >> + test_run:switch(SERVERS[current_leader_id]) \ >> >> >> * The 2 lines above are useless. > removed in updated test: > > -- Testcase body. > for i=1,100 do \ > test_run:eval(SERVERS[current_leader_id], \ > "box.cfg{replication_synchro_quorum=6}") \ > test_run:eval(SERVERS[current_leader_id], \ > string.format("box.space.sync:insert{%d}", i)) \ > new_leader_id = random(current_leader_id, 1, #SERVERS) \ > test_run:eval(SERVERS[new_leader_id], \ > "box.cfg{replication_synchro_quorum=3}") \ > test_run:eval(SERVERS[new_leader_id], "box.ctl.clear_synchro_queue()") \ > replica = random(new_leader_id, 1, #SERVERS) \ > test_run:eval(SERVERS[replica], \ > string.format("box.space.sync:get{%d}", i)) \ It is not checked whether get returns any data or not. You only make a call but never check the return value. It'd also be nice to check whether the insert on the old leader (the one with quorum = 6) succeeds after the new leader executes `clear_synchro_queue`. > test_run:switch('default') \ > current_leader_id = new_leader_id \ > end > > test_run:switch('qsync1') > box.space.sync:count() -- 100 > >> >> + box.cfg{read_only=true} \ >> + test_run:switch('default') \ >> + current_leader_id = new_leader_id \ >> +end >> + >> +-- Teardown. >> -- >> Serge Petrenko -- Serge Petrenko
next prev parent reply other threads:[~2020-09-01 13:03 UTC|newest] Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <cover.1594314820.git.sergeyb@tarantool.org> 2020-07-09 17:16 ` [Tarantool-patches] [PATCH 1/3] replication: print number of txs in limbo before its clear sergeyb 2020-07-09 20:07 ` Vladislav Shpilevoy 2020-07-10 12:55 ` Sergey Bronnikov 2020-07-20 21:59 ` Vladislav Shpilevoy 2020-07-09 17:16 ` [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function sergeyb 2020-07-20 22:00 ` Vladislav Shpilevoy 2020-08-25 12:49 ` Sergey Bronnikov 2020-08-26 7:31 ` Serge Petrenko 2020-08-26 14:48 ` Sergey Bronnikov 2020-07-09 17:16 ` [Tarantool-patches] [PATCH 3/3] replication: add test with random leaders promotion and demotion sergeyb 2020-07-09 20:07 ` Vladislav Shpilevoy 2020-07-10 8:05 ` Sergey Bronnikov 2020-07-20 22:01 ` Vladislav Shpilevoy 2020-08-26 14:45 ` Sergey Bronnikov 2020-08-27 10:49 ` Serge Petrenko 2020-08-31 10:05 ` Sergey Bronnikov 2020-09-01 13:03 ` Serge Petrenko [this message] 2020-11-12 9:32 ` Sergey Bronnikov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=9b889490-07da-9afc-cf2e-16d6700d255c@tarantool.org \ --to=sergepetrenko@tarantool.org \ --cc=sergeyb@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 3/3] replication: add test with random leaders promotion and demotion' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox