[Tarantool-patches] [PATCH 3/3] replication: add test with random leaders promotion and demotion
Sergey Bronnikov
sergeyb at tarantool.org
Mon Aug 31 13:05:39 MSK 2020
Sergey, thanks for review!
See my comments inline, test updated in a branch.
On 13:49 Thu 27 Aug , Serge Petrenko wrote:
>
> Hi! Thanks for the fixes!
> I’m pasting parts of the patch below to comment on.
>
> +box.cfg({
> + listen = instance_uri(INSTANCE_ID);
> + replication_connect_quorum = 3;
> + replication = {
> + instance_uri(1);
> + instance_uri(2);
> + instance_uri(3);
> + instance_uri(4);
> + instance_uri(5);
> + };
> +})
>
>
> * You should either omit `replication_connect_quorum` at all, or set it to 5.
> Omitting it will have the same effect.
> I think you meant `replication_synchro_quorum` here, then it makes sense
> to set it to 3. Also `replication_synchro_timeout` should be set here,
> I’ll mention it again below.
removed replication_connect_quorum at all
> +
> +NUM_INSTANCES = 5
> +BROKEN_QUORUM = NUM_INSTANCES + 1
> +
>
>
> * BROKEN_QUORUM assigned but never used.
removed
>
> +
> +test_run:create_cluster(SERVERS, "replication", {args="0.1"})
> +test_run:wait_fullmesh(SERVERS)
>
>
> * You’re passing some argument to qsync1, … qsync5 instances, but you never use it.
removed argument
>
> +current_leader_id = 1
> +test_run:switch(SERVERS[current_leader_id])
> +box.cfg{replication_synchro_timeout=1}
>
>
> * You should set `replication_synchro_timeout` on every instance, not only
> on qsync1 so you better move this box.cfg call to the instance file.
> Besides, the timeout should be bigger (much bigger), like Vlad said.
> We typically use 30 seconds for various replication timeouts.
> It’s fairly common when a test is stable on your machine, but is flaky on
> testing machines.
increased timeout up to 1000 and set synchro settings in a qsync.lua file
> +test_run:switch('default')
> +
> +-- Testcase body.
> +for i=1,10 do \
> + new_leader_id = random(current_leader_id, 1, #SERVERS) \
> + test_run:switch(SERVERS[new_leader_id]) \
> + box.cfg{read_only=false} \
> + f1 = fiber.create(function() \
> + pcall(box.space.sync:truncate{}) \
> + end) \
>
>
> * Why put truncate call in a separate fiber?
>
> Why use truncate at all? You may just replace all your `insert` calls below
> with `replace`, and then truncate won’t be needed. This is up to you
> though.
As far as I remember original idea was to cleanup before next insert.
I rewrote a body of test and keep all inserted values in a space and
after loop check amount of inserted values.
>
> + f2 = fiber.create(function() \
> + for i=1,10000 do box.space.sync:insert{i} end \
> + end) \
>
> * So you’re testing a case when a leader has some unconfirmed transactions in
> limbo and then a leader change happens. Then you need to call
> `clear_synchro_queue` on a new leader to wait for confirmation of old txns. Otherwise
> the new leader fails to insert its data, but the test doesn’t show this, because you
> don’t check fiber state or `insert()` return values.
added clear_synchro_queue() call to updated test
> + test_run:switch('default') \
> + test_run:switch(SERVERS[current_leader_id]) \
>
>
> * The 2 lines above are useless.
removed in updated test:
-- Testcase body.
for i=1,100 do \
test_run:eval(SERVERS[current_leader_id], \
"box.cfg{replication_synchro_quorum=6}") \
test_run:eval(SERVERS[current_leader_id], \
string.format("box.space.sync:insert{%d}", i)) \
new_leader_id = random(current_leader_id, 1, #SERVERS) \
test_run:eval(SERVERS[new_leader_id], \
"box.cfg{replication_synchro_quorum=3}") \
test_run:eval(SERVERS[new_leader_id], "box.ctl.clear_synchro_queue()") \
replica = random(new_leader_id, 1, #SERVERS) \
test_run:eval(SERVERS[replica], \
string.format("box.space.sync:get{%d}", i)) \
test_run:switch('default') \
current_leader_id = new_leader_id \
end
test_run:switch('qsync1')
box.space.sync:count() -- 100
>
> + box.cfg{read_only=true} \
> + test_run:switch('default') \
> + current_leader_id = new_leader_id \
> +end
> +
> +-- Teardown.
> --
> Serge Petrenko
--
sergeyb@
More information about the Tarantool-patches
mailing list