[Tarantool-patches] [PATCH v2 6/7] replication: tolerate synchro rollback during final join
Serge Petrenko
sergepetrenko at tarantool.org
Sat Mar 27 22:23:46 MSK 2021
26.03.2021 23:49, Vladislav Shpilevoy пишет:
> Thanks for working on this!
>
>> diff --git a/test/replication/gh-5566-final-join-synchro.result b/test/replication/gh-5566-final-join-synchro.result
>> new file mode 100644
>> index 000000000..32749bf12
>> --- /dev/null
>> +++ b/test/replication/gh-5566-final-join-synchro.result
>> @@ -0,0 +1,139 @@
>> +-- test-run result file version 2
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +
>> +--
>> +-- gh-5566 replica tolerates synchronous transactions in final join stream.
>> +--
>> +_ = box.schema.space.create('sync', {is_sync=true})
>> + | ---
>> + | ...
>> +_ = box.space.sync:create_index('pk')
>> + | ---
>> + | ...
>> +
>> +box.schema.user.grant('guest', 'replication')
>> + | ---
>> + | ...
>> +box.schema.user.grant('guest', 'write', 'space', 'sync')
>> + | ---
>> + | ...
>> +
>> +-- Part 1. Make sure a joining instance tolerates synchronous rows in final join
>> +-- stream.
>> +trig = function()\
>> + box.space.sync:replace{1}\
>> +end
> You might need to increase the synchro timeout because the default can
> be flaky.
I think it's fine, since quorum is 1.
Do you think we can wait for a WAL write for more than 5 seconds?
Anyway, let's change it to a bigger value while we're at it.
I've also changed wait_log's timeout to 60 seconds:
==========================
diff --git a/test/replication/gh-5566-final-join-synchro.result
b/test/replication/gh-5566-final-join-synchro.result
index 32749bf12..a09882ba6 100644
--- a/test/replication/gh-5566-final-join-synchro.result
+++ b/test/replication/gh-5566-final-join-synchro.result
@@ -35,7 +35,10 @@ _ = box.space._cluster:on_replace(trig)
orig_synchro_quorum = box.cfg.replication_synchro_quorum
| ---
| ...
-box.cfg{replication_synchro_quorum=1}
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=1, replication_synchro_timeout=60}
| ---
| ...
@@ -73,9 +76,6 @@ test_run:cmd('delete server replica')
-- Part 2. Make sure master aborts final join if insert to _cluster is
rolled
-- back and replica is capable of retrying it.
-orig_synchro_timeout = box.cfg.replication_synchro_timeout
- | ---
- | ...
-- Make the trigger we used above fail with no quorum.
box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.01}
| ---
@@ -91,7 +91,7 @@ test_run:cmd('start server replica with wait=False')
| - true
| ...
-test_run:wait_log('replica', 'ER_SYNC_QUORUM_TIMEOUT', nil, 10)
+test_run:wait_log('replica', 'ER_SYNC_QUORUM_TIMEOUT', nil, 60)
| ---
| - ER_SYNC_QUORUM_TIMEOUT
| ...
diff --git a/test/replication/gh-5566-final-join-synchro.test.lua
b/test/replication/gh-5566-final-join-synchro.test.lua
index 14302f6e6..2db2c742f 100644
--- a/test/replication/gh-5566-final-join-synchro.test.lua
+++ b/test/replication/gh-5566-final-join-synchro.test.lua
@@ -18,7 +18,8 @@ end
_ = box.space._cluster:on_replace(trig)
orig_synchro_quorum = box.cfg.replication_synchro_quorum
-box.cfg{replication_synchro_quorum=1}
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+box.cfg{replication_synchro_quorum=1, replication_synchro_timeout=60}
test_run:cmd('create server replica with rpl_master=default,\
script="replication/replica.lua"')
@@ -33,7 +34,6 @@ test_run:cmd('delete server replica')
-- Part 2. Make sure master aborts final join if insert to _cluster is
rolled
-- back and replica is capable of retrying it.
-orig_synchro_timeout = box.cfg.replication_synchro_timeout
-- Make the trigger we used above fail with no quorum.
box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.01}
-- Try to join the replica once again.
@@ -41,7 +41,7 @@ test_run:cmd('create server replica with
rpl_master=default,\
script="replication/replica.lua"')
test_run:cmd('start server replica with wait=False')
-test_run:wait_log('replica', 'ER_SYNC_QUORUM_TIMEOUT', nil, 10)
+test_run:wait_log('replica', 'ER_SYNC_QUORUM_TIMEOUT', nil, 60)
-- Remove the trigger to let the replica connect.
box.space._cluster:on_replace(nil, trig)
--
Serge Petrenko
More information about the Tarantool-patches
mailing list