[Tarantool-patches] [PATCH v2 6/7] replication: tolerate synchro rollback during final join

Serge Petrenko sergepetrenko at tarantool.org
Sat Mar 27 22:23:46 MSK 2021



26.03.2021 23:49, Vladislav Shpilevoy пишет:
> Thanks for working on this!
>
>> diff --git a/test/replication/gh-5566-final-join-synchro.result b/test/replication/gh-5566-final-join-synchro.result
>> new file mode 100644
>> index 000000000..32749bf12
>> --- /dev/null
>> +++ b/test/replication/gh-5566-final-join-synchro.result
>> @@ -0,0 +1,139 @@
>> +-- test-run result file version 2
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +
>> +--
>> +-- gh-5566 replica tolerates synchronous transactions in final join stream.
>> +--
>> +_ = box.schema.space.create('sync', {is_sync=true})
>> + | ---
>> + | ...
>> +_ = box.space.sync:create_index('pk')
>> + | ---
>> + | ...
>> +
>> +box.schema.user.grant('guest', 'replication')
>> + | ---
>> + | ...
>> +box.schema.user.grant('guest', 'write', 'space', 'sync')
>> + | ---
>> + | ...
>> +
>> +-- Part 1. Make sure a joining instance tolerates synchronous rows in final join
>> +-- stream.
>> +trig = function()\
>> +    box.space.sync:replace{1}\
>> +end
> You might need to increase the synchro timeout because the default can
> be flaky.

I think it's fine, since quorum is 1.
Do you think we can wait for a WAL write for more than 5 seconds?

Anyway, let's change it to a bigger value while we're at it.
I've also changed wait_log's timeout to 60 seconds:

==========================

diff --git a/test/replication/gh-5566-final-join-synchro.result 
b/test/replication/gh-5566-final-join-synchro.result
index 32749bf12..a09882ba6 100644
--- a/test/replication/gh-5566-final-join-synchro.result
+++ b/test/replication/gh-5566-final-join-synchro.result
@@ -35,7 +35,10 @@ _ = box.space._cluster:on_replace(trig)
  orig_synchro_quorum = box.cfg.replication_synchro_quorum
   | ---
   | ...
-box.cfg{replication_synchro_quorum=1}
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=1, replication_synchro_timeout=60}
   | ---
   | ...

@@ -73,9 +76,6 @@ test_run:cmd('delete server replica')

  -- Part 2. Make sure master aborts final join if insert to _cluster is 
rolled
  -- back and replica is capable of retrying it.
-orig_synchro_timeout = box.cfg.replication_synchro_timeout
- | ---
- | ...
  -- Make the trigger we used above fail with no quorum.
  box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.01}
   | ---
@@ -91,7 +91,7 @@ test_run:cmd('start server replica with wait=False')
   | - true
   | ...

-test_run:wait_log('replica', 'ER_SYNC_QUORUM_TIMEOUT', nil, 10)
+test_run:wait_log('replica', 'ER_SYNC_QUORUM_TIMEOUT', nil, 60)
   | ---
   | - ER_SYNC_QUORUM_TIMEOUT
   | ...
diff --git a/test/replication/gh-5566-final-join-synchro.test.lua 
b/test/replication/gh-5566-final-join-synchro.test.lua
index 14302f6e6..2db2c742f 100644
--- a/test/replication/gh-5566-final-join-synchro.test.lua
+++ b/test/replication/gh-5566-final-join-synchro.test.lua
@@ -18,7 +18,8 @@ end
  _ = box.space._cluster:on_replace(trig)

  orig_synchro_quorum = box.cfg.replication_synchro_quorum
-box.cfg{replication_synchro_quorum=1}
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+box.cfg{replication_synchro_quorum=1, replication_synchro_timeout=60}

  test_run:cmd('create server replica with rpl_master=default,\
script="replication/replica.lua"')
@@ -33,7 +34,6 @@ test_run:cmd('delete server replica')

  -- Part 2. Make sure master aborts final join if insert to _cluster is 
rolled
  -- back and replica is capable of retrying it.
-orig_synchro_timeout = box.cfg.replication_synchro_timeout
  -- Make the trigger we used above fail with no quorum.
  box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.01}
  -- Try to join the replica once again.
@@ -41,7 +41,7 @@ test_run:cmd('create server replica with 
rpl_master=default,\
script="replication/replica.lua"')
  test_run:cmd('start server replica with wait=False')

-test_run:wait_log('replica', 'ER_SYNC_QUORUM_TIMEOUT', nil, 10)
+test_run:wait_log('replica', 'ER_SYNC_QUORUM_TIMEOUT', nil, 60)
  -- Remove the trigger to let the replica connect.
  box.space._cluster:on_replace(nil, trig)

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list