[Tarantool-patches] [PATCH 1/7] replication: always send raft state to subscribers

Serge Petrenko sergepetrenko at tarantool.org
Fri Jun 18 00:00:03 MSK 2021



15.06.2021 23:53, Vladislav Shpilevoy пишет:
> Thanks for working on this! On 10.06.2021 15:32, Serge Petrenko via 
> Tarantool-patches wrote:
>> Tarantool used to send out raft state on subscribe only when raft was 
>> enabled. This was a safeguard against partially-upgraded clusters, 
>> where some nodes had no clue about Raft messages and couldn't handle 
>> them properly. Actually, Raft state should be sent out always. For 
>> example, promote bumps Raft term, even when raft is disabled, and 
>> it's important that everyone in cluster has the same term, for the 
>> sake of promote at least. So, send out Raft state to every subscriber 
>> with version >= 2.6.0 (that's when Raft was introduced). Closes #5438 
>> --- src/box/box.cc | 11 +-- 
>> test/replication/gh-5438-raft-state.result | 73 ++++++++++++++++++++ 
>> test/replication/gh-5438-raft-state.test.lua | 30 ++++++++ 
> The test still works when I revert box.cc changes. 

Hi! Thanks for pointing this out!

Yep, this happened because of raft broadcasts.
When relay is off, it still saves the lates broadcast message from raft,
and delivers it as soon as replica reconnects.

Thanks to this, I've also fixed raft broadcasts possibly being sent to 
old instances.

This could only happen after the patch which persists the latest raft 
broadcast and only
if raft was turned on for some time. So this wasn't such a big deal for 
upgrading installations,
because they wouldn't turn raft on.

Anyway, this issue is fixed as well and I made the test fail without the 
patch.
Here's the diff:

================================================

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 1196b65b9..5d72cab13 100644
diff --git a/src/box/relay.cc b/src/box/relay.cc
index b1571b361..b47767769 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -760,7 +760,7 @@ relay_subscribe_f(va_list ap)
            &relay->relay_pipe, NULL, NULL, cbus_process);

      struct relay_is_raft_enabled_msg raft_enabler;
-    if (!relay->replica->anon)
+    if (!relay->replica->anon && relay->version_id >= version_id(2, 6, 0))
          relay_send_is_raft_enabled(relay, &raft_enabler, true);

      /*
@@ -842,7 +842,7 @@ relay_subscribe_f(va_list ap)
          cpipe_push(&relay->tx_pipe, &relay->status_msg.msg);
      }

-    if (!relay->replica->anon)
+    if (!relay->replica->anon && relay->version_id >= version_id(2, 6, 0))
          relay_send_is_raft_enabled(relay, &raft_enabler, false);

      /*
diff --git a/test/replication/gh-5438-raft-state.result 
b/test/replication/gh-5438-raft-state.result
index 7982796a8..6985f026a 100644
--- a/test/replication/gh-5438-raft-state.result
+++ b/test/replication/gh-5438-raft-state.result
@@ -6,26 +6,6 @@ test_run = require('test_run').new()
  --
  -- gh-5428 send out Raft state to subscribers, even when Raft is disabled.
  --
-box.schema.user.grant('guest', 'replication')
- | ---
- | ...
-test_run:cmd('create server replica with rpl_master=default,\
- script="replication/replica.lua"')
- | ---
- | - true
- | ...
-test_run:cmd('start server replica')
- | ---
- | - true
- | ...
-test_run:wait_lsn('replica', 'default')
- | ---
- | ...
-test_run:cmd('stop server replica')
- | ---
- | - true
- | ...
-
  -- Bump Raft term while the replica's offline.
  term = box.info.election.term
   | ---
@@ -41,10 +21,19 @@ test_run:wait_cond(function() return 
box.info.election.term > term end)
   | - true
   | ...

--- Make sure the replica receives new term on resubscribe.
+-- Make sure the replica receives new term on subscribe.
  box.cfg{election_mode = 'off'}
   | ---
   | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
  test_run:cmd('start server replica')
   | ---
   | - true
@@ -56,6 +45,7 @@ end)
   | ---
   | - true
   | ...
+
  -- Cleanup.
  box.cfg{election_mode = old_election_mode}
   | ---
diff --git a/test/replication/gh-5438-raft-state.test.lua 
b/test/replication/gh-5438-raft-state.test.lua
index 179f4b1c9..60c3366c1 100644
--- a/test/replication/gh-5438-raft-state.test.lua
+++ b/test/replication/gh-5438-raft-state.test.lua
@@ -3,26 +3,24 @@ test_run = require('test_run').new()
  --
  -- gh-5428 send out Raft state to subscribers, even when Raft is disabled.
  --
-box.schema.user.grant('guest', 'replication')
-test_run:cmd('create server replica with rpl_master=default,\
- script="replication/replica.lua"')
-test_run:cmd('start server replica')
-test_run:wait_lsn('replica', 'default')
-test_run:cmd('stop server replica')
-
  -- Bump Raft term while the replica's offline.
  term = box.info.election.term
  old_election_mode = box.cfg.election_mode
  box.cfg{election_mode = 'candidate'}
  test_run:wait_cond(function() return box.info.election.term > term end)

--- Make sure the replica receives new term on resubscribe.
+-- Make sure the replica receives new term on subscribe.
  box.cfg{election_mode = 'off'}
+
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
  test_run:cmd('start server replica')
  test_run:wait_cond(function()\
      return test_run:eval('replica', 'return 
box.info.election.term')[1] ==\
             box.info.election.term\
  end)
+
  -- Cleanup.
  box.cfg{election_mode = old_election_mode}
  test_run:cmd('stop server replica')

================================================



-- Serge Petrenko


More information about the Tarantool-patches mailing list