- * [Tarantool-patches] [PATCH 1/7] replication: always send raft state to subscribers
  2021-06-10 13:32 [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
@ 2021-06-10 13:32 ` Serge Petrenko via Tarantool-patches
  2021-06-10 16:47   ` Cyrill Gorcunov via Tarantool-patches
  2021-06-15 20:53   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 2/7] replication: forbid implicit limbo owner transition Serge Petrenko via Tarantool-patches
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-10 13:32 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Tarantool used to send out raft state on subscribe only when raft was
enabled. This was a safeguard against partially-upgraded clusters, where
some nodes had no clue about Raft messages and couldn't handle them
properly.
Actually, Raft state should be sent out always. For example, promote
bumps Raft term, even when raft is disabled, and it's important that
everyone in cluster has the same term, for the sake of promote at least.
So, send out Raft state to every subscriber with version >= 2.6.0
(that's when Raft was introduced).
Closes #5438
---
 src/box/box.cc                               | 11 +--
 test/replication/gh-5438-raft-state.result   | 73 ++++++++++++++++++++
 test/replication/gh-5438-raft-state.test.lua | 30 ++++++++
 test/replication/suite.cfg                   |  1 +
 4 files changed, 110 insertions(+), 5 deletions(-)
 create mode 100644 test/replication/gh-5438-raft-state.result
 create mode 100644 test/replication/gh-5438-raft-state.test.lua
diff --git a/src/box/box.cc b/src/box/box.cc
index 6dc991dc8..b9f3fab32 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -82,6 +82,7 @@
 #include "msgpack.h"
 #include "raft.h"
 #include "trivia/util.h"
+#include "version.h"
 
 static char status[64] = "unknown";
 
@@ -2811,13 +2812,13 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
 		 tt_uuid_str(&replica_uuid), sio_socketname(io->fd));
 	say_info("remote vclock %s local vclock %s",
 		 vclock_to_string(&replica_clock), vclock_to_string(&vclock));
-	if (raft_is_enabled(box_raft())) {
+	if (replica_version_id >= version_id(2, 6, 0) && !anon) {
 		/*
 		 * Send out the current raft state of the instance. Don't do
-		 * that if Raft is disabled. It can be that a part of the
-		 * cluster still contains old versions, which can't handle Raft
-		 * messages. So when it is disabled, its network footprint
-		 * should be 0.
+		 * that if the remote instance is old. It can be that a part of
+		 * the cluster still contains old versions, which can't handle
+		 * Raft messages. Raft's network footprint should be 0 as seen
+		 * by such instances.
 		 */
 		struct raft_request req;
 		box_raft_checkpoint_remote(&req);
diff --git a/test/replication/gh-5438-raft-state.result b/test/replication/gh-5438-raft-state.result
new file mode 100644
index 000000000..7982796a8
--- /dev/null
+++ b/test/replication/gh-5438-raft-state.result
@@ -0,0 +1,73 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+--
+-- gh-5428 send out Raft state to subscribers, even when Raft is disabled.
+--
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_lsn('replica', 'default')
+ | ---
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+
+-- Bump Raft term while the replica's offline.
+term = box.info.election.term
+ | ---
+ | ...
+old_election_mode = box.cfg.election_mode
+ | ---
+ | ...
+box.cfg{election_mode = 'candidate'}
+ | ---
+ | ...
+test_run:wait_cond(function() return box.info.election.term > term end)
+ | ---
+ | - true
+ | ...
+
+-- Make sure the replica receives new term on resubscribe.
+box.cfg{election_mode = 'off'}
+ | ---
+ | ...
+test_run:cmd('start server replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function()\
+    return test_run:eval('replica', 'return box.info.election.term')[1] ==\
+           box.info.election.term\
+end)
+ | ---
+ | - true
+ | ...
+-- Cleanup.
+box.cfg{election_mode = old_election_mode}
+ | ---
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
diff --git a/test/replication/gh-5438-raft-state.test.lua b/test/replication/gh-5438-raft-state.test.lua
new file mode 100644
index 000000000..179f4b1c9
--- /dev/null
+++ b/test/replication/gh-5438-raft-state.test.lua
@@ -0,0 +1,30 @@
+test_run = require('test_run').new()
+
+--
+-- gh-5428 send out Raft state to subscribers, even when Raft is disabled.
+--
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+test_run:cmd('start server replica')
+test_run:wait_lsn('replica', 'default')
+test_run:cmd('stop server replica')
+
+-- Bump Raft term while the replica's offline.
+term = box.info.election.term
+old_election_mode = box.cfg.election_mode
+box.cfg{election_mode = 'candidate'}
+test_run:wait_cond(function() return box.info.election.term > term end)
+
+-- Make sure the replica receives new term on resubscribe.
+box.cfg{election_mode = 'off'}
+test_run:cmd('start server replica')
+test_run:wait_cond(function()\
+    return test_run:eval('replica', 'return box.info.election.term')[1] ==\
+           box.info.election.term\
+end)
+-- Cleanup.
+box.cfg{election_mode = old_election_mode}
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 27eab20c2..46de2e6c4 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -19,6 +19,7 @@
     "gh-5213-qsync-applier-order-3.test.lua": {},
     "gh-5426-election-on-off.test.lua": {},
     "gh-5433-election-restart-recovery.test.lua": {},
+    "gh-5438-raft-state.test.lua": {},
     "gh-5445-leader-inconsistency.test.lua": {},
     "gh-5506-election-on-off.test.lua": {},
     "once.test.lua": {},
-- 
2.30.1 (Apple Git-130)
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 1/7] replication: always send raft state to subscribers
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 1/7] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
@ 2021-06-10 16:47   ` Cyrill Gorcunov via Tarantool-patches
  2021-06-11  8:43     ` Serge Petrenko via Tarantool-patches
  2021-06-15 20:53   ` Vladislav Shpilevoy via Tarantool-patches
  1 sibling, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-06-10 16:47 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: v.shpilevoy, tarantool-patches
On Thu, Jun 10, 2021 at 04:32:51PM +0300, Serge Petrenko wrote:
> -	if (raft_is_enabled(box_raft())) {
> +	if (replica_version_id >= version_id(2, 6, 0) && !anon) {
>  		/*
>  		 * Send out the current raft state of the instance. Don't do
> -		 * that if Raft is disabled. It can be that a part of the
> -		 * cluster still contains old versions, which can't handle Raft
> -		 * messages. So when it is disabled, its network footprint
> -		 * should be 0.
> +		 * that if the remote instance is old. It can be that a part of
> +		 * the cluster still contains old versions, which can't handle
> +		 * Raft messages. Raft's network footprint should be 0 as seen
> +		 * by such instances.
>  		 */
Serge, why can't we send raft state for anon replicas as well? As far
as I understand anon replicas do receive raft updates from WAL notifications,
right? So this is somehow inconsistent that WAL's based raft updates are
reaching anon replicas while initial state from subscribe state is not,
or I miss something?
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 1/7] replication: always send raft state to subscribers
  2021-06-10 16:47   ` Cyrill Gorcunov via Tarantool-patches
@ 2021-06-11  8:43     ` Serge Petrenko via Tarantool-patches
  2021-06-11  8:44       ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-11  8:43 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: v.shpilevoy, tarantool-patches
10.06.2021 19:47, Cyrill Gorcunov пишет:
> On Thu, Jun 10, 2021 at 04:32:51PM +0300, Serge Petrenko wrote:
>> -	if (raft_is_enabled(box_raft())) {
>> +	if (replica_version_id >= version_id(2, 6, 0) && !anon) {
>>   		/*
>>   		 * Send out the current raft state of the instance. Don't do
>> -		 * that if Raft is disabled. It can be that a part of the
>> -		 * cluster still contains old versions, which can't handle Raft
>> -		 * messages. So when it is disabled, its network footprint
>> -		 * should be 0.
>> +		 * that if the remote instance is old. It can be that a part of
>> +		 * the cluster still contains old versions, which can't handle
>> +		 * Raft messages. Raft's network footprint should be 0 as seen
>> +		 * by such instances.
>>   		 */
> Serge, why can't we send raft state for anon replicas as well? As far
> as I understand anon replicas do receive raft updates from WAL notifications,
> right? So this is somehow inconsistent that WAL's based raft updates are
> reaching anon replicas while initial state from subscribe state is not,
> or I miss something?
No, anon replicas don't receive Raft notifications at all.
See relay_send_is_raft_enabled() calls.
Also raft state is persisted as a local row, and it's not replicated
directly via WAL.
All raft state updates are pushed to peers via relay_push_raft().
So, I wanted to stay consistent here, i.e. not send raft state to 
anonymous replicas.
Besides, anonymous replicas can't enable elections at all, so there's no 
point in
sending the updates to them.
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread
 
- * Re: [Tarantool-patches] [PATCH 1/7] replication: always send raft state to subscribers
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 1/7] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
  2021-06-10 16:47   ` Cyrill Gorcunov via Tarantool-patches
@ 2021-06-15 20:53   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  1 sibling, 1 reply; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-15 20:53 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for working on this!
On 10.06.2021 15:32, Serge Petrenko via Tarantool-patches wrote:
> Tarantool used to send out raft state on subscribe only when raft was
> enabled. This was a safeguard against partially-upgraded clusters, where
> some nodes had no clue about Raft messages and couldn't handle them
> properly.
> 
> Actually, Raft state should be sent out always. For example, promote
> bumps Raft term, even when raft is disabled, and it's important that
> everyone in cluster has the same term, for the sake of promote at least.
> 
> So, send out Raft state to every subscriber with version >= 2.6.0
> (that's when Raft was introduced).
> 
> Closes #5438
> ---
>  src/box/box.cc                               | 11 +--
>  test/replication/gh-5438-raft-state.result   | 73 ++++++++++++++++++++
>  test/replication/gh-5438-raft-state.test.lua | 30 ++++++++
The test still works when I revert box.cc changes.
^ permalink raw reply	[flat|nested] 34+ messages in thread 
- * Re: [Tarantool-patches] [PATCH 1/7] replication: always send raft state to subscribers
  2021-06-15 20:53   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-17 21:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
15.06.2021 23:53, Vladislav Shpilevoy пишет:
> Thanks for working on this! On 10.06.2021 15:32, Serge Petrenko via 
> Tarantool-patches wrote:
>> Tarantool used to send out raft state on subscribe only when raft was 
>> enabled. This was a safeguard against partially-upgraded clusters, 
>> where some nodes had no clue about Raft messages and couldn't handle 
>> them properly. Actually, Raft state should be sent out always. For 
>> example, promote bumps Raft term, even when raft is disabled, and 
>> it's important that everyone in cluster has the same term, for the 
>> sake of promote at least. So, send out Raft state to every subscriber 
>> with version >= 2.6.0 (that's when Raft was introduced). Closes #5438 
>> --- src/box/box.cc | 11 +-- 
>> test/replication/gh-5438-raft-state.result | 73 ++++++++++++++++++++ 
>> test/replication/gh-5438-raft-state.test.lua | 30 ++++++++ 
> The test still works when I revert box.cc changes. 
Hi! Thanks for pointing this out!
Yep, this happened because of raft broadcasts.
When relay is off, it still saves the lates broadcast message from raft,
and delivers it as soon as replica reconnects.
Thanks to this, I've also fixed raft broadcasts possibly being sent to 
old instances.
This could only happen after the patch which persists the latest raft 
broadcast and only
if raft was turned on for some time. So this wasn't such a big deal for 
upgrading installations,
because they wouldn't turn raft on.
Anyway, this issue is fixed as well and I made the test fail without the 
patch.
Here's the diff:
================================================
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 1196b65b9..5d72cab13 100644
diff --git a/src/box/relay.cc b/src/box/relay.cc
index b1571b361..b47767769 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -760,7 +760,7 @@ relay_subscribe_f(va_list ap)
            &relay->relay_pipe, NULL, NULL, cbus_process);
      struct relay_is_raft_enabled_msg raft_enabler;
-    if (!relay->replica->anon)
+    if (!relay->replica->anon && relay->version_id >= version_id(2, 6, 0))
          relay_send_is_raft_enabled(relay, &raft_enabler, true);
      /*
@@ -842,7 +842,7 @@ relay_subscribe_f(va_list ap)
          cpipe_push(&relay->tx_pipe, &relay->status_msg.msg);
      }
-    if (!relay->replica->anon)
+    if (!relay->replica->anon && relay->version_id >= version_id(2, 6, 0))
          relay_send_is_raft_enabled(relay, &raft_enabler, false);
      /*
diff --git a/test/replication/gh-5438-raft-state.result 
b/test/replication/gh-5438-raft-state.result
index 7982796a8..6985f026a 100644
--- a/test/replication/gh-5438-raft-state.result
+++ b/test/replication/gh-5438-raft-state.result
@@ -6,26 +6,6 @@ test_run = require('test_run').new()
  --
  -- gh-5428 send out Raft state to subscribers, even when Raft is disabled.
  --
-box.schema.user.grant('guest', 'replication')
- | ---
- | ...
-test_run:cmd('create server replica with rpl_master=default,\
- script="replication/replica.lua"')
- | ---
- | - true
- | ...
-test_run:cmd('start server replica')
- | ---
- | - true
- | ...
-test_run:wait_lsn('replica', 'default')
- | ---
- | ...
-test_run:cmd('stop server replica')
- | ---
- | - true
- | ...
-
  -- Bump Raft term while the replica's offline.
  term = box.info.election.term
   | ---
@@ -41,10 +21,19 @@ test_run:wait_cond(function() return 
box.info.election.term > term end)
   | - true
   | ...
--- Make sure the replica receives new term on resubscribe.
+-- Make sure the replica receives new term on subscribe.
  box.cfg{election_mode = 'off'}
   | ---
   | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
  test_run:cmd('start server replica')
   | ---
   | - true
@@ -56,6 +45,7 @@ end)
   | ---
   | - true
   | ...
+
  -- Cleanup.
  box.cfg{election_mode = old_election_mode}
   | ---
diff --git a/test/replication/gh-5438-raft-state.test.lua 
b/test/replication/gh-5438-raft-state.test.lua
index 179f4b1c9..60c3366c1 100644
--- a/test/replication/gh-5438-raft-state.test.lua
+++ b/test/replication/gh-5438-raft-state.test.lua
@@ -3,26 +3,24 @@ test_run = require('test_run').new()
  --
  -- gh-5428 send out Raft state to subscribers, even when Raft is disabled.
  --
-box.schema.user.grant('guest', 'replication')
-test_run:cmd('create server replica with rpl_master=default,\
- script="replication/replica.lua"')
-test_run:cmd('start server replica')
-test_run:wait_lsn('replica', 'default')
-test_run:cmd('stop server replica')
-
  -- Bump Raft term while the replica's offline.
  term = box.info.election.term
  old_election_mode = box.cfg.election_mode
  box.cfg{election_mode = 'candidate'}
  test_run:wait_cond(function() return box.info.election.term > term end)
--- Make sure the replica receives new term on resubscribe.
+-- Make sure the replica receives new term on subscribe.
  box.cfg{election_mode = 'off'}
+
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
  test_run:cmd('start server replica')
  test_run:wait_cond(function()\
      return test_run:eval('replica', 'return 
box.info.election.term')[1] ==\
             box.info.election.term\
  end)
+
  -- Cleanup.
  box.cfg{election_mode = old_election_mode}
  test_run:cmd('stop server replica')
================================================
-- Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread
 
 
- * [Tarantool-patches] [PATCH 2/7] replication: forbid implicit limbo owner transition
  2021-06-10 13:32 [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 1/7] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
@ 2021-06-10 13:32 ` Serge Petrenko via Tarantool-patches
  2021-06-15 20:55   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-18 22:49   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 3/7] txn_limbo: fix promote term filtering Serge Petrenko via Tarantool-patches
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-10 13:32 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Forbid limbo ownership transition without an explicit promote.
Make it so that synchronous transactions may be committed only after it
is claimed by some instance via a PROMOTE request.
Make everyone but the limbo owner read-only even when the limbo is
empty.
Part-of #6034
@TarantoolBot document
Title: synchronous replication changes
`box.info.synchro.queue` receives a new field: `owner`. It's a replica
id of the instance owning the synchronous transaction limbo.
Once some instance owns the limbo, every other instance becomes
read-only. When the limbo is unclaimed, e.g.
`box.info.synchro.queue.owner` is `0`, everyone may be writeable, but
cannot create synchronous transactions.
In order to claim or re-claim the limbo, you have to issue
`box.ctl.promote()` on the instance you wish to promote.
When elections are enabled, the instance issues `box.ctl.promote()`
automatically once it wins the elections.
---
 src/box/errcode.h                             |   1 +
 src/box/lua/info.c                            |   4 +-
 src/box/txn_limbo.c                           |  35 ++---
 test/box/alter.result                         |   2 +-
 test/box/error.result                         |   1 +
 .../gh-5140-qsync-casc-rollback.result        |   3 +
 .../gh-5140-qsync-casc-rollback.test.lua      |   1 +
 .../gh-5144-qsync-dup-confirm.result          |   3 +
 .../gh-5144-qsync-dup-confirm.test.lua        |   1 +
 .../gh-5163-qsync-restart-crash.result        |   3 +
 .../gh-5163-qsync-restart-crash.test.lua      |   1 +
 .../gh-5167-qsync-rollback-snap.result        |   3 +
 .../gh-5167-qsync-rollback-snap.test.lua      |   1 +
 .../gh-5195-qsync-replica-write.result        |   7 +-
 .../gh-5195-qsync-replica-write.test.lua      |   5 +-
 .../gh-5213-qsync-applier-order-3.result      |   6 +
 .../gh-5213-qsync-applier-order-3.test.lua    |   2 +
 .../gh-5213-qsync-applier-order.result        |   3 +
 .../gh-5213-qsync-applier-order.test.lua      |   1 +
 .../replication/gh-5288-qsync-recovery.result |   3 +
 .../gh-5288-qsync-recovery.test.lua           |   1 +
 .../gh-5298-qsync-recovery-snap.result        |   3 +
 .../gh-5298-qsync-recovery-snap.test.lua      |   1 +
 ...sync-clear-synchro-queue-commit-all.result |   3 +
 ...nc-clear-synchro-queue-commit-all.test.lua |   1 +
 test/replication/gh-5440-qsync-ro.result      | 133 ------------------
 test/replication/gh-5440-qsync-ro.test.lua    |  53 -------
 .../gh-5446-qsync-eval-quorum.result          |   3 +
 .../gh-5446-qsync-eval-quorum.test.lua        |   1 +
 .../gh-5566-final-join-synchro.result         |   3 +
 .../gh-5566-final-join-synchro.test.lua       |   1 +
 .../gh-5874-qsync-txn-recovery.result         |   3 +
 .../gh-5874-qsync-txn-recovery.test.lua       |   1 +
 .../gh-6057-qsync-confirm-async-no-wal.result |   4 +
 ...h-6057-qsync-confirm-async-no-wal.test.lua |   2 +
 test/replication/hang_on_synchro_fail.result  |   3 +
 .../replication/hang_on_synchro_fail.test.lua |   1 +
 test/replication/qsync_advanced.result        |   9 ++
 test/replication/qsync_advanced.test.lua      |   3 +
 test/replication/qsync_basic.result           |  64 +--------
 test/replication/qsync_basic.test.lua         |  24 +---
 test/replication/qsync_errinj.result          |   3 +
 test/replication/qsync_errinj.test.lua        |   1 +
 test/replication/qsync_snapshots.result       |   3 +
 test/replication/qsync_snapshots.test.lua     |   1 +
 test/replication/qsync_with_anon.result       |   3 +
 test/replication/qsync_with_anon.test.lua     |   1 +
 47 files changed, 114 insertions(+), 301 deletions(-)
 delete mode 100644 test/replication/gh-5440-qsync-ro.result
 delete mode 100644 test/replication/gh-5440-qsync-ro.test.lua
diff --git a/src/box/errcode.h b/src/box/errcode.h
index d93820e96..e75f54a01 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -277,6 +277,7 @@ struct errcode_record {
 	/*222 */_(ER_QUORUM_WAIT,		"Couldn't wait for quorum %d: %s") \
 	/*223 */_(ER_INTERFERING_PROMOTE,	"Instance with replica id %u was promoted first") \
 	/*224 */_(ER_RAFT_DISABLED,		"Elections were turned off while running box.ctl.promote()")\
+	/*225 */_(ER_LIMBO_UNCLAIMED,		"Synchronous transaction limbo doesn't belong to any instance")\
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index 0eb48b823..7e3cd0b7d 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -611,9 +611,11 @@ lbox_info_synchro(struct lua_State *L)
 
 	/* Queue information. */
 	struct txn_limbo *queue = &txn_limbo;
-	lua_createtable(L, 0, 1);
+	lua_createtable(L, 0, 2);
 	lua_pushnumber(L, queue->len);
 	lua_setfield(L, -2, "len");
+	lua_pushnumber(L, queue->owner_id);
+	lua_setfield(L, -2, "owner");
 	lua_setfield(L, -2, "queue");
 
 	return 1;
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index dae6d2df4..53233add3 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -55,7 +55,8 @@ txn_limbo_create(struct txn_limbo *limbo)
 bool
 txn_limbo_is_ro(struct txn_limbo *limbo)
 {
-	return limbo->owner_id != instance_id && !txn_limbo_is_empty(limbo);
+	return limbo->owner_id != REPLICA_ID_NIL &&
+	       limbo->owner_id != instance_id;
 }
 
 struct txn_limbo_entry *
@@ -94,18 +95,13 @@ txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
 	}
 	if (id == 0)
 		id = instance_id;
-	bool make_ro = false;
-	if (limbo->owner_id != id) {
-		if (rlist_empty(&limbo->queue)) {
-			limbo->owner_id = id;
-			limbo->confirmed_lsn = 0;
-			if (id != instance_id)
-				make_ro = true;
-		} else {
-			diag_set(ClientError, ER_UNCOMMITTED_FOREIGN_SYNC_TXNS,
-				 limbo->owner_id);
-			return NULL;
-		}
+	if  (limbo->owner_id == REPLICA_ID_NIL) {
+		diag_set(ClientError, ER_LIMBO_UNCLAIMED);
+		return NULL;
+	} else if (limbo->owner_id != id) {
+		diag_set(ClientError, ER_UNCOMMITTED_FOREIGN_SYNC_TXNS,
+			 limbo->owner_id);
+		return NULL;
 	}
 	size_t size;
 	struct txn_limbo_entry *e = region_alloc_object(&txn->region,
@@ -121,12 +117,6 @@ txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
 	e->is_rollback = false;
 	rlist_add_tail_entry(&limbo->queue, e, in_queue);
 	limbo->len++;
-	/*
-	 * We added new entries from a remote instance to an empty limbo.
-	 * Time to make this instance read-only.
-	 */
-	if (make_ro)
-		box_update_ro_summary();
 	return e;
 }
 
@@ -427,9 +417,6 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
 		assert(e->txn->signature >= 0);
 		txn_complete_success(e->txn);
 	}
-	/* Update is_ro once the limbo is clear. */
-	if (txn_limbo_is_empty(limbo))
-		box_update_ro_summary();
 }
 
 /**
@@ -477,9 +464,6 @@ txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
 		if (e == last_rollback)
 			break;
 	}
-	/* Update is_ro once the limbo is clear. */
-	if (txn_limbo_is_empty(limbo))
-		box_update_ro_summary();
 }
 
 void
@@ -510,6 +494,7 @@ txn_limbo_read_promote(struct txn_limbo *limbo, uint32_t replica_id,
 	txn_limbo_read_rollback(limbo, lsn + 1);
 	assert(txn_limbo_is_empty(&txn_limbo));
 	limbo->owner_id = replica_id;
+	box_update_ro_summary();
 	limbo->confirmed_lsn = 0;
 }
 
diff --git a/test/box/alter.result b/test/box/alter.result
index a7bffce10..b903bf0d0 100644
--- a/test/box/alter.result
+++ b/test/box/alter.result
@@ -1464,7 +1464,7 @@ assert(s.is_sync)
 ...
 s:replace{1}
 ---
-- error: Quorum collection for a synchronous transaction is timed out
+- error: Synchronous transaction limbo doesn't belong to any instance
 ...
 -- When not specified or nil - ignored.
 s:alter({is_sync = nil})
diff --git a/test/box/error.result b/test/box/error.result
index cc8cbaaa9..6420f92bc 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -443,6 +443,7 @@ t;
  |   222: box.error.QUORUM_WAIT
  |   223: box.error.INTERFERING_PROMOTE
  |   224: box.error.RAFT_DISABLED
+ |   225: box.error.LIMBO_UNCLAIMED
  | ...
 
 test_run:cmd("setopt delimiter ''");
diff --git a/test/replication/gh-5140-qsync-casc-rollback.result b/test/replication/gh-5140-qsync-casc-rollback.result
index da77631dd..0eec3f10a 100644
--- a/test/replication/gh-5140-qsync-casc-rollback.result
+++ b/test/replication/gh-5140-qsync-casc-rollback.result
@@ -73,6 +73,9 @@ _ = box.schema.space.create('async', {is_sync=false, engine = engine})
 _ = _:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 -- Write something to flush the master state to replica.
 box.space.sync:replace{1}
  | ---
diff --git a/test/replication/gh-5140-qsync-casc-rollback.test.lua b/test/replication/gh-5140-qsync-casc-rollback.test.lua
index 69fc9ad02..d45390aa6 100644
--- a/test/replication/gh-5140-qsync-casc-rollback.test.lua
+++ b/test/replication/gh-5140-qsync-casc-rollback.test.lua
@@ -48,6 +48,7 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = _:create_index('pk')
 _ = box.schema.space.create('async', {is_sync=false, engine = engine})
 _ = _:create_index('pk')
+box.ctl.promote()
 -- Write something to flush the master state to replica.
 box.space.sync:replace{1}
 
diff --git a/test/replication/gh-5144-qsync-dup-confirm.result b/test/replication/gh-5144-qsync-dup-confirm.result
index 9d265d9ff..01c7501f1 100644
--- a/test/replication/gh-5144-qsync-dup-confirm.result
+++ b/test/replication/gh-5144-qsync-dup-confirm.result
@@ -46,6 +46,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = _:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 -- Remember the current LSN. In the end, when the following synchronous
 -- transaction is committed, result LSN should be this value +2: for the
diff --git a/test/replication/gh-5144-qsync-dup-confirm.test.lua b/test/replication/gh-5144-qsync-dup-confirm.test.lua
index 01a8351e0..9a295192a 100644
--- a/test/replication/gh-5144-qsync-dup-confirm.test.lua
+++ b/test/replication/gh-5144-qsync-dup-confirm.test.lua
@@ -19,6 +19,7 @@ box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
 
 _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = _:create_index('pk')
+box.ctl.promote()
 
 -- Remember the current LSN. In the end, when the following synchronous
 -- transaction is committed, result LSN should be this value +2: for the
diff --git a/test/replication/gh-5163-qsync-restart-crash.result b/test/replication/gh-5163-qsync-restart-crash.result
index e57bc76d1..8c9d43a9b 100644
--- a/test/replication/gh-5163-qsync-restart-crash.result
+++ b/test/replication/gh-5163-qsync-restart-crash.result
@@ -16,6 +16,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 box.space.sync:replace{1}
  | ---
diff --git a/test/replication/gh-5163-qsync-restart-crash.test.lua b/test/replication/gh-5163-qsync-restart-crash.test.lua
index d5aca4749..a0c2533c6 100644
--- a/test/replication/gh-5163-qsync-restart-crash.test.lua
+++ b/test/replication/gh-5163-qsync-restart-crash.test.lua
@@ -7,6 +7,7 @@ engine = test_run:get_cfg('engine')
 --
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 
 box.space.sync:replace{1}
 test_run:cmd('restart server default')
diff --git a/test/replication/gh-5167-qsync-rollback-snap.result b/test/replication/gh-5167-qsync-rollback-snap.result
index 06f58526c..ddb3212ad 100644
--- a/test/replication/gh-5167-qsync-rollback-snap.result
+++ b/test/replication/gh-5167-qsync-rollback-snap.result
@@ -41,6 +41,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 -- Write something to flush the current master's state to replica.
 _ = box.space.sync:insert{1}
  | ---
diff --git a/test/replication/gh-5167-qsync-rollback-snap.test.lua b/test/replication/gh-5167-qsync-rollback-snap.test.lua
index 475727e61..1590a515a 100644
--- a/test/replication/gh-5167-qsync-rollback-snap.test.lua
+++ b/test/replication/gh-5167-qsync-rollback-snap.test.lua
@@ -16,6 +16,7 @@ fiber = require('fiber')
 box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
 _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 -- Write something to flush the current master's state to replica.
 _ = box.space.sync:insert{1}
 _ = box.space.sync:delete{1}
diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result
index 85e00e6ed..fcf96cd1a 100644
--- a/test/replication/gh-5195-qsync-replica-write.result
+++ b/test/replication/gh-5195-qsync-replica-write.result
@@ -40,6 +40,9 @@ _ = box.schema.space.create('sync', {engine = engine, is_sync = true})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 box.cfg{replication_synchro_timeout = 1000, replication_synchro_quorum = 3}
  | ---
@@ -71,12 +74,12 @@ test_run:wait_lsn('replica', 'default')
  | ---
  | ...
 -- Normal DML is blocked - the limbo is not empty and does not belong to the
--- replica. But synchro queue cleanup also does a WAL write, and propagates LSN
+-- replica. But promote also does a WAL write, and propagates LSN
 -- of the instance.
 box.cfg{replication_synchro_timeout = 0.001}
  | ---
  | ...
-box.ctl.clear_synchro_queue()
+box.ctl.promote()
  | ---
  | ...
 
diff --git a/test/replication/gh-5195-qsync-replica-write.test.lua b/test/replication/gh-5195-qsync-replica-write.test.lua
index 64c48be99..751ceee69 100644
--- a/test/replication/gh-5195-qsync-replica-write.test.lua
+++ b/test/replication/gh-5195-qsync-replica-write.test.lua
@@ -17,6 +17,7 @@ test_run:cmd('start server replica with wait=True, wait_load=True')
 --
 _ = box.schema.space.create('sync', {engine = engine, is_sync = true})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 
 box.cfg{replication_synchro_timeout = 1000, replication_synchro_quorum = 3}
 lsn = box.info.lsn
@@ -30,10 +31,10 @@ test_run:wait_cond(function() return box.info.lsn == lsn end)
 test_run:switch('replica')
 test_run:wait_lsn('replica', 'default')
 -- Normal DML is blocked - the limbo is not empty and does not belong to the
--- replica. But synchro queue cleanup also does a WAL write, and propagates LSN
+-- replica. But promote also does a WAL write, and propagates LSN
 -- of the instance.
 box.cfg{replication_synchro_timeout = 0.001}
-box.ctl.clear_synchro_queue()
+box.ctl.promote()
 
 test_run:switch('default')
 -- Wait second ACK receipt.
diff --git a/test/replication/gh-5213-qsync-applier-order-3.result b/test/replication/gh-5213-qsync-applier-order-3.result
index bcb18b5c0..90608dcdc 100644
--- a/test/replication/gh-5213-qsync-applier-order-3.result
+++ b/test/replication/gh-5213-qsync-applier-order-3.result
@@ -45,6 +45,9 @@ s = box.schema.space.create('test', {is_sync = true})
 _ = s:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 test_run:cmd('create server replica1 with rpl_master=default,\
               script="replication/replica1.lua"')
@@ -179,6 +182,9 @@ box.cfg{
 -- Replica2 takes the limbo ownership and sends the transaction to the replica1.
 -- Along with the CONFIRM from the default node, which is still not applied
 -- on the replica1.
+box.ctl.promote()
+ | ---
+ | ...
 fiber = require('fiber')
  | ---
  | ...
diff --git a/test/replication/gh-5213-qsync-applier-order-3.test.lua b/test/replication/gh-5213-qsync-applier-order-3.test.lua
index 37b569da7..669ab3677 100644
--- a/test/replication/gh-5213-qsync-applier-order-3.test.lua
+++ b/test/replication/gh-5213-qsync-applier-order-3.test.lua
@@ -30,6 +30,7 @@ box.schema.user.grant('guest', 'super')
 
 s = box.schema.space.create('test', {is_sync = true})
 _ = s:create_index('pk')
+box.ctl.promote()
 
 test_run:cmd('create server replica1 with rpl_master=default,\
               script="replication/replica1.lua"')
@@ -90,6 +91,7 @@ box.cfg{
 -- Replica2 takes the limbo ownership and sends the transaction to the replica1.
 -- Along with the CONFIRM from the default node, which is still not applied
 -- on the replica1.
+box.ctl.promote()
 fiber = require('fiber')
 f = fiber.new(function() box.space.test:replace{2} end)
 
diff --git a/test/replication/gh-5213-qsync-applier-order.result b/test/replication/gh-5213-qsync-applier-order.result
index a8c24c289..3e1f2871b 100644
--- a/test/replication/gh-5213-qsync-applier-order.result
+++ b/test/replication/gh-5213-qsync-applier-order.result
@@ -29,6 +29,9 @@ s = box.schema.space.create('test', {is_sync = true})
 _ = s:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 test_run:cmd('create server replica with rpl_master=default,\
               script="replication/gh-5213-replica.lua"')
diff --git a/test/replication/gh-5213-qsync-applier-order.test.lua b/test/replication/gh-5213-qsync-applier-order.test.lua
index f1eccfa84..010c08ca7 100644
--- a/test/replication/gh-5213-qsync-applier-order.test.lua
+++ b/test/replication/gh-5213-qsync-applier-order.test.lua
@@ -14,6 +14,7 @@ box.schema.user.grant('guest', 'super')
 
 s = box.schema.space.create('test', {is_sync = true})
 _ = s:create_index('pk')
+box.ctl.promote()
 
 test_run:cmd('create server replica with rpl_master=default,\
               script="replication/gh-5213-replica.lua"')
diff --git a/test/replication/gh-5288-qsync-recovery.result b/test/replication/gh-5288-qsync-recovery.result
index dc0babef6..42662cd38 100644
--- a/test/replication/gh-5288-qsync-recovery.result
+++ b/test/replication/gh-5288-qsync-recovery.result
@@ -12,6 +12,9 @@ s = box.schema.space.create('sync', {is_sync = true})
 _ = s:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 s:insert{1}
  | ---
  | - [1]
diff --git a/test/replication/gh-5288-qsync-recovery.test.lua b/test/replication/gh-5288-qsync-recovery.test.lua
index 00bff7b87..24727d59e 100644
--- a/test/replication/gh-5288-qsync-recovery.test.lua
+++ b/test/replication/gh-5288-qsync-recovery.test.lua
@@ -5,6 +5,7 @@ test_run = require('test_run').new()
 --
 s = box.schema.space.create('sync', {is_sync = true})
 _ = s:create_index('pk')
+box.ctl.promote()
 s:insert{1}
 box.snapshot()
 test_run:cmd('restart server default')
diff --git a/test/replication/gh-5298-qsync-recovery-snap.result b/test/replication/gh-5298-qsync-recovery-snap.result
index 922831552..ac6ccbc36 100644
--- a/test/replication/gh-5298-qsync-recovery-snap.result
+++ b/test/replication/gh-5298-qsync-recovery-snap.result
@@ -17,6 +17,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 for i = 1, 10 do box.space.sync:replace{i} end
  | ---
  | ...
diff --git a/test/replication/gh-5298-qsync-recovery-snap.test.lua b/test/replication/gh-5298-qsync-recovery-snap.test.lua
index 187f60d75..30f975c6c 100644
--- a/test/replication/gh-5298-qsync-recovery-snap.test.lua
+++ b/test/replication/gh-5298-qsync-recovery-snap.test.lua
@@ -8,6 +8,7 @@ engine = test_run:get_cfg('engine')
 --
 _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 for i = 1, 10 do box.space.sync:replace{i} end
 
 -- Local rows could affect this by increasing the signature.
diff --git a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result
index 2699231e5..20fab4072 100644
--- a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result
+++ b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result
@@ -49,6 +49,9 @@ _ = box.schema.space.create('test', {is_sync=true})
 _ = box.space.test:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 -- Fill the limbo with pending entries. 3 mustn't receive them yet.
 test_run:cmd('stop server election_replica3')
diff --git a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua
index 03705d96c..ec0f1d77e 100644
--- a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua
+++ b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua
@@ -21,6 +21,7 @@ box.ctl.wait_rw()
 
 _ = box.schema.space.create('test', {is_sync=true})
 _ = box.space.test:create_index('pk')
+box.ctl.promote()
 
 -- Fill the limbo with pending entries. 3 mustn't receive them yet.
 test_run:cmd('stop server election_replica3')
diff --git a/test/replication/gh-5440-qsync-ro.result b/test/replication/gh-5440-qsync-ro.result
deleted file mode 100644
index 1ece26a42..000000000
--- a/test/replication/gh-5440-qsync-ro.result
+++ /dev/null
@@ -1,133 +0,0 @@
--- test-run result file version 2
---
--- gh-5440 everyone but the limbo owner is read-only on non-empty limbo.
---
-env = require('test_run')
- | ---
- | ...
-test_run = env.new()
- | ---
- | ...
-fiber = require('fiber')
- | ---
- | ...
-
-box.schema.user.grant('guest', 'replication')
- | ---
- | ...
-test_run:cmd('create server replica with rpl_master=default, script="replication/replica.lua"')
- | ---
- | - true
- | ...
-test_run:cmd('start server replica with wait=True, wait_load=True')
- | ---
- | - true
- | ...
-
-_ = box.schema.space.create('test', {is_sync=true})
- | ---
- | ...
-_ = box.space.test:create_index('pk')
- | ---
- | ...
-
-old_synchro_quorum = box.cfg.replication_synchro_quorum
- | ---
- | ...
-old_synchro_timeout = box.cfg.replication_synchro_timeout
- | ---
- | ...
-
--- Make sure that the master stalls on commit leaving the limbo non-empty.
-box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=1000}
- | ---
- | ...
-
-f = fiber.new(function() box.space.test:insert{1} end)
- | ---
- | ...
-f:status()
- | ---
- | - suspended
- | ...
-
--- Wait till replica's limbo is non-empty.
-test_run:wait_lsn('replica', 'default')
- | ---
- | ...
-test_run:cmd('switch replica')
- | ---
- | - true
- | ...
-
-box.info.ro
- | ---
- | - true
- | ...
-box.space.test:insert{2}
- | ---
- | - error: Can't modify data because this instance is in read-only mode.
- | ...
-success = false
- | ---
- | ...
-f = require('fiber').new(function() box.ctl.wait_rw() success = true end)
- | ---
- | ...
-f:status()
- | ---
- | - suspended
- | ...
-
-test_run:cmd('switch default')
- | ---
- | - true
- | ...
-
--- Empty the limbo.
-box.cfg{replication_synchro_quorum=2}
- | ---
- | ...
-
-test_run:cmd('switch replica')
- | ---
- | - true
- | ...
-
-test_run:wait_cond(function() return success end)
- | ---
- | - true
- | ...
-box.info.ro
- | ---
- | - false
- | ...
--- Should succeed now.
-box.space.test:insert{2}
- | ---
- | - [2]
- | ...
-
--- Cleanup.
-test_run:cmd('switch default')
- | ---
- | - true
- | ...
-box.cfg{replication_synchro_quorum=old_synchro_quorum,\
-        replication_synchro_timeout=old_synchro_timeout}
- | ---
- | ...
-box.space.test:drop()
- | ---
- | ...
-test_run:cmd('stop server replica')
- | ---
- | - true
- | ...
-test_run:cmd('delete server replica')
- | ---
- | - true
- | ...
-box.schema.user.revoke('guest', 'replication')
- | ---
- | ...
diff --git a/test/replication/gh-5440-qsync-ro.test.lua b/test/replication/gh-5440-qsync-ro.test.lua
deleted file mode 100644
index d63ec9c1e..000000000
--- a/test/replication/gh-5440-qsync-ro.test.lua
+++ /dev/null
@@ -1,53 +0,0 @@
---
--- gh-5440 everyone but the limbo owner is read-only on non-empty limbo.
---
-env = require('test_run')
-test_run = env.new()
-fiber = require('fiber')
-
-box.schema.user.grant('guest', 'replication')
-test_run:cmd('create server replica with rpl_master=default, script="replication/replica.lua"')
-test_run:cmd('start server replica with wait=True, wait_load=True')
-
-_ = box.schema.space.create('test', {is_sync=true})
-_ = box.space.test:create_index('pk')
-
-old_synchro_quorum = box.cfg.replication_synchro_quorum
-old_synchro_timeout = box.cfg.replication_synchro_timeout
-
--- Make sure that the master stalls on commit leaving the limbo non-empty.
-box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=1000}
-
-f = fiber.new(function() box.space.test:insert{1} end)
-f:status()
-
--- Wait till replica's limbo is non-empty.
-test_run:wait_lsn('replica', 'default')
-test_run:cmd('switch replica')
-
-box.info.ro
-box.space.test:insert{2}
-success = false
-f = require('fiber').new(function() box.ctl.wait_rw() success = true end)
-f:status()
-
-test_run:cmd('switch default')
-
--- Empty the limbo.
-box.cfg{replication_synchro_quorum=2}
-
-test_run:cmd('switch replica')
-
-test_run:wait_cond(function() return success end)
-box.info.ro
--- Should succeed now.
-box.space.test:insert{2}
-
--- Cleanup.
-test_run:cmd('switch default')
-box.cfg{replication_synchro_quorum=old_synchro_quorum,\
-        replication_synchro_timeout=old_synchro_timeout}
-box.space.test:drop()
-test_run:cmd('stop server replica')
-test_run:cmd('delete server replica')
-box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/gh-5446-qsync-eval-quorum.result b/test/replication/gh-5446-qsync-eval-quorum.result
index 5f83b248c..fe9868651 100644
--- a/test/replication/gh-5446-qsync-eval-quorum.result
+++ b/test/replication/gh-5446-qsync-eval-quorum.result
@@ -88,6 +88,9 @@ s = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = s:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 -- Only one master node -> 1/2 + 1 = 1
 s:insert{1} -- should pass
diff --git a/test/replication/gh-5446-qsync-eval-quorum.test.lua b/test/replication/gh-5446-qsync-eval-quorum.test.lua
index 6b9e324ed..d8fe6cccf 100644
--- a/test/replication/gh-5446-qsync-eval-quorum.test.lua
+++ b/test/replication/gh-5446-qsync-eval-quorum.test.lua
@@ -37,6 +37,7 @@ end
 -- Create a sync space we will operate on
 s = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = s:create_index('pk')
+box.ctl.promote()
 
 -- Only one master node -> 1/2 + 1 = 1
 s:insert{1} -- should pass
diff --git a/test/replication/gh-5566-final-join-synchro.result b/test/replication/gh-5566-final-join-synchro.result
index a09882ba6..e43e03b1c 100644
--- a/test/replication/gh-5566-final-join-synchro.result
+++ b/test/replication/gh-5566-final-join-synchro.result
@@ -12,6 +12,9 @@ _ = box.schema.space.create('sync', {is_sync=true})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 box.schema.user.grant('guest', 'replication')
  | ---
diff --git a/test/replication/gh-5566-final-join-synchro.test.lua b/test/replication/gh-5566-final-join-synchro.test.lua
index 2db2c742f..0f22a0321 100644
--- a/test/replication/gh-5566-final-join-synchro.test.lua
+++ b/test/replication/gh-5566-final-join-synchro.test.lua
@@ -5,6 +5,7 @@ test_run = require('test_run').new()
 --
 _ = box.schema.space.create('sync', {is_sync=true})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 
 box.schema.user.grant('guest', 'replication')
 box.schema.user.grant('guest', 'write', 'space', 'sync')
diff --git a/test/replication/gh-5874-qsync-txn-recovery.result b/test/replication/gh-5874-qsync-txn-recovery.result
index 73f903ca7..85eb89e04 100644
--- a/test/replication/gh-5874-qsync-txn-recovery.result
+++ b/test/replication/gh-5874-qsync-txn-recovery.result
@@ -31,6 +31,9 @@ sync = box.schema.create_space('sync', {is_sync = true, engine = engine})
 _ = sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 -- The transaction fails, but is written to the log anyway.
 box.begin() async:insert{1} sync:insert{1} box.commit()
diff --git a/test/replication/gh-5874-qsync-txn-recovery.test.lua b/test/replication/gh-5874-qsync-txn-recovery.test.lua
index f35eb68de..cf9753851 100644
--- a/test/replication/gh-5874-qsync-txn-recovery.test.lua
+++ b/test/replication/gh-5874-qsync-txn-recovery.test.lua
@@ -12,6 +12,7 @@ async = box.schema.create_space('async', {engine = engine})
 _ = async:create_index('pk')
 sync = box.schema.create_space('sync', {is_sync = true, engine = engine})
 _ = sync:create_index('pk')
+box.ctl.promote()
 
 -- The transaction fails, but is written to the log anyway.
 box.begin() async:insert{1} sync:insert{1} box.commit()
diff --git a/test/replication/gh-6057-qsync-confirm-async-no-wal.result b/test/replication/gh-6057-qsync-confirm-async-no-wal.result
index 23c77729b..4e4fc7576 100644
--- a/test/replication/gh-6057-qsync-confirm-async-no-wal.result
+++ b/test/replication/gh-6057-qsync-confirm-async-no-wal.result
@@ -40,6 +40,10 @@ _ = s2:create_index('pk')
  | ---
  | ...
 
+box.ctl.promote()
+ | ---
+ | ...
+
 errinj = box.error.injection
  | ---
  | ...
diff --git a/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua b/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
index a11ddc042..c2d9d290b 100644
--- a/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
+++ b/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
@@ -21,6 +21,8 @@ _ = s:create_index('pk')
 s2 = box.schema.create_space('test2')
 _ = s2:create_index('pk')
 
+box.ctl.promote()
+
 errinj = box.error.injection
 
 function create_hanging_async_after_confirm(sync_key, async_key1, async_key2)   \
diff --git a/test/replication/hang_on_synchro_fail.result b/test/replication/hang_on_synchro_fail.result
index 9f6fac00b..b73406368 100644
--- a/test/replication/hang_on_synchro_fail.result
+++ b/test/replication/hang_on_synchro_fail.result
@@ -19,6 +19,9 @@ _ = box.schema.space.create('sync', {is_sync=true})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 old_synchro_quorum = box.cfg.replication_synchro_quorum
  | ---
diff --git a/test/replication/hang_on_synchro_fail.test.lua b/test/replication/hang_on_synchro_fail.test.lua
index 6c3b09fab..549ae17f6 100644
--- a/test/replication/hang_on_synchro_fail.test.lua
+++ b/test/replication/hang_on_synchro_fail.test.lua
@@ -8,6 +8,7 @@ box.schema.user.grant('guest', 'replication')
 
 _ = box.schema.space.create('sync', {is_sync=true})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 
 old_synchro_quorum = box.cfg.replication_synchro_quorum
 box.cfg{replication_synchro_quorum=3}
diff --git a/test/replication/qsync_advanced.result b/test/replication/qsync_advanced.result
index 94b19b1f2..509e5c2a7 100644
--- a/test/replication/qsync_advanced.result
+++ b/test/replication/qsync_advanced.result
@@ -72,6 +72,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 -- Testcase body.
 box.space.sync:insert{1} -- success
  | ---
@@ -468,6 +471,9 @@ box.space.sync:select{} -- 1
 box.cfg{read_only=false} -- promote replica to master
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 test_run:switch('default')
  | ---
  | - true
@@ -508,6 +514,9 @@ test_run:switch('default')
 box.cfg{read_only=false}
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 test_run:switch('replica')
  | ---
  | - true
diff --git a/test/replication/qsync_advanced.test.lua b/test/replication/qsync_advanced.test.lua
index 058ece602..5bc54794b 100644
--- a/test/replication/qsync_advanced.test.lua
+++ b/test/replication/qsync_advanced.test.lua
@@ -30,6 +30,7 @@ test_run:switch('default')
 box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 -- Testcase body.
 box.space.sync:insert{1} -- success
 test_run:cmd('switch replica')
@@ -170,6 +171,7 @@ box.space.sync:select{} -- 1
 test_run:switch('replica')
 box.space.sync:select{} -- 1
 box.cfg{read_only=false} -- promote replica to master
+box.ctl.promote()
 test_run:switch('default')
 box.cfg{read_only=true} -- demote master to replica
 test_run:switch('replica')
@@ -181,6 +183,7 @@ box.space.sync:select{} -- 1, 2
 -- Revert cluster configuration.
 test_run:switch('default')
 box.cfg{read_only=false}
+box.ctl.promote()
 test_run:switch('replica')
 box.cfg{read_only=true}
 -- Testcase cleanup.
diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
index 7e711ba13..08474269e 100644
--- a/test/replication/qsync_basic.result
+++ b/test/replication/qsync_basic.result
@@ -14,6 +14,9 @@ s1.is_sync
 pk = s1:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 box.begin() s1:insert({1}) s1:insert({2}) box.commit()
  | ---
  | ...
@@ -637,67 +640,6 @@ box.space.sync:count()
  | - 0
  | ...
 
---
--- gh-5445: NOPs bypass the limbo for the sake of vclock bumps from foreign
--- instances, but also works for local rows.
---
-test_run:switch('default')
- | ---
- | - true
- | ...
-box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
- | ---
- | ...
-f = fiber.create(function() box.space.sync:replace{1} end)
- | ---
- | ...
-test_run:wait_lsn('replica', 'default')
- | ---
- | ...
-
-test_run:switch('replica')
- | ---
- | - true
- | ...
-function skip_row() return nil end
- | ---
- | ...
-old_lsn = box.info.lsn
- | ---
- | ...
-_ = box.space.sync:before_replace(skip_row)
- | ---
- | ...
-box.space.sync:replace{2}
- | ---
- | ...
-box.space.sync:before_replace(nil, skip_row)
- | ---
- | ...
-assert(box.space.sync:get{2} == nil)
- | ---
- | - true
- | ...
-assert(box.space.sync:get{1} ~= nil)
- | ---
- | - true
- | ...
-
-test_run:switch('default')
- | ---
- | - true
- | ...
-box.cfg{replication_synchro_quorum = 2}
- | ---
- | ...
-test_run:wait_cond(function() return f:status() == 'dead' end)
- | ---
- | - true
- | ...
-box.space.sync:truncate()
- | ---
- | ...
-
 --
 -- gh-5191: test box.info.synchro interface. For
 -- this sake we stop the replica and initiate data
diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
index 75c9b222b..6a49e2b01 100644
--- a/test/replication/qsync_basic.test.lua
+++ b/test/replication/qsync_basic.test.lua
@@ -6,6 +6,7 @@
 s1 = box.schema.create_space('test1', {is_sync = true})
 s1.is_sync
 pk = s1:create_index('pk')
+box.ctl.promote()
 box.begin() s1:insert({1}) s1:insert({2}) box.commit()
 s1:select{}
 
@@ -248,29 +249,6 @@ for i = 1, 100 do box.space.sync:delete{i} end
 test_run:cmd('switch replica')
 box.space.sync:count()
 
---
--- gh-5445: NOPs bypass the limbo for the sake of vclock bumps from foreign
--- instances, but also works for local rows.
---
-test_run:switch('default')
-box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
-f = fiber.create(function() box.space.sync:replace{1} end)
-test_run:wait_lsn('replica', 'default')
-
-test_run:switch('replica')
-function skip_row() return nil end
-old_lsn = box.info.lsn
-_ = box.space.sync:before_replace(skip_row)
-box.space.sync:replace{2}
-box.space.sync:before_replace(nil, skip_row)
-assert(box.space.sync:get{2} == nil)
-assert(box.space.sync:get{1} ~= nil)
-
-test_run:switch('default')
-box.cfg{replication_synchro_quorum = 2}
-test_run:wait_cond(function() return f:status() == 'dead' end)
-box.space.sync:truncate()
-
 --
 -- gh-5191: test box.info.synchro interface. For
 -- this sake we stop the replica and initiate data
diff --git a/test/replication/qsync_errinj.result b/test/replication/qsync_errinj.result
index 635bcf939..837802847 100644
--- a/test/replication/qsync_errinj.result
+++ b/test/replication/qsync_errinj.result
@@ -35,6 +35,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 --
 -- gh-5100: slow ACK sending shouldn't stun replica for the
diff --git a/test/replication/qsync_errinj.test.lua b/test/replication/qsync_errinj.test.lua
index 6a9fd3e1a..556c897c4 100644
--- a/test/replication/qsync_errinj.test.lua
+++ b/test/replication/qsync_errinj.test.lua
@@ -12,6 +12,7 @@ test_run:cmd('start server replica with wait=True, wait_load=True')
 
 _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 
 --
 -- gh-5100: slow ACK sending shouldn't stun replica for the
diff --git a/test/replication/qsync_snapshots.result b/test/replication/qsync_snapshots.result
index cafdd63c8..f6b86ed70 100644
--- a/test/replication/qsync_snapshots.result
+++ b/test/replication/qsync_snapshots.result
@@ -57,6 +57,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 -- Testcase body.
 box.space.sync:insert{1}
  | ---
diff --git a/test/replication/qsync_snapshots.test.lua b/test/replication/qsync_snapshots.test.lua
index 590610974..e5f70e46a 100644
--- a/test/replication/qsync_snapshots.test.lua
+++ b/test/replication/qsync_snapshots.test.lua
@@ -23,6 +23,7 @@ test_run:switch('default')
 box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 -- Testcase body.
 box.space.sync:insert{1}
 box.space.sync:select{} -- 1
diff --git a/test/replication/qsync_with_anon.result b/test/replication/qsync_with_anon.result
index 6a2952a32..5ec99c1ef 100644
--- a/test/replication/qsync_with_anon.result
+++ b/test/replication/qsync_with_anon.result
@@ -57,6 +57,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 -- Testcase body.
 test_run:switch('default')
  | ---
diff --git a/test/replication/qsync_with_anon.test.lua b/test/replication/qsync_with_anon.test.lua
index d7ecaa107..28e08697e 100644
--- a/test/replication/qsync_with_anon.test.lua
+++ b/test/replication/qsync_with_anon.test.lua
@@ -22,6 +22,7 @@ test_run:switch('default')
 box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 -- Testcase body.
 test_run:switch('default')
 box.space.sync:insert{1} -- success
-- 
2.30.1 (Apple Git-130)
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 2/7] replication: forbid implicit limbo owner transition
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 2/7] replication: forbid implicit limbo owner transition Serge Petrenko via Tarantool-patches
@ 2021-06-15 20:55   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  2021-06-18 22:49   ` Vladislav Shpilevoy via Tarantool-patches
  1 sibling, 1 reply; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-15 20:55 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Good job on the patch!
See 4 comments below.
On 10.06.2021 15:32, Serge Petrenko via Tarantool-patches wrote:
> Forbid limbo ownership transition without an explicit promote.
> Make it so that synchronous transactions may be committed only after it
> is claimed by some instance via a PROMOTE request.
> 
> Make everyone but the limbo owner read-only even when the limbo is
> empty.
> 
> Part-of #6034
> 
> @TarantoolBot document
> Title: synchronous replication changes
> 
> `box.info.synchro.queue` receives a new field: `owner`. It's a replica
> id of the instance owning the synchronous transaction limbo.
> 
> Once some instance owns the limbo, every other instance becomes
> read-only. When the limbo is unclaimed, e.g.
> `box.info.synchro.queue.owner` is `0`, everyone may be writeable, but
> cannot create synchronous transactions.
> 
> In order to claim or re-claim the limbo, you have to issue
> `box.ctl.promote()` on the instance you wish to promote.
> 
> When elections are enabled, the instance issues `box.ctl.promote()`
> automatically once it wins the elections.
1. It might be not a good idea to mention the limbo in the public
documentation. It is not described there anyhow, and raises the
question "what is limbo?".
Maybe better replace "limbo" with "queue" everywhere in the
public texts such as doc requests, changelogs, and error
messages.
> diff --git a/src/box/errcode.h b/src/box/errcode.h
> index d93820e96..e75f54a01 100644
> --- a/src/box/errcode.h
> +++ b/src/box/errcode.h
> @@ -277,6 +277,7 @@ struct errcode_record {
>  	/*222 */_(ER_QUORUM_WAIT,		"Couldn't wait for quorum %d: %s") \
>  	/*223 */_(ER_INTERFERING_PROMOTE,	"Instance with replica id %u was promoted first") \
>  	/*224 */_(ER_RAFT_DISABLED,		"Elections were turned off while running box.ctl.promote()")\
> +	/*225 */_(ER_LIMBO_UNCLAIMED,		"Synchronous transaction limbo doesn't belong to any instance")\
2. The same as above - lets not mention limbo in the public
space. Might be ER_SYNC_UNCLAIMED, or ER_SYNC_QUEUE_UNCLAIMED,
or a similar option. And in the error message:
	transaction limbo -> transaction queue
> diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
> index 75c9b222b..6a49e2b01 100644
> --- a/test/replication/qsync_basic.test.lua
> +++ b/test/replication/qsync_basic.test.lua
> @@ -248,29 +249,6 @@ for i = 1, 100 do box.space.sync:delete{i} end
>  test_run:cmd('switch replica')
>  box.space.sync:count()
>  
> ---
> --- gh-5445: NOPs bypass the limbo for the sake of vclock bumps from foreign
> --- instances, but also works for local rows.
3. Please, try to keep this test somehow. You can try to perform
these nops locally on the default node. Otherwise is_fully_nop in
txn_journal_entry_new() remains untested.
4. When I tried to run the tests, I got a crash:
[010] replication/gh-5195-qsync-replica-write.test.l> memtx           
[010] 
[010] [Instance "replica" killed by signal: 6 (SIGABRT)]
[010] 
[010] Last 15 lines of Tarantool Log file [Instance "replica"][/Users/gerold/Work/Repositories/tarantool/test/var/010_replication/replica.log]:
[010] 2021-06-15 22:13:04.821 [40345] main/112/applier/unix/:/Users/gerold/Work/Repositories/tarantool/test/var/010_replication/master.socket-iproto I> RAFT: message {term: 1, state: follower} from 1
[010] 2021-06-15 22:13:04.824 [40345] main/113/applierw/unix/:/Users/gerold/Work/Repositories/tarantool/test/var/010_replication/master.socket-iproto C> leaving orphan mode
[010] 2021-06-15 22:13:04.824 [40345] main/103/replica I> replica set sync complete
[010] 2021-06-15 22:13:04.824 [40345] main/103/replica C> leaving orphan mode
[010] 2021-06-15 22:13:04.825 [40345] main/103/replica I> set 'log_level' configuration option to 5
[010] 2021-06-15 22:13:04.825 [40345] main/107/checkpoint_daemon I> scheduled next checkpoint for Tue Jun 15 23:20:29 2021
[010] 2021-06-15 22:13:04.825 [40345] main/103/replica I> set 'replication_timeout' configuration option to 0.1
[010] 2021-06-15 22:13:04.825 [40345] main/103/replica I> set 'memtx_memory' configuration option to 107374182
[010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'replication_sync_timeout' configuration option to 100
[010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'listen' configuration option to "\/Users\/gerold\/Work\/Repositories\/tarantool\/test\/var\/010_replication\/replica.socket-iproto"
[010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'replication' configuration option to ["\/Users\/gerold\/Work\/Repositories\/tarantool\/test\/var\/010_replication\/master.socket-iproto"]
[010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'log_format' configuration option to "plain"
[010] 2021-06-15 22:13:04.827 [40345] main C> entering the event loop
[010] 2021-06-15 22:13:04.903 [40345] main/128/console/unix/: I> set 'replication_synchro_timeout' configuration option to 0.001
[010] Assertion failed: (txn_limbo_is_empty(&txn_limbo)), function box_promote, file /Users/gerold/Work/Repositories/tarantool/src/box/box.cc, line 1678.
And a wrong result:
[005] replication/gh-5298-qsync-recovery-snap.test.l> vinyl           [ fail ]
[005] 
[005] Test failed! Result content mismatch:
[005] --- replication/gh-5298-qsync-recovery-snap.result	Tue Jun 15 21:48:57 2021
[005] +++ var/rejects/replication/gh-5298-qsync-recovery-snap.reject	Tue Jun 15 22:13:44 2021
[005] @@ -46,7 +46,7 @@
[005]  -- Could hang if the limbo would incorrectly handle the snapshot end.
[005]  box.space.sync:replace{11}
[005]   | ---
[005] - | - [11]
[005] + | - error: Synchronous transaction limbo doesn't belong to any instance
[005]   | ...
[005]  
[005]  old_synchro_quorum = box.cfg.replication_synchro_quorum
[005] @@ -64,7 +64,7 @@
[005]   | ...
[005]  box.space.sync:replace{12}
[005]   | ---
[005] - | - error: Quorum collection for a synchronous transaction is timed out
[005] + | - error: Synchronous transaction limbo doesn't belong to any instance
[005]   | ...
[005]  
[005]  box.cfg{                                                                        \
[005] @@ -75,18 +75,16 @@
[005]   | ...
[005]  box.space.sync:replace{13}
[005]   | ---
[005] - | - [13]
[005] + | - error: Synchronous transaction limbo doesn't belong to any instance
[005]   | ...
[005]  box.space.sync:get({11})
[005]   | ---
[005] - | - [11]
[005]   | ...
[005]  box.space.sync:get({12})
[005]   | ---
[005]   | ...
[005]  box.space.sync:get({13})
[005]   | ---
[005] - | - [13]
[005]   | ...
[005]  
[005]  box.cfg{                                                                        \
[005] 
Then I stopped.
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 2/7] replication: forbid implicit limbo owner transition
  2021-06-15 20:55   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-17 21:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
15.06.2021 23:55, Vladislav Shpilevoy пишет:
> Good job on the patch!
>
> See 4 comments below.
Thanks for the review!
> On 10.06.2021 15:32, Serge Petrenko via Tarantool-patches wrote:
>> Forbid limbo ownership transition without an explicit promote.
>> Make it so that synchronous transactions may be committed only after it
>> is claimed by some instance via a PROMOTE request.
>>
>> Make everyone but the limbo owner read-only even when the limbo is
>> empty.
>>
>> Part-of #6034
>>
>> @TarantoolBot document
>> Title: synchronous replication changes
>>
>> `box.info.synchro.queue` receives a new field: `owner`. It's a replica
>> id of the instance owning the synchronous transaction limbo.
>>
>> Once some instance owns the limbo, every other instance becomes
>> read-only. When the limbo is unclaimed, e.g.
>> `box.info.synchro.queue.owner` is `0`, everyone may be writeable, but
>> cannot create synchronous transactions.
>>
>> In order to claim or re-claim the limbo, you have to issue
>> `box.ctl.promote()` on the instance you wish to promote.
>>
>> When elections are enabled, the instance issues `box.ctl.promote()`
>> automatically once it wins the elections.
> 1. It might be not a good idea to mention the limbo in the public
> documentation. It is not described there anyhow, and raises the
> question "what is limbo?".
>
> Maybe better replace "limbo" with "queue" everywhere in the
> public texts such as doc requests, changelogs, and error
> messages.
Ok. Let it be 'synchronous transaction queue', or simply 'the queue' then.
>> diff --git a/src/box/errcode.h b/src/box/errcode.h
>> index d93820e96..e75f54a01 100644
>> --- a/src/box/errcode.h
>> +++ b/src/box/errcode.h
>> @@ -277,6 +277,7 @@ struct errcode_record {
>>   	/*222 */_(ER_QUORUM_WAIT,		"Couldn't wait for quorum %d: %s") \
>>   	/*223 */_(ER_INTERFERING_PROMOTE,	"Instance with replica id %u was promoted first") \
>>   	/*224 */_(ER_RAFT_DISABLED,		"Elections were turned off while running box.ctl.promote()")\
>> +	/*225 */_(ER_LIMBO_UNCLAIMED,		"Synchronous transaction limbo doesn't belong to any instance")\
> 2. The same as above - lets not mention limbo in the public
> space. Might be ER_SYNC_UNCLAIMED, or ER_SYNC_QUEUE_UNCLAIMED,
> or a similar option. And in the error message:
>
> 	transaction limbo -> transaction queue
No problem.
>> diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
>> index 75c9b222b..6a49e2b01 100644
>> --- a/test/replication/qsync_basic.test.lua
>> +++ b/test/replication/qsync_basic.test.lua
>> @@ -248,29 +249,6 @@ for i = 1, 100 do box.space.sync:delete{i} end
>>   test_run:cmd('switch replica')
>>   box.space.sync:count()
>>   
>> ---
>> --- gh-5445: NOPs bypass the limbo for the sake of vclock bumps from foreign
>> --- instances, but also works for local rows.
> 3. Please, try to keep this test somehow. You can try to perform
> these nops locally on the default node. Otherwise is_fully_nop in
> txn_journal_entry_new() remains untested.
Ok, I've reworked the test, so please find an incremental diff below.
> 4. When I tried to run the tests, I got a crash:
I believe you've run the tests on this exact commit?
If yes, then all the crashes are fixed in the following commits.
This is quite confusing, indeed, so let me reorder everything
so that fixes come earlier than the commit which has some issues to fix.
> [010] replication/gh-5195-qsync-replica-write.test.l> memtx
> [010]
> [010] [Instance "replica" killed by signal: 6 (SIGABRT)]
> [010]
> [010] Last 15 lines of Tarantool Log file [Instance "replica"][/Users/gerold/Work/Repositories/tarantool/test/var/010_replication/replica.log]:
> [010] 2021-06-15 22:13:04.821 [40345] main/112/applier/unix/:/Users/gerold/Work/Repositories/tarantool/test/var/010_replication/master.socket-iproto I> RAFT: message {term: 1, state: follower} from 1
> [010] 2021-06-15 22:13:04.824 [40345] main/113/applierw/unix/:/Users/gerold/Work/Repositories/tarantool/test/var/010_replication/master.socket-iproto C> leaving orphan mode
> [010] 2021-06-15 22:13:04.824 [40345] main/103/replica I> replica set sync complete
> [010] 2021-06-15 22:13:04.824 [40345] main/103/replica C> leaving orphan mode
> [010] 2021-06-15 22:13:04.825 [40345] main/103/replica I> set 'log_level' configuration option to 5
> [010] 2021-06-15 22:13:04.825 [40345] main/107/checkpoint_daemon I> scheduled next checkpoint for Tue Jun 15 23:20:29 2021
> [010] 2021-06-15 22:13:04.825 [40345] main/103/replica I> set 'replication_timeout' configuration option to 0.1
> [010] 2021-06-15 22:13:04.825 [40345] main/103/replica I> set 'memtx_memory' configuration option to 107374182
> [010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'replication_sync_timeout' configuration option to 100
> [010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'listen' configuration option to "\/Users\/gerold\/Work\/Repositories\/tarantool\/test\/var\/010_replication\/replica.socket-iproto"
> [010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'replication' configuration option to ["\/Users\/gerold\/Work\/Repositories\/tarantool\/test\/var\/010_replication\/master.socket-iproto"]
> [010] 2021-06-15 22:13:04.826 [40345] main/103/replica I> set 'log_format' configuration option to "plain"
> [010] 2021-06-15 22:13:04.827 [40345] main C> entering the event loop
> [010] 2021-06-15 22:13:04.903 [40345] main/128/console/unix/: I> set 'replication_synchro_timeout' configuration option to 0.001
> [010] Assertion failed: (txn_limbo_is_empty(&txn_limbo)), function box_promote, file /Users/gerold/Work/Repositories/tarantool/src/box/box.cc, line 1678.
>
> And a wrong result:
>
> [005] replication/gh-5298-qsync-recovery-snap.test.l> vinyl           [ fail ]
> [005]
> [005] Test failed! Result content mismatch:
> [005] --- replication/gh-5298-qsync-recovery-snap.result	Tue Jun 15 21:48:57 2021
> [005] +++ var/rejects/replication/gh-5298-qsync-recovery-snap.reject	Tue Jun 15 22:13:44 2021
> [005] @@ -46,7 +46,7 @@
> [005]  -- Could hang if the limbo would incorrectly handle the snapshot end.
> [005]  box.space.sync:replace{11}
> [005]   | ---
> [005] - | - [11]
> [005] + | - error: Synchronous transaction limbo doesn't belong to any instance
> [005]   | ...
> [005]
> [005]  old_synchro_quorum = box.cfg.replication_synchro_quorum
> [005] @@ -64,7 +64,7 @@
> [005]   | ...
> [005]  box.space.sync:replace{12}
> [005]   | ---
> [005] - | - error: Quorum collection for a synchronous transaction is timed out
> [005] + | - error: Synchronous transaction limbo doesn't belong to any instance
> [005]   | ...
> [005]
> [005]  box.cfg{                                                                        \
> [005] @@ -75,18 +75,16 @@
> [005]   | ...
> [005]  box.space.sync:replace{13}
> [005]   | ---
> [005] - | - [13]
> [005] + | - error: Synchronous transaction limbo doesn't belong to any instance
> [005]   | ...
> [005]  box.space.sync:get({11})
> [005]   | ---
> [005] - | - [11]
> [005]   | ...
> [005]  box.space.sync:get({12})
> [005]   | ---
> [005]   | ...
> [005]  box.space.sync:get({13})
> [005]   | ---
> [005] - | - [13]
> [005]   | ...
> [005]
> [005]  box.cfg{                                                                        \
> [005]
>
> Then I stopped.
=========================================================
diff --git a/src/box/errcode.h b/src/box/errcode.h
index 3c42ae502..e3943c01d 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -278,7 +278,7 @@ struct errcode_record {
      /*223 */_(ER_INTERFERING_PROMOTE,    "Instance with replica id %u 
was promoted first") \
      /*224 */_(ER_RAFT_DISABLED,        "Elections were turned off 
while running box.ctl.promote()")\
      /*225 */_(ER_TXN_ROLLBACK,        "Transaction was rolled back") \
-    /*226 */_(ER_LIMBO_UNCLAIMED,        "Synchronous transaction limbo 
doesn't belong to any instance")\
+    /*226 */_(ER_SYNCHRO_QUEUE_UNCLAIMED,    "The synchronous 
transaction queue doesn't belong to any instance")\
  /*
   * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 7f1c5c017..ce701ead4 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -97,7 +97,7 @@ txn_limbo_append(struct txn_limbo *limbo, uint32_t id, 
struct txn *txn)
      if (id == 0)
          id = instance_id;
      if  (limbo->owner_id == REPLICA_ID_NIL) {
-        diag_set(ClientError, ER_LIMBO_UNCLAIMED);
+        diag_set(ClientError, ER_SYNCHRO_QUEUE_UNCLAIMED);
          return NULL;
      } else if (limbo->owner_id != id) {
          diag_set(ClientError, ER_UNCOMMITTED_FOREIGN_SYNC_TXNS,
diff --git a/test/replication/qsync_basic.result 
b/test/replication/qsync_basic.result
index 08474269e..f1689f875 100644
--- a/test/replication/qsync_basic.result
+++ b/test/replication/qsync_basic.result
@@ -640,6 +640,56 @@ box.space.sync:count()
   | - 0
   | ...
+--
+-- gh-5445: NOPs bypass the limbo for the sake of vclock bumps from foreign
+-- instances, but also works for local rows.
+--
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.ctl.demote()
+ | ---
+ | ...
+box.space.sync:replace{1}
+ | ---
+ | - error: Synchronous transaction limbo doesn't belong to any instance
+ | ...
+function skip_row() return nil end
+ | ---
+ | ...
+old_lsn = box.info.lsn
+ | ---
+ | ...
+_ = box.space.sync:before_replace(skip_row)
+ | ---
+ | ...
+box.space.sync:replace{2}
+ | ---
+ | ...
+box.space.sync:before_replace(nil, skip_row)
+ | ---
+ | ...
+assert(box.space.sync:get{1} == nil)
+ | ---
+ | - true
+ | ...
+assert(box.space.sync:get{2} == nil)
+ | ---
+ | - true
+ | ...
+assert(box.info.lsn == old_lsn + 1)
+ | ---
+ | - true
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+
+box.space.sync:truncate()
+ | ---
+ | ...
+
  --
  -- gh-5191: test box.info.synchro interface. For
  -- this sake we stop the replica and initiate data
diff --git a/test/replication/qsync_basic.test.lua 
b/test/replication/qsync_basic.test.lua
index 6a49e2b01..6f0810afc 100644
--- a/test/replication/qsync_basic.test.lua
+++ b/test/replication/qsync_basic.test.lua
@@ -249,6 +249,25 @@ for i = 1, 100 do box.space.sync:delete{i} end
  test_run:cmd('switch replica')
  box.space.sync:count()
+--
+-- gh-5445: NOPs bypass the limbo for the sake of vclock bumps from foreign
+-- instances, but also works for local rows.
+--
+test_run:switch('default')
+box.ctl.demote()
+box.space.sync:replace{1}
+function skip_row() return nil end
+old_lsn = box.info.lsn
+_ = box.space.sync:before_replace(skip_row)
+box.space.sync:replace{2}
+box.space.sync:before_replace(nil, skip_row)
+assert(box.space.sync:get{1} == nil)
+assert(box.space.sync:get{2} == nil)
+assert(box.info.lsn == old_lsn + 1)
+box.ctl.promote()
+
+box.space.sync:truncate()
+
  --
  -- gh-5191: test box.info.synchro interface. For
  -- this sake we stop the replica and initiate data
=========================================================
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread
 
- * Re: [Tarantool-patches] [PATCH 2/7] replication: forbid implicit limbo owner transition
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 2/7] replication: forbid implicit limbo owner transition Serge Petrenko via Tarantool-patches
  2021-06-15 20:55   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-18 22:49   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-21 10:13     ` Serge Petrenko via Tarantool-patches
  1 sibling, 1 reply; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-18 22:49 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Nice fixes!
See 3 comments below.
> diff --git a/src/box/errcode.h b/src/box/errcode.h
> index 49aec4bf6..e3943c01d 100644
> --- a/src/box/errcode.h
> +++ b/src/box/errcode.h
> @@ -278,6 +278,7 @@ struct errcode_record {
>  	/*223 */_(ER_INTERFERING_PROMOTE,	"Instance with replica id %u was promoted first") \
>  	/*224 */_(ER_RAFT_DISABLED,		"Elections were turned off while running box.ctl.promote()")\
>  	/*225 */_(ER_TXN_ROLLBACK,		"Transaction was rolled back") \
> +	/*226 */_(ER_SYNCHRO_QUEUE_UNCLAIMED,	"The synchronous transaction queue doesn't belong to any instance")\
1. Maybe SYNCHRO -> SYNC? The rationality is that we
already have a few ER_SYNC_* about synchro replication.
> diff --git a/test/box/error.result b/test/box/error.result
> index 062a90399..574521a14 100644
> --- a/test/box/error.result
> +++ b/test/box/error.result
> @@ -444,6 +444,7 @@ t;
>   |   223: box.error.INTERFERING_PROMOTE
>   |   224: box.error.RAFT_DISABLED
>   |   225: box.error.TXN_ROLLBACK
> + |   226: box.error.LIMBO_UNCLAIMED
2. Forgot to update the result file? There is no LIMBO_UNCLAIMED error
code. It has a new name.
> diff --git a/test/replication/gh-5440-qsync-ro.test.lua b/test/replication/gh-5440-qsync-ro.test.lua
> deleted file mode 100644
> index d63ec9c1e..000000000
> --- a/test/replication/gh-5440-qsync-ro.test.lua
3. Please, drop it from suite.cfg too.
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 2/7] replication: forbid implicit limbo owner transition
  2021-06-18 22:49   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-21 10:13     ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-21 10:13 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
19.06.2021 01:49, Vladislav Shpilevoy пишет:
> Nice fixes!
>
> See 3 comments below.
>
>> diff --git a/src/box/errcode.h b/src/box/errcode.h
>> index 49aec4bf6..e3943c01d 100644
>> --- a/src/box/errcode.h
>> +++ b/src/box/errcode.h
>> @@ -278,6 +278,7 @@ struct errcode_record {
>>   	/*223 */_(ER_INTERFERING_PROMOTE,	"Instance with replica id %u was promoted first") \
>>   	/*224 */_(ER_RAFT_DISABLED,		"Elections were turned off while running box.ctl.promote()")\
>>   	/*225 */_(ER_TXN_ROLLBACK,		"Transaction was rolled back") \
>> +	/*226 */_(ER_SYNCHRO_QUEUE_UNCLAIMED,	"The synchronous transaction queue doesn't belong to any instance")\
> 1. Maybe SYNCHRO -> SYNC? The rationality is that we
> already have a few ER_SYNC_* about synchro replication.
Ok.
>> diff --git a/test/box/error.result b/test/box/error.result
>> index 062a90399..574521a14 100644
>> --- a/test/box/error.result
>> +++ b/test/box/error.result
>> @@ -444,6 +444,7 @@ t;
>>    |   223: box.error.INTERFERING_PROMOTE
>>    |   224: box.error.RAFT_DISABLED
>>    |   225: box.error.TXN_ROLLBACK
>> + |   226: box.error.LIMBO_UNCLAIMED
> 2. Forgot to update the result file? There is no LIMBO_UNCLAIMED error
> code. It has a new name.
Yep, sorry.
>> diff --git a/test/replication/gh-5440-qsync-ro.test.lua b/test/replication/gh-5440-qsync-ro.test.lua
>> deleted file mode 100644
>> index d63ec9c1e..000000000
>> --- a/test/replication/gh-5440-qsync-ro.test.lua
> 3. Please, drop it from suite.cfg too.
>
Sure.
Here's the diff:
========================
diff --git a/src/box/errcode.h b/src/box/errcode.h
index e3943c01d..bb3cf5eb9 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -278,7 +278,7 @@ struct errcode_record {
         /*223 */_(ER_INTERFERING_PROMOTE,       "Instance with replica 
id %u was promoted first") \
         /*224 */_(ER_RAFT_DISABLED,             "Elections were turned 
off while running box.ctl.promote()")\
         /*225 */_(ER_TXN_ROLLBACK,              "Transaction was rolled 
back") \
-       /*226 */_(ER_SYNCHRO_QUEUE_UNCLAIMED,   "The synchronous 
transaction queue doesn't belong to any instance")\
+       /*226 */_(ER_SYNC_QUEUE_UNCLAIMED,      "The synchronous 
transaction queue doesn't belong to any instance")\
  /*
   * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index b6e844f32..1dc093400 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -97,7 +97,7 @@ txn_limbo_append(struct txn_limbo *limbo, uint32_t id, 
struct txn *txn)
         if (id == 0)
                 id = instance_id;
         if  (limbo->owner_id == REPLICA_ID_NIL) {
-               diag_set(ClientError, ER_SYNCHRO_QUEUE_UNCLAIMED);
+               diag_set(ClientError, ER_SYNC_QUEUE_UNCLAIMED);
                 return NULL;
         } else if (limbo->owner_id != id) {
                 diag_set(ClientError, ER_UNCOMMITTED_FOREIGN_SYNC_TXNS,
diff --git a/test/box/error.result b/test/box/error.result
index 574521a14..dfe593dc2 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -444,7 +444,7 @@ t;
   |   223: box.error.INTERFERING_PROMOTE
   |   224: box.error.RAFT_DISABLED
   |   225: box.error.TXN_ROLLBACK
- |   226: box.error.LIMBO_UNCLAIMED
+ |   226: box.error.SYNC_QUEUE_UNCLAIMED
   | ...
  test_run:cmd("setopt delimiter ''");
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 4fc6643e4..c0854f7fa 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -41,7 +41,6 @@
      "gh-4739-vclock-assert.test.lua": {},
      "gh-4730-applier-rollback.test.lua": {},
      "gh-4928-tx-boundaries.test.lua": {},
-    "gh-5440-qsync-ro.test.lua": {},
      "gh-5435-qsync-clear-synchro-queue-commit-all.test.lua": {},
      "gh-5536-wal-limit.test.lua": {},
      "gh-5566-final-join-synchro.test.lua": {},
========================
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread
 
 
- * [Tarantool-patches] [PATCH 3/7] txn_limbo: fix promote term filtering
  2021-06-10 13:32 [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 1/7] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 2/7] replication: forbid implicit limbo owner transition Serge Petrenko via Tarantool-patches
@ 2021-06-10 13:32 ` Serge Petrenko via Tarantool-patches
  2021-06-15 20:57   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 4/7] txn_limbo: persist the latest effective promote in snapshot Serge Petrenko via Tarantool-patches
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-10 13:32 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
txn_limbo_process() used to filter out promote requests whose term was
equal to the greatest term seen. This wasn't correct for PROMOTE entries
with term 1.
Such entries appear after box.ctl.promote() is issued on an instance
with disabled elections. Every PROMOTE entry from such an instance has
term 1, but should still be applied. Fix this in the patch.
Also, when an outdated PROMOTE entry with term smaller than already
applied from some replica arrived, it wasn't filtered at all. Such a
situation shouldn't be possible, but fix it as well.
Part-of #6034
---
 src/box/txn_limbo.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 53233add3..33a6e5548 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -645,17 +645,21 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 {
 	uint64_t term = req->term;
 	uint32_t origin = req->origin_id;
-	if (txn_limbo_replica_term(limbo, origin) < term) {
+	if (txn_limbo_replica_term(limbo, origin) < term)
 		vclock_follow(&limbo->promote_term_map, origin, term);
-		if (term > limbo->promote_greatest_term) {
-			limbo->promote_greatest_term = term;
-		} else if (req->type == IPROTO_PROMOTE) {
-			/*
-			 * PROMOTE for outdated term. Ignore.
-			 */
-			return;
-		}
+
+	if (term > limbo->promote_greatest_term) {
+		limbo->promote_greatest_term = term;
+	} else if (req->type == IPROTO_PROMOTE &&
+		   limbo->promote_greatest_term > 1) {
+		/* PROMOTE for outdated term. Ignore. */
+		say_info("RAFT: ignoring PROMOTE request from instance "
+			 "id %"PRIu32" for term %"PRIu64". Greatest term seen "
+			 "before (%"PRIu64") is bigger.", origin, term,
+			 limbo->promote_greatest_term);
+		return;
 	}
+
 	int64_t lsn = req->lsn;
 	if (req->replica_id == REPLICA_ID_NIL) {
 		/*
-- 
2.30.1 (Apple Git-130)
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 3/7] txn_limbo: fix promote term filtering
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 3/7] txn_limbo: fix promote term filtering Serge Petrenko via Tarantool-patches
@ 2021-06-15 20:57   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-15 20:57 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the patch!
On 10.06.2021 15:32, Serge Petrenko via Tarantool-patches wrote:
> txn_limbo_process() used to filter out promote requests whose term was
> equal to the greatest term seen. This wasn't correct for PROMOTE entries
> with term 1.
> 
> Such entries appear after box.ctl.promote() is issued on an instance
> with disabled elections. Every PROMOTE entry from such an instance has
> term 1, but should still be applied. Fix this in the patch.
Didn't we agree that PROMOTE should bump the term always? I see no purpose
for the PROMOTE which never bumps the term except for the tests, but for
such occasion it would be better to have internal.make_leader() or something
which bypasses everything. Although the best option would to bump the term
always.
I see you did something in the last commit about that, but the part
about `limbo->promote_greatest_term > 1` still remains. It looks very
illegal to point out some terms as "special".
^ permalink raw reply	[flat|nested] 34+ messages in thread 
- * Re: [Tarantool-patches] [PATCH 3/7] txn_limbo: fix promote term filtering
  2021-06-15 20:57   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  2021-06-18 22:49       ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-17 21:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
15.06.2021 23:57, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
> On 10.06.2021 15:32, Serge Petrenko via Tarantool-patches wrote:
>> txn_limbo_process() used to filter out promote requests whose term was
>> equal to the greatest term seen. This wasn't correct for PROMOTE entries
>> with term 1.
>>
>> Such entries appear after box.ctl.promote() is issued on an instance
>> with disabled elections. Every PROMOTE entry from such an instance has
>> term 1, but should still be applied. Fix this in the patch.
> Didn't we agree that PROMOTE should bump the term always? I see no purpose
> for the PROMOTE which never bumps the term except for the tests, but for
> such occasion it would be better to have internal.make_leader() or something
> which bypasses everything. Although the best option would to bump the term
> always.
>
> I see you did something in the last commit about that, but the part
> about `limbo->promote_greatest_term > 1` still remains. It looks very
> illegal to point out some terms as "special".
What about old instances?
They may issue multiple promotes for the same term when elections are off.
Previous behaviour (wrong, because we forgot to filter out too old 
promotes at all)
made the instance apply all such promotes.
I thought we should allow multiple promotes, at least for the default 
term (1), which
happen only when elections are turned off and have never been turned on.
Otherwise, having a new instance together with the old one in the same 
cluster,
you'll have one of them apply all the promotes for one term, and have 
one dataset,
and the other ignore all but the first promote, and have possibly 
different data.
I did fix the issue in the last commit. All the new instances will issue 
promotes with
increasing term each time, and the term will be >= 2.
So all the promotes with term == 1 surely will be from the old instances.
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread 
- * Re: [Tarantool-patches] [PATCH 3/7] txn_limbo: fix promote term filtering
  2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
@ 2021-06-18 22:49       ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-21  8:55         ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-18 22:49 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Hi! Thanks for the fixes!
> 15.06.2021 23:57, Vladislav Shpilevoy пишет:
>> Thanks for the patch!
>>
>> On 10.06.2021 15:32, Serge Petrenko via Tarantool-patches wrote:
>>> txn_limbo_process() used to filter out promote requests whose term was
>>> equal to the greatest term seen. This wasn't correct for PROMOTE entries
>>> with term 1.
>>>
>>> Such entries appear after box.ctl.promote() is issued on an instance
>>> with disabled elections. Every PROMOTE entry from such an instance has
>>> term 1, but should still be applied. Fix this in the patch.
>> Didn't we agree that PROMOTE should bump the term always? I see no purpose
>> for the PROMOTE which never bumps the term except for the tests, but for
>> such occasion it would be better to have internal.make_leader() or something
>> which bypasses everything. Although the best option would to bump the term
>> always.
>>
>> I see you did something in the last commit about that, but the part
>> about `limbo->promote_greatest_term > 1` still remains. It looks very
>> illegal to point out some terms as "special".
> 
> What about old instances?
> They may issue multiple promotes for the same term when elections are off.
What if they issue multiple promotes for the same term > 1? They could have
elections enabled, then the term was bumped a few times, then elections were
turned off again. Next promotes will use the current term which is > 1,
won't they?
> Previous behaviour (wrong, because we forgot to filter out too old promotes at all)
> made the instance apply all such promotes.
> 
> I thought we should allow multiple promotes, at least for the default term (1), which
> happen only when elections are turned off and have never been turned on.
> 
> Otherwise, having a new instance together with the old one in the same cluster,
> you'll have one of them apply all the promotes for one term, and have one dataset,
> and the other ignore all but the first promote, and have possibly different data.> I did fix the issue in the last commit. All the new instances will issue promotes with
> increasing term each time, and the term will be >= 2.
> 
> So all the promotes with term == 1 surely will be from the old instances.
^ permalink raw reply	[flat|nested] 34+ messages in thread 
- * Re: [Tarantool-patches] [PATCH 3/7] txn_limbo: fix promote term filtering
  2021-06-18 22:49       ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-21  8:55         ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-21  8:55 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
19.06.2021 01:49, Vladislav Shpilevoy пишет:
> Hi! Thanks for the fixes!
>
>> 15.06.2021 23:57, Vladislav Shpilevoy пишет:
>>> Thanks for the patch!
>>>
>>> On 10.06.2021 15:32, Serge Petrenko via Tarantool-patches wrote:
>>>> txn_limbo_process() used to filter out promote requests whose term was
>>>> equal to the greatest term seen. This wasn't correct for PROMOTE entries
>>>> with term 1.
>>>>
>>>> Such entries appear after box.ctl.promote() is issued on an instance
>>>> with disabled elections. Every PROMOTE entry from such an instance has
>>>> term 1, but should still be applied. Fix this in the patch.
>>> Didn't we agree that PROMOTE should bump the term always? I see no purpose
>>> for the PROMOTE which never bumps the term except for the tests, but for
>>> such occasion it would be better to have internal.make_leader() or something
>>> which bypasses everything. Although the best option would to bump the term
>>> always.
>>>
>>> I see you did something in the last commit about that, but the part
>>> about `limbo->promote_greatest_term > 1` still remains. It looks very
>>> illegal to point out some terms as "special".
>> What about old instances?
>> They may issue multiple promotes for the same term when elections are off.
> What if they issue multiple promotes for the same term > 1? They could have
> elections enabled, then the term was bumped a few times, then elections were
> turned off again. Next promotes will use the current term which is > 1,
> won't they?
Yes, they will.
But such promotes must be filtered out because we can't  be sure if they 
were
issued with or without elections.
I still think we need some compatibility with old instances. At least 
for the case
when no elections were used at all.
>> Previous behaviour (wrong, because we forgot to filter out too old promotes at all)
>> made the instance apply all such promotes.
>>
>> I thought we should allow multiple promotes, at least for the default term (1), which
>> happen only when elections are turned off and have never been turned on.
>>
>> Otherwise, having a new instance together with the old one in the same cluster,
>> you'll have one of them apply all the promotes for one term, and have one dataset,
>> and the other ignore all but the first promote, and have possibly different data.> I did fix the issue in the last commit. All the new instances will issue promotes with
>> increasing term each time, and the term will be >= 2.
>>
>> So all the promotes with term == 1 surely will be from the old instances.
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread 
 
 
 
 
- * [Tarantool-patches] [PATCH 4/7] txn_limbo: persist the latest effective promote in snapshot
  2021-06-10 13:32 [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (2 preceding siblings ...)
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 3/7] txn_limbo: fix promote term filtering Serge Petrenko via Tarantool-patches
@ 2021-06-10 13:32 ` Serge Petrenko via Tarantool-patches
  2021-06-15 20:59   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 5/7] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-10 13:32 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Previously PROMOTE entries, just like CONFIRM and ROLLBACK were only
stored in WALs. This is because snapshots consist solely of confirmed
transactions, so there's nothing to CONFIRM or ROLLBACK.
PROMOTE has gained additional meaning recently: it pins limbo ownership
to a specific instance, rendering everyone else read-only. So now
PROMOTE information must be stored in snapshots as well.
Save the latest limbo state (owner id and latest confirmed lsn) to the
snapshot as a PROMOTE request.
Part-of #6034
---
 src/box/memtx_engine.c | 34 ++++++++++++++++++++++++++++++++++
 src/box/txn_limbo.c    |  9 +++++++++
 src/box/txn_limbo.h    |  6 ++++++
 3 files changed, 49 insertions(+)
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 6c4982b9f..3713d39d0 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -50,6 +50,7 @@
 #include "schema.h"
 #include "gc.h"
 #include "raft.h"
+#include "txn_limbo.h"
 
 /* sync snapshot every 16MB */
 #define SNAP_SYNC_INTERVAL	(1 << 24)
@@ -225,6 +226,22 @@ memtx_engine_recover_raft(const struct xrow_header *row)
 	return 0;
 }
 
+static int
+memtx_engine_recover_promote(const struct xrow_header *row)
+{
+	assert(row->type == IPROTO_PROMOTE);
+	struct synchro_request req;
+	if (xrow_decode_synchro(row, &req) != 0)
+		return -1;
+	/*
+	 * Origin id cannot be deduced from row.replica_id in a checkpoint,
+	 * because all it's rows have a zero replica_id.
+	 */
+	req.origin_id = req.replica_id;
+	txn_limbo_process(&txn_limbo, &req);
+	return 0;
+}
+
 static int
 memtx_engine_recover_snapshot_row(struct memtx_engine *memtx,
 				  struct xrow_header *row, int *is_space_system)
@@ -233,6 +250,8 @@ memtx_engine_recover_snapshot_row(struct memtx_engine *memtx,
 	if (row->type != IPROTO_INSERT) {
 		if (row->type == IPROTO_RAFT)
 			return memtx_engine_recover_raft(row);
+		if (row->type == IPROTO_PROMOTE)
+			return memtx_engine_recover_promote(row);
 		diag_set(ClientError, ER_UNKNOWN_REQUEST_TYPE,
 			 (uint32_t) row->type);
 		return -1;
@@ -542,6 +561,7 @@ struct checkpoint {
 	struct vclock vclock;
 	struct xdir dir;
 	struct raft_request raft;
+	struct synchro_request promote;
 	/**
 	 * Do nothing, just touch the snapshot file - the
 	 * checkpoint already exists.
@@ -567,6 +587,7 @@ checkpoint_new(const char *snap_dirname, uint64_t snap_io_rate_limit)
 	xdir_create(&ckpt->dir, snap_dirname, SNAP, &INSTANCE_UUID, &opts);
 	vclock_create(&ckpt->vclock);
 	box_raft_checkpoint_local(&ckpt->raft);
+	txn_limbo_checkpoint(&txn_limbo, &ckpt->promote);
 	ckpt->touch = false;
 	return ckpt;
 }
@@ -655,6 +676,17 @@ finish:
 	return rc;
 }
 
+static int
+checkpoint_write_promote(struct xlog *l, const struct synchro_request *req)
+{
+	struct xrow_header row;
+	char body[XROW_SYNCHRO_BODY_LEN_MAX];
+	xrow_encode_synchro(&row, body, req);
+	if (checkpoint_write_row(l, &row) != 0)
+		return -1;
+	return 0;
+}
+
 static int
 checkpoint_f(va_list ap)
 {
@@ -692,6 +724,8 @@ checkpoint_f(va_list ap)
 	}
 	if (checkpoint_write_raft(&snap, &ckpt->raft) != 0)
 		goto fail;
+	if (checkpoint_write_promote(&snap, &ckpt->promote) != 0)
+		goto fail;
 	if (xlog_flush(&snap) < 0)
 		goto fail;
 
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 33a6e5548..40c4a41bb 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -295,6 +295,15 @@ complete:
 	return 0;
 }
 
+void
+txn_limbo_checkpoint(struct txn_limbo *limbo, struct synchro_request *req)
+{
+	req->type = IPROTO_PROMOTE;
+	req->replica_id = limbo->owner_id;
+	req->lsn = limbo->confirmed_lsn;
+	req->term = limbo->promote_greatest_term;
+}
+
 static void
 txn_limbo_write_synchro(struct txn_limbo *limbo, uint16_t type, int64_t lsn,
 			uint64_t term)
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index e409ac657..84a19bb40 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -311,6 +311,12 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
 int
 txn_limbo_wait_confirm(struct txn_limbo *limbo);
 
+/**
+ * Persist limbo state to a given synchro request.
+ */
+void
+txn_limbo_checkpoint(struct txn_limbo *limbo, struct synchro_request *req);
+
 /**
  * Write a PROMOTE request, which has the same effect as CONFIRM(@a lsn) and
  * ROLLBACK(@a lsn + 1) combined.
-- 
2.30.1 (Apple Git-130)
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 4/7] txn_limbo: persist the latest effective promote in snapshot
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 4/7] txn_limbo: persist the latest effective promote in snapshot Serge Petrenko via Tarantool-patches
@ 2021-06-15 20:59   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-15 20:59 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Danke schön für der Patch!
See 3 comments below.
> diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
> index 6c4982b9f..3713d39d0 100644
> --- a/src/box/memtx_engine.c
> +++ b/src/box/memtx_engine.c
> @@ -225,6 +226,22 @@ memtx_engine_recover_raft(const struct xrow_header *row)
>  	return 0;
>  }
>  
> +static int
> +memtx_engine_recover_promote(const struct xrow_header *row)
1. Maybe call it recover_synchro instead of promote? Even in the
body PROMOTE is not mentioned anyhow except for the assertion.
What if we ever add more members to the synchro state?
> +{
> +	assert(row->type == IPROTO_PROMOTE);
> +	struct synchro_request req;
> +	if (xrow_decode_synchro(row, &req) != 0)
> +		return -1;
> +	/*
> +	 * Origin id cannot be deduced from row.replica_id in a checkpoint,
> +	 * because all it's rows have a zero replica_id.
> +	 */
> +	req.origin_id = req.replica_id;
> +	txn_limbo_process(&txn_limbo, &req);
> +	return 0;
> +}
> @@ -655,6 +676,17 @@ finish:
>  	return rc;
>  }
>  
> +static int
> +checkpoint_write_promote(struct xlog *l, const struct synchro_request *req)
> +{
> +	struct xrow_header row;
> +	char body[XROW_SYNCHRO_BODY_LEN_MAX];
> +	xrow_encode_synchro(&row, body, req);
> +	if (checkpoint_write_row(l, &row) != 0)
> +		return -1;
> +	return 0;
2. Could also make 'return checkpoint_write_row(...);'
without branching + that might activate the tail call
optimization.
> diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
> index e409ac657..84a19bb40 100644
> --- a/src/box/txn_limbo.h
> +++ b/src/box/txn_limbo.h
> @@ -311,6 +311,12 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
>  int
>  txn_limbo_wait_confirm(struct txn_limbo *limbo);
>  
> +/**
> + * Persist limbo state to a given synchro request.
> + */
> +void
> +txn_limbo_checkpoint(struct txn_limbo *limbo, struct synchro_request *req);
3. Lets make the limbo a const pointer. It does not change, does it?
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 4/7] txn_limbo: persist the latest effective promote in snapshot
  2021-06-15 20:59   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-17 21:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
15.06.2021 23:59, Vladislav Shpilevoy пишет:
> Danke schön für der Patch!
>
> See 3 comments below.
>
>> diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
>> index 6c4982b9f..3713d39d0 100644
>> --- a/src/box/memtx_engine.c
>> +++ b/src/box/memtx_engine.c
>> @@ -225,6 +226,22 @@ memtx_engine_recover_raft(const struct xrow_header *row)
>>   	return 0;
>>   }
>>   
>> +static int
>> +memtx_engine_recover_promote(const struct xrow_header *row)
> 1. Maybe call it recover_synchro instead of promote? Even in the
> body PROMOTE is not mentioned anyhow except for the assertion.
> What if we ever add more members to the synchro state?
Ok.
>> +{
>> +	assert(row->type == IPROTO_PROMOTE);
>> +	struct synchro_request req;
>> +	if (xrow_decode_synchro(row, &req) != 0)
>> +		return -1;
>> +	/*
>> +	 * Origin id cannot be deduced from row.replica_id in a checkpoint,
>> +	 * because all it's rows have a zero replica_id.
>> +	 */
>> +	req.origin_id = req.replica_id;
>> +	txn_limbo_process(&txn_limbo, &req);
>> +	return 0;
>> +}
>> @@ -655,6 +676,17 @@ finish:
>>   	return rc;
>>   }
>>   
>> +static int
>> +checkpoint_write_promote(struct xlog *l, const struct synchro_request *req)
>> +{
>> +	struct xrow_header row;
>> +	char body[XROW_SYNCHRO_BODY_LEN_MAX];
>> +	xrow_encode_synchro(&row, body, req);
>> +	if (checkpoint_write_row(l, &row) != 0)
>> +		return -1;
>> +	return 0;
> 2. Could also make 'return checkpoint_write_row(...);'
> without branching + that might activate the tail call
> optimization.
Sure, thanks for noticing!
>> diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
>> index e409ac657..84a19bb40 100644
>> --- a/src/box/txn_limbo.h
>> +++ b/src/box/txn_limbo.h
>> @@ -311,6 +311,12 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
>>   int
>>   txn_limbo_wait_confirm(struct txn_limbo *limbo);
>>   
>> +/**
>> + * Persist limbo state to a given synchro request.
>> + */
>> +void
>> +txn_limbo_checkpoint(struct txn_limbo *limbo, struct synchro_request *req);
> 3. Lets make the limbo a const pointer. It does not change, does it?
Ok. Please, find the incremental diff below.
==================================
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index a2ba17855..a2cfb2615 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -227,7 +227,7 @@ memtx_engine_recover_raft(const struct xrow_header *row)
  }
  static int
-memtx_engine_recover_promote(const struct xrow_header *row)
+memtx_engine_recover_synchro(const struct xrow_header *row)
  {
      assert(row->type == IPROTO_PROMOTE);
      struct synchro_request req;
@@ -235,7 +235,7 @@ memtx_engine_recover_promote(const struct 
xrow_header *row)
          return -1;
      /*
       * Origin id cannot be deduced from row.replica_id in a checkpoint,
-     * because all it's rows have a zero replica_id.
+     * because all its rows have a zero replica_id.
       */
      req.origin_id = req.replica_id;
      txn_limbo_process(&txn_limbo, &req);
@@ -251,7 +251,7 @@ memtx_engine_recover_snapshot_row(struct 
memtx_engine *memtx,
          if (row->type == IPROTO_RAFT)
              return memtx_engine_recover_raft(row);
          if (row->type == IPROTO_PROMOTE)
-            return memtx_engine_recover_promote(row);
+            return memtx_engine_recover_synchro(row);
          diag_set(ClientError, ER_UNKNOWN_REQUEST_TYPE,
               (uint32_t) row->type);
          return -1;
@@ -561,7 +561,7 @@ struct checkpoint {
      struct vclock vclock;
      struct xdir dir;
      struct raft_request raft;
-    struct synchro_request promote;
+    struct synchro_request synchro_state;
      /**
       * Do nothing, just touch the snapshot file - the
       * checkpoint already exists.
@@ -587,7 +587,7 @@ checkpoint_new(const char *snap_dirname, uint64_t 
snap_io_rate_limit)
      xdir_create(&ckpt->dir, snap_dirname, SNAP, &INSTANCE_UUID, &opts);
      vclock_create(&ckpt->vclock);
      box_raft_checkpoint_local(&ckpt->raft);
-    txn_limbo_checkpoint(&txn_limbo, &ckpt->promote);
+    txn_limbo_checkpoint(&txn_limbo, &ckpt->synchro_state);
      ckpt->touch = false;
      return ckpt;
  }
@@ -677,14 +677,12 @@ finish:
  }
  static int
-checkpoint_write_promote(struct xlog *l, const struct synchro_request *req)
+checkpoint_write_synchro(struct xlog *l, const struct synchro_request *req)
  {
      struct xrow_header row;
      char body[XROW_SYNCHRO_BODY_LEN_MAX];
      xrow_encode_synchro(&row, body, req);
-    if (checkpoint_write_row(l, &row) != 0)
-        return -1;
-    return 0;
+    return checkpoint_write_row(l, &row);
  }
  static int
@@ -724,7 +722,7 @@ checkpoint_f(va_list ap)
      }
      if (checkpoint_write_raft(&snap, &ckpt->raft) != 0)
          goto fail;
-    if (checkpoint_write_promote(&snap, &ckpt->promote) != 0)
+    if (checkpoint_write_synchro(&snap, &ckpt->synchro_state) != 0)
          goto fail;
      if (xlog_flush(&snap) < 0)
          goto fail;
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 57f105ee3..c4ff5cc34 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -297,7 +297,8 @@ complete:
  }
  void
-txn_limbo_checkpoint(struct txn_limbo *limbo, struct synchro_request *req)
+txn_limbo_checkpoint(const struct txn_limbo *limbo,
+             struct synchro_request *req)
  {
      req->type = IPROTO_PROMOTE;
      req->replica_id = limbo->owner_id;
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 84a19bb40..f5079d577 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -315,7 +315,8 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo);
   * Persist limbo state to a given synchro request.
   */
  void
-txn_limbo_checkpoint(struct txn_limbo *limbo, struct synchro_request *req);
+txn_limbo_checkpoint(const struct txn_limbo *limbo,
+             struct synchro_request *req);
  /**
   * Write a PROMOTE request, which has the same effect as CONFIRM(@a 
lsn) and
==================================
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread
 
 
- * [Tarantool-patches] [PATCH 5/7] replication: send latest effective promote in initial join
  2021-06-10 13:32 [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (3 preceding siblings ...)
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 4/7] txn_limbo: persist the latest effective promote in snapshot Serge Petrenko via Tarantool-patches
@ 2021-06-10 13:32 ` Serge Petrenko via Tarantool-patches
  2021-06-15 21:00   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 6/7] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-10 13:32 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
A joining instance may never receive the latest PROMOTE request, which
is the only source of information about the limbo owner. Send out the
latest limbo state (e.g. the latest applied PROMOTE request) together
with the initial join snapshot.
Part-of #6034
---
 src/box/applier.cc |  5 +++++
 src/box/relay.cc   | 10 ++++++++++
 2 files changed, 15 insertions(+)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 33181fdbf..dde045996 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -458,6 +458,11 @@ applier_wait_snapshot(struct applier *applier)
 				xrow_decode_vclock_xc(&row, &replicaset.vclock);
 			}
 			break; /* end of stream */
+		} else if (iproto_type_is_promote_request(row.type)) {
+			struct synchro_request req;
+			if (xrow_decode_synchro(&row, &req) != 0)
+				diag_raise();
+			txn_limbo_process(&txn_limbo, &req);
 		} else if (iproto_type_is_error(row.type)) {
 			xrow_decode_error_xc(&row);  /* rethrow error */
 		} else {
diff --git a/src/box/relay.cc b/src/box/relay.cc
index b1571b361..2781b2c22 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -399,12 +399,22 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock)
 	if (txn_limbo_wait_confirm(&txn_limbo) != 0)
 		diag_raise();
 
+	struct synchro_request req;
+	txn_limbo_checkpoint(&txn_limbo, &req);
+
 	/* Respond to the JOIN request with the current vclock. */
 	struct xrow_header row;
 	xrow_encode_vclock_xc(&row, vclock);
 	row.sync = sync;
 	coio_write_xrow(&relay->io, &row);
 
+	/* Send out the latest limbo state. */
+	char body[XROW_SYNCHRO_BODY_LEN_MAX];
+	xrow_encode_synchro(&row, body, &req);
+	row.replica_id = req.replica_id;
+	row.sync = sync;
+	coio_write_xrow(&relay->io, &row);
+
 	/* Send read view to the replica. */
 	engine_join_xc(&ctx, &relay->stream);
 }
-- 
2.30.1 (Apple Git-130)
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 5/7] replication: send latest effective promote in initial join
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 5/7] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
@ 2021-06-15 21:00   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-15 21:00 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for working on this!
Hm. The patch makes me think why don't we send the Raft
checkpoint on join too?
Otherwise it might happen that a replica joined, didn't
get the most actual Raft term yet, then the leader died,
and the replica would start elections with term 1. Even if
the latest term was 1000.
Nothing critical, Raft will probably work, but it could
be an "optimization"? Also it would be consistent with the
libmo state - send all the snapshot data on join.
Btw, the limbo state contains a term. And it means
that after join, but before subscribe, the limbo's term
is bigger than raft's term. Even though in the comments
of the limbo we say:
	* It means the limbo's term might be smaller than the raft term, while
	* there are ongoing elections, or the leader is already known and this
	* instance hasn't read its PROMOTE request yet. During other times the
	* limbo and raft are in sync and the terms are the same.
which means the limbo term should be always <= raft term.
Can this break something? Is it possible to make a test confirming
that we can't send the limbo state before the raft state?
^ permalink raw reply	[flat|nested] 34+ messages in thread 
- * Re: [Tarantool-patches] [PATCH 5/7] replication: send latest effective promote in initial join
  2021-06-15 21:00   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  2021-06-18 22:52       ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-17 21:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
16.06.2021 00:00, Vladislav Shpilevoy пишет:
> Thanks for working on this!
>
> Hm. The patch makes me think why don't we send the Raft
> checkpoint on join too?
>
> Otherwise it might happen that a replica joined, didn't
> get the most actual Raft term yet, then the leader died,
> and the replica would start elections with term 1. Even if
> the latest term was 1000.
>
> Nothing critical, Raft will probably work, but it could
> be an "optimization"? Also it would be consistent with the
> libmo state - send all the snapshot data on join.
I tried to implement such a patch, but faced the following problem:
Unfortunately, we don't have information about replica's version
during join, so I can only send raft state based on term > 1.
Also, while writing a commit message I understood that this patch
doesn't help much. Even if a node joins, but doesn't subscribe to the
leader, it will still subscribe to someone else and receive the latest
Raft state.
After all, our doc states full-mesh is required for Raft to work, so we'll
have someone else to subscribe to and receive Raft state from for sure.
The patch's small and simple but I think it's not needed.
I've made a tiny amendment to this commit though, please find the diff 
below.
> Btw, the limbo state contains a term. And it means
> that after join, but before subscribe, the limbo's term
> is bigger than raft's term. Even though in the comments
> of the limbo we say:
>
> 	* It means the limbo's term might be smaller than the raft term, while
> 	* there are ongoing elections, or the leader is already known and this
> 	* instance hasn't read its PROMOTE request yet. During other times the
> 	* limbo and raft are in sync and the terms are the same.
>
> which means the limbo term should be always <= raft term.
> Can this break something? Is it possible to make a test confirming
> that we can't send the limbo state before the raft state?
I don't think this could break anything.
Limbo and Raft terms are not that interconnected now.
================================
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 289dea0f3..e05b53d5d 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -408,12 +408,17 @@ relay_initial_join(int fd, uint64_t sync, struct 
vclock *vclock)
         row.sync = sync;
         coio_write_xrow(&relay->io, &row);
-       /* Send out the latest limbo state. */
-       char body[XROW_SYNCHRO_BODY_LEN_MAX];
-       xrow_encode_synchro(&row, body, &req);
-       row.replica_id = req.replica_id;
-       row.sync = sync;
-       coio_write_xrow(&relay->io, &row);
+       /*
+        * Send out the latest limbo state. Don't do that when limbo is 
unused,
+        * let the old instances join without trouble.
+        */
+       if (req.replica_id != REPLICA_ID_NIL) {
+               char body[XROW_SYNCHRO_BODY_LEN_MAX];
+               xrow_encode_synchro(&row, body, &req);
+               row.replica_id = req.replica_id;
+               row.sync = sync;
+               coio_write_xrow(&relay->io, &row);
+       }
         /* Send read view to the replica. */
         engine_join_xc(&ctx, &relay->stream);
================================
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 5/7] replication: send latest effective promote in initial join
  2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
@ 2021-06-18 22:52       ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-21 10:12         ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-18 22:52 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Good job on the fixes!
On 17.06.2021 23:00, Serge Petrenko via Tarantool-patches wrote:
> 
> 
> 16.06.2021 00:00, Vladislav Shpilevoy пишет:
>> Thanks for working on this!
>>
>> Hm. The patch makes me think why don't we send the Raft
>> checkpoint on join too?
>>
>> Otherwise it might happen that a replica joined, didn't
>> get the most actual Raft term yet, then the leader died,
>> and the replica would start elections with term 1. Even if
>> the latest term was 1000.
>>
>> Nothing critical, Raft will probably work, but it could
>> be an "optimization"? Also it would be consistent with the
>> libmo state - send all the snapshot data on join.
> I tried to implement such a patch, but faced the following problem:
> 
> Unfortunately, we don't have information about replica's version
> during join, so I can only send raft state based on term > 1.
But you do send the limbo state. Won't it break the old versions
like 1.10?
I tried to replicate 1.10 from the new version and got UNKNOWN_REQUEST_TYPE
error.
2021-06-19 00:50:49.781 [85883] main/105/applier/ applier.cc:345 E> ER_UNKNOWN_REQUEST_TYPE: Unknown request type 31
2021-06-19 00:50:49.781 [85883] main/101/interactive applier.cc:345 E> ER_UNKNOWN_REQUEST_TYPE: Unknown request type 31
2021-06-19 00:50:49.781 [85883] main/101/interactive F> can't initialize storage: Unknown request type 31
And that is fine. Because the replication does not need to be
compatible "forward". You don't need to be able to replicate a
new instance into an old instance.
> Also, while writing a commit message I understood that this patch
> doesn't help much. Even if a node joins, but doesn't subscribe to the
> leader, it will still subscribe to someone else and receive the latest
> Raft state.
And what if it does not? What if there are just 2 nodes?
> After all, our doc states full-mesh is required for Raft to work, so we'll
> have someone else to subscribe to and receive Raft state from for sure.
> The patch's small and simple but I think it's not needed.
> 
> I've made a tiny amendment to this commit though, please find the diff below.
The term limbo affects the filtering. Assume you sent the limbo
state without raft to 2 new replicas. They now has a big limbo term,
say 1000, and raft term 1. Now the master dies.
The replicas start new election. One of them wins with some small
raft term (let it be 10). It starts doing synchro txns. The other node's
applier will turn them all to NOPs, because txn_limbo_is_replica_outdated()
will return true. Even though it is not outdated.
Will it happen?
^ permalink raw reply	[flat|nested] 34+ messages in thread 
- * Re: [Tarantool-patches] [PATCH 5/7] replication: send latest effective promote in initial join
  2021-06-18 22:52       ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-21 10:12         ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-21 10:12 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
19.06.2021 01:52, Vladislav Shpilevoy пишет:
> Good job on the fixes!
Thanks for the review!
>
> On 17.06.2021 23:00, Serge Petrenko via Tarantool-patches wrote:
>>
>> 16.06.2021 00:00, Vladislav Shpilevoy пишет:
>>> Thanks for working on this!
>>>
>>> Hm. The patch makes me think why don't we send the Raft
>>> checkpoint on join too?
>>>
>>> Otherwise it might happen that a replica joined, didn't
>>> get the most actual Raft term yet, then the leader died,
>>> and the replica would start elections with term 1. Even if
>>> the latest term was 1000.
>>>
>>> Nothing critical, Raft will probably work, but it could
>>> be an "optimization"? Also it would be consistent with the
>>> libmo state - send all the snapshot data on join.
>> I tried to implement such a patch, but faced the following problem:
>>
>> Unfortunately, we don't have information about replica's version
>> during join, so I can only send raft state based on term > 1.
> But you do send the limbo state. Won't it break the old versions
> like 1.10?
I only send it if limbo is used. I.e. when limbo->owner_id != 0.
I've just realized, the problem is not only with versions like 1.10. 
It's the same for
versions like 2.8.1, 2.7.2 and so on, which will error with
UNKNOWN_REQUEST_TYPE, when receive limbo or raft state in initial join 
stream.
What can we do about that?
>
> I tried to replicate 1.10 from the new version and got UNKNOWN_REQUEST_TYPE
> error.
>
> 2021-06-19 00:50:49.781 [85883] main/105/applier/ applier.cc:345 E> ER_UNKNOWN_REQUEST_TYPE: Unknown request type 31
> 2021-06-19 00:50:49.781 [85883] main/101/interactive applier.cc:345 E> ER_UNKNOWN_REQUEST_TYPE: Unknown request type 31
> 2021-06-19 00:50:49.781 [85883] main/101/interactive F> can't initialize storage: Unknown request type 31
>
> And that is fine. Because the replication does not need to be
> compatible "forward". You don't need to be able to replicate a
> new instance into an old instance.
>
>> Also, while writing a commit message I understood that this patch
>> doesn't help much. Even if a node joins, but doesn't subscribe to the
>> leader, it will still subscribe to someone else and receive the latest
>> Raft state.
> And what if it does not? What if there are just 2 nodes?
>
>> After all, our doc states full-mesh is required for Raft to work, so we'll
>> have someone else to subscribe to and receive Raft state from for sure.
>> The patch's small and simple but I think it's not needed.
>>
>> I've made a tiny amendment to this commit though, please find the diff below.
> The term limbo affects the filtering. Assume you sent the limbo
> state without raft to 2 new replicas. They now has a big limbo term,
> say 1000, and raft term 1. Now the master dies.
>
> The replicas start new election. One of them wins with some small
> raft term (let it be 10). It starts doing synchro txns. The other node's
> applier will turn them all to NOPs, because txn_limbo_is_replica_outdated()
> will return true. Even though it is not outdated.
>
> Will it happen?
Looks like this may happen, indeed.
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread 
 
 
 
 
- * [Tarantool-patches] [PATCH 6/7] box: introduce `box.ctl.demote`
  2021-06-10 13:32 [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (4 preceding siblings ...)
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 5/7] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
@ 2021-06-10 13:32 ` Serge Petrenko via Tarantool-patches
  2021-06-18 22:52   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 7/7] box: make promote/demote always bump the term Serge Petrenko via Tarantool-patches
  2021-06-15 20:53 ` [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Vladislav Shpilevoy via Tarantool-patches
  7 siblings, 1 reply; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-10 13:32 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Introduce a new journal entry, DEMOTE. The entry has the same meaning as
PROMOTE, with the only difference that it clears limbo ownership instead
of transferring it to the issuer.
Introduce `box.ctl.demote`, which works exactly like `box.ctl.promote`,
but results in writing DEMOTE instead of PROMOTE.
A new request was necessary instead of simply writing PROMOTE(origin_id
= 0), because origin_id is deduced from row.replica_id, which cannot be
0 for replicated rows (it's always equal to instance_id of the row
originator).
Closes #6034
@TarantoolBod document
Title: box.ctl.demote
`box.ctl.demote()` is a new function, which works exactly like
`box.ctl.promote()`, with one exception that it results in the instance
writing DEMOTE request to WAL instead of a PROMOTE request.
A DEMOTE request (DEMOTE = 32) copies PROMOTE behaviour (it clears the
limbo as well), but clears limbo ownership instead of assigning it to a
new instance.
---
 src/box/box.cc                                | 28 +++++++++++---
 src/box/box.h                                 |  3 ++
 src/box/iproto_constants.h                    | 10 +++--
 src/box/lua/ctl.c                             |  9 +++++
 src/box/txn_limbo.c                           | 37 ++++++++++++++++---
 src/box/txn_limbo.h                           |  7 ++++
 test/replication/election_basic.result        |  3 ++
 test/replication/election_basic.test.lua      |  1 +
 test/replication/election_qsync.result        |  3 ++
 test/replication/election_qsync.test.lua      |  1 +
 .../gh-5140-qsync-casc-rollback.result        |  3 ++
 .../gh-5140-qsync-casc-rollback.test.lua      |  1 +
 .../gh-5144-qsync-dup-confirm.result          |  3 ++
 .../gh-5144-qsync-dup-confirm.test.lua        |  1 +
 .../gh-5163-qsync-restart-crash.result        |  3 ++
 .../gh-5163-qsync-restart-crash.test.lua      |  1 +
 .../gh-5167-qsync-rollback-snap.result        |  3 ++
 .../gh-5167-qsync-rollback-snap.test.lua      |  1 +
 .../gh-5195-qsync-replica-write.result        |  3 ++
 .../gh-5195-qsync-replica-write.test.lua      |  1 +
 .../gh-5213-qsync-applier-order-3.result      |  3 ++
 .../gh-5213-qsync-applier-order-3.test.lua    |  1 +
 .../gh-5213-qsync-applier-order.result        |  3 ++
 .../gh-5213-qsync-applier-order.test.lua      |  1 +
 .../replication/gh-5288-qsync-recovery.result |  3 ++
 .../gh-5288-qsync-recovery.test.lua           |  1 +
 .../gh-5298-qsync-recovery-snap.result        |  3 ++
 .../gh-5298-qsync-recovery-snap.test.lua      |  1 +
 .../gh-5426-election-on-off.result            |  3 ++
 .../gh-5426-election-on-off.test.lua          |  1 +
 .../gh-5433-election-restart-recovery.result  |  3 ++
 ...gh-5433-election-restart-recovery.test.lua |  1 +
 .../gh-5446-qsync-eval-quorum.result          |  4 ++
 .../gh-5446-qsync-eval-quorum.test.lua        |  2 +
 .../gh-5506-election-on-off.result            |  3 ++
 .../gh-5506-election-on-off.test.lua          |  1 +
 .../gh-5566-final-join-synchro.result         |  3 ++
 .../gh-5566-final-join-synchro.test.lua       |  1 +
 .../gh-5874-qsync-txn-recovery.result         |  3 ++
 .../gh-5874-qsync-txn-recovery.test.lua       |  1 +
 .../gh-6032-promote-wal-write.result          |  3 ++
 .../gh-6032-promote-wal-write.test.lua        |  1 +
 .../gh-6057-qsync-confirm-async-no-wal.result |  3 ++
 ...h-6057-qsync-confirm-async-no-wal.test.lua |  1 +
 test/replication/hang_on_synchro_fail.result  |  3 ++
 .../replication/hang_on_synchro_fail.test.lua |  1 +
 test/replication/qsync_advanced.result        |  3 ++
 test/replication/qsync_advanced.test.lua      |  1 +
 test/replication/qsync_basic.result           |  3 ++
 test/replication/qsync_basic.test.lua         |  1 +
 test/replication/qsync_errinj.result          |  3 ++
 test/replication/qsync_errinj.test.lua        |  1 +
 test/replication/qsync_snapshots.result       |  3 ++
 test/replication/qsync_snapshots.test.lua     |  1 +
 test/replication/qsync_with_anon.result       |  3 ++
 test/replication/qsync_with_anon.test.lua     |  1 +
 56 files changed, 183 insertions(+), 13 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index b9f3fab32..d3908b4cd 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1503,8 +1503,8 @@ box_wait_quorum(uint32_t lead_id, int64_t target_lsn, int quorum,
 	return 0;
 }
 
-int
-box_promote(void)
+static int
+box_clear_synchro_queue(bool demote)
 {
 	/* A guard to block multiple simultaneous function invocations. */
 	static bool in_promote = false;
@@ -1665,10 +1665,16 @@ box_promote(void)
 promote:
 			/* We cannot possibly get here in a volatile state. */
 			assert(box_raft()->volatile_term == box_raft()->term);
-			txn_limbo_write_promote(&txn_limbo, wait_lsn,
-						box_raft()->term);
+			if (demote) {
+				txn_limbo_write_demote(&txn_limbo, wait_lsn,
+						       box_raft()->term);
+			} else {
+				txn_limbo_write_promote(&txn_limbo, wait_lsn,
+							box_raft()->term);
+			}
+			uint16_t type = demote ? IPROTO_DEMOTE : IPROTO_PROMOTE;
 			struct synchro_request req = {
-				.type = IPROTO_PROMOTE,
+				.type = type,
 				.replica_id = former_leader_id,
 				.origin_id = instance_id,
 				.lsn = wait_lsn,
@@ -1681,6 +1687,18 @@ promote:
 	return rc;
 }
 
+int
+box_promote(void)
+{
+	return box_clear_synchro_queue(false);
+}
+
+int
+box_demote(void)
+{
+	return box_clear_synchro_queue(true);
+}
+
 void
 box_listen(void)
 {
diff --git a/src/box/box.h b/src/box/box.h
index 04bdd397d..a05f72bb6 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -276,6 +276,9 @@ typedef struct tuple box_tuple_t;
 int
 box_promote(void);
 
+int
+box_demote(void);
+
 /* box_select is private and used only by FFI */
 API_EXPORT int
 box_select(uint32_t space_id, uint32_t index_id,
diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
index 7362ddaf1..92d87016a 100644
--- a/src/box/iproto_constants.h
+++ b/src/box/iproto_constants.h
@@ -240,6 +240,8 @@ enum iproto_type {
 	IPROTO_RAFT = 30,
 	/** PROMOTE request. */
 	IPROTO_PROMOTE = 31,
+	/** DEMOTE request. */
+	IPROTO_DEMOTE = 32,
 
 	/** A confirmation message for synchronous transactions. */
 	IPROTO_CONFIRM = 40,
@@ -309,6 +311,8 @@ iproto_type_name(uint16_t type)
 		return "RAFT";
 	case IPROTO_PROMOTE:
 		return "PROMOTE";
+	case IPROTO_DEMOTE:
+		return "DEMOTE";
 	case IPROTO_CONFIRM:
 		return "CONFIRM";
 	case IPROTO_ROLLBACK:
@@ -363,14 +367,14 @@ static inline bool
 iproto_type_is_synchro_request(uint16_t type)
 {
 	return type == IPROTO_CONFIRM || type == IPROTO_ROLLBACK ||
-	       type == IPROTO_PROMOTE;
+	       type == IPROTO_PROMOTE || type == IPROTO_DEMOTE;
 }
 
-/** PROMOTE entry (synchronous replication and leader elections). */
+/** PROMOTE/DEMOTE entry (synchronous replication and leader elections). */
 static inline bool
 iproto_type_is_promote_request(uint32_t type)
 {
-       return type == IPROTO_PROMOTE;
+       return type == IPROTO_PROMOTE || type == IPROTO_DEMOTE;
 }
 
 static inline bool
diff --git a/src/box/lua/ctl.c b/src/box/lua/ctl.c
index 368b9ab60..a613c4111 100644
--- a/src/box/lua/ctl.c
+++ b/src/box/lua/ctl.c
@@ -89,6 +89,14 @@ lbox_ctl_promote(struct lua_State *L)
 	return 0;
 }
 
+static int
+lbox_ctl_demote(struct lua_State *L)
+{
+	if (box_demote() != 0)
+		return luaT_error(L);
+	return 0;
+}
+
 static int
 lbox_ctl_is_recovery_finished(struct lua_State *L)
 {
@@ -127,6 +135,7 @@ static const struct luaL_Reg lbox_ctl_lib[] = {
 	{"promote", lbox_ctl_promote},
 	/* An old alias. */
 	{"clear_synchro_queue", lbox_ctl_promote},
+	{"demote", lbox_ctl_demote},
 	{"is_recovery_finished", lbox_ctl_is_recovery_finished},
 	{"set_on_shutdown_timeout", lbox_ctl_set_on_shutdown_timeout},
 	{NULL, NULL}
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 40c4a41bb..f51c32030 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -507,6 +507,29 @@ txn_limbo_read_promote(struct txn_limbo *limbo, uint32_t replica_id,
 	limbo->confirmed_lsn = 0;
 }
 
+void
+txn_limbo_write_demote(struct txn_limbo *limbo, int64_t lsn, uint64_t term)
+{
+	limbo->confirmed_lsn = lsn;
+	limbo->is_in_rollback = true;
+	struct txn_limbo_entry *e = txn_limbo_last_synchro_entry(limbo);
+	assert(e == NULL || e->lsn <= lsn);
+	(void)e;
+	txn_limbo_write_synchro(limbo, IPROTO_DEMOTE, lsn, term);
+	limbo->is_in_rollback = false;
+}
+
+/**
+ * Process a DEMOTE request, which's like PROMOTE, but clears the limbo
+ * ownership.
+ * @sa txn_limbo_read_promote.
+ */
+static void
+txn_limbo_read_demote(struct txn_limbo *limbo, int64_t lsn)
+{
+	return txn_limbo_read_promote(limbo, REPLICA_ID_NIL, lsn);
+}
+
 void
 txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 {
@@ -659,12 +682,13 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 
 	if (term > limbo->promote_greatest_term) {
 		limbo->promote_greatest_term = term;
-	} else if (req->type == IPROTO_PROMOTE &&
+	} else if (iproto_type_is_promote_request(req->type) &&
 		   limbo->promote_greatest_term > 1) {
 		/* PROMOTE for outdated term. Ignore. */
-		say_info("RAFT: ignoring PROMOTE request from instance "
+		say_info("RAFT: ignoring %s request from instance "
 			 "id %"PRIu32" for term %"PRIu64". Greatest term seen "
-			 "before (%"PRIu64") is bigger.", origin, term,
+			 "before (%"PRIu64") is bigger.",
+			 iproto_type_name(req->type), origin, term,
 			 limbo->promote_greatest_term);
 		return;
 	}
@@ -675,7 +699,7 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 		 * The limbo was empty on the instance issuing the request.
 		 * This means this instance must empty its limbo as well.
 		 */
-		assert(lsn == 0 && req->type == IPROTO_PROMOTE);
+		assert(lsn == 0 && iproto_type_is_promote_request(req->type));
 	} else if (req->replica_id != limbo->owner_id) {
 		/*
 		 * Ignore CONFIRM/ROLLBACK messages for a foreign master.
@@ -683,7 +707,7 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 		 * data from an old leader, who has just started and written
 		 * confirm right on synchronous transaction recovery.
 		 */
-		if (req->type != IPROTO_PROMOTE)
+		if (!iproto_type_is_promote_request(req->type))
 			return;
 		/*
 		 * Promote has a bigger term, and tries to steal the limbo. It
@@ -703,6 +727,9 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 	case IPROTO_PROMOTE:
 		txn_limbo_read_promote(limbo, req->origin_id, lsn);
 		break;
+	case IPROTO_DEMOTE:
+		txn_limbo_read_demote(limbo, lsn);
+		break;
 	default:
 		unreachable();
 	}
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 84a19bb40..c876c31fa 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -324,6 +324,13 @@ txn_limbo_checkpoint(struct txn_limbo *limbo, struct synchro_request *req);
 void
 txn_limbo_write_promote(struct txn_limbo *limbo, int64_t lsn, uint64_t term);
 
+/**
+ * Write a DEMOTE request.
+ * It has the same effect as PROMOTE and additionally clears limbo ownership.
+ */
+void
+txn_limbo_write_demote(struct txn_limbo *limbo, int64_t lsn, uint64_t term);
+
 /**
  * Update qsync parameters dynamically.
  */
diff --git a/test/replication/election_basic.result b/test/replication/election_basic.result
index d5320b3ff..a62d32deb 100644
--- a/test/replication/election_basic.result
+++ b/test/replication/election_basic.result
@@ -114,6 +114,9 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 
 --
 -- See if bootstrap with election enabled works.
diff --git a/test/replication/election_basic.test.lua b/test/replication/election_basic.test.lua
index 821f73cea..2143a6000 100644
--- a/test/replication/election_basic.test.lua
+++ b/test/replication/election_basic.test.lua
@@ -43,6 +43,7 @@ box.cfg{
     election_mode = 'off',                                                      \
     election_timeout = old_election_timeout                                     \
 }
+box.ctl.demote()
 
 --
 -- See if bootstrap with election enabled works.
diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result
index c06400b38..2402c8578 100644
--- a/test/replication/election_qsync.result
+++ b/test/replication/election_qsync.result
@@ -165,6 +165,9 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua
index ea6fc4a61..e1aca8351 100644
--- a/test/replication/election_qsync.test.lua
+++ b/test/replication/election_qsync.test.lua
@@ -84,4 +84,5 @@ box.cfg{
     replication = old_replication,                                              \
     replication_synchro_timeout = old_replication_synchro_timeout,              \
 }
+box.ctl.demote()
 box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5140-qsync-casc-rollback.result b/test/replication/gh-5140-qsync-casc-rollback.result
index 0eec3f10a..d3208e1a4 100644
--- a/test/replication/gh-5140-qsync-casc-rollback.result
+++ b/test/replication/gh-5140-qsync-casc-rollback.result
@@ -225,3 +225,6 @@ test_run:cmd('delete server replica')
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5140-qsync-casc-rollback.test.lua b/test/replication/gh-5140-qsync-casc-rollback.test.lua
index d45390aa6..96ddfd260 100644
--- a/test/replication/gh-5140-qsync-casc-rollback.test.lua
+++ b/test/replication/gh-5140-qsync-casc-rollback.test.lua
@@ -104,3 +104,4 @@ test_run:cmd('stop server replica')
 test_run:cmd('delete server replica')
 
 box.schema.user.revoke('guest', 'super')
+box.ctl.demote()
diff --git a/test/replication/gh-5144-qsync-dup-confirm.result b/test/replication/gh-5144-qsync-dup-confirm.result
index 01c7501f1..217e44412 100644
--- a/test/replication/gh-5144-qsync-dup-confirm.result
+++ b/test/replication/gh-5144-qsync-dup-confirm.result
@@ -151,6 +151,9 @@ test_run:cmd('delete server replica2')
  | - true
  | ...
 
+box.ctl.demote()
+ | ---
+ | ...
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
diff --git a/test/replication/gh-5144-qsync-dup-confirm.test.lua b/test/replication/gh-5144-qsync-dup-confirm.test.lua
index 9a295192a..1d6af2c62 100644
--- a/test/replication/gh-5144-qsync-dup-confirm.test.lua
+++ b/test/replication/gh-5144-qsync-dup-confirm.test.lua
@@ -70,4 +70,5 @@ test_run:cmd('delete server replica1')
 test_run:cmd('stop server replica2')
 test_run:cmd('delete server replica2')
 
+box.ctl.demote()
 box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5163-qsync-restart-crash.result b/test/replication/gh-5163-qsync-restart-crash.result
index 8c9d43a9b..1b4d3d9b5 100644
--- a/test/replication/gh-5163-qsync-restart-crash.result
+++ b/test/replication/gh-5163-qsync-restart-crash.result
@@ -33,3 +33,6 @@ box.space.sync:select{}
 box.space.sync:drop()
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5163-qsync-restart-crash.test.lua b/test/replication/gh-5163-qsync-restart-crash.test.lua
index a0c2533c6..c8d54aad2 100644
--- a/test/replication/gh-5163-qsync-restart-crash.test.lua
+++ b/test/replication/gh-5163-qsync-restart-crash.test.lua
@@ -13,3 +13,4 @@ box.space.sync:replace{1}
 test_run:cmd('restart server default')
 box.space.sync:select{}
 box.space.sync:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-5167-qsync-rollback-snap.result b/test/replication/gh-5167-qsync-rollback-snap.result
index ddb3212ad..13166720f 100644
--- a/test/replication/gh-5167-qsync-rollback-snap.result
+++ b/test/replication/gh-5167-qsync-rollback-snap.result
@@ -166,3 +166,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5167-qsync-rollback-snap.test.lua b/test/replication/gh-5167-qsync-rollback-snap.test.lua
index 1590a515a..1a2a31b7c 100644
--- a/test/replication/gh-5167-qsync-rollback-snap.test.lua
+++ b/test/replication/gh-5167-qsync-rollback-snap.test.lua
@@ -66,3 +66,4 @@ box.cfg{
     replication_synchro_quorum = orig_synchro_quorum,                           \
     replication_synchro_timeout = orig_synchro_timeout,                         \
 }
+box.ctl.demote()
diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result
index fcf96cd1a..99ec9663e 100644
--- a/test/replication/gh-5195-qsync-replica-write.result
+++ b/test/replication/gh-5195-qsync-replica-write.result
@@ -160,3 +160,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5195-qsync-replica-write.test.lua b/test/replication/gh-5195-qsync-replica-write.test.lua
index 751ceee69..8b5d78357 100644
--- a/test/replication/gh-5195-qsync-replica-write.test.lua
+++ b/test/replication/gh-5195-qsync-replica-write.test.lua
@@ -67,3 +67,4 @@ box.cfg{
     replication_synchro_quorum = old_synchro_quorum,                            \
     replication_synchro_timeout = old_synchro_timeout,                          \
 }
+box.ctl.demote()
diff --git a/test/replication/gh-5213-qsync-applier-order-3.result b/test/replication/gh-5213-qsync-applier-order-3.result
index 90608dcdc..e788eec77 100644
--- a/test/replication/gh-5213-qsync-applier-order-3.result
+++ b/test/replication/gh-5213-qsync-applier-order-3.result
@@ -267,3 +267,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5213-qsync-applier-order-3.test.lua b/test/replication/gh-5213-qsync-applier-order-3.test.lua
index 669ab3677..304656de0 100644
--- a/test/replication/gh-5213-qsync-applier-order-3.test.lua
+++ b/test/replication/gh-5213-qsync-applier-order-3.test.lua
@@ -125,3 +125,4 @@ box.cfg{
     replication_synchro_quorum = old_synchro_quorum,                            \
     replication_synchro_timeout = old_synchro_timeout,                          \
 }
+box.ctl.demote()
diff --git a/test/replication/gh-5213-qsync-applier-order.result b/test/replication/gh-5213-qsync-applier-order.result
index 3e1f2871b..ba6cdab06 100644
--- a/test/replication/gh-5213-qsync-applier-order.result
+++ b/test/replication/gh-5213-qsync-applier-order.result
@@ -303,3 +303,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5213-qsync-applier-order.test.lua b/test/replication/gh-5213-qsync-applier-order.test.lua
index 010c08ca7..39b1912e8 100644
--- a/test/replication/gh-5213-qsync-applier-order.test.lua
+++ b/test/replication/gh-5213-qsync-applier-order.test.lua
@@ -121,3 +121,4 @@ box.cfg{
     replication_synchro_quorum = old_synchro_quorum,                            \
     replication_synchro_timeout = old_synchro_timeout,                          \
 }
+box.ctl.demote()
diff --git a/test/replication/gh-5288-qsync-recovery.result b/test/replication/gh-5288-qsync-recovery.result
index 42662cd38..704b71d93 100644
--- a/test/replication/gh-5288-qsync-recovery.result
+++ b/test/replication/gh-5288-qsync-recovery.result
@@ -28,3 +28,6 @@ test_run:cmd('restart server default')
 box.space.sync:drop()
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5288-qsync-recovery.test.lua b/test/replication/gh-5288-qsync-recovery.test.lua
index 24727d59e..2455f7278 100644
--- a/test/replication/gh-5288-qsync-recovery.test.lua
+++ b/test/replication/gh-5288-qsync-recovery.test.lua
@@ -10,3 +10,4 @@ s:insert{1}
 box.snapshot()
 test_run:cmd('restart server default')
 box.space.sync:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-5298-qsync-recovery-snap.result b/test/replication/gh-5298-qsync-recovery-snap.result
index ac6ccbc36..0883fe5f5 100644
--- a/test/replication/gh-5298-qsync-recovery-snap.result
+++ b/test/replication/gh-5298-qsync-recovery-snap.result
@@ -101,3 +101,6 @@ box.space.sync:drop()
 box.space.loc:drop()
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5298-qsync-recovery-snap.test.lua b/test/replication/gh-5298-qsync-recovery-snap.test.lua
index 30f975c6c..084cde963 100644
--- a/test/replication/gh-5298-qsync-recovery-snap.test.lua
+++ b/test/replication/gh-5298-qsync-recovery-snap.test.lua
@@ -47,3 +47,4 @@ box.cfg{
 }
 box.space.sync:drop()
 box.space.loc:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-5426-election-on-off.result b/test/replication/gh-5426-election-on-off.result
index 7444ef7f2..2bdc17ec6 100644
--- a/test/replication/gh-5426-election-on-off.result
+++ b/test/replication/gh-5426-election-on-off.result
@@ -168,6 +168,9 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
diff --git a/test/replication/gh-5426-election-on-off.test.lua b/test/replication/gh-5426-election-on-off.test.lua
index bdf06903b..6277e9ef2 100644
--- a/test/replication/gh-5426-election-on-off.test.lua
+++ b/test/replication/gh-5426-election-on-off.test.lua
@@ -69,4 +69,5 @@ box.cfg{
     election_mode = old_election_mode,                                          \
     replication_timeout = old_replication_timeout,                              \
 }
+box.ctl.demote()
 box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5433-election-restart-recovery.result b/test/replication/gh-5433-election-restart-recovery.result
index f8f32416e..ed63ff409 100644
--- a/test/replication/gh-5433-election-restart-recovery.result
+++ b/test/replication/gh-5433-election-restart-recovery.result
@@ -169,6 +169,9 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
diff --git a/test/replication/gh-5433-election-restart-recovery.test.lua b/test/replication/gh-5433-election-restart-recovery.test.lua
index 4aff000bf..ae1f42c4d 100644
--- a/test/replication/gh-5433-election-restart-recovery.test.lua
+++ b/test/replication/gh-5433-election-restart-recovery.test.lua
@@ -84,4 +84,5 @@ box.cfg{
     election_mode = old_election_mode,                                          \
     replication_timeout = old_replication_timeout,                              \
 }
+box.ctl.demote()
 box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5446-qsync-eval-quorum.result b/test/replication/gh-5446-qsync-eval-quorum.result
index fe9868651..1173128a7 100644
--- a/test/replication/gh-5446-qsync-eval-quorum.result
+++ b/test/replication/gh-5446-qsync-eval-quorum.result
@@ -346,3 +346,7 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
+
diff --git a/test/replication/gh-5446-qsync-eval-quorum.test.lua b/test/replication/gh-5446-qsync-eval-quorum.test.lua
index d8fe6cccf..b969df836 100644
--- a/test/replication/gh-5446-qsync-eval-quorum.test.lua
+++ b/test/replication/gh-5446-qsync-eval-quorum.test.lua
@@ -136,3 +136,5 @@ box.cfg{
     replication_synchro_quorum = old_synchro_quorum,                            \
     replication_synchro_timeout = old_synchro_timeout,                          \
 }
+box.ctl.demote()
+
diff --git a/test/replication/gh-5506-election-on-off.result b/test/replication/gh-5506-election-on-off.result
index b8abd7ecd..a7f2b6a9c 100644
--- a/test/replication/gh-5506-election-on-off.result
+++ b/test/replication/gh-5506-election-on-off.result
@@ -138,3 +138,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5506-election-on-off.test.lua b/test/replication/gh-5506-election-on-off.test.lua
index 476b00ec0..f8915c333 100644
--- a/test/replication/gh-5506-election-on-off.test.lua
+++ b/test/replication/gh-5506-election-on-off.test.lua
@@ -66,3 +66,4 @@ box.cfg{
     election_mode = old_election_mode,                                          \
     replication_timeout = old_replication_timeout,                              \
 }
+box.ctl.demote()
diff --git a/test/replication/gh-5566-final-join-synchro.result b/test/replication/gh-5566-final-join-synchro.result
index e43e03b1c..c5ae2f283 100644
--- a/test/replication/gh-5566-final-join-synchro.result
+++ b/test/replication/gh-5566-final-join-synchro.result
@@ -140,3 +140,6 @@ test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5566-final-join-synchro.test.lua b/test/replication/gh-5566-final-join-synchro.test.lua
index 0f22a0321..25f411407 100644
--- a/test/replication/gh-5566-final-join-synchro.test.lua
+++ b/test/replication/gh-5566-final-join-synchro.test.lua
@@ -60,3 +60,4 @@ box.cfg{\
 box.space.sync:drop()
 test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
+box.ctl.demote()
diff --git a/test/replication/gh-5874-qsync-txn-recovery.result b/test/replication/gh-5874-qsync-txn-recovery.result
index 85eb89e04..01328a9e3 100644
--- a/test/replication/gh-5874-qsync-txn-recovery.result
+++ b/test/replication/gh-5874-qsync-txn-recovery.result
@@ -163,3 +163,6 @@ sync:drop()
 loc:drop()
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5874-qsync-txn-recovery.test.lua b/test/replication/gh-5874-qsync-txn-recovery.test.lua
index cf9753851..6ddf164ac 100644
--- a/test/replication/gh-5874-qsync-txn-recovery.test.lua
+++ b/test/replication/gh-5874-qsync-txn-recovery.test.lua
@@ -83,3 +83,4 @@ loc:select()
 async:drop()
 sync:drop()
 loc:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-6032-promote-wal-write.result b/test/replication/gh-6032-promote-wal-write.result
index 246c7974f..03112fb8d 100644
--- a/test/replication/gh-6032-promote-wal-write.result
+++ b/test/replication/gh-6032-promote-wal-write.result
@@ -67,3 +67,6 @@ box.cfg{\
 box.space.sync:drop()
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-6032-promote-wal-write.test.lua b/test/replication/gh-6032-promote-wal-write.test.lua
index 8c1859083..9a036a8b4 100644
--- a/test/replication/gh-6032-promote-wal-write.test.lua
+++ b/test/replication/gh-6032-promote-wal-write.test.lua
@@ -26,3 +26,4 @@ box.cfg{\
     replication_synchro_timeout = replication_synchro_timeout,\
 }
 box.space.sync:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-6057-qsync-confirm-async-no-wal.result b/test/replication/gh-6057-qsync-confirm-async-no-wal.result
index 4e4fc7576..e7beefb2a 100644
--- a/test/replication/gh-6057-qsync-confirm-async-no-wal.result
+++ b/test/replication/gh-6057-qsync-confirm-async-no-wal.result
@@ -165,3 +165,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua b/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
index c2d9d290b..bb459ea02 100644
--- a/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
+++ b/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
@@ -88,3 +88,4 @@ box.cfg{
     replication_synchro_quorum = old_synchro_quorum,                            \
     replication_synchro_timeout = old_synchro_timeout,                          \
 }
+box.ctl.demote()
diff --git a/test/replication/hang_on_synchro_fail.result b/test/replication/hang_on_synchro_fail.result
index b73406368..dda15af20 100644
--- a/test/replication/hang_on_synchro_fail.result
+++ b/test/replication/hang_on_synchro_fail.result
@@ -130,4 +130,7 @@ box.space.sync:drop()
 box.schema.user.revoke('guest', 'replication')
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 
diff --git a/test/replication/hang_on_synchro_fail.test.lua b/test/replication/hang_on_synchro_fail.test.lua
index 549ae17f6..f0d494eae 100644
--- a/test/replication/hang_on_synchro_fail.test.lua
+++ b/test/replication/hang_on_synchro_fail.test.lua
@@ -55,4 +55,5 @@ box.cfg{replication_synchro_quorum=old_synchro_quorum,\
         replication_synchro_timeout=old_synchro_timeout}
 box.space.sync:drop()
 box.schema.user.revoke('guest', 'replication')
+box.ctl.demote()
 
diff --git a/test/replication/qsync_advanced.result b/test/replication/qsync_advanced.result
index 509e5c2a7..72ac0c326 100644
--- a/test/replication/qsync_advanced.result
+++ b/test/replication/qsync_advanced.result
@@ -790,3 +790,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_advanced.test.lua b/test/replication/qsync_advanced.test.lua
index 5bc54794b..37c285b8d 100644
--- a/test/replication/qsync_advanced.test.lua
+++ b/test/replication/qsync_advanced.test.lua
@@ -282,3 +282,4 @@ box.cfg{
     replication_synchro_quorum = orig_synchro_quorum,                           \
     replication_synchro_timeout = orig_synchro_timeout,                         \
 }
+box.ctl.demote()
diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
index 08474269e..f2e207943 100644
--- a/test/replication/qsync_basic.result
+++ b/test/replication/qsync_basic.result
@@ -700,3 +700,6 @@ box.space.sync:drop()
 box.schema.user.revoke('guest', 'replication')
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
index 6a49e2b01..b7ba905a6 100644
--- a/test/replication/qsync_basic.test.lua
+++ b/test/replication/qsync_basic.test.lua
@@ -279,3 +279,4 @@ test_run:cmd('delete server replica')
 box.space.test:drop()
 box.space.sync:drop()
 box.schema.user.revoke('guest', 'replication')
+box.ctl.demote()
diff --git a/test/replication/qsync_errinj.result b/test/replication/qsync_errinj.result
index 837802847..cf1e30a90 100644
--- a/test/replication/qsync_errinj.result
+++ b/test/replication/qsync_errinj.result
@@ -545,3 +545,6 @@ box.space.sync:drop()
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_errinj.test.lua b/test/replication/qsync_errinj.test.lua
index 556c897c4..e7c85c58c 100644
--- a/test/replication/qsync_errinj.test.lua
+++ b/test/replication/qsync_errinj.test.lua
@@ -223,3 +223,4 @@ test_run:cmd('delete server replica')
 
 box.space.sync:drop()
 box.schema.user.revoke('guest', 'super')
+box.ctl.demote()
diff --git a/test/replication/qsync_snapshots.result b/test/replication/qsync_snapshots.result
index f6b86ed70..ca418b168 100644
--- a/test/replication/qsync_snapshots.result
+++ b/test/replication/qsync_snapshots.result
@@ -302,3 +302,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_snapshots.test.lua b/test/replication/qsync_snapshots.test.lua
index e5f70e46a..82c2e3f7c 100644
--- a/test/replication/qsync_snapshots.test.lua
+++ b/test/replication/qsync_snapshots.test.lua
@@ -131,3 +131,4 @@ box.cfg{
     replication_synchro_quorum = orig_synchro_quorum,                           \
     replication_synchro_timeout = orig_synchro_timeout,                         \
 }
+box.ctl.demote()
diff --git a/test/replication/qsync_with_anon.result b/test/replication/qsync_with_anon.result
index 5ec99c1ef..99c6fb902 100644
--- a/test/replication/qsync_with_anon.result
+++ b/test/replication/qsync_with_anon.result
@@ -223,6 +223,9 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 test_run:cleanup_cluster()
  | ---
  | ...
diff --git a/test/replication/qsync_with_anon.test.lua b/test/replication/qsync_with_anon.test.lua
index 28e08697e..e73880ec7 100644
--- a/test/replication/qsync_with_anon.test.lua
+++ b/test/replication/qsync_with_anon.test.lua
@@ -82,4 +82,5 @@ box.cfg{
     replication_synchro_quorum = orig_synchro_quorum,                           \
     replication_synchro_timeout = orig_synchro_timeout,                         \
 }
+box.ctl.demote()
 test_run:cleanup_cluster()
-- 
2.30.1 (Apple Git-130)
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 6/7] box: introduce `box.ctl.demote`
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 6/7] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
@ 2021-06-18 22:52   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-21 14:56     ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-18 22:52 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the patch!
See 3 comments below.
>     box: introduce `box.ctl.demote`
>     
>     Introduce a new journal entry, DEMOTE. The entry has the same meaning as
>     PROMOTE, with the only difference that it clears limbo ownership instead
>     of transferring it to the issuer.
>     
>     Introduce `box.ctl.demote`, which works exactly like `box.ctl.promote`,
>     but results in writing DEMOTE instead of PROMOTE.
>     
>     A new request was necessary instead of simply writing PROMOTE(origin_id
>     = 0), because origin_id is deduced from row.replica_id, which cannot be
>     0 for replicated rows (it's always equal to instance_id of the row
>     originator).
>     
>     Closes #6034
>     
>     @TarantoolBod document
1. TarantoolBod -> TarantoolBot
>     Title: box.ctl.demote
>     
>     `box.ctl.demote()` is a new function, which works exactly like
>     `box.ctl.promote()`, with one exception that it results in the instance
>     writing DEMOTE request to WAL instead of a PROMOTE request.
>     
>     A DEMOTE request (DEMOTE = 32) copies PROMOTE behaviour (it clears the
>     limbo as well), but clears limbo ownership instead of assigning it to a
>     new instance.
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 53a8f80e5..f2bde910c 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1527,8 +1527,8 @@ box_wait_quorum(uint32_t lead_id, int64_t target_lsn, int quorum,
>  	return 0;
>  }
>  
> -int
> -box_promote(void)
> +static int
> +box_clear_synchro_queue(bool demote)
>  {
>  	/* A guard to block multiple simultaneous function invocations. */
>  	static bool in_promote = false;
2. A few lines below there is an error message about simultaneous invocations.
It still talks only about promote(), but I think it should mention demote() too.
> @@ -1691,10 +1691,16 @@ promote:
3. That looks strange.
tarantool> box.cfg{election_mode = 'candidate'}
tarantool> box.info.election
---
- state: leader
  vote: 1
  leader: 1
  term: 2
...
tarantool> box.ctl.demote()
---
...
tarantool> box.info.election
---
- state: leader
  vote: 1
  leader: 1
  term: 2
...
So demote() didn't demote the leader nor raised an error. I
would rather expect from demote() that it can be called only
on the leader, and always makes it a follower. Even if the
node is mode='candidate' and would elect itself again, still
I would think it should step down in the current term.
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 6/7] box: introduce `box.ctl.demote`
  2021-06-18 22:52   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-21 14:56     ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-21 14:56 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
19.06.2021 01:52, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
> See 3 comments below.
Thanks for the review!
>>      box: introduce `box.ctl.demote`
>>      
>>      Introduce a new journal entry, DEMOTE. The entry has the same meaning as
>>      PROMOTE, with the only difference that it clears limbo ownership instead
>>      of transferring it to the issuer.
>>      
>>      Introduce `box.ctl.demote`, which works exactly like `box.ctl.promote`,
>>      but results in writing DEMOTE instead of PROMOTE.
>>      
>>      A new request was necessary instead of simply writing PROMOTE(origin_id
>>      = 0), because origin_id is deduced from row.replica_id, which cannot be
>>      0 for replicated rows (it's always equal to instance_id of the row
>>      originator).
>>      
>>      Closes #6034
>>      
>>      @TarantoolBod document
> 1. TarantoolBod -> TarantoolBot
Fixed.
>
>>      Title: box.ctl.demote
>>      
>>      `box.ctl.demote()` is a new function, which works exactly like
>>      `box.ctl.promote()`, with one exception that it results in the instance
>>      writing DEMOTE request to WAL instead of a PROMOTE request.
>>      
>>      A DEMOTE request (DEMOTE = 32) copies PROMOTE behaviour (it clears the
>>      limbo as well), but clears limbo ownership instead of assigning it to a
>>      new instance.
>>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 53a8f80e5..f2bde910c 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1527,8 +1527,8 @@ box_wait_quorum(uint32_t lead_id, int64_t target_lsn, int quorum,
>>   	return 0;
>>   }
>>   
>> -int
>> -box_promote(void)
>> +static int
>> +box_clear_synchro_queue(bool demote)
>>   {
>>   	/* A guard to block multiple simultaneous function invocations. */
>>   	static bool in_promote = false;
> 2. A few lines below there is an error message about simultaneous invocations.
> It still talks only about promote(), but I think it should mention demote() too.
Sure, fixed. Please find the diff below.
>
>> @@ -1691,10 +1691,16 @@ promote:
> 3. That looks strange.
>
> tarantool> box.cfg{election_mode = 'candidate'}
>
> tarantool> box.info.election
> ---
> - state: leader
>    vote: 1
>    leader: 1
>    term: 2
> ...
>
> tarantool> box.ctl.demote()
> ---
> ...
>
> tarantool> box.info.election
> ---
> - state: leader
>    vote: 1
>    leader: 1
>    term: 2
> ...
>
> So demote() didn't demote the leader nor raised an error. I
> would rather expect from demote() that it can be called only
> on the leader, and always makes it a follower. Even if the
> node is mode='candidate' and would elect itself again, still
> I would think it should step down in the current term.
>
Ok, I see.
Maybe rename `box.ctl.demote()` to something else?
The notorious `clear_synchro_queue`, for example.
This way the user won't expect the function to demote a leader.
I intended the function to be used as PROMOTE for replica id 0,
i.e. someone has to become the leader before writing PROMOTE/DEMOTE.
This means both functions result in electing a leader. But for a different
purpose: either to pin the limbo to it, or free the limbo from any other 
owner.
=================================================
diff --git a/src/box/box.cc b/src/box/box.cc
index f2bde910c..44ee327e9 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1533,7 +1533,8 @@ box_clear_synchro_queue(bool demote)
         /* A guard to block multiple simultaneous function invocations. */
         static bool in_promote = false;
         if (in_promote) {
-               diag_set(ClientError, ER_UNSUPPORTED, "box.ctl.promote",
+               diag_set(ClientError, ER_UNSUPPORTED,
+                        demote ? "box.ctl.demote" : "box.ctl.promote",
                          "simultaneous invocations");
                 return -1;
         }
=================================================
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread
 
 
- * [Tarantool-patches] [PATCH 7/7] box: make promote/demote always bump the term
  2021-06-10 13:32 [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (5 preceding siblings ...)
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 6/7] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
@ 2021-06-10 13:32 ` Serge Petrenko via Tarantool-patches
  2021-06-15 21:00   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-18 22:53   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-15 20:53 ` [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Vladislav Shpilevoy via Tarantool-patches
  7 siblings, 2 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-10 13:32 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
When called without elections, promote and demote resulted in multiple
PROMOTE/DEMOTE entries for the same term. This is not right, because all
the promotions for the same term except the first one would be ignored as
already seen.
Follow-up #6034
---
 src/box/box.cc                                |  14 +-
 src/lib/raft/raft.c                           |   3 +-
 .../gh-6034-limbo-ownership.result            | 186 ++++++++++++++++++
 .../gh-6034-limbo-ownership.test.lua          |  68 +++++++
 .../gh-6034-promote-bump-term.result          |  51 +++++
 .../gh-6034-promote-bump-term.test.lua        |  20 ++
 test/replication/suite.cfg                    |   2 +
 7 files changed, 336 insertions(+), 8 deletions(-)
 create mode 100644 test/replication/gh-6034-limbo-ownership.result
 create mode 100644 test/replication/gh-6034-limbo-ownership.test.lua
 create mode 100644 test/replication/gh-6034-promote-bump-term.result
 create mode 100644 test/replication/gh-6034-promote-bump-term.test.lua
diff --git a/src/box/box.cc b/src/box/box.cc
index d3908b4cd..59f8dddae 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1663,14 +1663,16 @@ box_clear_synchro_queue(bool demote)
 			rc = -1;
 		} else {
 promote:
-			/* We cannot possibly get here in a volatile state. */
-			assert(box_raft()->volatile_term == box_raft()->term);
+			if (try_wait)
+				raft_new_term(box_raft());
+
+			uint64_t term = box_raft()->volatile_term;
 			if (demote) {
 				txn_limbo_write_demote(&txn_limbo, wait_lsn,
-						       box_raft()->term);
+						       term);
 			} else {
 				txn_limbo_write_promote(&txn_limbo, wait_lsn,
-							box_raft()->term);
+							term);
 			}
 			uint16_t type = demote ? IPROTO_DEMOTE : IPROTO_PROMOTE;
 			struct synchro_request req = {
@@ -1678,7 +1680,7 @@ promote:
 				.replica_id = former_leader_id,
 				.origin_id = instance_id,
 				.lsn = wait_lsn,
-				.term = box_raft()->term,
+				.term = term,
 			};
 			txn_limbo_process(&txn_limbo, &req);
 			assert(txn_limbo_is_empty(&txn_limbo));
@@ -3511,7 +3513,7 @@ box_cfg_xc(void)
 
 	if (!is_bootstrap_leader) {
 		replicaset_sync();
-	} else {
+	} else if (raft_is_enabled(box_raft())) {
 		/*
 		 * When the cluster is just bootstrapped and this instance is a
 		 * leader, it makes no sense to wait for a leader appearance.
diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
index 247c7a71a..1de05727f 100644
--- a/src/lib/raft/raft.c
+++ b/src/lib/raft/raft.c
@@ -983,8 +983,7 @@ raft_cfg_vclock(struct raft *raft, const struct vclock *vclock)
 void
 raft_new_term(struct raft *raft)
 {
-	if (raft->is_enabled)
-		raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
+	raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
 }
 
 static void
diff --git a/test/replication/gh-6034-limbo-ownership.result b/test/replication/gh-6034-limbo-ownership.result
new file mode 100644
index 000000000..ebedc2b17
--- /dev/null
+++ b/test/replication/gh-6034-limbo-ownership.result
@@ -0,0 +1,186 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+
+--
+-- gh-6034: test that transactional limbo isn't accessible without a promotion.
+--
+synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+election_mode = box.cfg.election_mode
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum = 1, election_mode='off'}
+ | ---
+ | ...
+
+_ = box.schema.space.create('async'):create_index('pk')
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true}):create_index('pk')
+ | ---
+ | ...
+
+-- Limbo is initially unclaimed, everyone is writeable.
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == 0)
+ | ---
+ | - true
+ | ...
+box.space.async:insert{1} -- success.
+ | ---
+ | - [1]
+ | ...
+-- Synchro spaces aren't writeable
+box.space.sync:insert{1} -- error.
+ | ---
+ | - error: Synchronous transaction limbo doesn't belong to any instance
+ | ...
+
+box.ctl.promote()
+ | ---
+ | ...
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == box.info.id)
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{1} -- success.
+ | ---
+ | - [1]
+ | ...
+
+-- Everyone but the limbo owner is read-only.
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+test_run:cmd('set variable rpl_listen to "replica.listen"')
+ | ---
+ | - true
+ | ...
+orig_replication = box.cfg.replication
+ | ---
+ | ...
+box.cfg{replication={box.info.listen, rpl_listen}}
+ | ---
+ | ...
+
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+assert(box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == test_run:eval('default', 'return box.info.id')[1])
+ | ---
+ | - true
+ | ...
+box.space.async:insert{2} -- failure.
+ | ---
+ | - error: Can't modify data because this instance is in read-only mode.
+ | ...
+
+-- Promotion on the other node. Default should become ro.
+box.ctl.promote()
+ | ---
+ | ...
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == box.info.id)
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{2} -- success.
+ | ---
+ | - [2]
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+assert(box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == test_run:eval('replica', 'return box.info.id')[1])
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{3} -- failure.
+ | ---
+ | - error: Can't modify data because this instance is in read-only mode.
+ | ...
+
+box.ctl.demote()
+ | ---
+ | ...
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{3} -- still fails.
+ | ---
+ | - error: Synchronous transaction limbo doesn't belong to any instance
+ | ...
+assert(box.info.synchro.queue.owner == 0)
+ | ---
+ | - true
+ | ...
+box.space.async:insert{3} -- success.
+ | ---
+ | - [3]
+ | ...
+
+-- Cleanup.
+box.ctl.demote()
+ | ---
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+box.space.async:drop()
+ | ---
+ | ...
+box.cfg{\
+    replication_synchro_quorum = synchro_quorum,\
+    election_mode = election_mode,\
+    replication = orig_replication,\
+}
+ | ---
+ | ...
diff --git a/test/replication/gh-6034-limbo-ownership.test.lua b/test/replication/gh-6034-limbo-ownership.test.lua
new file mode 100644
index 000000000..0e1586566
--- /dev/null
+++ b/test/replication/gh-6034-limbo-ownership.test.lua
@@ -0,0 +1,68 @@
+test_run = require('test_run').new()
+fiber = require('fiber')
+
+--
+-- gh-6034: test that transactional limbo isn't accessible without a promotion.
+--
+synchro_quorum = box.cfg.replication_synchro_quorum
+election_mode = box.cfg.election_mode
+box.cfg{replication_synchro_quorum = 1, election_mode='off'}
+
+_ = box.schema.space.create('async'):create_index('pk')
+_ = box.schema.space.create('sync', {is_sync=true}):create_index('pk')
+
+-- Limbo is initially unclaimed, everyone is writeable.
+assert(not box.info.ro)
+assert(box.info.synchro.queue.owner == 0)
+box.space.async:insert{1} -- success.
+-- Synchro spaces aren't writeable
+box.space.sync:insert{1} -- error.
+
+box.ctl.promote()
+assert(not box.info.ro)
+assert(box.info.synchro.queue.owner == box.info.id)
+box.space.sync:insert{1} -- success.
+
+-- Everyone but the limbo owner is read-only.
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+test_run:cmd('set variable rpl_listen to "replica.listen"')
+orig_replication = box.cfg.replication
+box.cfg{replication={box.info.listen, rpl_listen}}
+
+test_run:switch('replica')
+assert(box.info.ro)
+assert(box.info.synchro.queue.owner == test_run:eval('default', 'return box.info.id')[1])
+box.space.async:insert{2} -- failure.
+
+-- Promotion on the other node. Default should become ro.
+box.ctl.promote()
+assert(not box.info.ro)
+assert(box.info.synchro.queue.owner == box.info.id)
+box.space.sync:insert{2} -- success.
+
+test_run:switch('default')
+assert(box.info.ro)
+assert(box.info.synchro.queue.owner == test_run:eval('replica', 'return box.info.id')[1])
+box.space.sync:insert{3} -- failure.
+
+box.ctl.demote()
+assert(not box.info.ro)
+box.space.sync:insert{3} -- still fails.
+assert(box.info.synchro.queue.owner == 0)
+box.space.async:insert{3} -- success.
+
+-- Cleanup.
+box.ctl.demote()
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.schema.user.revoke('guest', 'replication')
+box.space.sync:drop()
+box.space.async:drop()
+box.cfg{\
+    replication_synchro_quorum = synchro_quorum,\
+    election_mode = election_mode,\
+    replication = orig_replication,\
+}
diff --git a/test/replication/gh-6034-promote-bump-term.result b/test/replication/gh-6034-promote-bump-term.result
new file mode 100644
index 000000000..d3cbbc1c3
--- /dev/null
+++ b/test/replication/gh-6034-promote-bump-term.result
@@ -0,0 +1,51 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- gh-6034 follow-up: test that every box.ctl.promote()/box.ctl.demote() bumps
+-- the instance's term. Even when elections are disabled. Even for consequent
+-- promotes/demotes on the same instance.
+election_mode = box.cfg.election_mode
+ | ---
+ | ...
+box.cfg{election_mode='off'}
+ | ---
+ | ...
+
+term = box.info.election.term
+ | ---
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+assert(box.info.election.term == term + 1)
+ | ---
+ | - true
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+assert(box.info.election.term == term + 2)
+ | ---
+ | - true
+ | ...
+box.ctl.demote()
+ | ---
+ | ...
+assert(box.info.election.term == term + 3)
+ | ---
+ | - true
+ | ...
+box.ctl.demote()
+ | ---
+ | ...
+assert(box.info.election.term == term + 4)
+ | ---
+ | - true
+ | ...
+
+-- Cleanup.
+box.cfg{election_mode=election_mode}
+ | ---
+ | ...
diff --git a/test/replication/gh-6034-promote-bump-term.test.lua b/test/replication/gh-6034-promote-bump-term.test.lua
new file mode 100644
index 000000000..a6a082ed5
--- /dev/null
+++ b/test/replication/gh-6034-promote-bump-term.test.lua
@@ -0,0 +1,20 @@
+test_run = require('test_run').new()
+
+-- gh-6034 follow-up: test that every box.ctl.promote()/box.ctl.demote() bumps
+-- the instance's term. Even when elections are disabled. Even for consequent
+-- promotes/demotes on the same instance.
+election_mode = box.cfg.election_mode
+box.cfg{election_mode='off'}
+
+term = box.info.election.term
+box.ctl.promote()
+assert(box.info.election.term == term + 1)
+box.ctl.promote()
+assert(box.info.election.term == term + 2)
+box.ctl.demote()
+assert(box.info.election.term == term + 3)
+box.ctl.demote()
+assert(box.info.election.term == term + 4)
+
+-- Cleanup.
+box.cfg{election_mode=election_mode}
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 46de2e6c4..9444ec49a 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -46,6 +46,8 @@
     "gh-5536-wal-limit.test.lua": {},
     "gh-5566-final-join-synchro.test.lua": {},
     "gh-6032-promote-wal-write.test.lua": {},
+    "gh-6034-limbo-ownership.test.lua": {},
+    "gh-6034-promote-bump-term.test.lua": {},
     "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
     "gh-6094-rs-uuid-mismatch.test.lua": {},
     "*": {
-- 
2.30.1 (Apple Git-130)
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 7/7] box: make promote/demote always bump the term
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 7/7] box: make promote/demote always bump the term Serge Petrenko via Tarantool-patches
@ 2021-06-15 21:00   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  2021-06-18 22:53   ` Vladislav Shpilevoy via Tarantool-patches
  1 sibling, 1 reply; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-15 21:00 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the patch!
> diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
> index 247c7a71a..1de05727f 100644
> --- a/src/lib/raft/raft.c
> +++ b/src/lib/raft/raft.c
> @@ -983,8 +983,7 @@ raft_cfg_vclock(struct raft *raft, const struct vclock *vclock)
>  void
>  raft_new_term(struct raft *raft)
>  {
> -	if (raft->is_enabled)
> -		raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
> +	raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
>  }
Could you please make it a separate commit + add a unit test?
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 7/7] box: make promote/demote always bump the term
  2021-06-15 21:00   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-17 21:00     ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-17 21:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
16.06.2021 00:00, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
>> diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
>> index 247c7a71a..1de05727f 100644
>> --- a/src/lib/raft/raft.c
>> +++ b/src/lib/raft/raft.c
>> @@ -983,8 +983,7 @@ raft_cfg_vclock(struct raft *raft, const struct vclock *vclock)
>>   void
>>   raft_new_term(struct raft *raft)
>>   {
>> -	if (raft->is_enabled)
>> -		raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
>> +	raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
>>   }
> Could you please make it a separate commit + add a unit test?
Thanks for the reivew!
Sure. Please see v2 of the patchset.
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread
 
- * Re: [Tarantool-patches] [PATCH 7/7] box: make promote/demote always bump the term
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 7/7] box: make promote/demote always bump the term Serge Petrenko via Tarantool-patches
  2021-06-15 21:00   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-18 22:53   ` Vladislav Shpilevoy via Tarantool-patches
  2021-06-21 15:02     ` Serge Petrenko via Tarantool-patches
  1 sibling, 1 reply; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-18 22:53 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the patch!
See 4 comments below.
>     box: make promote always bump the term
>     
>     When called without elections, promote and resulted in multiple
1. 'and' is excess.
>     PROMOTE entries for the same term. This is not right, because all
>     the promotions for the same term except the first one would be ignored
>     as already seen.
>     
>     Part-of #6034
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 6a0950f44..53a8f80e5 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1687,16 +1687,18 @@ box_promote(void)
>  			rc = -1;
>  		} else {
>  promote:
> -			/* We cannot possibly get here in a volatile state. */
> -			assert(box_raft()->volatile_term == box_raft()->term);
> +			if (try_wait)
> +				raft_new_term(box_raft());
2. It starts to bother me, that we use try_wait flag as a kind of a
state of the promote process rather just just a command 'you need to wait'.
But I can't propose anything better right away. Just noticing.
> diff --git a/test/replication/gh-4114-local-space-replication.result b/test/replication/gh-4114-local-space-replication.result
> index 9b63a4b99..676400cef 100644
> --- a/test/replication/gh-4114-local-space-replication.result
> +++ b/test/replication/gh-4114-local-space-replication.result> @@ -77,9 +76,9 @@ box.space.test:insert{3}
>   | - [3]
>   | ...
>  
> -box.info.vclock[0]
> +box.info.vclock[0] == a + 3 or box.info.vclock[0] - a
3. Maybe use an assertion? They really do look easier to read
when you try to understand a test.
>   | ---
> - | - 3
> + | - true
>   | ...
> diff --git a/test/replication/gh-6034-promote-bump-term.result b/test/replication/gh-6034-promote-bump-term.result
> new file mode 100644
> index 000000000..20e352922
> --- /dev/null
> +++ b/test/replication/gh-6034-promote-bump-term.result
> @@ -0,0 +1,37 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +-- gh-6034: test that every box.ctl.promote() bumps
> +-- the instance's term. Even when elections are disabled. Even for consequent
> +-- promotes on the same instance.
> +election_mode = box.cfg.election_mode
> + | ---
> + | ...
> +box.cfg{election_mode='off'}
> + | ---
> + | ...
> +
> +term = box.info.election.term
> + | ---
> + | ...
> +box.ctl.promote()
> + | ---
> + | ...
> +assert(box.info.election.term == term + 1)
> + | ---
> + | - true
> + | ...
> +box.ctl.promote()
> + | ---
> + | ...
> +assert(box.info.election.term == term + 2)
> + | ---
> + | - true
4. Could you please remind why do we issue a new promote even
if we own the limbo already?
Especially if its ownership is going to get ever more strict
after this series.
^ permalink raw reply	[flat|nested] 34+ messages in thread
- * Re: [Tarantool-patches] [PATCH 7/7] box: make promote/demote always bump the term
  2021-06-18 22:53   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-21 15:02     ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 34+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-21 15:02 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
19.06.2021 01:53, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
> See 4 comments below.
>
>>      box: make promote always bump the term
>>      
>>      When called without elections, promote and resulted in multiple
> 1. 'and' is excess.
Thanks, fixed.
>
>>      PROMOTE entries for the same term. This is not right, because all
>>      the promotions for the same term except the first one would be ignored
>>      as already seen.
>>      
>>      Part-of #6034
>>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 6a0950f44..53a8f80e5 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1687,16 +1687,18 @@ box_promote(void)
>>   			rc = -1;
>>   		} else {
>>   promote:
>> -			/* We cannot possibly get here in a volatile state. */
>> -			assert(box_raft()->volatile_term == box_raft()->term);
>> +			if (try_wait)
>> +				raft_new_term(box_raft());
> 2. It starts to bother me, that we use try_wait flag as a kind of a
> state of the promote process rather just just a command 'you need to wait'.
> But I can't propose anything better right away. Just noticing.
I agree. I'd like to refactor box_clear_synchro_queue()/box_ctl_promote()
sometime soon. It starts to look really bad in my opinion.
>
>> diff --git a/test/replication/gh-4114-local-space-replication.result b/test/replication/gh-4114-local-space-replication.result
>> index 9b63a4b99..676400cef 100644
>> --- a/test/replication/gh-4114-local-space-replication.result
>> +++ b/test/replication/gh-4114-local-space-replication.result> @@ -77,9 +76,9 @@ box.space.test:insert{3}
>>    | - [3]
>>    | ...
>>   
>> -box.info.vclock[0]
>> +box.info.vclock[0] == a + 3 or box.info.vclock[0] - a
> 3. Maybe use an assertion? They really do look easier to read
> when you try to understand a test.
Ok, sure.
>
>>    | ---
>> - | - 3
>> + | - true
>>    | ...
>> diff --git a/test/replication/gh-6034-promote-bump-term.result b/test/replication/gh-6034-promote-bump-term.result
>> new file mode 100644
>> index 000000000..20e352922
>> --- /dev/null
>> +++ b/test/replication/gh-6034-promote-bump-term.result
>> @@ -0,0 +1,37 @@
>> +-- test-run result file version 2
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +
>> +-- gh-6034: test that every box.ctl.promote() bumps
>> +-- the instance's term. Even when elections are disabled. Even for consequent
>> +-- promotes on the same instance.
>> +election_mode = box.cfg.election_mode
>> + | ---
>> + | ...
>> +box.cfg{election_mode='off'}
>> + | ---
>> + | ...
>> +
>> +term = box.info.election.term
>> + | ---
>> + | ...
>> +box.ctl.promote()
>> + | ---
>> + | ...
>> +assert(box.info.election.term == term + 1)
>> + | ---
>> + | - true
>> + | ...
>> +box.ctl.promote()
>> + | ---
>> + | ...
>> +assert(box.info.election.term == term + 2)
>> + | ---
>> + | - true
> 4. Could you please remind why do we issue a new promote even
> if we own the limbo already?
>
> Especially if its ownership is going to get ever more strict
> after this series.
>
Hmm. I just didn't change that behaviour.
Why not though? Since it bumps the term as well.
Say, when promote is issued in 'manual' election mode,
it results in a new election every time.
So, why not make it bump the term for each consequent call
even when elections are off?
Here's the diff:
=======================================
diff --git a/test/replication/gh-4114-local-space-replication.result 
b/test/replication/gh-4114-local-space-replication.result
index 676400cef..e71eb60a8 100644
--- a/test/replication/gh-4114-local-space-replication.result
+++ b/test/replication/gh-4114-local-space-replication.result
@@ -76,7 +76,7 @@ box.space.test:insert{3}
   | - [3]
   | ...
-box.info.vclock[0] == a + 3 or box.info.vclock[0] - a
+assert(box.info.vclock[0] == a + 3)
   | ---
   | - true
   | ...
diff --git a/test/replication/gh-4114-local-space-replication.test.lua 
b/test/replication/gh-4114-local-space-replication.test.lua
index 8f787db84..65fef3bf6 100644
--- a/test/replication/gh-4114-local-space-replication.test.lua
+++ b/test/replication/gh-4114-local-space-replication.test.lua
@@ -27,7 +27,7 @@ box.space.test:insert{2}
  box.snapshot()
  box.space.test:insert{3}
-box.info.vclock[0] == a + 3 or box.info.vclock[0] - a
+assert(box.info.vclock[0] == a + 3)
  test_run:cmd('switch default')
=======================================
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 34+ messages in thread
 
 
- * Re: [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition
  2021-06-10 13:32 [Tarantool-patches] [PATCH 0/7] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (6 preceding siblings ...)
  2021-06-10 13:32 ` [Tarantool-patches] [PATCH 7/7] box: make promote/demote always bump the term Serge Petrenko via Tarantool-patches
@ 2021-06-15 20:53 ` Vladislav Shpilevoy via Tarantool-patches
  7 siblings, 0 replies; 34+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-15 20:53 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the patchset! I looked at most parts of it,
and will continue when we resolve the present comments.
I see a lot of jobs failed in CI, such as this:
https://github.com/tarantool/tarantool/runs/2793918607
Something might be broken.
^ permalink raw reply	[flat|nested] 34+ messages in thread