Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition
@ 2021-07-14 18:25 Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 01/16] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
                   ` (15 more replies)
  0 siblings, 16 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

Changes in v4:
  - various cleanups and fixes as per review from Vlad:
    * refactor box_promote()/box_demote()
  - reorder the patches for a more logical progression

Changes in v3:
  - change demote() behaviour as discussed with Vlad:
      * make it work only on the current leader
      * make it demote the current leader and always
        bump the term
  - change how limbo and raft snapshots are sent in response
    to JOIN:
      * encode replica's version in JOIN request
      * introduce a special stage: JOIN_META with raft and limbo
        snapshots. Send it based on replica's version.\

https://github.com/tarantool/tarantool/issues/5438
https://github.com/tarantool/tarantool/issues/6034
https://github.com/tarantool/tarantool/tree/sp/gh-6034-empty-limbo-transition

Serge Petrenko (16):
  replication: always send raft state to subscribers
  txn_limbo: fix promote term filtering
  txn_limbo: persist the latest effective promote in snapshot
  replication: encode version in JOIN request
  replication: add META stage to JOIN
  replication: send latest effective promote in initial join
  replication: send current Raft term in join response
  raft: refactor raft_new_term()
  box: split promote() into reasonable parts
  box: make promote always bump the term
  box: make promote on the current leader a no-op
  box: fix an assertion failure after a spurious wakeup in promote
  box: allow calling promote on a candidate
  box: extract promote() settings to a separate method
  replication: forbid implicit limbo owner transition
  box: introduce `box.ctl.demote`

 src/box/applier.cc                            |  27 +-
 src/box/box.cc                                | 435 ++++++++++++------
 src/box/box.h                                 |   3 +
 src/box/errcode.h                             |   2 +
 src/box/iproto_constants.h                    |  12 +-
 src/box/lua/ctl.c                             |   9 +
 src/box/lua/info.c                            |   4 +-
 src/box/memtx_engine.c                        |  32 ++
 src/box/raft.c                                |  48 +-
 src/box/raft.h                                |   4 +
 src/box/relay.cc                              |  36 +-
 src/box/relay.h                               |   4 +-
 src/box/txn_limbo.c                           |  90 ++--
 src/box/txn_limbo.h                           |  14 +
 src/box/xrow.c                                |   4 +-
 src/box/xrow.h                                |  25 +-
 src/lib/raft/raft.c                           |   3 +-
 test/box/alter.result                         |   2 +-
 test/box/error.result                         |   2 +
 test/replication/election_basic.result        |   3 +
 test/replication/election_basic.test.lua      |   1 +
 test/replication/election_qsync.result        |   3 +
 test/replication/election_qsync.test.lua      |   1 +
 .../gh-3055-promote-wakeup-crash.result       |  43 ++
 .../gh-3055-promote-wakeup-crash.test.lua     |  20 +
 .../gh-4114-local-space-replication.result    |   7 +-
 .../gh-4114-local-space-replication.test.lua  |   4 +-
 .../gh-5140-qsync-casc-rollback.result        |   6 +
 .../gh-5140-qsync-casc-rollback.test.lua      |   2 +
 .../gh-5144-qsync-dup-confirm.result          |   6 +
 .../gh-5144-qsync-dup-confirm.test.lua        |   2 +
 .../gh-5163-qsync-restart-crash.result        |   6 +
 .../gh-5163-qsync-restart-crash.test.lua      |   2 +
 .../gh-5167-qsync-rollback-snap.result        |   6 +
 .../gh-5167-qsync-rollback-snap.test.lua      |   2 +
 .../gh-5195-qsync-replica-write.result        |  10 +-
 .../gh-5195-qsync-replica-write.test.lua      |   6 +-
 .../gh-5213-qsync-applier-order-3.result      |   9 +
 .../gh-5213-qsync-applier-order-3.test.lua    |   3 +
 .../gh-5213-qsync-applier-order.result        |   6 +
 .../gh-5213-qsync-applier-order.test.lua      |   2 +
 .../replication/gh-5288-qsync-recovery.result |   6 +
 .../gh-5288-qsync-recovery.test.lua           |   2 +
 .../gh-5298-qsync-recovery-snap.result        |   6 +
 .../gh-5298-qsync-recovery-snap.test.lua      |   2 +
 .../gh-5426-election-on-off.result            |   3 +
 .../gh-5426-election-on-off.test.lua          |   1 +
 .../gh-5433-election-restart-recovery.result  |   3 +
 ...gh-5433-election-restart-recovery.test.lua |   1 +
 ...sync-clear-synchro-queue-commit-all.result |   3 +
 ...nc-clear-synchro-queue-commit-all.test.lua |   1 +
 .../replication/gh-5438-election-state.result |  66 +++
 .../gh-5438-election-state.test.lua           |  29 ++
 test/replication/gh-5440-qsync-ro.result      | 133 ------
 test/replication/gh-5440-qsync-ro.test.lua    |  53 ---
 .../gh-5446-qsync-eval-quorum.result          |   6 +
 .../gh-5446-qsync-eval-quorum.test.lua        |   2 +
 .../gh-5506-election-on-off.result            |   3 +
 .../gh-5506-election-on-off.test.lua          |   1 +
 .../gh-5566-final-join-synchro.result         |   6 +
 .../gh-5566-final-join-synchro.test.lua       |   2 +
 .../gh-5874-qsync-txn-recovery.result         |   6 +
 .../gh-5874-qsync-txn-recovery.test.lua       |   2 +
 .../gh-6032-promote-wal-write.result          |   3 +
 .../gh-6032-promote-wal-write.test.lua        |   1 +
 .../gh-6034-election-candidate-promote.result |  84 ++++
 ...h-6034-election-candidate-promote.test.lua |  42 ++
 .../gh-6034-election-promote-bump-term.result |  26 ++
 ...h-6034-election-promote-bump-term.test.lua |  12 +
 .../gh-6034-qsync-limbo-ownership.result      | 186 ++++++++
 .../gh-6034-qsync-limbo-ownership.test.lua    |  68 +++
 .../gh-6057-qsync-confirm-async-no-wal.result |   7 +
 ...h-6057-qsync-confirm-async-no-wal.test.lua |   3 +
 test/replication/hang_on_synchro_fail.result  |   6 +
 .../replication/hang_on_synchro_fail.test.lua |   2 +
 test/replication/qsync_advanced.result        |  12 +
 test/replication/qsync_advanced.test.lua      |   4 +
 test/replication/qsync_basic.result           |  33 +-
 test/replication/qsync_basic.test.lua         |  16 +-
 test/replication/qsync_errinj.result          |   6 +
 test/replication/qsync_errinj.test.lua        |   2 +
 test/replication/qsync_snapshots.result       |   6 +
 test/replication/qsync_snapshots.test.lua     |   2 +
 test/replication/qsync_with_anon.result       |   6 +
 test/replication/qsync_with_anon.test.lua     |   2 +
 test/replication/replica_rejoin.result        |  77 ++--
 test/replication/replica_rejoin.test.lua      |  50 +-
 test/replication/suite.cfg                    |   6 +-
 test/unit/raft.c                              |  15 +-
 test/unit/raft.result                         |   3 +-
 90 files changed, 1436 insertions(+), 487 deletions(-)
 create mode 100644 test/replication/gh-3055-promote-wakeup-crash.result
 create mode 100644 test/replication/gh-3055-promote-wakeup-crash.test.lua
 create mode 100644 test/replication/gh-5438-election-state.result
 create mode 100644 test/replication/gh-5438-election-state.test.lua
 delete mode 100644 test/replication/gh-5440-qsync-ro.result
 delete mode 100644 test/replication/gh-5440-qsync-ro.test.lua
 create mode 100644 test/replication/gh-6034-election-candidate-promote.result
 create mode 100644 test/replication/gh-6034-election-candidate-promote.test.lua
 create mode 100644 test/replication/gh-6034-election-promote-bump-term.result
 create mode 100644 test/replication/gh-6034-election-promote-bump-term.test.lua
 create mode 100644 test/replication/gh-6034-qsync-limbo-ownership.result
 create mode 100644 test/replication/gh-6034-qsync-limbo-ownership.test.lua

-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 01/16] replication: always send raft state to subscribers
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 02/16] txn_limbo: fix promote term filtering Serge Petrenko via Tarantool-patches
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

Tarantool used to send out raft state on subscribe only when raft was
enabled. This was a safeguard against partially-upgraded clusters, where
some nodes had no clue about Raft messages and couldn't handle them
properly.

Actually, Raft state should be sent out always. For example, promote
will be changed to bump Raft term even when Raft is disabled, and it's
important that everyone in cluster has the same term for the sake of promote
at least.

So, send out Raft state to every subscriber with version >= 2.6.0
(that's when Raft was introduced).
Do the same for Raft broadcasts. They should be sent only to replicas
with version >= 2.6.0

Closes #5438
---
 src/box/box.cc                                | 11 ++--
 src/box/relay.cc                              |  4 +-
 .../replication/gh-5438-election-state.result | 63 +++++++++++++++++++
 .../gh-5438-election-state.test.lua           | 28 +++++++++
 test/replication/suite.cfg                    |  1 +
 5 files changed, 100 insertions(+), 7 deletions(-)
 create mode 100644 test/replication/gh-5438-election-state.result
 create mode 100644 test/replication/gh-5438-election-state.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index eeb57b04e..5dcf5b460 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -82,6 +82,7 @@
 #include "msgpack.h"
 #include "raft.h"
 #include "trivia/util.h"
+#include "version.h"
 
 enum {
 	IPROTO_THREADS_MAX = 1000,
@@ -2831,13 +2832,13 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
 		 tt_uuid_str(&replica_uuid), sio_socketname(io->fd));
 	say_info("remote vclock %s local vclock %s",
 		 vclock_to_string(&replica_clock), vclock_to_string(&vclock));
-	if (raft_is_enabled(box_raft())) {
+	if (replica_version_id >= version_id(2, 6, 0) && !anon) {
 		/*
 		 * Send out the current raft state of the instance. Don't do
-		 * that if Raft is disabled. It can be that a part of the
-		 * cluster still contains old versions, which can't handle Raft
-		 * messages. So when it is disabled, its network footprint
-		 * should be 0.
+		 * that if the remote instance is old. It can be that a part of
+		 * the cluster still contains old versions, which can't handle
+		 * Raft messages. Raft's network footprint should be 0 as seen
+		 * by such instances.
 		 */
 		struct raft_request req;
 		box_raft_checkpoint_remote(&req);
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 115037fc3..60f527b7f 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -800,7 +800,7 @@ relay_subscribe_f(va_list ap)
 		  &relay->relay_pipe, NULL, NULL, cbus_process);
 
 	struct relay_is_raft_enabled_msg raft_enabler;
-	if (!relay->replica->anon)
+	if (!relay->replica->anon && relay->version_id >= version_id(2, 6, 0))
 		relay_send_is_raft_enabled(relay, &raft_enabler, true);
 
 	/*
@@ -883,7 +883,7 @@ relay_subscribe_f(va_list ap)
 		cpipe_push(&relay->tx_pipe, &relay->status_msg.msg);
 	}
 
-	if (!relay->replica->anon)
+	if (!relay->replica->anon && relay->version_id >= version_id(2, 6, 0))
 		relay_send_is_raft_enabled(relay, &raft_enabler, false);
 
 	/*
diff --git a/test/replication/gh-5438-election-state.result b/test/replication/gh-5438-election-state.result
new file mode 100644
index 000000000..6985f026a
--- /dev/null
+++ b/test/replication/gh-5438-election-state.result
@@ -0,0 +1,63 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+--
+-- gh-5428 send out Raft state to subscribers, even when Raft is disabled.
+--
+-- Bump Raft term while the replica's offline.
+term = box.info.election.term
+ | ---
+ | ...
+old_election_mode = box.cfg.election_mode
+ | ---
+ | ...
+box.cfg{election_mode = 'candidate'}
+ | ---
+ | ...
+test_run:wait_cond(function() return box.info.election.term > term end)
+ | ---
+ | - true
+ | ...
+
+-- Make sure the replica receives new term on subscribe.
+box.cfg{election_mode = 'off'}
+ | ---
+ | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function()\
+    return test_run:eval('replica', 'return box.info.election.term')[1] ==\
+           box.info.election.term\
+end)
+ | ---
+ | - true
+ | ...
+
+-- Cleanup.
+box.cfg{election_mode = old_election_mode}
+ | ---
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
diff --git a/test/replication/gh-5438-election-state.test.lua b/test/replication/gh-5438-election-state.test.lua
new file mode 100644
index 000000000..60c3366c1
--- /dev/null
+++ b/test/replication/gh-5438-election-state.test.lua
@@ -0,0 +1,28 @@
+test_run = require('test_run').new()
+
+--
+-- gh-5428 send out Raft state to subscribers, even when Raft is disabled.
+--
+-- Bump Raft term while the replica's offline.
+term = box.info.election.term
+old_election_mode = box.cfg.election_mode
+box.cfg{election_mode = 'candidate'}
+test_run:wait_cond(function() return box.info.election.term > term end)
+
+-- Make sure the replica receives new term on subscribe.
+box.cfg{election_mode = 'off'}
+
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+test_run:cmd('start server replica')
+test_run:wait_cond(function()\
+    return test_run:eval('replica', 'return box.info.election.term')[1] ==\
+           box.info.election.term\
+end)
+
+-- Cleanup.
+box.cfg{election_mode = old_election_mode}
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 69f2f3511..ae146c366 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -19,6 +19,7 @@
     "gh-5213-qsync-applier-order-3.test.lua": {},
     "gh-5426-election-on-off.test.lua": {},
     "gh-5433-election-restart-recovery.test.lua": {},
+    "gh-5438-election-state.test.lua": {},
     "gh-5445-leader-inconsistency.test.lua": {},
     "gh-5506-election-on-off.test.lua": {},
     "once.test.lua": {},
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 02/16] txn_limbo: fix promote term filtering
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 01/16] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 03/16] txn_limbo: persist the latest effective promote in snapshot Serge Petrenko via Tarantool-patches
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

txn_limbo_process() used to filter out promote requests whose term was
equal to the greatest term seen. This wasn't correct for PROMOTE entries
with term 1.

Such entries appear after box.ctl.promote() is issued on an instance
with disabled elections. Every PROMOTE entry from such an instance has
term 1, but should still be applied. Fix this in the patch.

Also, when an outdated PROMOTE entry with term smaller than already
applied from some replica arrived, it wasn't filtered at all. Such a
situation shouldn't be possible, but fix it as well.

Part-of #6034
---
 src/box/txn_limbo.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index fdea287c7..6e5d6d04e 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -707,15 +707,18 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 	uint32_t origin = req->origin_id;
 	if (txn_limbo_replica_term(limbo, origin) < term) {
 		vclock_follow(&limbo->promote_term_map, origin, term);
-		if (term > limbo->promote_greatest_term) {
+		if (term > limbo->promote_greatest_term)
 			limbo->promote_greatest_term = term;
-		} else if (req->type == IPROTO_PROMOTE) {
-			/*
-			 * PROMOTE for outdated term. Ignore.
-			 */
-			return;
-		}
+	} else if (req->type == IPROTO_PROMOTE &&
+		   limbo->promote_greatest_term > 1) {
+		/* PROMOTE for outdated term. Ignore. */
+		say_info("RAFT: ignoring PROMOTE request from instance "
+			 "id %u for term %llu. Greatest term seen "
+			 "before (%llu) is bigger.", origin, (long long)term,
+			 (long long)limbo->promote_greatest_term);
+		return;
 	}
+
 	int64_t lsn = req->lsn;
 	if (req->replica_id == REPLICA_ID_NIL) {
 		/*
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 03/16] txn_limbo: persist the latest effective promote in snapshot
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 01/16] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 02/16] txn_limbo: fix promote term filtering Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 04/16] replication: encode version in JOIN request Serge Petrenko via Tarantool-patches
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

Previously PROMOTE entries, just like CONFIRM and ROLLBACK were only
stored in WALs. This is because snapshots consist solely of confirmed
transactions, so there's nothing to CONFIRM or ROLLBACK.

PROMOTE has gained additional meaning recently: it pins limbo ownership
to a specific instance, rendering everyone else read-only. So now
PROMOTE information must be stored in snapshots as well.

Save the latest limbo state (owner id and latest confirmed lsn) to the
snapshot as a PROMOTE request.

Prerequisite #6034
---
 src/box/memtx_engine.c | 32 ++++++++++++++++++++++++++++++++
 src/box/txn_limbo.c    | 10 ++++++++++
 src/box/txn_limbo.h    |  7 +++++++
 3 files changed, 49 insertions(+)

diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index c85dc6af3..0b06e5e63 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -50,6 +50,7 @@
 #include "schema.h"
 #include "gc.h"
 #include "raft.h"
+#include "txn_limbo.h"
 
 /* sync snapshot every 16MB */
 #define SNAP_SYNC_INTERVAL	(1 << 24)
@@ -225,6 +226,22 @@ memtx_engine_recover_raft(const struct xrow_header *row)
 	return 0;
 }
 
+static int
+memtx_engine_recover_synchro(const struct xrow_header *row)
+{
+	assert(row->type == IPROTO_PROMOTE);
+	struct synchro_request req;
+	if (xrow_decode_synchro(row, &req) != 0)
+		return -1;
+	/*
+	 * Origin id cannot be deduced from row.replica_id in a checkpoint,
+	 * because all its rows have a zero replica_id.
+	 */
+	req.origin_id = req.replica_id;
+	txn_limbo_process(&txn_limbo, &req);
+	return 0;
+}
+
 static int
 memtx_engine_recover_snapshot_row(struct memtx_engine *memtx,
 				  struct xrow_header *row, int *is_space_system)
@@ -233,6 +250,8 @@ memtx_engine_recover_snapshot_row(struct memtx_engine *memtx,
 	if (row->type != IPROTO_INSERT) {
 		if (row->type == IPROTO_RAFT)
 			return memtx_engine_recover_raft(row);
+		if (row->type == IPROTO_PROMOTE)
+			return memtx_engine_recover_synchro(row);
 		diag_set(ClientError, ER_UNKNOWN_REQUEST_TYPE,
 			 (uint32_t) row->type);
 		return -1;
@@ -546,6 +565,7 @@ struct checkpoint {
 	struct vclock vclock;
 	struct xdir dir;
 	struct raft_request raft;
+	struct synchro_request synchro_state;
 	/**
 	 * Do nothing, just touch the snapshot file - the
 	 * checkpoint already exists.
@@ -571,6 +591,7 @@ checkpoint_new(const char *snap_dirname, uint64_t snap_io_rate_limit)
 	xdir_create(&ckpt->dir, snap_dirname, SNAP, &INSTANCE_UUID, &opts);
 	vclock_create(&ckpt->vclock);
 	box_raft_checkpoint_local(&ckpt->raft);
+	txn_limbo_checkpoint(&txn_limbo, &ckpt->synchro_state);
 	ckpt->touch = false;
 	return ckpt;
 }
@@ -659,6 +680,15 @@ finish:
 	return rc;
 }
 
+static int
+checkpoint_write_synchro(struct xlog *l, const struct synchro_request *req)
+{
+	struct xrow_header row;
+	char body[XROW_SYNCHRO_BODY_LEN_MAX];
+	xrow_encode_synchro(&row, body, req);
+	return checkpoint_write_row(l, &row);
+}
+
 static int
 checkpoint_f(va_list ap)
 {
@@ -696,6 +726,8 @@ checkpoint_f(va_list ap)
 	}
 	if (checkpoint_write_raft(&snap, &ckpt->raft) != 0)
 		goto fail;
+	if (checkpoint_write_synchro(&snap, &ckpt->synchro_state) != 0)
+		goto fail;
 	if (xlog_flush(&snap) < 0)
 		goto fail;
 
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 6e5d6d04e..991c47698 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -306,6 +306,16 @@ complete:
 	return 0;
 }
 
+void
+txn_limbo_checkpoint(const struct txn_limbo *limbo,
+		     struct synchro_request *req)
+{
+	req->type = IPROTO_PROMOTE;
+	req->replica_id = limbo->owner_id;
+	req->lsn = limbo->confirmed_lsn;
+	req->term = limbo->promote_greatest_term;
+}
+
 static void
 txn_limbo_write_synchro(struct txn_limbo *limbo, uint16_t type, int64_t lsn,
 			uint64_t term)
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 7debbc0b9..7151843f4 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -315,6 +315,13 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo);
 int
 txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout);
 
+/**
+ * Persist limbo state to a given synchro request.
+ */
+void
+txn_limbo_checkpoint(const struct txn_limbo *limbo,
+		     struct synchro_request *req);
+
 /**
  * Write a PROMOTE request, which has the same effect as CONFIRM(@a lsn) and
  * ROLLBACK(@a lsn + 1) combined.
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 04/16] replication: encode version in JOIN request
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (2 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 03/16] txn_limbo: persist the latest effective promote in snapshot Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 05/16] replication: add META stage to JOIN Serge Petrenko via Tarantool-patches
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

The replica's version will be needed once sending limbo and election
state snapshot is implemented.

Prerequisite #6034

@TarantoolBot document
Title: New field in JOIN request

JOIN request now comes with a new field: replica's version.
The field uses IPROTO_SERVER_VERSION key.
---
 src/box/box.cc |  7 +++++--
 src/box/xrow.c |  4 +++-
 src/box/xrow.h | 25 +++++++++++++++----------
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 5dcf5b460..6d5516682 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2495,7 +2495,9 @@ box_process_register(struct ev_io *io, struct xrow_header *header)
 
 	struct tt_uuid instance_uuid = uuid_nil;
 	struct vclock replica_vclock;
-	xrow_decode_register_xc(header, &instance_uuid, &replica_vclock);
+	uint32_t replica_version_id;
+	xrow_decode_register_xc(header, &instance_uuid, &replica_vclock,
+				&replica_version_id);
 
 	if (!is_box_configured)
 		tnt_raise(ClientError, ER_LOADING);
@@ -2620,7 +2622,8 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
 
 	/* Decode JOIN request */
 	struct tt_uuid instance_uuid = uuid_nil;
-	xrow_decode_join_xc(header, &instance_uuid);
+	uint32_t replica_version_id;
+	xrow_decode_join_xc(header, &instance_uuid, &replica_version_id);
 
 	/* Check that bootstrap has been finished */
 	if (!is_box_configured)
diff --git a/src/box/xrow.c b/src/box/xrow.c
index 16cb2484c..400e0456f 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -1639,10 +1639,12 @@ xrow_encode_join(struct xrow_header *row, const struct tt_uuid *instance_uuid)
 		return -1;
 	}
 	char *data = buf;
-	data = mp_encode_map(data, 1);
+	data = mp_encode_map(data, 2);
 	data = mp_encode_uint(data, IPROTO_INSTANCE_UUID);
 	/* Greet the remote replica with our replica UUID */
 	data = xrow_encode_uuid(data, instance_uuid);
+	data = mp_encode_uint(data, IPROTO_SERVER_VERSION);
+	data = mp_encode_uint(data, tarantool_version_id());
 	assert(data <= buf + size);
 
 	row->body[0].iov_base = buf;
diff --git a/src/box/xrow.h b/src/box/xrow.h
index 9f61bfd47..6b06db5c1 100644
--- a/src/box/xrow.h
+++ b/src/box/xrow.h
@@ -471,15 +471,17 @@ xrow_encode_join(struct xrow_header *row, const struct tt_uuid *instance_uuid);
  * Decode JOIN command.
  * @param row Row to decode.
  * @param[out] instance_uuid.
+ * @param[out] version_id.
  *
  * @retval  0 Success.
  * @retval -1 Memory or format error.
  */
 static inline int
-xrow_decode_join(struct xrow_header *row, struct tt_uuid *instance_uuid)
+xrow_decode_join(struct xrow_header *row, struct tt_uuid *instance_uuid,
+		 uint32_t *version_id)
 {
-	return xrow_decode_subscribe(row, NULL, instance_uuid, NULL, NULL, NULL,
-				     NULL);
+	return xrow_decode_subscribe(row, NULL, instance_uuid, NULL, version_id,
+				     NULL, NULL);
 }
 
 /**
@@ -487,15 +489,17 @@ xrow_decode_join(struct xrow_header *row, struct tt_uuid *instance_uuid)
  * @param row Row to decode.
  * @param[out] instance_uuid Instance uuid.
  * @param[out] vclock Instance vclock.
+ * @param[out] version_id Replica version id.
+ *
  * @retval 0 Success.
  * @retval -1 Memory or format error.
  */
 static inline int
 xrow_decode_register(struct xrow_header *row, struct tt_uuid *instance_uuid,
-		     struct vclock *vclock)
+		     struct vclock *vclock, uint32_t *version_id)
 {
-	return xrow_decode_subscribe(row, NULL, instance_uuid, vclock, NULL,
-				     NULL, NULL);
+	return xrow_decode_subscribe(row, NULL, instance_uuid, vclock,
+				     version_id, NULL, NULL);
 }
 
 /**
@@ -960,18 +964,19 @@ xrow_encode_join_xc(struct xrow_header *row,
 
 /** @copydoc xrow_decode_join. */
 static inline void
-xrow_decode_join_xc(struct xrow_header *row, struct tt_uuid *instance_uuid)
+xrow_decode_join_xc(struct xrow_header *row, struct tt_uuid *instance_uuid,
+		    uint32_t *version_id)
 {
-	if (xrow_decode_join(row, instance_uuid) != 0)
+	if (xrow_decode_join(row, instance_uuid, version_id) != 0)
 		diag_raise();
 }
 
 /** @copydoc xrow_decode_register. */
 static inline void
 xrow_decode_register_xc(struct xrow_header *row, struct tt_uuid *instance_uuid,
-			struct vclock *vclock)
+			struct vclock *vclock, uint32_t *version_id)
 {
-	if (xrow_decode_register(row, instance_uuid, vclock) != 0)
+	if (xrow_decode_register(row, instance_uuid, vclock, version_id) != 0)
 		diag_raise();
 }
 
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 05/16] replication: add META stage to JOIN
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (3 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 04/16] replication: encode version in JOIN request Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 06/16] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

The new META stage is part of server's response to a join request.
It's marked by IPROTO_JOIN_META and IPROTO_JOIN_SNAPSHOT requests and goes
before the actual snapshot data.

Prerequisite #6034

@TarantoolBot document
Title: new protocol stage during JOIN

A new stage is added to the stream of JOIN rows coming from master.
The stage is marked with a bodyless row with type
IPROTO_JOIN_META = 71
Once all the rows from the stage are sent out, the JOIN continues as
before (as a stream of snapshot rows). The end of META stage is marked
with a row of type IPROTO_JOIN_SNAPSHOT = 72

The stage contains the rows that are necessary for instance
initialization (current Raft term, current state of synchronous
transaction queue), but do not belong to any system space.
---
 src/box/applier.cc         | 17 ++++++++++++++++-
 src/box/box.cc             |  5 +++--
 src/box/iproto_constants.h |  2 ++
 src/box/relay.cc           | 19 ++++++++++++++++++-
 src/box/relay.h            |  4 +++-
 5 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 07fe7f5c7..0f81b7cc4 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -447,12 +447,26 @@ applier_wait_snapshot(struct applier *applier)
 		xrow_decode_vclock_xc(&row, &replicaset.vclock);
 	}
 
+	coio_read_xrow(coio, ibuf, &row);
+	if (row.type == IPROTO_JOIN_META) {
+		/* Read additional metadata. Empty at the moment. */
+		do {
+			coio_read_xrow(coio, ibuf, &row);
+			if (iproto_type_is_error(row.type)) {
+				xrow_decode_error_xc(&row);
+			} else if (row.type != IPROTO_JOIN_SNAPSHOT) {
+				tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE,
+					  (uint32_t)row.type);
+			}
+		} while (row.type != IPROTO_JOIN_SNAPSHOT);
+		coio_read_xrow(coio, ibuf, &row);
+	}
+
 	/*
 	 * Receive initial data.
 	 */
 	uint64_t row_count = 0;
 	while (true) {
-		coio_read_xrow(coio, ibuf, &row);
 		applier->last_row_time = ev_monotonic_now(loop());
 		if (iproto_type_is_dml(row.type)) {
 			if (apply_snapshot_row(&row) != 0)
@@ -477,6 +491,7 @@ applier_wait_snapshot(struct applier *applier)
 			tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE,
 				  (uint32_t) row.type);
 		}
+		coio_read_xrow(coio, ibuf, &row);
 	}
 
 	return row_count;
diff --git a/src/box/box.cc b/src/box/box.cc
index 6d5516682..8c695686e 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2474,7 +2474,7 @@ box_process_fetch_snapshot(struct ev_io *io, struct xrow_header *header)
 
 	/* Send the snapshot data to the instance. */
 	struct vclock start_vclock;
-	relay_initial_join(io->fd, header->sync, &start_vclock);
+	relay_initial_join(io->fd, header->sync, &start_vclock, 0);
 	say_info("read-view sent.");
 
 	/* Remember master's vclock after the last request */
@@ -2672,7 +2672,8 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
 	 * Initial stream: feed replica with dirty data from engines.
 	 */
 	struct vclock start_vclock;
-	relay_initial_join(io->fd, header->sync, &start_vclock);
+	relay_initial_join(io->fd, header->sync, &start_vclock,
+			   replica_version_id);
 	say_info("initial data sent.");
 
 	/**
diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
index 137bee9da..e913801a8 100644
--- a/src/box/iproto_constants.h
+++ b/src/box/iproto_constants.h
@@ -261,6 +261,8 @@ enum iproto_type {
 	IPROTO_FETCH_SNAPSHOT = 69,
 	/** REGISTER request to leave anonymous replication. */
 	IPROTO_REGISTER = 70,
+	IPROTO_JOIN_META = 71,
+	IPROTO_JOIN_SNAPSHOT = 72,
 
 	/** Vinyl run info stored in .index file */
 	VY_INDEX_RUN_INFO = 100,
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 60f527b7f..4ebe0fb06 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -392,7 +392,8 @@ relay_set_cord_name(int fd)
 }
 
 void
-relay_initial_join(int fd, uint64_t sync, struct vclock *vclock)
+relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
+		   uint32_t replica_version_id)
 {
 	struct relay *relay = relay_new(NULL);
 	if (relay == NULL)
@@ -432,6 +433,22 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock)
 	row.sync = sync;
 	coio_write_xrow(&relay->io, &row);
 
+	/*
+	 * Version is present starting with 2.7.3, 2.8.2, 2.9.1
+	 * All these versions know of additional META stage of initial join.
+	 */
+	if (replica_version_id > 0) {
+		/* Mark the beginning of the metadata stream. */
+		row.type = IPROTO_JOIN_META;
+		coio_write_xrow(&relay->io, &row);
+
+		/* Empty at the moment. */
+
+		/* Mark the end of the metadata stream. */
+		row.type = IPROTO_JOIN_SNAPSHOT;
+		coio_write_xrow(&relay->io, &row);
+	}
+
 	/* Send read view to the replica. */
 	engine_join_xc(&ctx, &relay->stream);
 }
diff --git a/src/box/relay.h b/src/box/relay.h
index 615ffb75d..112428ae8 100644
--- a/src/box/relay.h
+++ b/src/box/relay.h
@@ -116,9 +116,11 @@ relay_push_raft(struct relay *relay, const struct raft_request *req);
  * @param fd        client connection
  * @param sync      sync from incoming JOIN request
  * @param vclock[out] vclock of the read view sent to the replica
+ * @param replica_version_id peer's version
  */
 void
-relay_initial_join(int fd, uint64_t sync, struct vclock *vclock);
+relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
+		   uint32_t replica_version_id);
 
 /**
  * Send final JOIN rows to the replica.
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 06/16] replication: send latest effective promote in initial join
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (4 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 05/16] replication: add META stage to JOIN Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-21 23:24   ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 07/16] replication: send current Raft term in join response Serge Petrenko via Tarantool-patches
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

A joining instance may never receive the latest PROMOTE request, which
is the only source of information about the limbo owner. Send out the
latest limbo state (e.g. the latest applied PROMOTE request) together
with the initial join snapshot.

Prerequisite #6034
---
 src/box/applier.cc                       |  5 ++
 src/box/relay.cc                         |  9 ++-
 test/replication/replica_rejoin.result   | 77 ++++++++++++++----------
 test/replication/replica_rejoin.test.lua | 50 +++++++--------
 4 files changed, 85 insertions(+), 56 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 0f81b7cc4..4088fcc21 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -454,6 +454,11 @@ applier_wait_snapshot(struct applier *applier)
 			coio_read_xrow(coio, ibuf, &row);
 			if (iproto_type_is_error(row.type)) {
 				xrow_decode_error_xc(&row);
+			} else if (iproto_type_is_promote_request(row.type)) {
+				struct synchro_request req;
+				if (xrow_decode_synchro(&row, &req) != 0)
+					diag_raise();
+				txn_limbo_process(&txn_limbo, &req);
 			} else if (row.type != IPROTO_JOIN_SNAPSHOT) {
 				tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE,
 					  (uint32_t)row.type);
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 4ebe0fb06..4b102a777 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -427,6 +427,9 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
 	if (txn_limbo_wait_confirm(&txn_limbo) != 0)
 		diag_raise();
 
+	struct synchro_request req;
+	txn_limbo_checkpoint(&txn_limbo, &req);
+
 	/* Respond to the JOIN request with the current vclock. */
 	struct xrow_header row;
 	xrow_encode_vclock_xc(&row, vclock);
@@ -442,7 +445,11 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
 		row.type = IPROTO_JOIN_META;
 		coio_write_xrow(&relay->io, &row);
 
-		/* Empty at the moment. */
+		char body[XROW_SYNCHRO_BODY_LEN_MAX];
+		xrow_encode_synchro(&row, body, &req);
+		row.replica_id = req.replica_id;
+		row.sync = sync;
+		coio_write_xrow(&relay->io, &row);
 
 		/* Mark the end of the metadata stream. */
 		row.type = IPROTO_JOIN_SNAPSHOT;
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index 843333a19..e489c150a 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -7,10 +7,19 @@ test_run = env.new()
 log = require('log')
 ---
 ...
-engine = test_run:get_cfg('engine')
+test_run:cmd("create server master with script='replication/master1.lua'")
 ---
+- true
 ...
-test_run:cleanup_cluster()
+test_run:cmd("start server master")
+---
+- true
+...
+test_run:switch("master")
+---
+- true
+...
+engine = test_run:get_cfg('engine')
 ---
 ...
 --
@@ -43,7 +52,7 @@ _ = box.space.test:insert{3}
 ---
 ...
 -- Join a replica, then stop it.
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica_rejoin.lua'")
+test_run:cmd("create server replica with rpl_master=master, script='replication/replica_rejoin.lua'")
 ---
 - true
 ...
@@ -65,7 +74,7 @@ box.space.test:select()
   - [2]
   - [3]
 ...
-test_run:cmd("switch default")
+test_run:cmd("switch master")
 ---
 - true
 ...
@@ -75,7 +84,7 @@ test_run:cmd("stop server replica")
 ...
 -- Restart the server to purge the replica from
 -- the garbage collection state.
-test_run:cmd("restart server default")
+test_run:cmd("restart server master")
 box.cfg{wal_cleanup_delay = 0}
 ---
 ...
@@ -146,7 +155,7 @@ box.space.test:select()
   - [20]
   - [30]
 ...
-test_run:cmd("switch default")
+test_run:cmd("switch master")
 ---
 - true
 ...
@@ -154,7 +163,7 @@ test_run:cmd("switch default")
 for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end
 ---
 ...
-vclock = test_run:get_vclock('default')
+vclock = test_run:get_vclock('master')
 ---
 ...
 vclock[0] = nil
@@ -191,7 +200,7 @@ box.space.test:replace{1, 2, 3} -- bumps LSN on the replica
 ---
 - [1, 2, 3]
 ...
-test_run:cmd("switch default")
+test_run:cmd("switch master")
 ---
 - true
 ...
@@ -199,7 +208,7 @@ test_run:cmd("stop server replica")
 ---
 - true
 ...
-test_run:cmd("restart server default")
+test_run:cmd("restart server master")
 box.cfg{wal_cleanup_delay = 0}
 ---
 ...
@@ -253,7 +262,7 @@ box.space.test:select()
 -- from the replica.
 --
 -- Bootstrap a new replica.
-test_run:cmd("switch default")
+test_run:cmd("switch master")
 ---
 - true
 ...
@@ -295,7 +304,7 @@ box.cfg{replication = ''}
 ---
 ...
 -- Bump vclock on the master.
-test_run:cmd("switch default")
+test_run:cmd("switch master")
 ---
 - true
 ...
@@ -317,15 +326,15 @@ vclock = test_run:get_vclock('replica')
 vclock[0] = nil
 ---
 ...
-_ = test_run:wait_vclock('default', vclock)
+_ = test_run:wait_vclock('master', vclock)
 ---
 ...
 -- Restart the master and force garbage collection.
-test_run:cmd("switch default")
+test_run:cmd("switch master")
 ---
 - true
 ...
-test_run:cmd("restart server default")
+test_run:cmd("restart server master")
 box.cfg{wal_cleanup_delay = 0}
 ---
 ...
@@ -373,7 +382,7 @@ vclock = test_run:get_vclock('replica')
 vclock[0] = nil
 ---
 ...
-_ = test_run:wait_vclock('default', vclock)
+_ = test_run:wait_vclock('master', vclock)
 ---
 ...
 -- Restart the replica. It should successfully rebootstrap.
@@ -396,38 +405,42 @@ test_run:cmd("switch default")
 ---
 - true
 ...
-box.cfg{replication = ''}
+test_run:cmd("stop server replica")
 ---
+- true
 ...
-test_run:cmd("stop server replica")
+test_run:cmd("delete server replica")
 ---
 - true
 ...
-test_run:cmd("cleanup server replica")
+test_run:cmd("stop server master")
 ---
 - true
 ...
-test_run:cmd("delete server replica")
+test_run:cmd("delete server master")
 ---
 - true
 ...
-test_run:cleanup_cluster()
+--
+-- gh-4107: rebootstrap fails if the replica was deleted from
+-- the cluster on the master.
+--
+test_run:cmd("create server master with script='replication/master1.lua'")
 ---
+- true
 ...
-box.space.test:drop()
+test_run:cmd("start server master")
 ---
+- true
 ...
-box.schema.user.revoke('guest', 'replication')
+test_run:switch("master")
 ---
+- true
 ...
---
--- gh-4107: rebootstrap fails if the replica was deleted from
--- the cluster on the master.
---
 box.schema.user.grant('guest', 'replication')
 ---
 ...
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica_uuid.lua'")
+test_run:cmd("create server replica with rpl_master=master, script='replication/replica_uuid.lua'")
 ---
 - true
 ...
@@ -462,11 +475,11 @@ box.space._cluster:get(2) ~= nil
 ---
 - true
 ...
-test_run:cmd("stop server replica")
+test_run:switch("default")
 ---
 - true
 ...
-test_run:cmd("cleanup server replica")
+test_run:cmd("stop server replica")
 ---
 - true
 ...
@@ -474,9 +487,11 @@ test_run:cmd("delete server replica")
 ---
 - true
 ...
-box.schema.user.revoke('guest', 'replication')
+test_run:cmd("stop server master")
 ---
+- true
 ...
-test_run:cleanup_cluster()
+test_run:cmd("delete server master")
 ---
+- true
 ...
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index c3ba9bf3f..2563177cf 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -1,9 +1,11 @@
 env = require('test_run')
 test_run = env.new()
 log = require('log')
-engine = test_run:get_cfg('engine')
 
-test_run:cleanup_cluster()
+test_run:cmd("create server master with script='replication/master1.lua'")
+test_run:cmd("start server master")
+test_run:switch("master")
+engine = test_run:get_cfg('engine')
 
 --
 -- gh-5806: this replica_rejoin test relies on the wal cleanup fiber
@@ -23,17 +25,17 @@ _ = box.space.test:insert{2}
 _ = box.space.test:insert{3}
 
 -- Join a replica, then stop it.
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica_rejoin.lua'")
+test_run:cmd("create server replica with rpl_master=master, script='replication/replica_rejoin.lua'")
 test_run:cmd("start server replica")
 test_run:cmd("switch replica")
 box.info.replication[1].upstream.status == 'follow' or log.error(box.info)
 box.space.test:select()
-test_run:cmd("switch default")
+test_run:cmd("switch master")
 test_run:cmd("stop server replica")
 
 -- Restart the server to purge the replica from
 -- the garbage collection state.
-test_run:cmd("restart server default")
+test_run:cmd("restart server master")
 box.cfg{wal_cleanup_delay = 0}
 
 -- Make some checkpoints to remove old xlogs.
@@ -58,11 +60,11 @@ box.info.replication[2].downstream.vclock ~= nil or log.error(box.info)
 test_run:cmd("switch replica")
 box.info.replication[1].upstream.status == 'follow' or log.error(box.info)
 box.space.test:select()
-test_run:cmd("switch default")
+test_run:cmd("switch master")
 
 -- Make sure the replica follows new changes.
 for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end
-vclock = test_run:get_vclock('default')
+vclock = test_run:get_vclock('master')
 vclock[0] = nil
 _ = test_run:wait_vclock('replica', vclock)
 test_run:cmd("switch replica")
@@ -76,9 +78,9 @@ box.space.test:select()
 -- Check that rebootstrap is NOT initiated unless the replica
 -- is strictly behind the master.
 box.space.test:replace{1, 2, 3} -- bumps LSN on the replica
-test_run:cmd("switch default")
+test_run:cmd("switch master")
 test_run:cmd("stop server replica")
-test_run:cmd("restart server default")
+test_run:cmd("restart server master")
 box.cfg{wal_cleanup_delay = 0}
 checkpoint_count = box.cfg.checkpoint_count
 box.cfg{checkpoint_count = 1}
@@ -99,7 +101,7 @@ box.space.test:select()
 --
 
 -- Bootstrap a new replica.
-test_run:cmd("switch default")
+test_run:cmd("switch master")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
 test_run:cleanup_cluster()
@@ -113,17 +115,17 @@ box.cfg{replication = replica_listen}
 test_run:cmd("switch replica")
 box.cfg{replication = ''}
 -- Bump vclock on the master.
-test_run:cmd("switch default")
+test_run:cmd("switch master")
 box.space.test:replace{1}
 -- Bump vclock on the replica.
 test_run:cmd("switch replica")
 for i = 1, 10 do box.space.test:replace{2} end
 vclock = test_run:get_vclock('replica')
 vclock[0] = nil
-_ = test_run:wait_vclock('default', vclock)
+_ = test_run:wait_vclock('master', vclock)
 -- Restart the master and force garbage collection.
-test_run:cmd("switch default")
-test_run:cmd("restart server default")
+test_run:cmd("switch master")
+test_run:cmd("restart server master")
 box.cfg{wal_cleanup_delay = 0}
 replica_listen = test_run:cmd("eval replica 'return box.cfg.listen'")
 replica_listen ~= nil
@@ -139,7 +141,7 @@ test_run:cmd("switch replica")
 for i = 1, 10 do box.space.test:replace{2} end
 vclock = test_run:get_vclock('replica')
 vclock[0] = nil
-_ = test_run:wait_vclock('default', vclock)
+_ = test_run:wait_vclock('master', vclock)
 -- Restart the replica. It should successfully rebootstrap.
 test_run:cmd("restart server replica with args='true'")
 box.space.test:select()
@@ -148,20 +150,20 @@ box.space.test:replace{2}
 
 -- Cleanup.
 test_run:cmd("switch default")
-box.cfg{replication = ''}
 test_run:cmd("stop server replica")
-test_run:cmd("cleanup server replica")
 test_run:cmd("delete server replica")
-test_run:cleanup_cluster()
-box.space.test:drop()
-box.schema.user.revoke('guest', 'replication')
+test_run:cmd("stop server master")
+test_run:cmd("delete server master")
 
 --
 -- gh-4107: rebootstrap fails if the replica was deleted from
 -- the cluster on the master.
 --
+test_run:cmd("create server master with script='replication/master1.lua'")
+test_run:cmd("start server master")
+test_run:switch("master")
 box.schema.user.grant('guest', 'replication')
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica_uuid.lua'")
+test_run:cmd("create server replica with rpl_master=master, script='replication/replica_uuid.lua'")
 start_cmd = string.format("start server replica with args='%s'", require('uuid').new())
 box.space._cluster:get(2) == nil
 test_run:cmd(start_cmd)
@@ -170,8 +172,8 @@ test_run:cmd("cleanup server replica")
 box.space._cluster:delete(2) ~= nil
 test_run:cmd(start_cmd)
 box.space._cluster:get(2) ~= nil
+test_run:switch("default")
 test_run:cmd("stop server replica")
-test_run:cmd("cleanup server replica")
 test_run:cmd("delete server replica")
-box.schema.user.revoke('guest', 'replication')
-test_run:cleanup_cluster()
+test_run:cmd("stop server master")
+test_run:cmd("delete server master")
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 07/16] replication: send current Raft term in join response
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (5 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 06/16] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-21 23:24   ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 08/16] raft: refactor raft_new_term() Serge Petrenko via Tarantool-patches
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

Make Raft nodes send out their latest persisted term to joining
replicas.

This is needed to avoid the situation when txn_limbo-managed 'promote
greatest term' is greater than current Raft term. Otherwise the
following may happen: replica joins off some instance and receives its
latest limbo state. The state includes "greatest term seen" and makes
limbo filter out any data coming from instances with smaller terms.
Imagine that master this replica has joined from dies before replica has
a chance to subscribe to it. Then it doesn't receive its current Raft
term and start elections at smallest term possible, 2 (when there are no
suitable Raft nodes besides the replica).

Once the elections in a small term number are won, a ton of problems
arises: starting with filtering out PROMOTE requests for "old" term and
nop-ifying any data coming from terms smaller than "greatest term seen".

Prerequisite #6034
---
 src/box/applier.cc | 5 +++++
 src/box/relay.cc   | 6 ++++++
 2 files changed, 11 insertions(+)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 4088fcc21..92ec088ea 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -459,6 +459,11 @@ applier_wait_snapshot(struct applier *applier)
 				if (xrow_decode_synchro(&row, &req) != 0)
 					diag_raise();
 				txn_limbo_process(&txn_limbo, &req);
+			} else if (iproto_type_is_raft_request(row.type)) {
+				struct raft_request req;
+				if (xrow_decode_raft(&row, &req, NULL) != 0)
+					diag_raise();
+				box_raft_recover(&req);
 			} else if (row.type != IPROTO_JOIN_SNAPSHOT) {
 				tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE,
 					  (uint32_t)row.type);
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 4b102a777..70f1a045b 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -428,7 +428,9 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
 		diag_raise();
 
 	struct synchro_request req;
+	struct raft_request raft_req;
 	txn_limbo_checkpoint(&txn_limbo, &req);
+	box_raft_checkpoint_local(&raft_req);
 
 	/* Respond to the JOIN request with the current vclock. */
 	struct xrow_header row;
@@ -451,6 +453,10 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
 		row.sync = sync;
 		coio_write_xrow(&relay->io, &row);
 
+		xrow_encode_raft(&row, &fiber()->gc, &raft_req);
+		row.sync = sync;
+		coio_write_xrow(&relay->io, &row);
+
 		/* Mark the end of the metadata stream. */
 		row.type = IPROTO_JOIN_SNAPSHOT;
 		coio_write_xrow(&relay->io, &row);
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 08/16] raft: refactor raft_new_term()
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (6 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 07/16] replication: send current Raft term in join response Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 09/16] box: split promote() into reasonable parts Serge Petrenko via Tarantool-patches
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

Make raft_new_term() always bump the current term, even when Raft is
disabled.

Part-of #6034
---
 src/box/box.cc        |  2 +-
 src/lib/raft/raft.c   |  3 +--
 test/unit/raft.c      | 15 ++++++++++++++-
 test/unit/raft.result |  3 ++-
 4 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 8c695686e..86370514a 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -3513,7 +3513,7 @@ box_cfg_xc(void)
 
 	if (!is_bootstrap_leader) {
 		replicaset_sync();
-	} else {
+	} else if (raft_is_enabled(box_raft())) {
 		/*
 		 * When the cluster is just bootstrapped and this instance is a
 		 * leader, it makes no sense to wait for a leader appearance.
diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
index ef11ef89f..22f38b330 100644
--- a/src/lib/raft/raft.c
+++ b/src/lib/raft/raft.c
@@ -988,8 +988,7 @@ raft_cfg_vclock(struct raft *raft, const struct vclock *vclock)
 void
 raft_new_term(struct raft *raft)
 {
-	if (raft->is_enabled)
-		raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
+	raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
 }
 
 static void
diff --git a/test/unit/raft.c b/test/unit/raft.c
index b9bc49b78..9637abf43 100644
--- a/test/unit/raft.c
+++ b/test/unit/raft.c
@@ -1184,7 +1184,7 @@ raft_test_death_timeout(void)
 static void
 raft_test_enable_disable(void)
 {
-	raft_start_test(10);
+	raft_start_test(11);
 	struct raft_node node;
 	raft_node_create(&node);
 
@@ -1276,7 +1276,20 @@ raft_test_enable_disable(void)
 		"{0: 2}" /* Vclock. */
 	), "nothing changed");
 
+	/* Disabled node still bumps the term when needed. */
+	raft_node_new_term(&node);
+
+	ok(raft_node_check_full_state(&node,
+		RAFT_STATE_FOLLOWER /* State. */,
+		0 /* Leader. */,
+		4 /* Term. */,
+		0 /* Vote. */,
+		4 /* Volatile term. */,
+		0 /* Volatile vote. */,
+		"{0: 3}" /* Vclock. */
+	), "term bump when disabled");
 	raft_node_destroy(&node);
+
 	raft_finish_test();
 }
 
diff --git a/test/unit/raft.result b/test/unit/raft.result
index f89cd1658..41463db69 100644
--- a/test/unit/raft.result
+++ b/test/unit/raft.result
@@ -205,7 +205,7 @@ ok 10 - subtests
 ok 11 - subtests
 	*** raft_test_death_timeout: done ***
 	*** raft_test_enable_disable ***
-    1..10
+    1..11
     ok 1 - accepted a leader notification
     ok 2 - leader is seen
     ok 3 - death timeout passed
@@ -216,6 +216,7 @@ ok 11 - subtests
     ok 8 - became leader
     ok 9 - resigned from leader state
     ok 10 - nothing changed
+    ok 11 - term bump when disabled
 ok 12 - subtests
 	*** raft_test_enable_disable: done ***
 	*** raft_test_too_long_wal_write ***
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 09/16] box: split promote() into reasonable parts
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (7 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 08/16] raft: refactor raft_new_term() Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-21 23:26   ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 10/16] box: make promote always bump the term Serge Petrenko via Tarantool-patches
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

box_promote() is a monster. It does a lot of different things based on
flags: try_wait and run_elections. The flags themselves depend on the
node's Raft state and the lunar calendar.

Moreover, there are multiple cancellation points and places where
external state may have changed and needs a re-check.

Things are going to get even worse with the introduction of box.ctl.demote().

So it's time to split up box_promote() into reasonable parts, each doing
exactly one thing.

This commit mostly addresses the multiple cancellation points issue,
so that promote() doesn't look like a huge pile of if(something_changed)
blocks. Some other functions will look like that instead.

Part of #6034
---
 src/box/box.cc | 269 ++++++++++++++++++++++++++++---------------------
 1 file changed, 155 insertions(+), 114 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 86370514a..445875f8f 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1527,6 +1527,147 @@ box_wait_quorum(uint32_t lead_id, int64_t target_lsn, int quorum,
 	return 0;
 }
 
+/**
+ * A helper to start new Raft election round and wait until the election results
+ * are known.
+ * Returns 0 in case this instance has won the elections, -1 otherwise.
+ */
+static int
+box_run_elections(void)
+{
+	assert(box_raft()->is_enabled);
+	assert(box_election_mode != ELECTION_MODE_VOTER);
+	/*
+	 * Make this instance a candidate and run until some leader, not
+	 * necessarily this instance, emerges.
+	 */
+	raft_start_candidate(box_raft());
+	/*
+	 * Trigger new elections without waiting for an old leader to
+	 * disappear.
+	 */
+	raft_new_term(box_raft());
+	int rc = box_raft_wait_leader_found();
+
+	if (box_election_mode == ELECTION_MODE_MANUAL)
+		raft_stop_candidate(box_raft(), false);
+	if (rc != 0)
+		return -1;
+	if (!box_raft()->is_enabled) {
+		diag_set(ClientError, ER_RAFT_DISABLED);
+		return -1;
+	}
+	if (box_raft()->state != RAFT_STATE_LEADER) {
+		diag_set(ClientError, ER_INTERFERING_PROMOTE,
+			 box_raft()->leader);
+		return -1;
+	}
+
+	return 0;
+}
+
+/**
+ * Check whether the greatest promote term has changed since it was last read.
+ * IOW check that a foreign PROMOTE arrived while we were sleeping.
+ */
+static int
+box_check_promote_term_changed(uint64_t promote_term)
+{
+	if (txn_limbo.promote_greatest_term != promote_term) {
+		diag_set(ClientError, ER_INTERFERING_PROMOTE,
+			 txn_limbo.owner_id);
+		return -1;
+	}
+	return 0;
+}
+
+/** Try waiting until limbo is emptied up to given timeout. */
+static int
+box_try_wait_confirm(double timeout)
+{
+	uint64_t promote_term = txn_limbo.promote_greatest_term;
+	txn_limbo_wait_empty(&txn_limbo, timeout);
+	return box_check_promote_term_changed(promote_term);
+}
+
+/**
+ * A helper to wait until all limbo entries are ready to be confirmed, i.e.
+ * written to WAL and have gathered a quorum of ACKs from replicas.
+ * Return lsn of the last limbo entry on success, -1 on error.
+ */
+static int64_t
+box_wait_limbo_acked(void)
+{
+	if (txn_limbo_is_empty(&txn_limbo))
+		return txn_limbo.confirmed_lsn;
+
+	uint64_t promote_term = txn_limbo.promote_greatest_term;
+	int quorum = replication_synchro_quorum;
+	struct txn_limbo_entry *last_entry;
+	last_entry = txn_limbo_last_synchro_entry(&txn_limbo);
+	/* Wait for the last entries WAL write. */
+	if (last_entry->lsn < 0) {
+		int64_t tid = last_entry->txn->id;
+
+		if (wal_sync(NULL) < 0)
+			return -1;
+
+		if (box_check_promote_term_changed(promote_term) < 0)
+			return -1;
+		if (txn_limbo_is_empty(&txn_limbo))
+			return txn_limbo.confirmed_lsn;
+		if (tid != txn_limbo_last_synchro_entry(&txn_limbo)->txn->id) {
+			diag_set(ClientError, ER_QUORUM_WAIT, quorum,
+				 "new synchronous transactions appeared");
+			return -1;
+		}
+	}
+	assert(last_entry->lsn > 0);
+	int64_t wait_lsn = last_entry->lsn;
+
+	if (box_wait_quorum(txn_limbo.owner_id, wait_lsn, quorum,
+			    replication_synchro_timeout) < 0)
+		return -1;
+
+	if (box_check_promote_term_changed(promote_term) < 0)
+		return -1;
+
+	if (txn_limbo_is_empty(&txn_limbo))
+		return txn_limbo.confirmed_lsn;
+
+	if (quorum < replication_synchro_quorum) {
+		diag_set(ClientError, ER_QUORUM_WAIT, quorum,
+			 "quorum was increased while waiting");
+		return -1;
+	}
+	if (wait_lsn < txn_limbo_last_synchro_entry(&txn_limbo)->lsn) {
+		diag_set(ClientError, ER_QUORUM_WAIT, quorum,
+			 "new synchronous transactions appeared");
+		return -1;
+	}
+
+	return wait_lsn;
+}
+
+/** Write and process a PROMOTE request. */
+static void
+box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
+{
+	assert(box_raft()->volatile_term == box_raft()->term);
+	assert(promote_lsn >= 0);
+	txn_limbo_write_promote(&txn_limbo, promote_lsn,
+				box_raft()->term);
+	struct synchro_request req = {
+		.type = IPROTO_PROMOTE,
+		.replica_id = prev_leader_id,
+		.origin_id = instance_id,
+		.lsn = promote_lsn,
+		.term = box_raft()->term,
+	};
+	txn_limbo_process(&txn_limbo, &req);
+	assert(txn_limbo_is_empty(&txn_limbo));
+}
+
 int
 box_promote(void)
 {
@@ -1537,6 +1678,10 @@ box_promote(void)
 			 "simultaneous invocations");
 		return -1;
 	}
+	in_promote = true;
+	auto promote_guard = make_scoped_guard([&] {
+		in_promote = false;
+	});
 
 	/*
 	 * Do nothing when box isn't configured and when PROMOTE was already
@@ -1582,122 +1727,18 @@ box_promote(void)
 		unreachable();
 	}
 
-	uint32_t former_leader_id = txn_limbo.owner_id;
-	int64_t wait_lsn = txn_limbo.confirmed_lsn;
-	int rc = 0;
-	int quorum = replication_synchro_quorum;
-	in_promote = true;
-	auto promote_guard = make_scoped_guard([&] {
-		in_promote = false;
-	});
-
-	if (run_elections) {
-		/*
-		 * Make this instance a candidate and run until some leader, not
-		 * necessarily this instance, emerges.
-		 */
-		raft_start_candidate(box_raft());
-		/*
-		 * Trigger new elections without waiting for an old leader to
-		 * disappear.
-		 */
-		raft_new_term(box_raft());
-		rc = box_raft_wait_leader_found();
-		/*
-		 * Do not reset raft mode if it was changed while running the
-		 * elections.
-		 */
-		if (box_election_mode == ELECTION_MODE_MANUAL)
-			raft_stop_candidate(box_raft(), false);
-		if (rc != 0)
-			return -1;
-		if (!box_raft()->is_enabled) {
-			diag_set(ClientError, ER_RAFT_DISABLED);
-			return -1;
-		}
-		if (box_raft()->state != RAFT_STATE_LEADER) {
-			diag_set(ClientError, ER_INTERFERING_PROMOTE,
-				 box_raft()->leader);
-			return -1;
-		}
-	}
-
-	if (txn_limbo_is_empty(&txn_limbo))
-		goto promote;
+	int64_t wait_lsn = -1;
 
-	if (try_wait) {
-		/* Wait until pending confirmations/rollbacks reach us. */
-		double timeout = 2 * replication_synchro_timeout;
-		txn_limbo_wait_empty(&txn_limbo, timeout);
-		/*
-		 * Our mission was to clear the limbo from former leader's
-		 * transactions. Exit in case someone did that for us.
-		 */
-		if (former_leader_id != txn_limbo.owner_id) {
-			diag_set(ClientError, ER_INTERFERING_PROMOTE,
-				 txn_limbo.owner_id);
-			return -1;
-		}
-		if (txn_limbo_is_empty(&txn_limbo)) {
-			wait_lsn = txn_limbo.confirmed_lsn;
-			goto promote;
-		}
-	}
-
-	struct txn_limbo_entry *last_entry;
-	last_entry = txn_limbo_last_synchro_entry(&txn_limbo);
-	/* Wait for the last entries WAL write. */
-	if (last_entry->lsn < 0) {
-		int64_t tid = last_entry->txn->id;
-		if (wal_sync(NULL) < 0)
-			return -1;
-		if (former_leader_id != txn_limbo.owner_id) {
-			diag_set(ClientError, ER_INTERFERING_PROMOTE,
-				 txn_limbo.owner_id);
-			return -1;
-		}
-		if (txn_limbo_is_empty(&txn_limbo)) {
-			wait_lsn = txn_limbo.confirmed_lsn;
-			goto promote;
-		}
-		if (tid != txn_limbo_last_synchro_entry(&txn_limbo)->txn->id) {
-			diag_set(ClientError, ER_QUORUM_WAIT, quorum,
-				 "new synchronous transactions appeared");
-			return -1;
-		}
-	}
-	wait_lsn = last_entry->lsn;
-	assert(wait_lsn > 0);
+	if (run_elections && box_run_elections() < 0)
+		return -1;
+	if (try_wait &&
+	    box_try_wait_confirm(2 * replication_synchro_timeout) < 0)
+		return -1;
+	if ((wait_lsn = box_wait_limbo_acked()) < 0)
+		return -1;
 
-	rc = box_wait_quorum(former_leader_id, wait_lsn, quorum,
-			     replication_synchro_timeout);
-	if (rc == 0) {
-		if (quorum < replication_synchro_quorum) {
-			diag_set(ClientError, ER_QUORUM_WAIT, quorum,
-				 "quorum was increased while waiting");
-			rc = -1;
-		} else if (wait_lsn < txn_limbo_last_synchro_entry(&txn_limbo)->lsn) {
-			diag_set(ClientError, ER_QUORUM_WAIT, quorum,
-				 "new synchronous transactions appeared");
-			rc = -1;
-		} else {
-promote:
-			/* We cannot possibly get here in a volatile state. */
-			assert(box_raft()->volatile_term == box_raft()->term);
-			txn_limbo_write_promote(&txn_limbo, wait_lsn,
-						box_raft()->term);
-			struct synchro_request req = {
-				.type = IPROTO_PROMOTE,
-				.replica_id = former_leader_id,
-				.origin_id = instance_id,
-				.lsn = wait_lsn,
-				.term = box_raft()->term,
-			};
-			txn_limbo_process(&txn_limbo, &req);
-			assert(txn_limbo_is_empty(&txn_limbo));
-		}
-	}
-	return rc;
+	box_issue_promote(txn_limbo.owner_id, wait_lsn);
+	return 0;
 }
 
 int
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 10/16] box: make promote always bump the term
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (8 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 09/16] box: split promote() into reasonable parts Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 11/16] box: make promote on the current leader a no-op Serge Petrenko via Tarantool-patches
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

When called without elections, promote resulted in multiple
PROMOTE entries for the same term. This is not correct, because all
the promotions for the same term except the first one would be ignored
as already seen.

Part-of #6034
---
 src/box/box.cc                                | 20 ++++++++--
 src/box/raft.c                                | 40 +++++++++++++++++++
 src/box/raft.h                                |  4 ++
 .../gh-4114-local-space-replication.result    |  7 ++--
 .../gh-4114-local-space-replication.test.lua  |  4 +-
 .../gh-6034-election-promote-bump-term.result | 21 ++++++++++
 ...h-6034-election-promote-bump-term.test.lua |  9 +++++
 test/replication/suite.cfg                    |  1 +
 8 files changed, 97 insertions(+), 9 deletions(-)
 create mode 100644 test/replication/gh-6034-election-promote-bump-term.result
 create mode 100644 test/replication/gh-6034-election-promote-bump-term.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index 445875f8f..ac6c487e4 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1581,6 +1581,17 @@ box_check_promote_term_changed(uint64_t promote_term)
 	return 0;
 }
 
+/** Trigger a new election round but don't wait for its result. */
+static int
+box_trigger_elections(void)
+{
+	uint64_t promote_term = txn_limbo.promote_greatest_term;
+	raft_new_term(box_raft());
+	if (box_raft_wait_term_persisted() < 0)
+		return -1;
+	return box_check_promote_term_changed(promote_term);
+}
+
 /** Try waiting until limbo is emptied up to given timeout. */
 static int
 box_try_wait_confirm(double timeout)
@@ -1731,9 +1742,12 @@ box_promote(void)
 
 	if (run_elections && box_run_elections() < 0)
 		return -1;
-	if (try_wait &&
-	    box_try_wait_confirm(2 * replication_synchro_timeout) < 0)
-		return -1;
+	if (try_wait) {
+		if (box_try_wait_confirm(2 * replication_synchro_timeout) < 0)
+			return -1;
+		if (box_trigger_elections() < 0)
+			return -1;
+	}
 	if ((wait_lsn = box_wait_limbo_acked()) < 0)
 		return -1;
 
diff --git a/src/box/raft.c b/src/box/raft.c
index 7f787c0c5..b04932cd9 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -354,6 +354,46 @@ box_raft_wait_leader_found(void)
 	return 0;
 }
 
+struct raft_wait_persisted_data {
+	struct fiber *waiter;
+	uint64_t term;
+};
+
+static int
+box_raft_wait_term_persisted_f(struct trigger *trig, void *event)
+{
+	struct raft *raft = event;
+	struct raft_wait_persisted_data *data = trig->data;
+	if (raft->term >= data->term)
+		fiber_wakeup(data->waiter);
+	return 0;
+}
+
+int
+box_raft_wait_term_persisted(void)
+{
+	if (box_raft()->term == box_raft()->volatile_term)
+		return 0;
+	struct raft_wait_persisted_data data = {
+		.waiter = fiber(),
+		.term = box_raft()->volatile_term,
+	};
+	struct trigger trig;
+	trigger_create(&trig, box_raft_wait_term_persisted_f, &data, NULL);
+	raft_on_update(box_raft(), &trig);
+
+	do {
+		fiber_yield();
+	} while (box_raft()->term < data.term && !fiber_is_cancelled());
+
+	trigger_clear(&trig);
+	if (fiber_is_cancelled()) {
+		diag_set(FiberIsCancelled);
+		return -1;
+	}
+	return 0;
+}
+
 void
 box_raft_init(void)
 {
diff --git a/src/box/raft.h b/src/box/raft.h
index 6b6136510..4c9c7306d 100644
--- a/src/box/raft.h
+++ b/src/box/raft.h
@@ -101,6 +101,10 @@ box_raft_process(struct raft_request *req, uint32_t source);
 int
 box_raft_wait_leader_found();
 
+/** Block this fiber until the current volatile term is persisted. */
+int
+box_raft_wait_term_persisted(void);
+
 void
 box_raft_init(void);
 
diff --git a/test/replication/gh-4114-local-space-replication.result b/test/replication/gh-4114-local-space-replication.result
index 9b63a4b99..e71eb60a8 100644
--- a/test/replication/gh-4114-local-space-replication.result
+++ b/test/replication/gh-4114-local-space-replication.result
@@ -45,9 +45,8 @@ test_run:cmd('switch replica')
  | ---
  | - true
  | ...
-box.info.vclock[0]
+a = box.info.vclock[0] or 0
  | ---
- | - null
  | ...
 box.cfg{checkpoint_count=1}
  | ---
@@ -77,9 +76,9 @@ box.space.test:insert{3}
  | - [3]
  | ...
 
-box.info.vclock[0]
+assert(box.info.vclock[0] == a + 3)
  | ---
- | - 3
+ | - true
  | ...
 
 test_run:cmd('switch default')
diff --git a/test/replication/gh-4114-local-space-replication.test.lua b/test/replication/gh-4114-local-space-replication.test.lua
index c18fb3b10..65fef3bf6 100644
--- a/test/replication/gh-4114-local-space-replication.test.lua
+++ b/test/replication/gh-4114-local-space-replication.test.lua
@@ -18,7 +18,7 @@ for i = 1,10 do box.space.test:insert{i} end
 box.info.vclock[0] == a + 10 or box.info.vclock[0] - a
 
 test_run:cmd('switch replica')
-box.info.vclock[0]
+a = box.info.vclock[0] or 0
 box.cfg{checkpoint_count=1}
 box.space.test:select{}
 box.space.test:insert{1}
@@ -27,7 +27,7 @@ box.space.test:insert{2}
 box.snapshot()
 box.space.test:insert{3}
 
-box.info.vclock[0]
+assert(box.info.vclock[0] == a + 3)
 
 test_run:cmd('switch default')
 
diff --git a/test/replication/gh-6034-election-promote-bump-term.result b/test/replication/gh-6034-election-promote-bump-term.result
new file mode 100644
index 000000000..8be4e8243
--- /dev/null
+++ b/test/replication/gh-6034-election-promote-bump-term.result
@@ -0,0 +1,21 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- gh-6034: test that every box.ctl.promote() bumps
+-- the instance's term. Even when elections are disabled.
+box.cfg{election_mode='off'}
+ | ---
+ | ...
+
+term = box.info.election.term
+ | ---
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+assert(box.info.election.term == term + 1)
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-6034-election-promote-bump-term.test.lua b/test/replication/gh-6034-election-promote-bump-term.test.lua
new file mode 100644
index 000000000..1e814bf5d
--- /dev/null
+++ b/test/replication/gh-6034-election-promote-bump-term.test.lua
@@ -0,0 +1,9 @@
+test_run = require('test_run').new()
+
+-- gh-6034: test that every box.ctl.promote() bumps
+-- the instance's term. Even when elections are disabled.
+box.cfg{election_mode='off'}
+
+term = box.info.election.term
+box.ctl.promote()
+assert(box.info.election.term == term + 1)
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index ae146c366..7f9014b22 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -48,6 +48,7 @@
     "gh-5613-bootstrap-prefer-booted.test.lua": {},
     "gh-6027-applier-error-show.test.lua": {},
     "gh-6032-promote-wal-write.test.lua": {},
+    "gh-6034-election-promote-bump-term.test.lua": {},
     "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
     "gh-6094-rs-uuid-mismatch.test.lua": {},
     "gh-6127-election-join-new.test.lua": {},
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 11/16] box: make promote on the current leader a no-op
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (9 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 10/16] box: make promote always bump the term Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-21 23:26   ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 12/16] box: fix an assertion failure after a spurious wakeup in promote Serge Petrenko via Tarantool-patches
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

It was allowed to call promote on any instance, even when it's already
the limbo owner (iow Raft leader, when elections are enabled).

This doesn't break anything, when elections are disabled, but is rather
strange.

When elections are enabled, in contrary, calling promote() should be a
no-op on the leader. Otherwise it would make the leader read-only until
it wins the next election round, which's quite inconvenient.

Part-of #6034
---
 src/box/box.cc | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index ac6c487e4..6a534952f 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1701,6 +1701,9 @@ box_promote(void)
 	 */
 	if (!is_box_configured)
 		return 0;
+	if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
+	    box_raft()->term)
+		return 0;
 	bool run_elections = false;
 	bool try_wait = false;
 
@@ -1729,10 +1732,6 @@ box_promote(void)
 				 "'candidate'", "manual elections");
 			return -1;
 		}
-		if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
-		    box_raft()->term)
-			return 0;
-
 		break;
 	default:
 		unreachable();
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 12/16] box: fix an assertion failure after a spurious wakeup in promote
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (10 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 11/16] box: make promote on the current leader a no-op Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-21 23:29   ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 13/16] box: allow calling promote on a candidate Serge Petrenko via Tarantool-patches
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

Follow-up #3055
---
 src/box/raft.c                                |  8 +++-
 .../gh-3055-promote-wakeup-crash.result       | 43 +++++++++++++++++++
 .../gh-3055-promote-wakeup-crash.test.lua     | 20 +++++++++
 test/replication/suite.cfg                    |  1 +
 4 files changed, 70 insertions(+), 2 deletions(-)
 create mode 100644 test/replication/gh-3055-promote-wakeup-crash.result
 create mode 100644 test/replication/gh-3055-promote-wakeup-crash.test.lua

diff --git a/src/box/raft.c b/src/box/raft.c
index b04932cd9..d16ec952a 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -344,13 +344,17 @@ box_raft_wait_leader_found(void)
 	struct trigger trig;
 	trigger_create(&trig, box_raft_wait_leader_found_f, fiber(), NULL);
 	raft_on_update(box_raft(), &trig);
-	fiber_yield();
+
+	do {
+		fiber_yield();
+	} while (box_raft()->is_enabled && !fiber_is_cancelled() &&
+		 box_raft()->leader == REPLICA_ID_NIL);
+
 	trigger_clear(&trig);
 	if (fiber_is_cancelled()) {
 		diag_set(FiberIsCancelled);
 		return -1;
 	}
-	assert(box_raft()->leader != REPLICA_ID_NIL || !box_raft()->is_enabled);
 	return 0;
 }
 
diff --git a/test/replication/gh-3055-promote-wakeup-crash.result b/test/replication/gh-3055-promote-wakeup-crash.result
new file mode 100644
index 000000000..e508611e5
--- /dev/null
+++ b/test/replication/gh-3055-promote-wakeup-crash.result
@@ -0,0 +1,43 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+--
+-- gh-3055 follow-up: box.ctl.promote() could crash on an assertion after a
+-- spurious wakeup.
+--
+_ = box.space._cluster:insert{2, require('uuid').str()}
+ | ---
+ | ...
+box.cfg{election_mode='manual',\
+        replication_synchro_quorum=2,\
+        election_timeout=1000}
+ | ---
+ | ...
+
+fiber = require('fiber')
+ | ---
+ | ...
+f = fiber.create(function() box.ctl.promote() end)
+ | ---
+ | ...
+f:set_joinable(true)
+ | ---
+ | ...
+f:wakeup()
+ | ---
+ | ...
+fiber.yield()
+ | ---
+ | ...
+
+-- Cleanup.
+f:cancel()
+ | ---
+ | ...
+box.cfg{election_mode='off'}
+ | ---
+ | ...
+test_run:cleanup_cluster()
+ | ---
+ | ...
diff --git a/test/replication/gh-3055-promote-wakeup-crash.test.lua b/test/replication/gh-3055-promote-wakeup-crash.test.lua
new file mode 100644
index 000000000..2ac901b08
--- /dev/null
+++ b/test/replication/gh-3055-promote-wakeup-crash.test.lua
@@ -0,0 +1,20 @@
+test_run = require('test_run').new()
+--
+-- gh-3055 follow-up: box.ctl.promote() could crash on an assertion after a
+-- spurious wakeup.
+--
+_ = box.space._cluster:insert{2, require('uuid').str()}
+box.cfg{election_mode='manual',\
+        replication_synchro_quorum=2,\
+        election_timeout=1000}
+
+fiber = require('fiber')
+f = fiber.create(function() box.ctl.promote() end)
+f:set_joinable(true)
+f:wakeup()
+fiber.yield()
+
+-- Cleanup.
+f:cancel()
+box.cfg{election_mode='off'}
+test_run:cleanup_cluster()
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 7f9014b22..8b2204e2a 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -3,6 +3,7 @@
     "anon_register_gap.test.lua": {},
     "gh-2991-misc-asserts-on-update.test.lua": {},
     "gh-3055-election-promote.test.lua": {},
+    "gh-3055-promote-wakeup-crash.test.lua": {},
     "gh-3111-misc-rebootstrap-from-ro-master.test.lua": {},
     "gh-3160-misc-heartbeats-on-master-changes.test.lua": {},
     "gh-3247-misc-iproto-sequence-value-not-replicated.test.lua": {},
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 13/16] box: allow calling promote on a candidate
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (11 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 12/16] box: fix an assertion failure after a spurious wakeup in promote Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-15 14:06   ` Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 14/16] box: extract promote() settings to a separate method Serge Petrenko via Tarantool-patches
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

Part of #6034
---
 src/box/box.cc                                | 15 +---
 .../gh-6034-election-candidate-promote.result | 84 +++++++++++++++++++
 ...h-6034-election-candidate-promote.test.lua | 42 ++++++++++
 test/replication/suite.cfg                    |  1 +
 4 files changed, 128 insertions(+), 14 deletions(-)
 create mode 100644 test/replication/gh-6034-election-candidate-promote.result
 create mode 100644 test/replication/gh-6034-election-candidate-promote.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index 6a534952f..9130fc322 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1717,21 +1717,8 @@ box_promote(void)
 			 "manual elections");
 		return -1;
 	case ELECTION_MODE_MANUAL:
-		if (box_raft()->state == RAFT_STATE_LEADER)
-			return 0;
-		run_elections = true;
-		break;
 	case ELECTION_MODE_CANDIDATE:
-		/*
-		 * Leader elections are enabled, and this instance is allowed to
-		 * promote only if it's already an elected leader. No manual
-		 * elections.
-		 */
-		if (box_raft()->state != RAFT_STATE_LEADER) {
-			diag_set(ClientError, ER_UNSUPPORTED, "election_mode="
-				 "'candidate'", "manual elections");
-			return -1;
-		}
+		run_elections = box_raft()->state != RAFT_STATE_LEADER;
 		break;
 	default:
 		unreachable();
diff --git a/test/replication/gh-6034-election-candidate-promote.result b/test/replication/gh-6034-election-candidate-promote.result
new file mode 100644
index 000000000..76a54d5a6
--- /dev/null
+++ b/test/replication/gh-6034-election-candidate-promote.result
@@ -0,0 +1,84 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+SERVERS = {'election_replica1', 'election_replica2', 'election_replica3'}
+ | ---
+ | ...
+test_run:create_cluster(SERVERS, 'replication')
+ | ---
+ | ...
+test_run:wait_fullmesh(SERVERS)
+ | ---
+ | ...
+is_leader_cmd = 'return box.info.election.state == \'leader\''
+ | ---
+ | ...
+
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function get_leader_nr()
+    local leader_nr = 0
+    test_run:wait_cond(function()
+        for nr = 1,3 do
+            local is_leader = test_run:eval('election_replica'..nr, is_leader_cmd)[1]
+            if is_leader then
+                leader_nr = nr
+                return true
+            end
+        end
+        return false
+    end)
+    return leader_nr
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+
+leader_nr = get_leader_nr()
+ | ---
+ | ...
+
+assert(leader_nr ~= 0)
+ | ---
+ | - true
+ | ...
+
+term = test_run:eval('election_replica'..leader_nr,\
+                     'return box.info.election.term')[1]
+ | ---
+ | ...
+
+next_nr = leader_nr % 3 + 1
+ | ---
+ | ...
+-- Someone else may become a leader, thus promote may fail. But we're testing
+-- that it takes effect at all, so that's fine.
+_ = pcall(test_run:eval('election_replica'..next_nr, 'box.ctl.promote()'))
+ | ---
+ | ...
+new_term = test_run:eval('election_replica'..next_nr,\
+                         'return box.info.election.term')[1]
+ | ---
+ | ...
+assert(new_term > term)
+ | ---
+ | - true
+ | ...
+leader_nr = get_leader_nr()
+ | ---
+ | ...
+assert(leader_nr ~= 0)
+ | ---
+ | - true
+ | ...
+
+test_run:drop_cluster(SERVERS)
+ | ---
+ | ...
diff --git a/test/replication/gh-6034-election-candidate-promote.test.lua b/test/replication/gh-6034-election-candidate-promote.test.lua
new file mode 100644
index 000000000..24c57f4cb
--- /dev/null
+++ b/test/replication/gh-6034-election-candidate-promote.test.lua
@@ -0,0 +1,42 @@
+test_run = require('test_run').new()
+
+SERVERS = {'election_replica1', 'election_replica2', 'election_replica3'}
+test_run:create_cluster(SERVERS, 'replication')
+test_run:wait_fullmesh(SERVERS)
+is_leader_cmd = 'return box.info.election.state == \'leader\''
+
+test_run:cmd("setopt delimiter ';'")
+function get_leader_nr()
+    local leader_nr = 0
+    test_run:wait_cond(function()
+        for nr = 1,3 do
+            local is_leader = test_run:eval('election_replica'..nr, is_leader_cmd)[1]
+            if is_leader then
+                leader_nr = nr
+                return true
+            end
+        end
+        return false
+    end)
+    return leader_nr
+end;
+test_run:cmd("setopt delimiter ''");
+
+leader_nr = get_leader_nr()
+
+assert(leader_nr ~= 0)
+
+term = test_run:eval('election_replica'..leader_nr,\
+                     'return box.info.election.term')[1]
+
+next_nr = leader_nr % 3 + 1
+-- Someone else may become a leader, thus promote may fail. But we're testing
+-- that it takes effect at all, so that's fine.
+_ = pcall(test_run:eval('election_replica'..next_nr, 'box.ctl.promote()'))
+new_term = test_run:eval('election_replica'..next_nr,\
+                         'return box.info.election.term')[1]
+assert(new_term > term)
+leader_nr = get_leader_nr()
+assert(leader_nr ~= 0)
+
+test_run:drop_cluster(SERVERS)
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 8b2204e2a..6f42db081 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -53,6 +53,7 @@
     "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
     "gh-6094-rs-uuid-mismatch.test.lua": {},
     "gh-6127-election-join-new.test.lua": {},
+    "gh-6034-election-candidate-promote.test.lua": {},
     "gh-6035-applier-filter.test.lua": {},
     "*": {
         "memtx": {"engine": "memtx"},
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 14/16] box: extract promote() settings to a separate method
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (12 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 13/16] box: allow calling promote on a candidate Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 15/16] replication: forbid implicit limbo owner transition Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 16/16] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
  15 siblings, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

Extract the switch(box_election_mode) { } block to a separate method,
box_check_promote_election_mode(), and make it rule out whether the
instance may issue a promote(), and how the promote() has to perform.

This will simplify the code a bit once demote() is introduced.

Part-of #6034
---
 src/box/box.cc | 44 +++++++++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 9130fc322..dcfaab29e 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1679,6 +1679,34 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
 	assert(txn_limbo_is_empty(&txn_limbo));
 }
 
+/**
+ * Check whether this instance may run a promote() and set promote parameters
+ * according to its election mode.
+ */
+static int
+box_check_promote_election_mode(bool *try_wait, bool *run_elections)
+{
+	switch (box_election_mode) {
+	case ELECTION_MODE_OFF:
+		if (try_wait != NULL)
+			*try_wait = true;
+		break;
+	case ELECTION_MODE_VOTER:
+		assert(box_raft()->state == RAFT_STATE_FOLLOWER);
+		diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'",
+			 "manual elections");
+		return -1;
+	case ELECTION_MODE_MANUAL:
+	case ELECTION_MODE_CANDIDATE:
+		if (run_elections != NULL)
+			*run_elections = box_raft()->state != RAFT_STATE_LEADER;
+		break;
+	default:
+		unreachable();
+	}
+	return 0;
+}
+
 int
 box_promote(void)
 {
@@ -1707,22 +1735,8 @@ box_promote(void)
 	bool run_elections = false;
 	bool try_wait = false;
 
-	switch (box_election_mode) {
-	case ELECTION_MODE_OFF:
-		try_wait = true;
-		break;
-	case ELECTION_MODE_VOTER:
-		assert(box_raft()->state == RAFT_STATE_FOLLOWER);
-		diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'",
-			 "manual elections");
+	if (box_check_promote_election_mode(&try_wait, &run_elections) < 0)
 		return -1;
-	case ELECTION_MODE_MANUAL:
-	case ELECTION_MODE_CANDIDATE:
-		run_elections = box_raft()->state != RAFT_STATE_LEADER;
-		break;
-	default:
-		unreachable();
-	}
 
 	int64_t wait_lsn = -1;
 
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 15/16] replication: forbid implicit limbo owner transition
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (13 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 14/16] box: extract promote() settings to a separate method Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 16/16] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
  15 siblings, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

Forbid limbo ownership transition without an explicit promote.
Make it so that synchronous transactions may be committed only after it
is claimed by some instance via a PROMOTE request.

Make everyone but the limbo owner read-only even when the limbo is
empty.

Part-of #6034

@TarantoolBot document
Title: synchronous replication changes

`box.info.synchro.queue` receives a new field: `owner`. It's a replica
id of the instance owning the synchronous transaction queue.

Once some instance owns the queue, every other instance becomes
read-only. When the queue is unclaimed, e.g.
`box.info.synchro.queue.owner` is `0`, everyone may be writeable, but
cannot create synchronous transactions.

In order to claim or re-claim the queue, you have to issue
`box.ctl.promote()` on the instance you wish to promote.

When elections are enabled, the instance issues `box.ctl.promote()`
automatically once it wins the elections, no additional actions are
required.
---
 src/box/errcode.h                          |   2 +
 src/box/lua/info.c                         |   4 +-
 src/box/txn_limbo.c                        |  32 ++---
 test/box/alter.result                      |   2 +-
 test/box/error.result                      |   2 +
 test/replication/gh-5440-qsync-ro.result   | 133 ---------------------
 test/replication/gh-5440-qsync-ro.test.lua |  53 --------
 test/replication/suite.cfg                 |   1 -
 8 files changed, 19 insertions(+), 210 deletions(-)
 delete mode 100644 test/replication/gh-5440-qsync-ro.result
 delete mode 100644 test/replication/gh-5440-qsync-ro.test.lua

diff --git a/src/box/errcode.h b/src/box/errcode.h
index 49aec4bf6..d42b64ef4 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -278,6 +278,8 @@ struct errcode_record {
 	/*223 */_(ER_INTERFERING_PROMOTE,	"Instance with replica id %u was promoted first") \
 	/*224 */_(ER_RAFT_DISABLED,		"Elections were turned off while running box.ctl.promote()")\
 	/*225 */_(ER_TXN_ROLLBACK,		"Transaction was rolled back") \
+	/*226 */_(ER_SYNC_QUEUE_UNCLAIMED,	"The synchronous transaction queue doesn't belong to any instance")\
+	/*227 */_(ER_SYNC_QUEUE_FOREIGN,	"The synchronous transaction queue belongs to other instance with id %u")\
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index 1d8fe7938..ef3d6f5b2 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -615,9 +615,11 @@ lbox_info_synchro(struct lua_State *L)
 
 	/* Queue information. */
 	struct txn_limbo *queue = &txn_limbo;
-	lua_createtable(L, 0, 1);
+	lua_createtable(L, 0, 2);
 	lua_pushnumber(L, queue->len);
 	lua_setfield(L, -2, "len");
+	lua_pushnumber(L, queue->owner_id);
+	lua_setfield(L, -2, "owner");
 	lua_setfield(L, -2, "queue");
 
 	return 1;
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 991c47698..c8c4f587c 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -55,7 +55,8 @@ txn_limbo_create(struct txn_limbo *limbo)
 bool
 txn_limbo_is_ro(struct txn_limbo *limbo)
 {
-	return limbo->owner_id != instance_id && !txn_limbo_is_empty(limbo);
+	return limbo->owner_id != REPLICA_ID_NIL &&
+	       limbo->owner_id != instance_id;
 }
 
 struct txn_limbo_entry *
@@ -95,18 +96,18 @@ txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
 	}
 	if (id == 0)
 		id = instance_id;
-	bool make_ro = false;
-	if (limbo->owner_id != id) {
-		if (rlist_empty(&limbo->queue)) {
-			limbo->owner_id = id;
-			limbo->confirmed_lsn = 0;
-			if (id != instance_id)
-				make_ro = true;
+	if  (limbo->owner_id == REPLICA_ID_NIL) {
+		diag_set(ClientError, ER_SYNC_QUEUE_UNCLAIMED);
+		return NULL;
+	} else if (limbo->owner_id != id) {
+		if (txn_limbo_is_empty(limbo)) {
+			diag_set(ClientError, ER_SYNC_QUEUE_FOREIGN,
+				 limbo->owner_id);
 		} else {
 			diag_set(ClientError, ER_UNCOMMITTED_FOREIGN_SYNC_TXNS,
 				 limbo->owner_id);
-			return NULL;
 		}
+		return NULL;
 	}
 	size_t size;
 	struct txn_limbo_entry *e = region_alloc_object(&txn->region,
@@ -122,12 +123,6 @@ txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
 	e->is_rollback = false;
 	rlist_add_tail_entry(&limbo->queue, e, in_queue);
 	limbo->len++;
-	/*
-	 * We added new entries from a remote instance to an empty limbo.
-	 * Time to make this instance read-only.
-	 */
-	if (make_ro)
-		box_update_ro_summary();
 	return e;
 }
 
@@ -442,9 +437,6 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
 		assert(e->txn->signature >= 0);
 		txn_complete_success(e->txn);
 	}
-	/* Update is_ro once the limbo is clear. */
-	if (txn_limbo_is_empty(limbo))
-		box_update_ro_summary();
 }
 
 /**
@@ -492,9 +484,6 @@ txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
 		if (e == last_rollback)
 			break;
 	}
-	/* Update is_ro once the limbo is clear. */
-	if (txn_limbo_is_empty(limbo))
-		box_update_ro_summary();
 }
 
 void
@@ -525,6 +514,7 @@ txn_limbo_read_promote(struct txn_limbo *limbo, uint32_t replica_id,
 	txn_limbo_read_rollback(limbo, lsn + 1);
 	assert(txn_limbo_is_empty(&txn_limbo));
 	limbo->owner_id = replica_id;
+	box_update_ro_summary();
 	limbo->confirmed_lsn = 0;
 }
 
diff --git a/test/box/alter.result b/test/box/alter.result
index a7bffce10..6a64f6b84 100644
--- a/test/box/alter.result
+++ b/test/box/alter.result
@@ -1464,7 +1464,7 @@ assert(s.is_sync)
 ...
 s:replace{1}
 ---
-- error: Quorum collection for a synchronous transaction is timed out
+- error: The synchronous transaction queue doesn't belong to any instance
 ...
 -- When not specified or nil - ignored.
 s:alter({is_sync = nil})
diff --git a/test/box/error.result b/test/box/error.result
index 062a90399..8f783f927 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -444,6 +444,8 @@ t;
  |   223: box.error.INTERFERING_PROMOTE
  |   224: box.error.RAFT_DISABLED
  |   225: box.error.TXN_ROLLBACK
+ |   226: box.error.SYNC_QUEUE_UNCLAIMED
+ |   227: box.error.SYNC_QUEUE_FOREIGN
  | ...
 
 test_run:cmd("setopt delimiter ''");
diff --git a/test/replication/gh-5440-qsync-ro.result b/test/replication/gh-5440-qsync-ro.result
deleted file mode 100644
index 1ece26a42..000000000
--- a/test/replication/gh-5440-qsync-ro.result
+++ /dev/null
@@ -1,133 +0,0 @@
--- test-run result file version 2
---
--- gh-5440 everyone but the limbo owner is read-only on non-empty limbo.
---
-env = require('test_run')
- | ---
- | ...
-test_run = env.new()
- | ---
- | ...
-fiber = require('fiber')
- | ---
- | ...
-
-box.schema.user.grant('guest', 'replication')
- | ---
- | ...
-test_run:cmd('create server replica with rpl_master=default, script="replication/replica.lua"')
- | ---
- | - true
- | ...
-test_run:cmd('start server replica with wait=True, wait_load=True')
- | ---
- | - true
- | ...
-
-_ = box.schema.space.create('test', {is_sync=true})
- | ---
- | ...
-_ = box.space.test:create_index('pk')
- | ---
- | ...
-
-old_synchro_quorum = box.cfg.replication_synchro_quorum
- | ---
- | ...
-old_synchro_timeout = box.cfg.replication_synchro_timeout
- | ---
- | ...
-
--- Make sure that the master stalls on commit leaving the limbo non-empty.
-box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=1000}
- | ---
- | ...
-
-f = fiber.new(function() box.space.test:insert{1} end)
- | ---
- | ...
-f:status()
- | ---
- | - suspended
- | ...
-
--- Wait till replica's limbo is non-empty.
-test_run:wait_lsn('replica', 'default')
- | ---
- | ...
-test_run:cmd('switch replica')
- | ---
- | - true
- | ...
-
-box.info.ro
- | ---
- | - true
- | ...
-box.space.test:insert{2}
- | ---
- | - error: Can't modify data because this instance is in read-only mode.
- | ...
-success = false
- | ---
- | ...
-f = require('fiber').new(function() box.ctl.wait_rw() success = true end)
- | ---
- | ...
-f:status()
- | ---
- | - suspended
- | ...
-
-test_run:cmd('switch default')
- | ---
- | - true
- | ...
-
--- Empty the limbo.
-box.cfg{replication_synchro_quorum=2}
- | ---
- | ...
-
-test_run:cmd('switch replica')
- | ---
- | - true
- | ...
-
-test_run:wait_cond(function() return success end)
- | ---
- | - true
- | ...
-box.info.ro
- | ---
- | - false
- | ...
--- Should succeed now.
-box.space.test:insert{2}
- | ---
- | - [2]
- | ...
-
--- Cleanup.
-test_run:cmd('switch default')
- | ---
- | - true
- | ...
-box.cfg{replication_synchro_quorum=old_synchro_quorum,\
-        replication_synchro_timeout=old_synchro_timeout}
- | ---
- | ...
-box.space.test:drop()
- | ---
- | ...
-test_run:cmd('stop server replica')
- | ---
- | - true
- | ...
-test_run:cmd('delete server replica')
- | ---
- | - true
- | ...
-box.schema.user.revoke('guest', 'replication')
- | ---
- | ...
diff --git a/test/replication/gh-5440-qsync-ro.test.lua b/test/replication/gh-5440-qsync-ro.test.lua
deleted file mode 100644
index d63ec9c1e..000000000
--- a/test/replication/gh-5440-qsync-ro.test.lua
+++ /dev/null
@@ -1,53 +0,0 @@
---
--- gh-5440 everyone but the limbo owner is read-only on non-empty limbo.
---
-env = require('test_run')
-test_run = env.new()
-fiber = require('fiber')
-
-box.schema.user.grant('guest', 'replication')
-test_run:cmd('create server replica with rpl_master=default, script="replication/replica.lua"')
-test_run:cmd('start server replica with wait=True, wait_load=True')
-
-_ = box.schema.space.create('test', {is_sync=true})
-_ = box.space.test:create_index('pk')
-
-old_synchro_quorum = box.cfg.replication_synchro_quorum
-old_synchro_timeout = box.cfg.replication_synchro_timeout
-
--- Make sure that the master stalls on commit leaving the limbo non-empty.
-box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=1000}
-
-f = fiber.new(function() box.space.test:insert{1} end)
-f:status()
-
--- Wait till replica's limbo is non-empty.
-test_run:wait_lsn('replica', 'default')
-test_run:cmd('switch replica')
-
-box.info.ro
-box.space.test:insert{2}
-success = false
-f = require('fiber').new(function() box.ctl.wait_rw() success = true end)
-f:status()
-
-test_run:cmd('switch default')
-
--- Empty the limbo.
-box.cfg{replication_synchro_quorum=2}
-
-test_run:cmd('switch replica')
-
-test_run:wait_cond(function() return success end)
-box.info.ro
--- Should succeed now.
-box.space.test:insert{2}
-
--- Cleanup.
-test_run:cmd('switch default')
-box.cfg{replication_synchro_quorum=old_synchro_quorum,\
-        replication_synchro_timeout=old_synchro_timeout}
-box.space.test:drop()
-test_run:cmd('stop server replica')
-test_run:cmd('delete server replica')
-box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 6f42db081..7a6fd7052 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -42,7 +42,6 @@
     "gh-4739-vclock-assert.test.lua": {},
     "gh-4730-applier-rollback.test.lua": {},
     "gh-4928-tx-boundaries.test.lua": {},
-    "gh-5440-qsync-ro.test.lua": {},
     "gh-5435-qsync-clear-synchro-queue-commit-all.test.lua": {},
     "gh-5536-wal-limit.test.lua": {},
     "gh-5566-final-join-synchro.test.lua": {},
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 16/16] box: introduce `box.ctl.demote`
  2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
                   ` (14 preceding siblings ...)
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 15/16] replication: forbid implicit limbo owner transition Serge Petrenko via Tarantool-patches
@ 2021-07-14 18:25 ` Serge Petrenko via Tarantool-patches
  2021-07-15 17:13   ` Serge Petrenko via Tarantool-patches
  2021-07-15 20:11   ` [Tarantool-patches] [PATCH v4 17/16] replication: fix flaky election_qsync.test Serge Petrenko via Tarantool-patches
  15 siblings, 2 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:25 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

Introduce a new journal entry, DEMOTE. The entry has the same meaning as
PROMOTE, with the only difference that it clears limbo ownership instead
of transferring it to the issuer.

Introduce `box.ctl.demote`, a counterpart to `box.ctl.promote`, which
clears the limbo ownership (when elections are off) via writing the DEMOTE, or
simply makes this instance step down from the leader's role (when elections
are enabled in any mode).

A new request was necessary instead of simply writing PROMOTE(origin_id
= 0), because origin_id is deduced from row.replica_id, which cannot be
0 for replicated rows (it's always equal to instance_id of the row
originator).

Closes #6034

@TarantoolBot document
Title: box.ctl.demote

`box.ctl.demote()` is a counterpart to `box.ctl.promote()` which works
as follows:
 - when `box.cfg.election_mode` is not 'off': make the instance give up
   leadership.
 - when `box.cfg.election_mode` is 'off': write a DEMOTE entry to WAL.

`box.ctl.demote()` may only be issued on the synchronous transaction
queue owner (i.e. leader when elections are enabled).

A DEMOTE request (DEMOTE = 32) copies PROMOTE behaviour (it clears the
limbo as well), but clears the synchronous transaction queue  ownership instead
of assigning it to a new instance.
---
 src/box/box.cc                                |  99 ++++++++--
 src/box/box.h                                 |   3 +
 src/box/iproto_constants.h                    |  10 +-
 src/box/lua/ctl.c                             |   9 +
 src/box/txn_limbo.c                           |  37 +++-
 src/box/txn_limbo.h                           |   7 +
 test/replication/election_basic.result        |   3 +
 test/replication/election_basic.test.lua      |   1 +
 test/replication/election_qsync.result        |   3 +
 test/replication/election_qsync.test.lua      |   1 +
 .../gh-5140-qsync-casc-rollback.result        |   6 +
 .../gh-5140-qsync-casc-rollback.test.lua      |   2 +
 .../gh-5144-qsync-dup-confirm.result          |   6 +
 .../gh-5144-qsync-dup-confirm.test.lua        |   2 +
 .../gh-5163-qsync-restart-crash.result        |   6 +
 .../gh-5163-qsync-restart-crash.test.lua      |   2 +
 .../gh-5167-qsync-rollback-snap.result        |   6 +
 .../gh-5167-qsync-rollback-snap.test.lua      |   2 +
 .../gh-5195-qsync-replica-write.result        |  10 +-
 .../gh-5195-qsync-replica-write.test.lua      |   6 +-
 .../gh-5213-qsync-applier-order-3.result      |   9 +
 .../gh-5213-qsync-applier-order-3.test.lua    |   3 +
 .../gh-5213-qsync-applier-order.result        |   6 +
 .../gh-5213-qsync-applier-order.test.lua      |   2 +
 .../replication/gh-5288-qsync-recovery.result |   6 +
 .../gh-5288-qsync-recovery.test.lua           |   2 +
 .../gh-5298-qsync-recovery-snap.result        |   6 +
 .../gh-5298-qsync-recovery-snap.test.lua      |   2 +
 .../gh-5426-election-on-off.result            |   3 +
 .../gh-5426-election-on-off.test.lua          |   1 +
 .../gh-5433-election-restart-recovery.result  |   3 +
 ...gh-5433-election-restart-recovery.test.lua |   1 +
 ...sync-clear-synchro-queue-commit-all.result |   3 +
 ...nc-clear-synchro-queue-commit-all.test.lua |   1 +
 .../replication/gh-5438-election-state.result |   3 +
 .../gh-5438-election-state.test.lua           |   1 +
 .../gh-5446-qsync-eval-quorum.result          |   6 +
 .../gh-5446-qsync-eval-quorum.test.lua        |   2 +
 .../gh-5506-election-on-off.result            |   3 +
 .../gh-5506-election-on-off.test.lua          |   1 +
 .../gh-5566-final-join-synchro.result         |   6 +
 .../gh-5566-final-join-synchro.test.lua       |   2 +
 .../gh-5874-qsync-txn-recovery.result         |   6 +
 .../gh-5874-qsync-txn-recovery.test.lua       |   2 +
 .../gh-6032-promote-wal-write.result          |   3 +
 .../gh-6032-promote-wal-write.test.lua        |   1 +
 .../gh-6034-election-promote-bump-term.result |   5 +
 ...h-6034-election-promote-bump-term.test.lua |   3 +
 .../gh-6034-qsync-limbo-ownership.result      | 186 ++++++++++++++++++
 .../gh-6034-qsync-limbo-ownership.test.lua    |  68 +++++++
 .../gh-6057-qsync-confirm-async-no-wal.result |   7 +
 ...h-6057-qsync-confirm-async-no-wal.test.lua |   3 +
 test/replication/hang_on_synchro_fail.result  |   6 +
 .../replication/hang_on_synchro_fail.test.lua |   2 +
 test/replication/qsync_advanced.result        |  12 ++
 test/replication/qsync_advanced.test.lua      |   4 +
 test/replication/qsync_basic.result           |  33 ++--
 test/replication/qsync_basic.test.lua         |  16 +-
 test/replication/qsync_errinj.result          |   6 +
 test/replication/qsync_errinj.test.lua        |   2 +
 test/replication/qsync_snapshots.result       |   6 +
 test/replication/qsync_snapshots.test.lua     |   2 +
 test/replication/qsync_with_anon.result       |   6 +
 test/replication/qsync_with_anon.test.lua     |   2 +
 test/replication/suite.cfg                    |   1 +
 65 files changed, 623 insertions(+), 52 deletions(-)
 create mode 100644 test/replication/gh-6034-qsync-limbo-ownership.result
 create mode 100644 test/replication/gh-6034-qsync-limbo-ownership.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index dcfaab29e..f68fffcab 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1679,6 +1679,25 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
 	assert(txn_limbo_is_empty(&txn_limbo));
 }
 
+/** Write and process a DEMOTE request. */
+static void
+box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
+{
+	assert(box_raft()->volatile_term == box_raft()->term);
+	assert(promote_lsn >= 0);
+	txn_limbo_write_demote(&txn_limbo, promote_lsn,
+				box_raft()->term);
+	struct synchro_request req = {
+		.type = IPROTO_DEMOTE,
+		.replica_id = prev_leader_id,
+		.origin_id = instance_id,
+		.lsn = promote_lsn,
+		.term = box_raft()->term,
+	};
+	txn_limbo_process(&txn_limbo, &req);
+	assert(txn_limbo_is_empty(&txn_limbo));
+}
+
 /**
  * Check whether this instance may run a promote() and set promote parameters
  * according to its election mode.
@@ -1707,31 +1726,37 @@ box_check_promote_election_mode(bool *try_wait, bool *run_elections)
 	return 0;
 }
 
+/* A guard to block multiple simultaneous promote()/demote() invocations. */
+static bool box_in_promote = false;
+
 int
 box_promote(void)
 {
-	/* A guard to block multiple simultaneous function invocations. */
-	static bool in_promote = false;
-	if (in_promote) {
+	if (box_in_promote) {
 		diag_set(ClientError, ER_UNSUPPORTED, "box.ctl.promote",
 			 "simultaneous invocations");
 		return -1;
 	}
-	in_promote = true;
+	box_in_promote = true;
 	auto promote_guard = make_scoped_guard([&] {
-		in_promote = false;
+		box_in_promote = false;
 	});
 
-	/*
-	 * Do nothing when box isn't configured and when PROMOTE was already
-	 * written for this term (synchronous replication and leader election
-	 * are in sync, and both chose this node as a leader).
-	 */
 	if (!is_box_configured)
 		return 0;
-	if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
-	    box_raft()->term)
+
+	/*
+	 * Currently active leader (the instance that is seen as leader by both
+	 * raft and txn_limbo) can't issue another PROMOTE.
+	 */
+	bool is_leader = txn_limbo_replica_term(&txn_limbo, instance_id) ==
+			 box_raft()->term && txn_limbo.owner_id == instance_id;
+	if (box_election_mode != ELECTION_MODE_OFF)
+		is_leader = is_leader && box_raft()->state == RAFT_STATE_LEADER;
+
+	if (is_leader)
 		return 0;
+
 	bool run_elections = false;
 	bool try_wait = false;
 
@@ -1752,6 +1777,56 @@ box_promote(void)
 		return -1;
 
 	box_issue_promote(txn_limbo.owner_id, wait_lsn);
+
+	return 0;
+}
+
+int
+box_demote(void)
+{
+	if (box_in_promote) {
+		diag_set(ClientError, ER_UNSUPPORTED, "box.ctl.demote",
+			 "simultaneous invocations");
+		return -1;
+	}
+	box_in_promote = true;
+	auto promote_guard = make_scoped_guard([&] {
+		box_in_promote = false;
+	});
+
+	if (!is_box_configured)
+		return 0;
+
+	/* Currently active leader is the only one who can issue a DEMOTE. */
+	bool is_leader = txn_limbo_replica_term(&txn_limbo, instance_id) ==
+			 box_raft()->term && txn_limbo.owner_id == instance_id;
+	if (box_election_mode != ELECTION_MODE_OFF)
+		is_leader = is_leader && box_raft()->state == RAFT_STATE_LEADER;
+	if (!is_leader)
+		return 0;
+
+	bool try_wait = false;
+
+	if (box_check_promote_election_mode(&try_wait, NULL) < 0)
+		return -1;
+
+	int64_t wait_lsn = -1;
+
+	if (box_trigger_elections() < 0)
+		return -1;
+
+	if (box_election_mode != ELECTION_MODE_OFF)
+		return 0;
+
+	if (try_wait &&
+	    box_try_wait_confirm(2 * replication_synchro_timeout) < 0)
+		return -1;
+
+	if ((wait_lsn = box_wait_limbo_acked()) < 0)
+		return -1;
+
+	box_issue_demote(txn_limbo.owner_id, wait_lsn);
+
 	return 0;
 }
 
diff --git a/src/box/box.h b/src/box/box.h
index ecf32240d..aaf20d9dd 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -276,6 +276,9 @@ typedef struct tuple box_tuple_t;
 int
 box_promote(void);
 
+int
+box_demote(void);
+
 /* box_select is private and used only by FFI */
 API_EXPORT int
 box_select(uint32_t space_id, uint32_t index_id,
diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
index e913801a8..247ca6f37 100644
--- a/src/box/iproto_constants.h
+++ b/src/box/iproto_constants.h
@@ -241,6 +241,8 @@ enum iproto_type {
 	IPROTO_RAFT = 30,
 	/** PROMOTE request. */
 	IPROTO_PROMOTE = 31,
+	/** DEMOTE request. */
+	IPROTO_DEMOTE = 32,
 
 	/** A confirmation message for synchronous transactions. */
 	IPROTO_CONFIRM = 40,
@@ -312,6 +314,8 @@ iproto_type_name(uint16_t type)
 		return "RAFT";
 	case IPROTO_PROMOTE:
 		return "PROMOTE";
+	case IPROTO_DEMOTE:
+		return "DEMOTE";
 	case IPROTO_CONFIRM:
 		return "CONFIRM";
 	case IPROTO_ROLLBACK:
@@ -366,14 +370,14 @@ static inline bool
 iproto_type_is_synchro_request(uint16_t type)
 {
 	return type == IPROTO_CONFIRM || type == IPROTO_ROLLBACK ||
-	       type == IPROTO_PROMOTE;
+	       type == IPROTO_PROMOTE || type == IPROTO_DEMOTE;
 }
 
-/** PROMOTE entry (synchronous replication and leader elections). */
+/** PROMOTE/DEMOTE entry (synchronous replication and leader elections). */
 static inline bool
 iproto_type_is_promote_request(uint32_t type)
 {
-       return type == IPROTO_PROMOTE;
+       return type == IPROTO_PROMOTE || type == IPROTO_DEMOTE;
 }
 
 static inline bool
diff --git a/src/box/lua/ctl.c b/src/box/lua/ctl.c
index 368b9ab60..a613c4111 100644
--- a/src/box/lua/ctl.c
+++ b/src/box/lua/ctl.c
@@ -89,6 +89,14 @@ lbox_ctl_promote(struct lua_State *L)
 	return 0;
 }
 
+static int
+lbox_ctl_demote(struct lua_State *L)
+{
+	if (box_demote() != 0)
+		return luaT_error(L);
+	return 0;
+}
+
 static int
 lbox_ctl_is_recovery_finished(struct lua_State *L)
 {
@@ -127,6 +135,7 @@ static const struct luaL_Reg lbox_ctl_lib[] = {
 	{"promote", lbox_ctl_promote},
 	/* An old alias. */
 	{"clear_synchro_queue", lbox_ctl_promote},
+	{"demote", lbox_ctl_demote},
 	{"is_recovery_finished", lbox_ctl_is_recovery_finished},
 	{"set_on_shutdown_timeout", lbox_ctl_set_on_shutdown_timeout},
 	{NULL, NULL}
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index c8c4f587c..570f77c46 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -518,6 +518,29 @@ txn_limbo_read_promote(struct txn_limbo *limbo, uint32_t replica_id,
 	limbo->confirmed_lsn = 0;
 }
 
+void
+txn_limbo_write_demote(struct txn_limbo *limbo, int64_t lsn, uint64_t term)
+{
+	limbo->confirmed_lsn = lsn;
+	limbo->is_in_rollback = true;
+	struct txn_limbo_entry *e = txn_limbo_last_synchro_entry(limbo);
+	assert(e == NULL || e->lsn <= lsn);
+	(void)e;
+	txn_limbo_write_synchro(limbo, IPROTO_DEMOTE, lsn, term);
+	limbo->is_in_rollback = false;
+}
+
+/**
+ * Process a DEMOTE request, which's like PROMOTE, but clears the limbo
+ * ownership.
+ * @sa txn_limbo_read_promote.
+ */
+static void
+txn_limbo_read_demote(struct txn_limbo *limbo, int64_t lsn)
+{
+	return txn_limbo_read_promote(limbo, REPLICA_ID_NIL, lsn);
+}
+
 void
 txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 {
@@ -709,12 +732,13 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 		vclock_follow(&limbo->promote_term_map, origin, term);
 		if (term > limbo->promote_greatest_term)
 			limbo->promote_greatest_term = term;
-	} else if (req->type == IPROTO_PROMOTE &&
+	} else if (iproto_type_is_promote_request(req->type) &&
 		   limbo->promote_greatest_term > 1) {
 		/* PROMOTE for outdated term. Ignore. */
-		say_info("RAFT: ignoring PROMOTE request from instance "
+		say_info("RAFT: ignoring %s request from instance "
 			 "id %u for term %llu. Greatest term seen "
-			 "before (%llu) is bigger.", origin, (long long)term,
+			 "before (%llu) is bigger.",
+			 iproto_type_name(req->type), origin, (long long)term,
 			 (long long)limbo->promote_greatest_term);
 		return;
 	}
@@ -725,7 +749,7 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 		 * The limbo was empty on the instance issuing the request.
 		 * This means this instance must empty its limbo as well.
 		 */
-		assert(lsn == 0 && req->type == IPROTO_PROMOTE);
+		assert(lsn == 0 && iproto_type_is_promote_request(req->type));
 	} else if (req->replica_id != limbo->owner_id) {
 		/*
 		 * Ignore CONFIRM/ROLLBACK messages for a foreign master.
@@ -733,7 +757,7 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 		 * data from an old leader, who has just started and written
 		 * confirm right on synchronous transaction recovery.
 		 */
-		if (req->type != IPROTO_PROMOTE)
+		if (!iproto_type_is_promote_request(req->type))
 			return;
 		/*
 		 * Promote has a bigger term, and tries to steal the limbo. It
@@ -753,6 +777,9 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 	case IPROTO_PROMOTE:
 		txn_limbo_read_promote(limbo, req->origin_id, lsn);
 		break;
+	case IPROTO_DEMOTE:
+		txn_limbo_read_demote(limbo, lsn);
+		break;
 	default:
 		unreachable();
 	}
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 7151843f4..53e52f676 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -329,6 +329,13 @@ txn_limbo_checkpoint(const struct txn_limbo *limbo,
 void
 txn_limbo_write_promote(struct txn_limbo *limbo, int64_t lsn, uint64_t term);
 
+/**
+ * Write a DEMOTE request.
+ * It has the same effect as PROMOTE and additionally clears limbo ownership.
+ */
+void
+txn_limbo_write_demote(struct txn_limbo *limbo, int64_t lsn, uint64_t term);
+
 /**
  * Update qsync parameters dynamically.
  */
diff --git a/test/replication/election_basic.result b/test/replication/election_basic.result
index b64028c60..c0323a042 100644
--- a/test/replication/election_basic.result
+++ b/test/replication/election_basic.result
@@ -114,6 +114,9 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 
 --
 -- See if bootstrap with election enabled works.
diff --git a/test/replication/election_basic.test.lua b/test/replication/election_basic.test.lua
index 77fdf6340..085a499d5 100644
--- a/test/replication/election_basic.test.lua
+++ b/test/replication/election_basic.test.lua
@@ -43,6 +43,7 @@ box.cfg{
     election_mode = 'off',                                                      \
     election_timeout = old_election_timeout                                     \
 }
+box.ctl.demote()
 
 --
 -- See if bootstrap with election enabled works.
diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result
index c06400b38..2402c8578 100644
--- a/test/replication/election_qsync.result
+++ b/test/replication/election_qsync.result
@@ -165,6 +165,9 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua
index ea6fc4a61..e1aca8351 100644
--- a/test/replication/election_qsync.test.lua
+++ b/test/replication/election_qsync.test.lua
@@ -84,4 +84,5 @@ box.cfg{
     replication = old_replication,                                              \
     replication_synchro_timeout = old_replication_synchro_timeout,              \
 }
+box.ctl.demote()
 box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5140-qsync-casc-rollback.result b/test/replication/gh-5140-qsync-casc-rollback.result
index da77631dd..d3208e1a4 100644
--- a/test/replication/gh-5140-qsync-casc-rollback.result
+++ b/test/replication/gh-5140-qsync-casc-rollback.result
@@ -73,6 +73,9 @@ _ = box.schema.space.create('async', {is_sync=false, engine = engine})
 _ = _:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 -- Write something to flush the master state to replica.
 box.space.sync:replace{1}
  | ---
@@ -222,3 +225,6 @@ test_run:cmd('delete server replica')
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5140-qsync-casc-rollback.test.lua b/test/replication/gh-5140-qsync-casc-rollback.test.lua
index 69fc9ad02..96ddfd260 100644
--- a/test/replication/gh-5140-qsync-casc-rollback.test.lua
+++ b/test/replication/gh-5140-qsync-casc-rollback.test.lua
@@ -48,6 +48,7 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = _:create_index('pk')
 _ = box.schema.space.create('async', {is_sync=false, engine = engine})
 _ = _:create_index('pk')
+box.ctl.promote()
 -- Write something to flush the master state to replica.
 box.space.sync:replace{1}
 
@@ -103,3 +104,4 @@ test_run:cmd('stop server replica')
 test_run:cmd('delete server replica')
 
 box.schema.user.revoke('guest', 'super')
+box.ctl.demote()
diff --git a/test/replication/gh-5144-qsync-dup-confirm.result b/test/replication/gh-5144-qsync-dup-confirm.result
index 9d265d9ff..217e44412 100644
--- a/test/replication/gh-5144-qsync-dup-confirm.result
+++ b/test/replication/gh-5144-qsync-dup-confirm.result
@@ -46,6 +46,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = _:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 -- Remember the current LSN. In the end, when the following synchronous
 -- transaction is committed, result LSN should be this value +2: for the
@@ -148,6 +151,9 @@ test_run:cmd('delete server replica2')
  | - true
  | ...
 
+box.ctl.demote()
+ | ---
+ | ...
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
diff --git a/test/replication/gh-5144-qsync-dup-confirm.test.lua b/test/replication/gh-5144-qsync-dup-confirm.test.lua
index 01a8351e0..1d6af2c62 100644
--- a/test/replication/gh-5144-qsync-dup-confirm.test.lua
+++ b/test/replication/gh-5144-qsync-dup-confirm.test.lua
@@ -19,6 +19,7 @@ box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
 
 _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = _:create_index('pk')
+box.ctl.promote()
 
 -- Remember the current LSN. In the end, when the following synchronous
 -- transaction is committed, result LSN should be this value +2: for the
@@ -69,4 +70,5 @@ test_run:cmd('delete server replica1')
 test_run:cmd('stop server replica2')
 test_run:cmd('delete server replica2')
 
+box.ctl.demote()
 box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5163-qsync-restart-crash.result b/test/replication/gh-5163-qsync-restart-crash.result
index e57bc76d1..1b4d3d9b5 100644
--- a/test/replication/gh-5163-qsync-restart-crash.result
+++ b/test/replication/gh-5163-qsync-restart-crash.result
@@ -16,6 +16,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 box.space.sync:replace{1}
  | ---
@@ -30,3 +33,6 @@ box.space.sync:select{}
 box.space.sync:drop()
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5163-qsync-restart-crash.test.lua b/test/replication/gh-5163-qsync-restart-crash.test.lua
index d5aca4749..c8d54aad2 100644
--- a/test/replication/gh-5163-qsync-restart-crash.test.lua
+++ b/test/replication/gh-5163-qsync-restart-crash.test.lua
@@ -7,8 +7,10 @@ engine = test_run:get_cfg('engine')
 --
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 
 box.space.sync:replace{1}
 test_run:cmd('restart server default')
 box.space.sync:select{}
 box.space.sync:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-5167-qsync-rollback-snap.result b/test/replication/gh-5167-qsync-rollback-snap.result
index 06f58526c..13166720f 100644
--- a/test/replication/gh-5167-qsync-rollback-snap.result
+++ b/test/replication/gh-5167-qsync-rollback-snap.result
@@ -41,6 +41,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 -- Write something to flush the current master's state to replica.
 _ = box.space.sync:insert{1}
  | ---
@@ -163,3 +166,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5167-qsync-rollback-snap.test.lua b/test/replication/gh-5167-qsync-rollback-snap.test.lua
index 475727e61..1a2a31b7c 100644
--- a/test/replication/gh-5167-qsync-rollback-snap.test.lua
+++ b/test/replication/gh-5167-qsync-rollback-snap.test.lua
@@ -16,6 +16,7 @@ fiber = require('fiber')
 box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
 _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 -- Write something to flush the current master's state to replica.
 _ = box.space.sync:insert{1}
 _ = box.space.sync:delete{1}
@@ -65,3 +66,4 @@ box.cfg{
     replication_synchro_quorum = orig_synchro_quorum,                           \
     replication_synchro_timeout = orig_synchro_timeout,                         \
 }
+box.ctl.demote()
diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result
index 85e00e6ed..bc73bb599 100644
--- a/test/replication/gh-5195-qsync-replica-write.result
+++ b/test/replication/gh-5195-qsync-replica-write.result
@@ -40,6 +40,9 @@ _ = box.schema.space.create('sync', {engine = engine, is_sync = true})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 box.cfg{replication_synchro_timeout = 1000, replication_synchro_quorum = 3}
  | ---
@@ -71,12 +74,12 @@ test_run:wait_lsn('replica', 'default')
  | ---
  | ...
 -- Normal DML is blocked - the limbo is not empty and does not belong to the
--- replica. But synchro queue cleanup also does a WAL write, and propagates LSN
+-- replica. But promote also does a WAL write, and propagates LSN
 -- of the instance.
 box.cfg{replication_synchro_timeout = 0.001}
  | ---
  | ...
-box.ctl.clear_synchro_queue()
+box.ctl.promote()
  | ---
  | ...
 
@@ -144,6 +147,9 @@ test_run:cmd('delete server replica')
  | - true
  | ...
 
+box.ctl.demote()
+ | ---
+ | ...
 box.space.sync:drop()
  | ---
  | ...
diff --git a/test/replication/gh-5195-qsync-replica-write.test.lua b/test/replication/gh-5195-qsync-replica-write.test.lua
index 64c48be99..a59ec154e 100644
--- a/test/replication/gh-5195-qsync-replica-write.test.lua
+++ b/test/replication/gh-5195-qsync-replica-write.test.lua
@@ -17,6 +17,7 @@ test_run:cmd('start server replica with wait=True, wait_load=True')
 --
 _ = box.schema.space.create('sync', {engine = engine, is_sync = true})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 
 box.cfg{replication_synchro_timeout = 1000, replication_synchro_quorum = 3}
 lsn = box.info.lsn
@@ -30,10 +31,10 @@ test_run:wait_cond(function() return box.info.lsn == lsn end)
 test_run:switch('replica')
 test_run:wait_lsn('replica', 'default')
 -- Normal DML is blocked - the limbo is not empty and does not belong to the
--- replica. But synchro queue cleanup also does a WAL write, and propagates LSN
+-- replica. But promote also does a WAL write, and propagates LSN
 -- of the instance.
 box.cfg{replication_synchro_timeout = 0.001}
-box.ctl.clear_synchro_queue()
+box.ctl.promote()
 
 test_run:switch('default')
 -- Wait second ACK receipt.
@@ -59,6 +60,7 @@ test_run:switch('default')
 test_run:cmd('stop server replica')
 test_run:cmd('delete server replica')
 
+box.ctl.demote()
 box.space.sync:drop()
 box.schema.user.revoke('guest', 'super')
 
diff --git a/test/replication/gh-5213-qsync-applier-order-3.result b/test/replication/gh-5213-qsync-applier-order-3.result
index bcb18b5c0..e788eec77 100644
--- a/test/replication/gh-5213-qsync-applier-order-3.result
+++ b/test/replication/gh-5213-qsync-applier-order-3.result
@@ -45,6 +45,9 @@ s = box.schema.space.create('test', {is_sync = true})
 _ = s:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 test_run:cmd('create server replica1 with rpl_master=default,\
               script="replication/replica1.lua"')
@@ -179,6 +182,9 @@ box.cfg{
 -- Replica2 takes the limbo ownership and sends the transaction to the replica1.
 -- Along with the CONFIRM from the default node, which is still not applied
 -- on the replica1.
+box.ctl.promote()
+ | ---
+ | ...
 fiber = require('fiber')
  | ---
  | ...
@@ -261,3 +267,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5213-qsync-applier-order-3.test.lua b/test/replication/gh-5213-qsync-applier-order-3.test.lua
index 37b569da7..304656de0 100644
--- a/test/replication/gh-5213-qsync-applier-order-3.test.lua
+++ b/test/replication/gh-5213-qsync-applier-order-3.test.lua
@@ -30,6 +30,7 @@ box.schema.user.grant('guest', 'super')
 
 s = box.schema.space.create('test', {is_sync = true})
 _ = s:create_index('pk')
+box.ctl.promote()
 
 test_run:cmd('create server replica1 with rpl_master=default,\
               script="replication/replica1.lua"')
@@ -90,6 +91,7 @@ box.cfg{
 -- Replica2 takes the limbo ownership and sends the transaction to the replica1.
 -- Along with the CONFIRM from the default node, which is still not applied
 -- on the replica1.
+box.ctl.promote()
 fiber = require('fiber')
 f = fiber.new(function() box.space.test:replace{2} end)
 
@@ -123,3 +125,4 @@ box.cfg{
     replication_synchro_quorum = old_synchro_quorum,                            \
     replication_synchro_timeout = old_synchro_timeout,                          \
 }
+box.ctl.demote()
diff --git a/test/replication/gh-5213-qsync-applier-order.result b/test/replication/gh-5213-qsync-applier-order.result
index a8c24c289..ba6cdab06 100644
--- a/test/replication/gh-5213-qsync-applier-order.result
+++ b/test/replication/gh-5213-qsync-applier-order.result
@@ -29,6 +29,9 @@ s = box.schema.space.create('test', {is_sync = true})
 _ = s:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 test_run:cmd('create server replica with rpl_master=default,\
               script="replication/gh-5213-replica.lua"')
@@ -300,3 +303,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5213-qsync-applier-order.test.lua b/test/replication/gh-5213-qsync-applier-order.test.lua
index f1eccfa84..39b1912e8 100644
--- a/test/replication/gh-5213-qsync-applier-order.test.lua
+++ b/test/replication/gh-5213-qsync-applier-order.test.lua
@@ -14,6 +14,7 @@ box.schema.user.grant('guest', 'super')
 
 s = box.schema.space.create('test', {is_sync = true})
 _ = s:create_index('pk')
+box.ctl.promote()
 
 test_run:cmd('create server replica with rpl_master=default,\
               script="replication/gh-5213-replica.lua"')
@@ -120,3 +121,4 @@ box.cfg{
     replication_synchro_quorum = old_synchro_quorum,                            \
     replication_synchro_timeout = old_synchro_timeout,                          \
 }
+box.ctl.demote()
diff --git a/test/replication/gh-5288-qsync-recovery.result b/test/replication/gh-5288-qsync-recovery.result
index dc0babef6..704b71d93 100644
--- a/test/replication/gh-5288-qsync-recovery.result
+++ b/test/replication/gh-5288-qsync-recovery.result
@@ -12,6 +12,9 @@ s = box.schema.space.create('sync', {is_sync = true})
 _ = s:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 s:insert{1}
  | ---
  | - [1]
@@ -25,3 +28,6 @@ test_run:cmd('restart server default')
 box.space.sync:drop()
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5288-qsync-recovery.test.lua b/test/replication/gh-5288-qsync-recovery.test.lua
index 00bff7b87..2455f7278 100644
--- a/test/replication/gh-5288-qsync-recovery.test.lua
+++ b/test/replication/gh-5288-qsync-recovery.test.lua
@@ -5,7 +5,9 @@ test_run = require('test_run').new()
 --
 s = box.schema.space.create('sync', {is_sync = true})
 _ = s:create_index('pk')
+box.ctl.promote()
 s:insert{1}
 box.snapshot()
 test_run:cmd('restart server default')
 box.space.sync:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-5298-qsync-recovery-snap.result b/test/replication/gh-5298-qsync-recovery-snap.result
index 922831552..0883fe5f5 100644
--- a/test/replication/gh-5298-qsync-recovery-snap.result
+++ b/test/replication/gh-5298-qsync-recovery-snap.result
@@ -17,6 +17,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 for i = 1, 10 do box.space.sync:replace{i} end
  | ---
  | ...
@@ -98,3 +101,6 @@ box.space.sync:drop()
 box.space.loc:drop()
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5298-qsync-recovery-snap.test.lua b/test/replication/gh-5298-qsync-recovery-snap.test.lua
index 187f60d75..084cde963 100644
--- a/test/replication/gh-5298-qsync-recovery-snap.test.lua
+++ b/test/replication/gh-5298-qsync-recovery-snap.test.lua
@@ -8,6 +8,7 @@ engine = test_run:get_cfg('engine')
 --
 _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 for i = 1, 10 do box.space.sync:replace{i} end
 
 -- Local rows could affect this by increasing the signature.
@@ -46,3 +47,4 @@ box.cfg{
 }
 box.space.sync:drop()
 box.space.loc:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-5426-election-on-off.result b/test/replication/gh-5426-election-on-off.result
index 7444ef7f2..2bdc17ec6 100644
--- a/test/replication/gh-5426-election-on-off.result
+++ b/test/replication/gh-5426-election-on-off.result
@@ -168,6 +168,9 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
diff --git a/test/replication/gh-5426-election-on-off.test.lua b/test/replication/gh-5426-election-on-off.test.lua
index bdf06903b..6277e9ef2 100644
--- a/test/replication/gh-5426-election-on-off.test.lua
+++ b/test/replication/gh-5426-election-on-off.test.lua
@@ -69,4 +69,5 @@ box.cfg{
     election_mode = old_election_mode,                                          \
     replication_timeout = old_replication_timeout,                              \
 }
+box.ctl.demote()
 box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5433-election-restart-recovery.result b/test/replication/gh-5433-election-restart-recovery.result
index f8f32416e..ed63ff409 100644
--- a/test/replication/gh-5433-election-restart-recovery.result
+++ b/test/replication/gh-5433-election-restart-recovery.result
@@ -169,6 +169,9 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
diff --git a/test/replication/gh-5433-election-restart-recovery.test.lua b/test/replication/gh-5433-election-restart-recovery.test.lua
index 4aff000bf..ae1f42c4d 100644
--- a/test/replication/gh-5433-election-restart-recovery.test.lua
+++ b/test/replication/gh-5433-election-restart-recovery.test.lua
@@ -84,4 +84,5 @@ box.cfg{
     election_mode = old_election_mode,                                          \
     replication_timeout = old_replication_timeout,                              \
 }
+box.ctl.demote()
 box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result
index 2699231e5..20fab4072 100644
--- a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result
+++ b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result
@@ -49,6 +49,9 @@ _ = box.schema.space.create('test', {is_sync=true})
 _ = box.space.test:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 -- Fill the limbo with pending entries. 3 mustn't receive them yet.
 test_run:cmd('stop server election_replica3')
diff --git a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua
index 03705d96c..ec0f1d77e 100644
--- a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua
+++ b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua
@@ -21,6 +21,7 @@ box.ctl.wait_rw()
 
 _ = box.schema.space.create('test', {is_sync=true})
 _ = box.space.test:create_index('pk')
+box.ctl.promote()
 
 -- Fill the limbo with pending entries. 3 mustn't receive them yet.
 test_run:cmd('stop server election_replica3')
diff --git a/test/replication/gh-5438-election-state.result b/test/replication/gh-5438-election-state.result
index 6985f026a..68b6bfad8 100644
--- a/test/replication/gh-5438-election-state.result
+++ b/test/replication/gh-5438-election-state.result
@@ -47,6 +47,9 @@ end)
  | ...
 
 -- Cleanup.
+box.ctl.demote()
+ | ---
+ | ...
 box.cfg{election_mode = old_election_mode}
  | ---
  | ...
diff --git a/test/replication/gh-5438-election-state.test.lua b/test/replication/gh-5438-election-state.test.lua
index 60c3366c1..cf0f4ca23 100644
--- a/test/replication/gh-5438-election-state.test.lua
+++ b/test/replication/gh-5438-election-state.test.lua
@@ -22,6 +22,7 @@ test_run:wait_cond(function()\
 end)
 
 -- Cleanup.
+box.ctl.demote()
 box.cfg{election_mode = old_election_mode}
 test_run:cmd('stop server replica')
 test_run:cmd('delete server replica')
diff --git a/test/replication/gh-5446-qsync-eval-quorum.result b/test/replication/gh-5446-qsync-eval-quorum.result
index 5f83b248c..b3c217913 100644
--- a/test/replication/gh-5446-qsync-eval-quorum.result
+++ b/test/replication/gh-5446-qsync-eval-quorum.result
@@ -88,6 +88,9 @@ s = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = s:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 -- Only one master node -> 1/2 + 1 = 1
 s:insert{1} -- should pass
@@ -343,3 +346,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5446-qsync-eval-quorum.test.lua b/test/replication/gh-5446-qsync-eval-quorum.test.lua
index 6b9e324ed..c2901b845 100644
--- a/test/replication/gh-5446-qsync-eval-quorum.test.lua
+++ b/test/replication/gh-5446-qsync-eval-quorum.test.lua
@@ -37,6 +37,7 @@ end
 -- Create a sync space we will operate on
 s = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = s:create_index('pk')
+box.ctl.promote()
 
 -- Only one master node -> 1/2 + 1 = 1
 s:insert{1} -- should pass
@@ -135,3 +136,4 @@ box.cfg{
     replication_synchro_quorum = old_synchro_quorum,                            \
     replication_synchro_timeout = old_synchro_timeout,                          \
 }
+box.ctl.demote()
diff --git a/test/replication/gh-5506-election-on-off.result b/test/replication/gh-5506-election-on-off.result
index b8abd7ecd..a7f2b6a9c 100644
--- a/test/replication/gh-5506-election-on-off.result
+++ b/test/replication/gh-5506-election-on-off.result
@@ -138,3 +138,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5506-election-on-off.test.lua b/test/replication/gh-5506-election-on-off.test.lua
index 476b00ec0..f8915c333 100644
--- a/test/replication/gh-5506-election-on-off.test.lua
+++ b/test/replication/gh-5506-election-on-off.test.lua
@@ -66,3 +66,4 @@ box.cfg{
     election_mode = old_election_mode,                                          \
     replication_timeout = old_replication_timeout,                              \
 }
+box.ctl.demote()
diff --git a/test/replication/gh-5566-final-join-synchro.result b/test/replication/gh-5566-final-join-synchro.result
index a09882ba6..c5ae2f283 100644
--- a/test/replication/gh-5566-final-join-synchro.result
+++ b/test/replication/gh-5566-final-join-synchro.result
@@ -12,6 +12,9 @@ _ = box.schema.space.create('sync', {is_sync=true})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 box.schema.user.grant('guest', 'replication')
  | ---
@@ -137,3 +140,6 @@ test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5566-final-join-synchro.test.lua b/test/replication/gh-5566-final-join-synchro.test.lua
index 2db2c742f..25f411407 100644
--- a/test/replication/gh-5566-final-join-synchro.test.lua
+++ b/test/replication/gh-5566-final-join-synchro.test.lua
@@ -5,6 +5,7 @@ test_run = require('test_run').new()
 --
 _ = box.schema.space.create('sync', {is_sync=true})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 
 box.schema.user.grant('guest', 'replication')
 box.schema.user.grant('guest', 'write', 'space', 'sync')
@@ -59,3 +60,4 @@ box.cfg{\
 box.space.sync:drop()
 test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
+box.ctl.demote()
diff --git a/test/replication/gh-5874-qsync-txn-recovery.result b/test/replication/gh-5874-qsync-txn-recovery.result
index 73f903ca7..01328a9e3 100644
--- a/test/replication/gh-5874-qsync-txn-recovery.result
+++ b/test/replication/gh-5874-qsync-txn-recovery.result
@@ -31,6 +31,9 @@ sync = box.schema.create_space('sync', {is_sync = true, engine = engine})
 _ = sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 -- The transaction fails, but is written to the log anyway.
 box.begin() async:insert{1} sync:insert{1} box.commit()
@@ -160,3 +163,6 @@ sync:drop()
 loc:drop()
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5874-qsync-txn-recovery.test.lua b/test/replication/gh-5874-qsync-txn-recovery.test.lua
index f35eb68de..6ddf164ac 100644
--- a/test/replication/gh-5874-qsync-txn-recovery.test.lua
+++ b/test/replication/gh-5874-qsync-txn-recovery.test.lua
@@ -12,6 +12,7 @@ async = box.schema.create_space('async', {engine = engine})
 _ = async:create_index('pk')
 sync = box.schema.create_space('sync', {is_sync = true, engine = engine})
 _ = sync:create_index('pk')
+box.ctl.promote()
 
 -- The transaction fails, but is written to the log anyway.
 box.begin() async:insert{1} sync:insert{1} box.commit()
@@ -82,3 +83,4 @@ loc:select()
 async:drop()
 sync:drop()
 loc:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-6032-promote-wal-write.result b/test/replication/gh-6032-promote-wal-write.result
index 246c7974f..03112fb8d 100644
--- a/test/replication/gh-6032-promote-wal-write.result
+++ b/test/replication/gh-6032-promote-wal-write.result
@@ -67,3 +67,6 @@ box.cfg{\
 box.space.sync:drop()
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-6032-promote-wal-write.test.lua b/test/replication/gh-6032-promote-wal-write.test.lua
index 8c1859083..9a036a8b4 100644
--- a/test/replication/gh-6032-promote-wal-write.test.lua
+++ b/test/replication/gh-6032-promote-wal-write.test.lua
@@ -26,3 +26,4 @@ box.cfg{\
     replication_synchro_timeout = replication_synchro_timeout,\
 }
 box.space.sync:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-6034-election-promote-bump-term.result b/test/replication/gh-6034-election-promote-bump-term.result
index 8be4e8243..c64da0d5d 100644
--- a/test/replication/gh-6034-election-promote-bump-term.result
+++ b/test/replication/gh-6034-election-promote-bump-term.result
@@ -19,3 +19,8 @@ assert(box.info.election.term == term + 1)
  | ---
  | - true
  | ...
+
+-- Cleanup.
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-6034-election-promote-bump-term.test.lua b/test/replication/gh-6034-election-promote-bump-term.test.lua
index 1e814bf5d..dfb78f906 100644
--- a/test/replication/gh-6034-election-promote-bump-term.test.lua
+++ b/test/replication/gh-6034-election-promote-bump-term.test.lua
@@ -7,3 +7,6 @@ box.cfg{election_mode='off'}
 term = box.info.election.term
 box.ctl.promote()
 assert(box.info.election.term == term + 1)
+
+-- Cleanup.
+box.ctl.demote()
diff --git a/test/replication/gh-6034-qsync-limbo-ownership.result b/test/replication/gh-6034-qsync-limbo-ownership.result
new file mode 100644
index 000000000..0da3e5c2d
--- /dev/null
+++ b/test/replication/gh-6034-qsync-limbo-ownership.result
@@ -0,0 +1,186 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+--
+-- gh-6034: test that transactional limbo isn't accessible without a promotion.
+--
+synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+election_mode = box.cfg.election_mode
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum = 1, election_mode='off'}
+ | ---
+ | ...
+
+_ = box.schema.space.create('async'):create_index('pk')
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true}):create_index('pk')
+ | ---
+ | ...
+
+-- Limbo is initially unclaimed, everyone is writeable.
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == 0)
+ | ---
+ | - true
+ | ...
+box.space.async:insert{1} -- success.
+ | ---
+ | - [1]
+ | ...
+-- Synchro spaces aren't writeable
+box.space.sync:insert{1} -- error.
+ | ---
+ | - error: The synchronous transaction queue doesn't belong to any instance
+ | ...
+
+box.ctl.promote()
+ | ---
+ | ...
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == box.info.id)
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{1} -- success.
+ | ---
+ | - [1]
+ | ...
+
+-- Everyone but the limbo owner is read-only.
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+test_run:cmd('set variable rpl_listen to "replica.listen"')
+ | ---
+ | - true
+ | ...
+orig_replication = box.cfg.replication
+ | ---
+ | ...
+box.cfg{replication=rpl_listen}
+ | ---
+ | ...
+
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+assert(box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == test_run:get_server_id('default'))
+ | ---
+ | - true
+ | ...
+box.space.async:insert{2} -- failure.
+ | ---
+ | - error: Can't modify data because this instance is in read-only mode.
+ | ...
+
+-- Promotion on the other node. Default should become ro.
+box.ctl.promote()
+ | ---
+ | ...
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == box.info.id)
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{2} -- success.
+ | ---
+ | - [2]
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+assert(box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == test_run:get_server_id('replica'))
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{3} -- failure.
+ | ---
+ | - error: Can't modify data because this instance is in read-only mode.
+ | ...
+
+box.ctl.promote()
+ | ---
+ | ...
+box.ctl.demote()
+ | ---
+ | ...
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{3} -- still fails.
+ | ---
+ | - error: The synchronous transaction queue doesn't belong to any instance
+ | ...
+assert(box.info.synchro.queue.owner == 0)
+ | ---
+ | - true
+ | ...
+box.space.async:insert{3} -- success.
+ | ---
+ | - [3]
+ | ...
+
+-- Cleanup.
+box.ctl.demote()
+ | ---
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+box.space.async:drop()
+ | ---
+ | ...
+box.cfg{\
+    replication_synchro_quorum = synchro_quorum,\
+    election_mode = election_mode,\
+    replication = orig_replication,\
+}
+ | ---
+ | ...
diff --git a/test/replication/gh-6034-qsync-limbo-ownership.test.lua b/test/replication/gh-6034-qsync-limbo-ownership.test.lua
new file mode 100644
index 000000000..a0c49575a
--- /dev/null
+++ b/test/replication/gh-6034-qsync-limbo-ownership.test.lua
@@ -0,0 +1,68 @@
+test_run = require('test_run').new()
+
+--
+-- gh-6034: test that transactional limbo isn't accessible without a promotion.
+--
+synchro_quorum = box.cfg.replication_synchro_quorum
+election_mode = box.cfg.election_mode
+box.cfg{replication_synchro_quorum = 1, election_mode='off'}
+
+_ = box.schema.space.create('async'):create_index('pk')
+_ = box.schema.space.create('sync', {is_sync=true}):create_index('pk')
+
+-- Limbo is initially unclaimed, everyone is writeable.
+assert(not box.info.ro)
+assert(box.info.synchro.queue.owner == 0)
+box.space.async:insert{1} -- success.
+-- Synchro spaces aren't writeable
+box.space.sync:insert{1} -- error.
+
+box.ctl.promote()
+assert(not box.info.ro)
+assert(box.info.synchro.queue.owner == box.info.id)
+box.space.sync:insert{1} -- success.
+
+-- Everyone but the limbo owner is read-only.
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+test_run:cmd('set variable rpl_listen to "replica.listen"')
+orig_replication = box.cfg.replication
+box.cfg{replication=rpl_listen}
+
+test_run:switch('replica')
+assert(box.info.ro)
+assert(box.info.synchro.queue.owner == test_run:get_server_id('default'))
+box.space.async:insert{2} -- failure.
+
+-- Promotion on the other node. Default should become ro.
+box.ctl.promote()
+assert(not box.info.ro)
+assert(box.info.synchro.queue.owner == box.info.id)
+box.space.sync:insert{2} -- success.
+
+test_run:switch('default')
+assert(box.info.ro)
+assert(box.info.synchro.queue.owner == test_run:get_server_id('replica'))
+box.space.sync:insert{3} -- failure.
+
+box.ctl.promote()
+box.ctl.demote()
+assert(not box.info.ro)
+box.space.sync:insert{3} -- still fails.
+assert(box.info.synchro.queue.owner == 0)
+box.space.async:insert{3} -- success.
+
+-- Cleanup.
+box.ctl.demote()
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.schema.user.revoke('guest', 'replication')
+box.space.sync:drop()
+box.space.async:drop()
+box.cfg{\
+    replication_synchro_quorum = synchro_quorum,\
+    election_mode = election_mode,\
+    replication = orig_replication,\
+}
diff --git a/test/replication/gh-6057-qsync-confirm-async-no-wal.result b/test/replication/gh-6057-qsync-confirm-async-no-wal.result
index 23c77729b..e7beefb2a 100644
--- a/test/replication/gh-6057-qsync-confirm-async-no-wal.result
+++ b/test/replication/gh-6057-qsync-confirm-async-no-wal.result
@@ -40,6 +40,10 @@ _ = s2:create_index('pk')
  | ---
  | ...
 
+box.ctl.promote()
+ | ---
+ | ...
+
 errinj = box.error.injection
  | ---
  | ...
@@ -161,3 +165,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua b/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
index a11ddc042..bb459ea02 100644
--- a/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
+++ b/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
@@ -21,6 +21,8 @@ _ = s:create_index('pk')
 s2 = box.schema.create_space('test2')
 _ = s2:create_index('pk')
 
+box.ctl.promote()
+
 errinj = box.error.injection
 
 function create_hanging_async_after_confirm(sync_key, async_key1, async_key2)   \
@@ -86,3 +88,4 @@ box.cfg{
     replication_synchro_quorum = old_synchro_quorum,                            \
     replication_synchro_timeout = old_synchro_timeout,                          \
 }
+box.ctl.demote()
diff --git a/test/replication/hang_on_synchro_fail.result b/test/replication/hang_on_synchro_fail.result
index 9f6fac00b..dda15af20 100644
--- a/test/replication/hang_on_synchro_fail.result
+++ b/test/replication/hang_on_synchro_fail.result
@@ -19,6 +19,9 @@ _ = box.schema.space.create('sync', {is_sync=true})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 old_synchro_quorum = box.cfg.replication_synchro_quorum
  | ---
@@ -127,4 +130,7 @@ box.space.sync:drop()
 box.schema.user.revoke('guest', 'replication')
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 
diff --git a/test/replication/hang_on_synchro_fail.test.lua b/test/replication/hang_on_synchro_fail.test.lua
index 6c3b09fab..f0d494eae 100644
--- a/test/replication/hang_on_synchro_fail.test.lua
+++ b/test/replication/hang_on_synchro_fail.test.lua
@@ -8,6 +8,7 @@ box.schema.user.grant('guest', 'replication')
 
 _ = box.schema.space.create('sync', {is_sync=true})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 
 old_synchro_quorum = box.cfg.replication_synchro_quorum
 box.cfg{replication_synchro_quorum=3}
@@ -54,4 +55,5 @@ box.cfg{replication_synchro_quorum=old_synchro_quorum,\
         replication_synchro_timeout=old_synchro_timeout}
 box.space.sync:drop()
 box.schema.user.revoke('guest', 'replication')
+box.ctl.demote()
 
diff --git a/test/replication/qsync_advanced.result b/test/replication/qsync_advanced.result
index 94b19b1f2..72ac0c326 100644
--- a/test/replication/qsync_advanced.result
+++ b/test/replication/qsync_advanced.result
@@ -72,6 +72,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 -- Testcase body.
 box.space.sync:insert{1} -- success
  | ---
@@ -468,6 +471,9 @@ box.space.sync:select{} -- 1
 box.cfg{read_only=false} -- promote replica to master
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 test_run:switch('default')
  | ---
  | - true
@@ -508,6 +514,9 @@ test_run:switch('default')
 box.cfg{read_only=false}
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 test_run:switch('replica')
  | ---
  | - true
@@ -781,3 +790,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_advanced.test.lua b/test/replication/qsync_advanced.test.lua
index 058ece602..37c285b8d 100644
--- a/test/replication/qsync_advanced.test.lua
+++ b/test/replication/qsync_advanced.test.lua
@@ -30,6 +30,7 @@ test_run:switch('default')
 box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 -- Testcase body.
 box.space.sync:insert{1} -- success
 test_run:cmd('switch replica')
@@ -170,6 +171,7 @@ box.space.sync:select{} -- 1
 test_run:switch('replica')
 box.space.sync:select{} -- 1
 box.cfg{read_only=false} -- promote replica to master
+box.ctl.promote()
 test_run:switch('default')
 box.cfg{read_only=true} -- demote master to replica
 test_run:switch('replica')
@@ -181,6 +183,7 @@ box.space.sync:select{} -- 1, 2
 -- Revert cluster configuration.
 test_run:switch('default')
 box.cfg{read_only=false}
+box.ctl.promote()
 test_run:switch('replica')
 box.cfg{read_only=true}
 -- Testcase cleanup.
@@ -279,3 +282,4 @@ box.cfg{
     replication_synchro_quorum = orig_synchro_quorum,                           \
     replication_synchro_timeout = orig_synchro_timeout,                         \
 }
+box.ctl.demote()
diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
index 7e711ba13..bbdfc42fe 100644
--- a/test/replication/qsync_basic.result
+++ b/test/replication/qsync_basic.result
@@ -14,6 +14,9 @@ s1.is_sync
 pk = s1:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 box.begin() s1:insert({1}) s1:insert({2}) box.commit()
  | ---
  | ...
@@ -645,19 +648,12 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
- | ---
- | ...
-f = fiber.create(function() box.space.sync:replace{1} end)
+box.ctl.demote()
  | ---
  | ...
-test_run:wait_lsn('replica', 'default')
+box.space.sync:replace{1}
  | ---
- | ...
-
-test_run:switch('replica')
- | ---
- | - true
+ | - error: The synchronous transaction queue doesn't belong to any instance
  | ...
 function skip_row() return nil end
  | ---
@@ -674,26 +670,22 @@ box.space.sync:replace{2}
 box.space.sync:before_replace(nil, skip_row)
  | ---
  | ...
-assert(box.space.sync:get{2} == nil)
+assert(box.space.sync:get{1} == nil)
  | ---
  | - true
  | ...
-assert(box.space.sync:get{1} ~= nil)
+assert(box.space.sync:get{2} == nil)
  | ---
  | - true
  | ...
-
-test_run:switch('default')
+assert(box.info.lsn == old_lsn + 1)
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum = 2}
+box.ctl.promote()
  | ---
  | ...
-test_run:wait_cond(function() return f:status() == 'dead' end)
- | ---
- | - true
- | ...
+
 box.space.sync:truncate()
  | ---
  | ...
@@ -758,3 +750,6 @@ box.space.sync:drop()
 box.schema.user.revoke('guest', 'replication')
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
index 75c9b222b..eac465e25 100644
--- a/test/replication/qsync_basic.test.lua
+++ b/test/replication/qsync_basic.test.lua
@@ -6,6 +6,7 @@
 s1 = box.schema.create_space('test1', {is_sync = true})
 s1.is_sync
 pk = s1:create_index('pk')
+box.ctl.promote()
 box.begin() s1:insert({1}) s1:insert({2}) box.commit()
 s1:select{}
 
@@ -253,22 +254,18 @@ box.space.sync:count()
 -- instances, but also works for local rows.
 --
 test_run:switch('default')
-box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
-f = fiber.create(function() box.space.sync:replace{1} end)
-test_run:wait_lsn('replica', 'default')
-
-test_run:switch('replica')
+box.ctl.demote()
+box.space.sync:replace{1}
 function skip_row() return nil end
 old_lsn = box.info.lsn
 _ = box.space.sync:before_replace(skip_row)
 box.space.sync:replace{2}
 box.space.sync:before_replace(nil, skip_row)
+assert(box.space.sync:get{1} == nil)
 assert(box.space.sync:get{2} == nil)
-assert(box.space.sync:get{1} ~= nil)
+assert(box.info.lsn == old_lsn + 1)
+box.ctl.promote()
 
-test_run:switch('default')
-box.cfg{replication_synchro_quorum = 2}
-test_run:wait_cond(function() return f:status() == 'dead' end)
 box.space.sync:truncate()
 
 --
@@ -301,3 +298,4 @@ test_run:cmd('delete server replica')
 box.space.test:drop()
 box.space.sync:drop()
 box.schema.user.revoke('guest', 'replication')
+box.ctl.demote()
diff --git a/test/replication/qsync_errinj.result b/test/replication/qsync_errinj.result
index 635bcf939..cf1e30a90 100644
--- a/test/replication/qsync_errinj.result
+++ b/test/replication/qsync_errinj.result
@@ -35,6 +35,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 
 --
 -- gh-5100: slow ACK sending shouldn't stun replica for the
@@ -542,3 +545,6 @@ box.space.sync:drop()
 box.schema.user.revoke('guest', 'super')
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_errinj.test.lua b/test/replication/qsync_errinj.test.lua
index 6a9fd3e1a..e7c85c58c 100644
--- a/test/replication/qsync_errinj.test.lua
+++ b/test/replication/qsync_errinj.test.lua
@@ -12,6 +12,7 @@ test_run:cmd('start server replica with wait=True, wait_load=True')
 
 _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 
 --
 -- gh-5100: slow ACK sending shouldn't stun replica for the
@@ -222,3 +223,4 @@ test_run:cmd('delete server replica')
 
 box.space.sync:drop()
 box.schema.user.revoke('guest', 'super')
+box.ctl.demote()
diff --git a/test/replication/qsync_snapshots.result b/test/replication/qsync_snapshots.result
index cafdd63c8..ca418b168 100644
--- a/test/replication/qsync_snapshots.result
+++ b/test/replication/qsync_snapshots.result
@@ -57,6 +57,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 -- Testcase body.
 box.space.sync:insert{1}
  | ---
@@ -299,3 +302,6 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_snapshots.test.lua b/test/replication/qsync_snapshots.test.lua
index 590610974..82c2e3f7c 100644
--- a/test/replication/qsync_snapshots.test.lua
+++ b/test/replication/qsync_snapshots.test.lua
@@ -23,6 +23,7 @@ test_run:switch('default')
 box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 -- Testcase body.
 box.space.sync:insert{1}
 box.space.sync:select{} -- 1
@@ -130,3 +131,4 @@ box.cfg{
     replication_synchro_quorum = orig_synchro_quorum,                           \
     replication_synchro_timeout = orig_synchro_timeout,                         \
 }
+box.ctl.demote()
diff --git a/test/replication/qsync_with_anon.result b/test/replication/qsync_with_anon.result
index 6a2952a32..99c6fb902 100644
--- a/test/replication/qsync_with_anon.result
+++ b/test/replication/qsync_with_anon.result
@@ -57,6 +57,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+box.ctl.promote()
+ | ---
+ | ...
 -- Testcase body.
 test_run:switch('default')
  | ---
@@ -220,6 +223,9 @@ box.cfg{
 }
  | ---
  | ...
+box.ctl.demote()
+ | ---
+ | ...
 test_run:cleanup_cluster()
  | ---
  | ...
diff --git a/test/replication/qsync_with_anon.test.lua b/test/replication/qsync_with_anon.test.lua
index d7ecaa107..e73880ec7 100644
--- a/test/replication/qsync_with_anon.test.lua
+++ b/test/replication/qsync_with_anon.test.lua
@@ -22,6 +22,7 @@ test_run:switch('default')
 box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
+box.ctl.promote()
 -- Testcase body.
 test_run:switch('default')
 box.space.sync:insert{1} -- success
@@ -81,4 +82,5 @@ box.cfg{
     replication_synchro_quorum = orig_synchro_quorum,                           \
     replication_synchro_timeout = orig_synchro_timeout,                         \
 }
+box.ctl.demote()
 test_run:cleanup_cluster()
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 7a6fd7052..9e284b3f2 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -48,6 +48,7 @@
     "gh-5613-bootstrap-prefer-booted.test.lua": {},
     "gh-6027-applier-error-show.test.lua": {},
     "gh-6032-promote-wal-write.test.lua": {},
+    "gh-6034-qsync-limbo-ownership.test.lua": {},
     "gh-6034-election-promote-bump-term.test.lua": {},
     "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
     "gh-6094-rs-uuid-mismatch.test.lua": {},
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 13/16] box: allow calling promote on a candidate
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 13/16] box: allow calling promote on a candidate Serge Petrenko via Tarantool-patches
@ 2021-07-15 14:06   ` Serge Petrenko via Tarantool-patches
  0 siblings, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-15 14:06 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches



14.07.2021 21:25, Serge Petrenko пишет:
> Part of #6034
> ---
>   src/box/box.cc                                | 15 +---
>   .../gh-6034-election-candidate-promote.result | 84 +++++++++++++++++++
>   ...h-6034-election-candidate-promote.test.lua | 42 ++++++++++
>   test/replication/suite.cfg                    |  1 +
>   4 files changed, 128 insertions(+), 14 deletions(-)
>   create mode 100644 test/replication/gh-6034-election-candidate-promote.result
>   create mode 100644 test/replication/gh-6034-election-candidate-promote.test.lua
>

Fixed an obvious typo:

=========================

diff --git a/test/replication/gh-6034-election-candidate-promote.result 
b/test/replication/gh-6034-election-candidate-promote.result
index 76a54d5a6..2b4bc0213 100644
--- a/test/replication/gh-6034-election-candidate-promote.result
+++ b/test/replication/gh-6034-election-candidate-promote.result
@@ -60,7 +60,7 @@ next_nr = leader_nr % 3 + 1
   | ...
  -- Someone else may become a leader, thus promote may fail. But we're 
testing
  -- that it takes effect at all, so that's fine.
-_ = pcall(test_run:eval('election_replica'..next_nr, 'box.ctl.promote()'))
+_ = pcall(test_run.eval, test_run, 'election_replica'..next_nr, 
'box.ctl.promote()')
   | ---
   | ...
  new_term = test_run:eval('election_replica'..next_nr,\
diff --git 
a/test/replication/gh-6034-election-candidate-promote.test.lua 
b/test/replication/gh-6034-election-candidate-promote.test.lua
index 24c57f4cb..c25f9296d 100644
--- a/test/replication/gh-6034-election-candidate-promote.test.lua
+++ b/test/replication/gh-6034-election-candidate-promote.test.lua
@@ -32,7 +32,7 @@ term = test_run:eval('election_replica'..leader_nr,\
  next_nr = leader_nr % 3 + 1
  -- Someone else may become a leader, thus promote may fail. But we're 
testing
  -- that it takes effect at all, so that's fine.
-_ = pcall(test_run:eval('election_replica'..next_nr, 'box.ctl.promote()'))
+_ = pcall(test_run.eval, test_run, 'election_replica'..next_nr, 
'box.ctl.promote()')
  new_term = test_run:eval('election_replica'..next_nr,\
                           'return box.info.election.term')[1]
  assert(new_term > term)

=========================

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 16/16] box: introduce `box.ctl.demote`
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 16/16] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
@ 2021-07-15 17:13   ` Serge Petrenko via Tarantool-patches
  2021-07-15 20:11   ` [Tarantool-patches] [PATCH v4 17/16] replication: fix flaky election_qsync.test Serge Petrenko via Tarantool-patches
  1 sibling, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-15 17:13 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches



14.07.2021 21:25, Serge Petrenko пишет:

Another simple fix:

=================================

diff --git a/test/replication/gh-6034-qsync-limbo-ownership.result 
b/test/replication/gh-6034-qsync-limbo-ownership.result
index 0da3e5c2d..b4f53cd2a 100644
--- a/test/replication/gh-6034-qsync-limbo-ownership.result
+++ b/test/replication/gh-6034-qsync-limbo-ownership.result
@@ -120,6 +120,9 @@ test_run:switch('default')
   | ---
   | - true
   | ...
+test_run:wait_lsn('default', 'replica')
+ | ---
+ | ...
  assert(box.info.ro)
   | ---
   | - true
diff --git a/test/replication/gh-6034-qsync-limbo-ownership.test.lua 
b/test/replication/gh-6034-qsync-limbo-ownership.test.lua
index a0c49575a..f9ef5ca41 100644
--- a/test/replication/gh-6034-qsync-limbo-ownership.test.lua
+++ b/test/replication/gh-6034-qsync-limbo-ownership.test.lua
@@ -43,6 +43,7 @@ assert(box.info.synchro.queue.owner == box.info.id)
  box.space.sync:insert{2} -- success.

  test_run:switch('default')
+test_run:wait_lsn('default', 'replica')
  assert(box.info.ro)
  assert(box.info.synchro.queue.owner == test_run:get_server_id('replica'))
  box.space.sync:insert{3} -- failure.

=================================

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Tarantool-patches] [PATCH v4 17/16] replication: fix flaky election_qsync.test
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 16/16] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
  2021-07-15 17:13   ` Serge Petrenko via Tarantool-patches
@ 2021-07-15 20:11   ` Serge Petrenko via Tarantool-patches
  1 sibling, 0 replies; 30+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-15 20:11 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov; +Cc: tarantool-patches

Fix the test failing occasionally with the following result mismatch:

[001] replication/election_qsync.test.lua memtx           [ fail ]
[001]
[001] Test failed! Result content mismatch:
[001] --- replication/election_qsync.result    Thu Jul 15 17:15:48 2021
[001] +++ var/rejects/replication/election_qsync.reject    Thu Jul 15 
20:46:51 2021
[001] @@ -145,8 +145,7 @@
[001]   | ...
[001]  box.space.test:select{}
[001]   | ---
[001] - | - - [1]
[001] - |   - [2]
[001] + | - - [2]
[001]   | ...
[001]  box.space.test:drop()
[001]   | ---
[001]

The issue happened because row [1] wasn't delivered to the 'default'
instance from the 'replica' at all. The test does try to wait for [1] to
be written to WAL and replicated, but sometimes it fails to wait until
this event happens:

box.ctl.promote() is issued asynchronously once the instance becomes the
Raft leader. So issuing `box.ctl.wait_rw()` doesn't guarantee that the
replica has already written the PROMOTE (the limbo is initially
unclaimed so replica becomes writeable as soon as it becomes the Raft
leader).

Right after `wait_rw()` we wait for lsn propagation and for 'default'
instance to reach replica's lsn. It may happen that lsn propagation
happens due to PROMOTE being written to WAL, and not row [1].

When this is the case, the 'default' instance doesn't receive row [1] at
all, resulting in the test error shown above.

Fix the issue by waiting for the promotion to happen explicitly.

Part of #5430
---
  test/replication/election_qsync.result   | 8 +++++++-
  test/replication/election_qsync.test.lua | 7 ++++++-
  2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/test/replication/election_qsync.result 
b/test/replication/election_qsync.result
index 2402c8578..c6ec5e352 100644
--- a/test/replication/election_qsync.result
+++ b/test/replication/election_qsync.result
@@ -75,13 +75,19 @@ box.cfg{
   | ---
   | ...

-box.ctl.wait_rw()
+-- Promote is written asynchronously to the instance becoming the 
leader, so
+-- wait for it. As soon as it's written, the instance's definitely a 
leader.
+test_run:wait_cond(function() \
+    return box.info.synchro.queue.owner == 
box.info.id                          \
+end)
   | ---
+ | - true
   | ...
  assert(box.info.election.state == 'leader')
   | ---
   | - true
   | ...
+
  lsn = box.info.lsn
   | ---
   | ...
diff --git a/test/replication/election_qsync.test.lua 
b/test/replication/election_qsync.test.lua
index e1aca8351..f3c7c290b 100644
--- a/test/replication/election_qsync.test.lua
+++ b/test/replication/election_qsync.test.lua
@@ -39,8 +39,13 @@ box.cfg{
      replication_timeout = 
0.1,                                                  \
  }

-box.ctl.wait_rw()
+-- Promote is written asynchronously to the instance becoming the 
leader, so
+-- wait for it. As soon as it's written, the instance's definitely a 
leader.
+test_run:wait_cond(function() \
+    return box.info.synchro.queue.owner == 
box.info.id                          \
+end)
  assert(box.info.election.state == 'leader')
+
  lsn = box.info.lsn
  _ = fiber.create(function() \
      ok, err = pcall(box.space.test.replace, box.space.test, 
{1})                \
-- 
2.30.1 (Apple Git-130)



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 06/16] replication: send latest effective promote in initial join
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 06/16] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
@ 2021-07-21 23:24   ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-23  7:44     ` Sergey Petrenko via Tarantool-patches
  0 siblings, 1 reply; 30+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-21 23:24 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches

Hi! Good job on the patch!

> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index 4ebe0fb06..4b102a777 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -442,7 +445,11 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
>  		row.type = IPROTO_JOIN_META;
>  		coio_write_xrow(&relay->io, &row);
>  
> -		/* Empty at the moment. */
> +		char body[XROW_SYNCHRO_BODY_LEN_MAX];
> +		xrow_encode_synchro(&row, body, &req);
> +		row.replica_id = req.replica_id;
> +		row.sync = sync;

Shouldn't we also attach sync to IPROTO_JOIN_SNAPSHOT and
IPROTO_JOIN_META? Although I don't see where it is used in
the applier. But I see it is attached to xrow_encode_vclock
a bit above.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 07/16] replication: send current Raft term in join response
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 07/16] replication: send current Raft term in join response Serge Petrenko via Tarantool-patches
@ 2021-07-21 23:24   ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-23  7:44     ` Sergey Petrenko via Tarantool-patches
  0 siblings, 1 reply; 30+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-21 23:24 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches

Thanks for the patch!

> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index 4b102a777..70f1a045b 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -451,6 +453,10 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
>  		row.sync = sync;
>  		coio_write_xrow(&relay->io, &row);
>  
> +		xrow_encode_raft(&row, &fiber()->gc, &raft_req);
> +		row.sync = sync;
> +		coio_write_xrow(&relay->io, &row);

You might need to send it before the limbo checkpoint. Otherwise
the problem is exactly the same - the master could send the limbo
state, the replica applied it, then the master dies before sending the
raft state, then the replica has limbo's term > raft term.

I wish we could test it somehow simple.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 09/16] box: split promote() into reasonable parts
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 09/16] box: split promote() into reasonable parts Serge Petrenko via Tarantool-patches
@ 2021-07-21 23:26   ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-23  7:45     ` Sergey Petrenko via Tarantool-patches
  0 siblings, 1 reply; 30+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-21 23:26 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches

Thanks for working on this!

See 3 comments below.

> diff --git a/src/box/box.cc b/src/box/box.cc
> index 86370514a..445875f8f 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1527,6 +1527,147 @@ box_wait_quorum(uint32_t lead_id, int64_t target_lsn, int quorum,

<...>

> +
> +/**
> + * Check whether the greatest promote term has changed since it was last read.
> + * IOW check that a foreign PROMOTE arrived while we were sleeping.
> + */
> +static int
> +box_check_promote_term_changed(uint64_t promote_term)

1. Normally you call check functions using the pattern
"check_something_correct". Here the correct behaviour is the term
being intact. So I propose to rename it to box_check_promote_term_intact.

> +{
> +	if (txn_limbo.promote_greatest_term != promote_term) {
> +		diag_set(ClientError, ER_INTERFERING_PROMOTE,
> +			 txn_limbo.owner_id);
> +		return -1;
> +	}
> +	return 0;
> +}

<...>

> +
> +/**
> + * A helper to wait until all limbo entries are ready to be confirmed, i.e.
> + * written to WAL and have gathered a quorum of ACKs from replicas.
> + * Return lsn of the last limbo entry on success, -1 on error.
> + */
> +static int64_t
> +box_wait_limbo_acked(void)
> +{
> +	if (txn_limbo_is_empty(&txn_limbo))
> +		return txn_limbo.confirmed_lsn;
> +
> +	uint64_t promote_term = txn_limbo.promote_greatest_term;
> +	int quorum = replication_synchro_quorum;
> +	struct txn_limbo_entry *last_entry;
> +	last_entry = txn_limbo_last_synchro_entry(&txn_limbo);
> +	/* Wait for the last entries WAL write. */
> +	if (last_entry->lsn < 0) {
> +		int64_t tid = last_entry->txn->id;
> +
> +		if (wal_sync(NULL) < 0)
> +			return -1;
> +
> +		if (box_check_promote_term_changed(promote_term) < 0)

2. Why < 0? It is not a in the code guidelines, but don't we usually
use '!= 0'? '< 0' normally assumes you can get > 0, 0, and < 0 meaning
different things, like it is done in iproto occassionally.

> +			return -1;
> +		if (txn_limbo_is_empty(&txn_limbo))
> +			return txn_limbo.confirmed_lsn;
> +		if (tid != txn_limbo_last_synchro_entry(&txn_limbo)->txn->id) {
> +			diag_set(ClientError, ER_QUORUM_WAIT, quorum,
> +				 "new synchronous transactions appeared");
> +			return -1;
> +		}
> +	}

<...>

> +
> +/** Write and process a PROMOTE request. */
> +static void
> +box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
> +{
> +	assert(box_raft()->volatile_term == box_raft()->term);
> +	assert(promote_lsn >= 0);
> +	txn_limbo_write_promote(&txn_limbo, promote_lsn,
> +				box_raft()->term);

3. Maybe cache box_raft() value in a variable? Its usage would look shorter
then. The same in other places where it is used more than once. Up to
you.

> +	struct synchro_request req = {
> +		.type = IPROTO_PROMOTE,
> +		.replica_id = prev_leader_id,
> +		.origin_id = instance_id,
> +		.lsn = promote_lsn,
> +		.term = box_raft()->term,
> +	};
> +	txn_limbo_process(&txn_limbo, &req);
> +	assert(txn_limbo_is_empty(&txn_limbo));
> +}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 11/16] box: make promote on the current leader a no-op
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 11/16] box: make promote on the current leader a no-op Serge Petrenko via Tarantool-patches
@ 2021-07-21 23:26   ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-23  7:45     ` Sergey Petrenko via Tarantool-patches
  0 siblings, 1 reply; 30+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-21 23:26 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches

Thanks for the patch!

Could you please add a test? Not even necessary in its own file. You
could use qsync_basic/election_basic.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 12/16] box: fix an assertion failure after a spurious wakeup in promote
  2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 12/16] box: fix an assertion failure after a spurious wakeup in promote Serge Petrenko via Tarantool-patches
@ 2021-07-21 23:29   ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-23  7:45     ` Sergey Petrenko via Tarantool-patches
  0 siblings, 1 reply; 30+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-21 23:29 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov; +Cc: tarantool-patches

I appreciate the work you did here!

See 2 comments below.

> diff --git a/test/replication/gh-3055-promote-wakeup-crash.result b/test/replication/gh-3055-promote-wakeup-crash.result
> new file mode 100644
> index 000000000..e508611e5
> --- /dev/null
> +++ b/test/replication/gh-3055-promote-wakeup-crash.result
> @@ -0,0 +1,43 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +--
> +-- gh-3055 follow-up: box.ctl.promote() could crash on an assertion after a
> +-- spurious wakeup.
> +--
> +_ = box.space._cluster:insert{2, require('uuid').str()}
> + | ---
> + | ...
> +box.cfg{election_mode='manual',\
> +        replication_synchro_quorum=2,\
> +        election_timeout=1000}

1. Shouldn't you save and restore the old option values in the end
of the test?

> + | ---
> + | ...
> +
> +fiber = require('fiber')
> + | ---
> + | ...
> +f = fiber.create(function() box.ctl.promote() end)
> + | ---
> + | ...
> +f:set_joinable(true)

2. But you never use :join(). Perhaps you might want to call it
after :cancel(). To ensure the fiber really ended.

> + | ---
> + | ...
> +f:wakeup()
> + | ---
> + | ...
> +fiber.yield()
> + | ---
> + | ...
> +
> +-- Cleanup.
> +f:cancel()
> + | ---
> + | ...
> +box.cfg{election_mode='off'}
> + | ---
> + | ...
> +test_run:cleanup_cluster()
> + | ---
> + | ...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 06/16] replication: send latest effective promote in initial join
  2021-07-21 23:24   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-23  7:44     ` Sergey Petrenko via Tarantool-patches
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Petrenko via Tarantool-patches @ 2021-07-23  7:44 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches


22.07.2021 01:24, Vladislav Shpilevoy пишет:
> Hi! Good job on the patch!
>
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index 4ebe0fb06..4b102a777 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -442,7 +445,11 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
>>   		row.type = IPROTO_JOIN_META;
>>   		coio_write_xrow(&relay->io, &row);
>>   
>> -		/* Empty at the moment. */
>> +		char body[XROW_SYNCHRO_BODY_LEN_MAX];
>> +		xrow_encode_synchro(&row, body, &req);
>> +		row.replica_id = req.replica_id;
>> +		row.sync = sync;
> Shouldn't we also attach sync to IPROTO_JOIN_SNAPSHOT and
> IPROTO_JOIN_META? Although I don't see where it is used in
> the applier. But I see it is attached to xrow_encode_vclock
> a bit above.

Yes, indeed. Will it be enough to set sync once only when sending the 
vclock?

The field shouldn't be overwritten by anyone.

===========================

diff --git a/src/box/relay.cc b/src/box/relay.cc
index 4b102a777..db1d595e2 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -448,7 +448,6 @@ relay_initial_join(int fd, uint64_t sync, struct 
vclock *vclock,
                 char body[XROW_SYNCHRO_BODY_LEN_MAX];
                 xrow_encode_synchro(&row, body, &req);
                 row.replica_id = req.replica_id;
-               row.sync = sync;
                 coio_write_xrow(&relay->io, &row);

                 /* Mark the end of the metadata stream. */

===========================


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 07/16] replication: send current Raft term in join response
  2021-07-21 23:24   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-23  7:44     ` Sergey Petrenko via Tarantool-patches
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Petrenko via Tarantool-patches @ 2021-07-23  7:44 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches


22.07.2021 01:24, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index 4b102a777..70f1a045b 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -451,6 +453,10 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
>>   		row.sync = sync;
>>   		coio_write_xrow(&relay->io, &row);
>>   
>> +		xrow_encode_raft(&row, &fiber()->gc, &raft_req);
>> +		row.sync = sync;
>> +		coio_write_xrow(&relay->io, &row);
> You might need to send it before the limbo checkpoint. Otherwise
> the problem is exactly the same - the master could send the limbo
> state, the replica applied it, then the master dies before sending the
> raft state, then the replica has limbo's term > raft term.


Yep, thanks for noticing! Fixed.

===============================

diff --git a/src/box/relay.cc b/src/box/relay.cc
index b413b713c..805b5e7ff 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -447,15 +447,14 @@ relay_initial_join(int fd, uint64_t sync, struct 
vclock *vclock,
                 row.type = IPROTO_JOIN_META;
                 coio_write_xrow(&relay->io, &row);

+               xrow_encode_raft(&row, &fiber()->gc, &raft_req);
+               coio_write_xrow(&relay->io, &row);
+
                 char body[XROW_SYNCHRO_BODY_LEN_MAX];
                 xrow_encode_synchro(&row, body, &req);
                 row.replica_id = req.replica_id;
                 coio_write_xrow(&relay->io, &row);

-               xrow_encode_raft(&row, &fiber()->gc, &raft_req);
-               row.sync = sync;
-               coio_write_xrow(&relay->io, &row);
-
                 /* Mark the end of the metadata stream. */
                 row.type = IPROTO_JOIN_SNAPSHOT;
                 coio_write_xrow(&relay->io, &row);

===============================

> I wish we could test it somehow simple.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 09/16] box: split promote() into reasonable parts
  2021-07-21 23:26   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-23  7:45     ` Sergey Petrenko via Tarantool-patches
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Petrenko via Tarantool-patches @ 2021-07-23  7:45 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches


22.07.2021 01:26, Vladislav Shpilevoy пишет:
> Thanks for working on this!
>
> See 3 comments below.

Thanks for the review!


>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 86370514a..445875f8f 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1527,6 +1527,147 @@ box_wait_quorum(uint32_t lead_id, int64_t target_lsn, int quorum,
> <...>
>
>> +
>> +/**
>> + * Check whether the greatest promote term has changed since it was last read.
>> + * IOW check that a foreign PROMOTE arrived while we were sleeping.
>> + */
>> +static int
>> +box_check_promote_term_changed(uint64_t promote_term)
> 1. Normally you call check functions using the pattern
> "check_something_correct". Here the correct behaviour is the term
> being intact. So I propose to rename it to box_check_promote_term_intact.


Ok, sure.


>> +{
>> +	if (txn_limbo.promote_greatest_term != promote_term) {
>> +		diag_set(ClientError, ER_INTERFERING_PROMOTE,
>> +			 txn_limbo.owner_id);
>> +		return -1;
>> +	}
>> +	return 0;
>> +}
> <...>
>
>> +
>> +/**
>> + * A helper to wait until all limbo entries are ready to be confirmed, i.e.
>> + * written to WAL and have gathered a quorum of ACKs from replicas.
>> + * Return lsn of the last limbo entry on success, -1 on error.
>> + */
>> +static int64_t
>> +box_wait_limbo_acked(void)
>> +{
>> +	if (txn_limbo_is_empty(&txn_limbo))
>> +		return txn_limbo.confirmed_lsn;
>> +
>> +	uint64_t promote_term = txn_limbo.promote_greatest_term;
>> +	int quorum = replication_synchro_quorum;
>> +	struct txn_limbo_entry *last_entry;
>> +	last_entry = txn_limbo_last_synchro_entry(&txn_limbo);
>> +	/* Wait for the last entries WAL write. */
>> +	if (last_entry->lsn < 0) {
>> +		int64_t tid = last_entry->txn->id;
>> +
>> +		if (wal_sync(NULL) < 0)
>> +			return -1;
>> +
>> +		if (box_check_promote_term_changed(promote_term) < 0)
> 2. Why < 0? It is not a in the code guidelines, but don't we usually
> use '!= 0'? '< 0' normally assumes you can get > 0, 0, and < 0 meaning
> different things, like it is done in iproto occassionally.


I've put '< 0' here without a second thought.

I'm just used to if (smth() < 0) { err; }, I guess.

AFAICS there are more places where we use if (rc != 0) { err;} more,

so I'll change my code accordingly.


>> +			return -1;
>> +		if (txn_limbo_is_empty(&txn_limbo))
>> +			return txn_limbo.confirmed_lsn;
>> +		if (tid != txn_limbo_last_synchro_entry(&txn_limbo)->txn->id) {
>> +			diag_set(ClientError, ER_QUORUM_WAIT, quorum,
>> +				 "new synchronous transactions appeared");
>> +			return -1;
>> +		}
>> +	}
> <...>
>
>> +
>> +/** Write and process a PROMOTE request. */
>> +static void
>> +box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
>> +{
>> +	assert(box_raft()->volatile_term == box_raft()->term);
>> +	assert(promote_lsn >= 0);
>> +	txn_limbo_write_promote(&txn_limbo, promote_lsn,
>> +				box_raft()->term);
> 3. Maybe cache box_raft() value in a variable? Its usage would look shorter
> then. The same in other places where it is used more than once. Up to
> you.

Done.

Incremental diff:

=========================

diff --git a/src/box/box.cc b/src/box/box.cc
index 341857267..d83c30918 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1563,7 +1563,7 @@ box_run_elections(void)
   * IOW check that a foreign PROMOTE arrived while we were sleeping.
   */
  static int
-box_check_promote_term_changed(uint64_t promote_term)
+box_check_promote_term_intact(uint64_t promote_term)
  {
      if (txn_limbo.promote_greatest_term != promote_term) {
          diag_set(ClientError, ER_INTERFERING_PROMOTE,
@@ -1579,7 +1579,7 @@ box_try_wait_confirm(double timeout)
  {
      uint64_t promote_term = txn_limbo.promote_greatest_term;
      txn_limbo_wait_empty(&txn_limbo, timeout);
-    return box_check_promote_term_changed(promote_term);
+    return box_check_promote_term_intact(promote_term);
  }

  /**
@@ -1604,7 +1604,7 @@ box_wait_limbo_acked(void)
          if (wal_sync(NULL) < 0)
              return -1;

-        if (box_check_promote_term_changed(promote_term) < 0)
+        if (box_check_promote_term_intact(promote_term) != 0)
              return -1;
          if (txn_limbo_is_empty(&txn_limbo))
              return txn_limbo.confirmed_lsn;
@@ -1618,10 +1618,10 @@ box_wait_limbo_acked(void)
      int64_t wait_lsn = last_entry->lsn;

      if (box_wait_quorum(txn_limbo.owner_id, wait_lsn, quorum,
-                replication_synchro_timeout) < 0)
+                replication_synchro_timeout) != 0)
          return -1;

-    if (box_check_promote_term_changed(promote_term) < 0)
+    if (box_check_promote_term_intact(promote_term) != 0)
          return -1;

      if (txn_limbo_is_empty(&txn_limbo))
@@ -1722,10 +1722,10 @@ box_promote(void)

      int64_t wait_lsn = -1;

-    if (run_elections && box_run_elections() < 0)
+    if (run_elections && box_run_elections() != 0)
          return -1;
      if (try_wait &&
-        box_try_wait_confirm(2 * replication_synchro_timeout) < 0)
+        box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
          return -1;
      if ((wait_lsn = box_wait_limbo_acked()) < 0)
          return -1;

=========================

>> +	struct synchro_request req = {
>> +		.type = IPROTO_PROMOTE,
>> +		.replica_id = prev_leader_id,
>> +		.origin_id = instance_id,
>> +		.lsn = promote_lsn,
>> +		.term = box_raft()->term,
>> +	};
>> +	txn_limbo_process(&txn_limbo, &req);
>> +	assert(txn_limbo_is_empty(&txn_limbo));
>> +}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 11/16] box: make promote on the current leader a no-op
  2021-07-21 23:26   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-23  7:45     ` Sergey Petrenko via Tarantool-patches
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Petrenko via Tarantool-patches @ 2021-07-23  7:45 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches


22.07.2021 01:26, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
> Could you please add a test? Not even necessary in its own file. You
> could use qsync_basic/election_basic.

Ok, sure. I think gh-6034-election-promote-bump-term suits our needs better.

==============================

diff --git a/test/replication/gh-6034-election-promote-bump-term.result 
b/test/replication/gh-6034-election-promote-bump-term.result
index 8be4e8243..3999cfcc9 100644
--- a/test/replication/gh-6034-election-promote-bump-term.result
+++ b/test/replication/gh-6034-election-promote-bump-term.result
@@ -19,3 +19,12 @@ assert(box.info.election.term == term + 1)
   | ---
   | - true
   | ...
+
+-- Consequent promotes are no-ops on the leader.
+box.ctl.promote()
+ | ---
+ | ...
+assert(box.info.election.term == term + 1)
+ | ---
+ | - true
+ | ...
diff --git 
a/test/replication/gh-6034-election-promote-bump-term.test.lua 
b/test/replication/gh-6034-election-promote-bump-term.test.lua
index 1e814bf5d..52429394b 100644
--- a/test/replication/gh-6034-election-promote-bump-term.test.lua
+++ b/test/replication/gh-6034-election-promote-bump-term.test.lua
@@ -7,3 +7,7 @@ box.cfg{election_mode='off'}
  term = box.info.election.term
  box.ctl.promote()
  assert(box.info.election.term == term + 1)
+
+-- Consequent promotes are no-ops on the leader.
+box.ctl.promote()
+assert(box.info.election.term == term + 1)

==============================


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Tarantool-patches] [PATCH v4 12/16] box: fix an assertion failure after a spurious wakeup in promote
  2021-07-21 23:29   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-23  7:45     ` Sergey Petrenko via Tarantool-patches
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Petrenko via Tarantool-patches @ 2021-07-23  7:45 UTC (permalink / raw)
  To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches


22.07.2021 01:29, Vladislav Shpilevoy пишет:
> I appreciate the work you did here!
>
> See 2 comments below.
>
>> diff --git a/test/replication/gh-3055-promote-wakeup-crash.result b/test/replication/gh-3055-promote-wakeup-crash.result
>> new file mode 100644
>> index 000000000..e508611e5
>> --- /dev/null
>> +++ b/test/replication/gh-3055-promote-wakeup-crash.result
>> @@ -0,0 +1,43 @@
>> +-- test-run result file version 2
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +--
>> +-- gh-3055 follow-up: box.ctl.promote() could crash on an assertion after a
>> +-- spurious wakeup.
>> +--
>> +_ = box.space._cluster:insert{2, require('uuid').str()}
>> + | ---
>> + | ...
>> +box.cfg{election_mode='manual',\
>> +        replication_synchro_quorum=2,\
>> +        election_timeout=1000}
> 1. Shouldn't you save and restore the old option values in the end
> of the test?

Not anymore. As far as I know the default instance is restarted

between each test run now.

So we only need to reset persistent things now, like

grants, roles, promotes/demotes, spaces and so on.

>> + | ---
>> + | ...
>> +
>> +fiber = require('fiber')
>> + | ---
>> + | ...
>> +f = fiber.create(function() box.ctl.promote() end)
>> + | ---
>> + | ...
>> +f:set_joinable(true)
> 2. But you never use :join(). Perhaps you might want to call it
> after :cancel(). To ensure the fiber really ended.

Yep, my bad. Thanks for noticing!

=====================================

diff --git a/test/replication/gh-3055-promote-wakeup-crash.result 
b/test/replication/gh-3055-promote-wakeup-crash.result
index e508611e5..f915bf359 100644
--- a/test/replication/gh-3055-promote-wakeup-crash.result
+++ b/test/replication/gh-3055-promote-wakeup-crash.result
@@ -35,6 +35,11 @@ fiber.yield()
  f:cancel()
   | ---
   | ...
+f:join()
+ | ---
+ | - false
+ | - fiber is cancelled
+ | ...
  box.cfg{election_mode='off'}
   | ---
   | ...
diff --git a/test/replication/gh-3055-promote-wakeup-crash.test.lua 
b/test/replication/gh-3055-promote-wakeup-crash.test.lua
index 2ac901b08..acf0c05b0 100644
--- a/test/replication/gh-3055-promote-wakeup-crash.test.lua
+++ b/test/replication/gh-3055-promote-wakeup-crash.test.lua
@@ -16,5 +16,6 @@ fiber.yield()

  -- Cleanup.
  f:cancel()
+f:join()
  box.cfg{election_mode='off'}
  test_run:cleanup_cluster()

=====================================

>> + | ---
>> + | ...
>> +f:wakeup()
>> + | ---
>> + | ...
>> +fiber.yield()
>> + | ---
>> + | ...
>> +
>> +-- Cleanup.
>> +f:cancel()
>> + | ---
>> + | ...
>> +box.cfg{election_mode='off'}
>> + | ---
>> + | ...
>> +test_run:cleanup_cluster()
>> + | ---
>> + | ...

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-07-23  7:47 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-14 18:25 [Tarantool-patches] [PATCH v4 00/16] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 01/16] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 02/16] txn_limbo: fix promote term filtering Serge Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 03/16] txn_limbo: persist the latest effective promote in snapshot Serge Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 04/16] replication: encode version in JOIN request Serge Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 05/16] replication: add META stage to JOIN Serge Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 06/16] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
2021-07-21 23:24   ` Vladislav Shpilevoy via Tarantool-patches
2021-07-23  7:44     ` Sergey Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 07/16] replication: send current Raft term in join response Serge Petrenko via Tarantool-patches
2021-07-21 23:24   ` Vladislav Shpilevoy via Tarantool-patches
2021-07-23  7:44     ` Sergey Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 08/16] raft: refactor raft_new_term() Serge Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 09/16] box: split promote() into reasonable parts Serge Petrenko via Tarantool-patches
2021-07-21 23:26   ` Vladislav Shpilevoy via Tarantool-patches
2021-07-23  7:45     ` Sergey Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 10/16] box: make promote always bump the term Serge Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 11/16] box: make promote on the current leader a no-op Serge Petrenko via Tarantool-patches
2021-07-21 23:26   ` Vladislav Shpilevoy via Tarantool-patches
2021-07-23  7:45     ` Sergey Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 12/16] box: fix an assertion failure after a spurious wakeup in promote Serge Petrenko via Tarantool-patches
2021-07-21 23:29   ` Vladislav Shpilevoy via Tarantool-patches
2021-07-23  7:45     ` Sergey Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 13/16] box: allow calling promote on a candidate Serge Petrenko via Tarantool-patches
2021-07-15 14:06   ` Serge Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 14/16] box: extract promote() settings to a separate method Serge Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 15/16] replication: forbid implicit limbo owner transition Serge Petrenko via Tarantool-patches
2021-07-14 18:25 ` [Tarantool-patches] [PATCH v4 16/16] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
2021-07-15 17:13   ` Serge Petrenko via Tarantool-patches
2021-07-15 20:11   ` [Tarantool-patches] [PATCH v4 17/16] replication: fix flaky election_qsync.test Serge Petrenko via Tarantool-patches

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git