* [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition
@ 2021-06-28 22:12 Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 01/12] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
` (13 more replies)
0 siblings, 14 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Changes in v3:
- change demote() behaviour as discussed with Vlad:
* make it work only on the current leader
* make it demote the current leader and always
bump the term
- change how limbo and raft snapshots are sent in response
to JOIN:
* encode replica's version in JOIN request
* introduce a special stage: JOIN_META with raft and limbo
snapshots. Send it based on replica's version.
https://github.com/tarantool/tarantool/issues/5438
https://github.com/tarantool/tarantool/issues/6034
https://github.com/tarantool/tarantool/tree/sp/gh-6034-empty-limbo-transition
Serge Petrenko (12):
replication: always send raft state to subscribers
txn_limbo: fix promote term filtering
raft: refactor raft_new_term()
box: make promote always bump the term
replication: forbid implicit limbo owner transition
box: allow calling promote on a candidate
box: introduce `box.ctl.demote`
txn_limbo: persist the latest effective promote in snapshot
replication: encode version in JOIN request
replication: add META stage to JOIN
replication: send latest effective promote in initial join
replication: send current Raft term in join response
src/box/applier.cc | 27 ++-
src/box/box.cc | 101 ++++++----
src/box/box.h | 3 +
src/box/errcode.h | 2 +
src/box/iproto_constants.h | 12 +-
src/box/lua/ctl.c | 9 +
src/box/lua/info.c | 4 +-
src/box/memtx_engine.c | 32 +++
src/box/raft.c | 36 ++++
src/box/raft.h | 4 +
src/box/relay.cc | 36 +++-
src/box/relay.h | 4 +-
src/box/txn_limbo.c | 90 ++++++---
src/box/txn_limbo.h | 14 ++
src/box/xrow.c | 4 +-
src/box/xrow.h | 25 ++-
src/lib/raft/raft.c | 3 +-
test/box/alter.result | 2 +-
test/box/error.result | 2 +
test/replication/election_basic.result | 3 +
test/replication/election_basic.test.lua | 1 +
test/replication/election_qsync.result | 3 +
test/replication/election_qsync.test.lua | 1 +
.../gh-4114-local-space-replication.result | 7 +-
.../gh-4114-local-space-replication.test.lua | 4 +-
.../gh-5140-qsync-casc-rollback.result | 6 +
.../gh-5140-qsync-casc-rollback.test.lua | 2 +
.../gh-5144-qsync-dup-confirm.result | 6 +
.../gh-5144-qsync-dup-confirm.test.lua | 2 +
.../gh-5163-qsync-restart-crash.result | 6 +
.../gh-5163-qsync-restart-crash.test.lua | 2 +
.../gh-5167-qsync-rollback-snap.result | 6 +
.../gh-5167-qsync-rollback-snap.test.lua | 2 +
.../gh-5195-qsync-replica-write.result | 10 +-
.../gh-5195-qsync-replica-write.test.lua | 6 +-
.../gh-5213-qsync-applier-order-3.result | 9 +
.../gh-5213-qsync-applier-order-3.test.lua | 3 +
.../gh-5213-qsync-applier-order.result | 6 +
.../gh-5213-qsync-applier-order.test.lua | 2 +
.../replication/gh-5288-qsync-recovery.result | 6 +
.../gh-5288-qsync-recovery.test.lua | 2 +
.../gh-5298-qsync-recovery-snap.result | 6 +
.../gh-5298-qsync-recovery-snap.test.lua | 2 +
.../gh-5426-election-on-off.result | 3 +
.../gh-5426-election-on-off.test.lua | 1 +
.../gh-5433-election-restart-recovery.result | 3 +
...gh-5433-election-restart-recovery.test.lua | 1 +
...sync-clear-synchro-queue-commit-all.result | 3 +
...nc-clear-synchro-queue-commit-all.test.lua | 1 +
test/replication/gh-5438-raft-state.result | 66 ++++++
test/replication/gh-5438-raft-state.test.lua | 29 +++
test/replication/gh-5440-qsync-ro.result | 133 ------------
test/replication/gh-5440-qsync-ro.test.lua | 53 -----
.../gh-5446-qsync-eval-quorum.result | 7 +
.../gh-5446-qsync-eval-quorum.test.lua | 3 +
.../gh-5506-election-on-off.result | 3 +
.../gh-5506-election-on-off.test.lua | 1 +
.../gh-5566-final-join-synchro.result | 6 +
.../gh-5566-final-join-synchro.test.lua | 2 +
.../gh-5874-qsync-txn-recovery.result | 6 +
.../gh-5874-qsync-txn-recovery.test.lua | 2 +
.../gh-6032-promote-wal-write.result | 3 +
.../gh-6032-promote-wal-write.test.lua | 1 +
.../gh-6034-limbo-ownership.result | 189 ++++++++++++++++++
.../gh-6034-limbo-ownership.test.lua | 69 +++++++
.../gh-6034-promote-bump-term.result | 40 ++++
.../gh-6034-promote-bump-term.test.lua | 17 ++
.../gh-6057-qsync-confirm-async-no-wal.result | 7 +
...h-6057-qsync-confirm-async-no-wal.test.lua | 3 +
test/replication/hang_on_synchro_fail.result | 6 +
.../replication/hang_on_synchro_fail.test.lua | 2 +
test/replication/qsync_advanced.result | 12 ++
test/replication/qsync_advanced.test.lua | 4 +
test/replication/qsync_basic.result | 33 ++-
test/replication/qsync_basic.test.lua | 16 +-
test/replication/qsync_errinj.result | 6 +
test/replication/qsync_errinj.test.lua | 2 +
test/replication/qsync_snapshots.result | 6 +
test/replication/qsync_snapshots.test.lua | 2 +
test/replication/qsync_with_anon.result | 6 +
test/replication/qsync_with_anon.test.lua | 2 +
test/replication/replica_rejoin.result | 77 ++++---
test/replication/replica_rejoin.test.lua | 50 ++---
test/replication/suite.cfg | 4 +-
test/unit/raft.c | 15 +-
test/unit/raft.result | 3 +-
86 files changed, 1039 insertions(+), 372 deletions(-)
create mode 100644 test/replication/gh-5438-raft-state.result
create mode 100644 test/replication/gh-5438-raft-state.test.lua
delete mode 100644 test/replication/gh-5440-qsync-ro.result
delete mode 100644 test/replication/gh-5440-qsync-ro.test.lua
create mode 100644 test/replication/gh-6034-limbo-ownership.result
create mode 100644 test/replication/gh-6034-limbo-ownership.test.lua
create mode 100644 test/replication/gh-6034-promote-bump-term.result
create mode 100644 test/replication/gh-6034-promote-bump-term.test.lua
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 01/12] replication: always send raft state to subscribers
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-07-04 12:12 ` Vladislav Shpilevoy via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 02/12] txn_limbo: fix promote term filtering Serge Petrenko via Tarantool-patches
` (12 subsequent siblings)
13 siblings, 1 reply; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Tarantool used to send out raft state on subscribe only when raft was
enabled. This was a safeguard against partially-upgraded clusters, where
some nodes had no clue about Raft messages and couldn't handle them
properly.
Actually, Raft state should be sent out always. For example, promote
will be changed to bump Raft term even when Raft is disabled, and it's
important that everyone in cluster has the same term for the sake of promote
at least.
So, send out Raft state to every subscriber with version >= 2.6.0
(that's when Raft was introduced).
Do the same for Raft broadcasts. They should be sent only to replicas
with version >= 2.6.0
Closes #5438
---
src/box/box.cc | 11 ++--
src/box/relay.cc | 4 +-
test/replication/gh-5438-raft-state.result | 63 ++++++++++++++++++++
test/replication/gh-5438-raft-state.test.lua | 28 +++++++++
test/replication/suite.cfg | 1 +
5 files changed, 100 insertions(+), 7 deletions(-)
create mode 100644 test/replication/gh-5438-raft-state.result
create mode 100644 test/replication/gh-5438-raft-state.test.lua
diff --git a/src/box/box.cc b/src/box/box.cc
index ab7d983c9..d6a3b4d97 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -82,6 +82,7 @@
#include "msgpack.h"
#include "raft.h"
#include "trivia/util.h"
+#include "version.h"
enum {
IPROTO_THREADS_MAX = 1000,
@@ -2836,13 +2837,13 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
tt_uuid_str(&replica_uuid), sio_socketname(io->fd));
say_info("remote vclock %s local vclock %s",
vclock_to_string(&replica_clock), vclock_to_string(&vclock));
- if (raft_is_enabled(box_raft())) {
+ if (replica_version_id >= version_id(2, 6, 0) && !anon) {
/*
* Send out the current raft state of the instance. Don't do
- * that if Raft is disabled. It can be that a part of the
- * cluster still contains old versions, which can't handle Raft
- * messages. So when it is disabled, its network footprint
- * should be 0.
+ * that if the remote instance is old. It can be that a part of
+ * the cluster still contains old versions, which can't handle
+ * Raft messages. Raft's network footprint should be 0 as seen
+ * by such instances.
*/
struct raft_request req;
box_raft_checkpoint_remote(&req);
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 115037fc3..60f527b7f 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -800,7 +800,7 @@ relay_subscribe_f(va_list ap)
&relay->relay_pipe, NULL, NULL, cbus_process);
struct relay_is_raft_enabled_msg raft_enabler;
- if (!relay->replica->anon)
+ if (!relay->replica->anon && relay->version_id >= version_id(2, 6, 0))
relay_send_is_raft_enabled(relay, &raft_enabler, true);
/*
@@ -883,7 +883,7 @@ relay_subscribe_f(va_list ap)
cpipe_push(&relay->tx_pipe, &relay->status_msg.msg);
}
- if (!relay->replica->anon)
+ if (!relay->replica->anon && relay->version_id >= version_id(2, 6, 0))
relay_send_is_raft_enabled(relay, &raft_enabler, false);
/*
diff --git a/test/replication/gh-5438-raft-state.result b/test/replication/gh-5438-raft-state.result
new file mode 100644
index 000000000..6985f026a
--- /dev/null
+++ b/test/replication/gh-5438-raft-state.result
@@ -0,0 +1,63 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+--
+-- gh-5428 send out Raft state to subscribers, even when Raft is disabled.
+--
+-- Bump Raft term while the replica's offline.
+term = box.info.election.term
+ | ---
+ | ...
+old_election_mode = box.cfg.election_mode
+ | ---
+ | ...
+box.cfg{election_mode = 'candidate'}
+ | ---
+ | ...
+test_run:wait_cond(function() return box.info.election.term > term end)
+ | ---
+ | - true
+ | ...
+
+-- Make sure the replica receives new term on subscribe.
+box.cfg{election_mode = 'off'}
+ | ---
+ | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function()\
+ return test_run:eval('replica', 'return box.info.election.term')[1] ==\
+ box.info.election.term\
+end)
+ | ---
+ | - true
+ | ...
+
+-- Cleanup.
+box.cfg{election_mode = old_election_mode}
+ | ---
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
diff --git a/test/replication/gh-5438-raft-state.test.lua b/test/replication/gh-5438-raft-state.test.lua
new file mode 100644
index 000000000..60c3366c1
--- /dev/null
+++ b/test/replication/gh-5438-raft-state.test.lua
@@ -0,0 +1,28 @@
+test_run = require('test_run').new()
+
+--
+-- gh-5428 send out Raft state to subscribers, even when Raft is disabled.
+--
+-- Bump Raft term while the replica's offline.
+term = box.info.election.term
+old_election_mode = box.cfg.election_mode
+box.cfg{election_mode = 'candidate'}
+test_run:wait_cond(function() return box.info.election.term > term end)
+
+-- Make sure the replica receives new term on subscribe.
+box.cfg{election_mode = 'off'}
+
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+test_run:cmd('start server replica')
+test_run:wait_cond(function()\
+ return test_run:eval('replica', 'return box.info.election.term')[1] ==\
+ box.info.election.term\
+end)
+
+-- Cleanup.
+box.cfg{election_mode = old_election_mode}
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 69f2f3511..c4b3fbd9c 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -19,6 +19,7 @@
"gh-5213-qsync-applier-order-3.test.lua": {},
"gh-5426-election-on-off.test.lua": {},
"gh-5433-election-restart-recovery.test.lua": {},
+ "gh-5438-raft-state.test.lua": {},
"gh-5445-leader-inconsistency.test.lua": {},
"gh-5506-election-on-off.test.lua": {},
"once.test.lua": {},
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 02/12] txn_limbo: fix promote term filtering
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 01/12] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 03/12] raft: refactor raft_new_term() Serge Petrenko via Tarantool-patches
` (11 subsequent siblings)
13 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
txn_limbo_process() used to filter out promote requests whose term was
equal to the greatest term seen. This wasn't correct for PROMOTE entries
with term 1.
Such entries appear after box.ctl.promote() is issued on an instance
with disabled elections. Every PROMOTE entry from such an instance has
term 1, but should still be applied. Fix this in the patch.
Also, when an outdated PROMOTE entry with term smaller than already
applied from some replica arrived, it wasn't filtered at all. Such a
situation shouldn't be possible, but fix it as well.
Part-of #6034
---
src/box/txn_limbo.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 51dc2a186..16181b8a0 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -667,15 +667,18 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
uint32_t origin = req->origin_id;
if (txn_limbo_replica_term(limbo, origin) < term) {
vclock_follow(&limbo->promote_term_map, origin, term);
- if (term > limbo->promote_greatest_term) {
+ if (term > limbo->promote_greatest_term)
limbo->promote_greatest_term = term;
- } else if (req->type == IPROTO_PROMOTE) {
- /*
- * PROMOTE for outdated term. Ignore.
- */
- return;
- }
+ } else if (req->type == IPROTO_PROMOTE &&
+ limbo->promote_greatest_term > 1) {
+ /* PROMOTE for outdated term. Ignore. */
+ say_info("RAFT: ignoring PROMOTE request from instance "
+ "id %u for term %llu. Greatest term seen "
+ "before (%llu) is bigger.", origin, (long long)term,
+ (long long)limbo->promote_greatest_term);
+ return;
}
+
int64_t lsn = req->lsn;
if (req->replica_id == REPLICA_ID_NIL) {
/*
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 03/12] raft: refactor raft_new_term()
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 01/12] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 02/12] txn_limbo: fix promote term filtering Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 04/12] box: make promote always bump the term Serge Petrenko via Tarantool-patches
` (10 subsequent siblings)
13 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Make raft_new_term() always bump the current term, even when Raft is
disabled.
Part-of #6034
---
src/box/box.cc | 2 +-
src/lib/raft/raft.c | 3 +--
test/unit/raft.c | 15 ++++++++++++++-
test/unit/raft.result | 3 ++-
4 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index d6a3b4d97..6a0950f44 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -3514,7 +3514,7 @@ box_cfg_xc(void)
if (!is_bootstrap_leader) {
replicaset_sync();
- } else {
+ } else if (raft_is_enabled(box_raft())) {
/*
* When the cluster is just bootstrapped and this instance is a
* leader, it makes no sense to wait for a leader appearance.
diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
index eacdddb7e..78d55846f 100644
--- a/src/lib/raft/raft.c
+++ b/src/lib/raft/raft.c
@@ -987,8 +987,7 @@ raft_cfg_vclock(struct raft *raft, const struct vclock *vclock)
void
raft_new_term(struct raft *raft)
{
- if (raft->is_enabled)
- raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
+ raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
}
static void
diff --git a/test/unit/raft.c b/test/unit/raft.c
index 6369c42d3..4db4024d9 100644
--- a/test/unit/raft.c
+++ b/test/unit/raft.c
@@ -1176,7 +1176,7 @@ raft_test_death_timeout(void)
static void
raft_test_enable_disable(void)
{
- raft_start_test(10);
+ raft_start_test(11);
struct raft_node node;
raft_node_create(&node);
@@ -1268,7 +1268,20 @@ raft_test_enable_disable(void)
"{0: 2}" /* Vclock. */
), "nothing changed");
+ /* Disabled node still bumps the term when needed. */
+ raft_node_new_term(&node);
+
+ ok(raft_node_check_full_state(&node,
+ RAFT_STATE_FOLLOWER /* State. */,
+ 0 /* Leader. */,
+ 4 /* Term. */,
+ 0 /* Vote. */,
+ 4 /* Volatile term. */,
+ 0 /* Volatile vote. */,
+ "{0: 3}" /* Vclock. */
+ ), "term bump when disabled");
raft_node_destroy(&node);
+
raft_finish_test();
}
diff --git a/test/unit/raft.result b/test/unit/raft.result
index 598a7e760..57229a265 100644
--- a/test/unit/raft.result
+++ b/test/unit/raft.result
@@ -203,7 +203,7 @@ ok 10 - subtests
ok 11 - subtests
*** raft_test_death_timeout: done ***
*** raft_test_enable_disable ***
- 1..10
+ 1..11
ok 1 - accepted a leader notification
ok 2 - leader is seen
ok 3 - death timeout passed
@@ -214,6 +214,7 @@ ok 11 - subtests
ok 8 - became leader
ok 9 - resigned from leader state
ok 10 - nothing changed
+ ok 11 - term bump when disabled
ok 12 - subtests
*** raft_test_enable_disable: done ***
*** raft_test_too_long_wal_write ***
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 04/12] box: make promote always bump the term
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
` (2 preceding siblings ...)
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 03/12] raft: refactor raft_new_term() Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-07-04 12:14 ` Vladislav Shpilevoy via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 05/12] replication: forbid implicit limbo owner transition Serge Petrenko via Tarantool-patches
` (9 subsequent siblings)
13 siblings, 1 reply; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
When called without elections, promote resulted in multiple
PROMOTE entries for the same term. This is not right, because all
the promotions for the same term except the first one would be ignored
as already seen.
Part-of #6034
---
src/box/box.cc | 13 ++++---
src/box/raft.c | 36 ++++++++++++++++++
src/box/raft.h | 4 ++
.../gh-4114-local-space-replication.result | 7 ++--
.../gh-4114-local-space-replication.test.lua | 4 +-
.../gh-6034-promote-bump-term.result | 37 +++++++++++++++++++
.../gh-6034-promote-bump-term.test.lua | 16 ++++++++
test/replication/suite.cfg | 1 +
8 files changed, 107 insertions(+), 11 deletions(-)
create mode 100644 test/replication/gh-6034-promote-bump-term.result
create mode 100644 test/replication/gh-6034-promote-bump-term.test.lua
diff --git a/src/box/box.cc b/src/box/box.cc
index 6a0950f44..ce37b307d 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1687,16 +1687,19 @@ box_promote(void)
rc = -1;
} else {
promote:
- /* We cannot possibly get here in a volatile state. */
- assert(box_raft()->volatile_term == box_raft()->term);
- txn_limbo_write_promote(&txn_limbo, wait_lsn,
- box_raft()->term);
+ if (try_wait) {
+ raft_new_term(box_raft());
+ if (box_raft_wait_persisted() < 0)
+ return -1;
+ }
+ uint64_t term = box_raft()->term;
+ txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
struct synchro_request req = {
.type = IPROTO_PROMOTE,
.replica_id = former_leader_id,
.origin_id = instance_id,
.lsn = wait_lsn,
- .term = box_raft()->term,
+ .term = term,
};
txn_limbo_process(&txn_limbo, &req);
assert(txn_limbo_is_empty(&txn_limbo));
diff --git a/src/box/raft.c b/src/box/raft.c
index 7f787c0c5..17caf6f54 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -354,6 +354,42 @@ box_raft_wait_leader_found(void)
return 0;
}
+struct raft_wait_persisted_data {
+ struct fiber *waiter;
+ uint64_t term;
+};
+
+static int
+box_raft_wait_persisted_f(struct trigger *trig, void *event)
+{
+ struct raft *raft = event;
+ struct raft_wait_persisted_data *data = trig->data;
+ if (raft->term >= data->term)
+ fiber_wakeup(data->waiter);
+ return 0;
+}
+
+int
+box_raft_wait_persisted(void)
+{
+ if (box_raft()->term == box_raft()->volatile_term)
+ return 0;
+ struct raft_wait_persisted_data data = {
+ .waiter = fiber(),
+ .term = box_raft()->volatile_term,
+ };
+ struct trigger trig;
+ trigger_create(&trig, box_raft_wait_persisted_f, &data, NULL);
+ raft_on_update(box_raft(), &trig);
+ fiber_yield();
+ trigger_clear(&trig);
+ if (fiber_is_cancelled()) {
+ diag_set(FiberIsCancelled);
+ return -1;
+ }
+ return 0;
+}
+
void
box_raft_init(void)
{
diff --git a/src/box/raft.h b/src/box/raft.h
index 6b6136510..6e27b098f 100644
--- a/src/box/raft.h
+++ b/src/box/raft.h
@@ -101,6 +101,10 @@ box_raft_process(struct raft_request *req, uint32_t source);
int
box_raft_wait_leader_found();
+/** Block this fiber until the current volatile term is persisted. */
+int
+box_raft_wait_persisted(void);
+
void
box_raft_init(void);
diff --git a/test/replication/gh-4114-local-space-replication.result b/test/replication/gh-4114-local-space-replication.result
index 9b63a4b99..e71eb60a8 100644
--- a/test/replication/gh-4114-local-space-replication.result
+++ b/test/replication/gh-4114-local-space-replication.result
@@ -45,9 +45,8 @@ test_run:cmd('switch replica')
| ---
| - true
| ...
-box.info.vclock[0]
+a = box.info.vclock[0] or 0
| ---
- | - null
| ...
box.cfg{checkpoint_count=1}
| ---
@@ -77,9 +76,9 @@ box.space.test:insert{3}
| - [3]
| ...
-box.info.vclock[0]
+assert(box.info.vclock[0] == a + 3)
| ---
- | - 3
+ | - true
| ...
test_run:cmd('switch default')
diff --git a/test/replication/gh-4114-local-space-replication.test.lua b/test/replication/gh-4114-local-space-replication.test.lua
index c18fb3b10..65fef3bf6 100644
--- a/test/replication/gh-4114-local-space-replication.test.lua
+++ b/test/replication/gh-4114-local-space-replication.test.lua
@@ -18,7 +18,7 @@ for i = 1,10 do box.space.test:insert{i} end
box.info.vclock[0] == a + 10 or box.info.vclock[0] - a
test_run:cmd('switch replica')
-box.info.vclock[0]
+a = box.info.vclock[0] or 0
box.cfg{checkpoint_count=1}
box.space.test:select{}
box.space.test:insert{1}
@@ -27,7 +27,7 @@ box.space.test:insert{2}
box.snapshot()
box.space.test:insert{3}
-box.info.vclock[0]
+assert(box.info.vclock[0] == a + 3)
test_run:cmd('switch default')
diff --git a/test/replication/gh-6034-promote-bump-term.result b/test/replication/gh-6034-promote-bump-term.result
new file mode 100644
index 000000000..20e352922
--- /dev/null
+++ b/test/replication/gh-6034-promote-bump-term.result
@@ -0,0 +1,37 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- gh-6034: test that every box.ctl.promote() bumps
+-- the instance's term. Even when elections are disabled. Even for consequent
+-- promotes on the same instance.
+election_mode = box.cfg.election_mode
+ | ---
+ | ...
+box.cfg{election_mode='off'}
+ | ---
+ | ...
+
+term = box.info.election.term
+ | ---
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+assert(box.info.election.term == term + 1)
+ | ---
+ | - true
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+assert(box.info.election.term == term + 2)
+ | ---
+ | - true
+ | ...
+
+-- Cleanup.
+box.cfg{election_mode=election_mode}
+ | ---
+ | ...
diff --git a/test/replication/gh-6034-promote-bump-term.test.lua b/test/replication/gh-6034-promote-bump-term.test.lua
new file mode 100644
index 000000000..5847dbb8f
--- /dev/null
+++ b/test/replication/gh-6034-promote-bump-term.test.lua
@@ -0,0 +1,16 @@
+test_run = require('test_run').new()
+
+-- gh-6034: test that every box.ctl.promote() bumps
+-- the instance's term. Even when elections are disabled. Even for consequent
+-- promotes on the same instance.
+election_mode = box.cfg.election_mode
+box.cfg{election_mode='off'}
+
+term = box.info.election.term
+box.ctl.promote()
+assert(box.info.election.term == term + 1)
+box.ctl.promote()
+assert(box.info.election.term == term + 2)
+
+-- Cleanup.
+box.cfg{election_mode=election_mode}
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index c4b3fbd9c..496b2e104 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -48,6 +48,7 @@
"gh-5613-bootstrap-prefer-booted.test.lua": {},
"gh-6027-applier-error-show.test.lua": {},
"gh-6032-promote-wal-write.test.lua": {},
+ "gh-6034-promote-bump-term.test.lua": {},
"gh-6057-qsync-confirm-async-no-wal.test.lua": {},
"gh-6094-rs-uuid-mismatch.test.lua": {},
"gh-6127-election-join-new.test.lua": {},
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 05/12] replication: forbid implicit limbo owner transition
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
` (3 preceding siblings ...)
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 04/12] box: make promote always bump the term Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 06/12] box: allow calling promote on a candidate Serge Petrenko via Tarantool-patches
` (8 subsequent siblings)
13 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Forbid limbo ownership transition without an explicit promote.
Make it so that synchronous transactions may be committed only after it
is claimed by some instance via a PROMOTE request.
Make everyone but the limbo owner read-only even when the limbo is
empty.
Part-of #6034
@TarantoolBot document
Title: synchronous replication changes
`box.info.synchro.queue` receives a new field: `owner`. It's a replica
id of the instance owning the synchronous transaction queue.
Once some instance owns the queue, every other instance becomes
read-only. When the queue is unclaimed, e.g.
`box.info.synchro.queue.owner` is `0`, everyone may be writeable, but
cannot create synchronous transactions.
In order to claim or re-claim the queue, you have to issue
`box.ctl.promote()` on the instance you wish to promote.
When elections are enabled, the instance issues `box.ctl.promote()`
automatically once it wins the elections, no additional actions are
required.
---
src/box/errcode.h | 2 +
src/box/lua/info.c | 4 +-
src/box/txn_limbo.c | 32 ++---
test/box/alter.result | 2 +-
test/box/error.result | 1 +
test/replication/gh-5440-qsync-ro.result | 133 ---------------------
test/replication/gh-5440-qsync-ro.test.lua | 53 --------
test/replication/suite.cfg | 1 -
8 files changed, 18 insertions(+), 210 deletions(-)
delete mode 100644 test/replication/gh-5440-qsync-ro.result
delete mode 100644 test/replication/gh-5440-qsync-ro.test.lua
diff --git a/src/box/errcode.h b/src/box/errcode.h
index 49aec4bf6..d42b64ef4 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -278,6 +278,8 @@ struct errcode_record {
/*223 */_(ER_INTERFERING_PROMOTE, "Instance with replica id %u was promoted first") \
/*224 */_(ER_RAFT_DISABLED, "Elections were turned off while running box.ctl.promote()")\
/*225 */_(ER_TXN_ROLLBACK, "Transaction was rolled back") \
+ /*226 */_(ER_SYNC_QUEUE_UNCLAIMED, "The synchronous transaction queue doesn't belong to any instance")\
+ /*227 */_(ER_SYNC_QUEUE_FOREIGN, "The synchronous transaction queue belongs to other instance with id %u")\
/*
* !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index f201b25e3..211d2baea 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -614,9 +614,11 @@ lbox_info_synchro(struct lua_State *L)
/* Queue information. */
struct txn_limbo *queue = &txn_limbo;
- lua_createtable(L, 0, 1);
+ lua_createtable(L, 0, 2);
lua_pushnumber(L, queue->len);
lua_setfield(L, -2, "len");
+ lua_pushnumber(L, queue->owner_id);
+ lua_setfield(L, -2, "owner");
lua_setfield(L, -2, "queue");
return 1;
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 16181b8a0..996f1a3fc 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -55,7 +55,8 @@ txn_limbo_create(struct txn_limbo *limbo)
bool
txn_limbo_is_ro(struct txn_limbo *limbo)
{
- return limbo->owner_id != instance_id && !txn_limbo_is_empty(limbo);
+ return limbo->owner_id != REPLICA_ID_NIL &&
+ limbo->owner_id != instance_id;
}
struct txn_limbo_entry *
@@ -95,18 +96,18 @@ txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
}
if (id == 0)
id = instance_id;
- bool make_ro = false;
- if (limbo->owner_id != id) {
- if (rlist_empty(&limbo->queue)) {
- limbo->owner_id = id;
- limbo->confirmed_lsn = 0;
- if (id != instance_id)
- make_ro = true;
+ if (limbo->owner_id == REPLICA_ID_NIL) {
+ diag_set(ClientError, ER_SYNC_QUEUE_UNCLAIMED);
+ return NULL;
+ } else if (limbo->owner_id != id) {
+ if (txn_limbo_is_empty(limbo)) {
+ diag_set(ClientError, ER_SYNC_QUEUE_FOREIGN,
+ limbo->owner_id);
} else {
diag_set(ClientError, ER_UNCOMMITTED_FOREIGN_SYNC_TXNS,
limbo->owner_id);
- return NULL;
}
+ return NULL;
}
size_t size;
struct txn_limbo_entry *e = region_alloc_object(&txn->region,
@@ -122,12 +123,6 @@ txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
e->is_rollback = false;
rlist_add_tail_entry(&limbo->queue, e, in_queue);
limbo->len++;
- /*
- * We added new entries from a remote instance to an empty limbo.
- * Time to make this instance read-only.
- */
- if (make_ro)
- box_update_ro_summary();
return e;
}
@@ -432,9 +427,6 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
assert(e->txn->signature >= 0);
txn_complete_success(e->txn);
}
- /* Update is_ro once the limbo is clear. */
- if (txn_limbo_is_empty(limbo))
- box_update_ro_summary();
}
/**
@@ -482,9 +474,6 @@ txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
if (e == last_rollback)
break;
}
- /* Update is_ro once the limbo is clear. */
- if (txn_limbo_is_empty(limbo))
- box_update_ro_summary();
}
void
@@ -515,6 +504,7 @@ txn_limbo_read_promote(struct txn_limbo *limbo, uint32_t replica_id,
txn_limbo_read_rollback(limbo, lsn + 1);
assert(txn_limbo_is_empty(&txn_limbo));
limbo->owner_id = replica_id;
+ box_update_ro_summary();
limbo->confirmed_lsn = 0;
}
diff --git a/test/box/alter.result b/test/box/alter.result
index a7bffce10..6a64f6b84 100644
--- a/test/box/alter.result
+++ b/test/box/alter.result
@@ -1464,7 +1464,7 @@ assert(s.is_sync)
...
s:replace{1}
---
-- error: Quorum collection for a synchronous transaction is timed out
+- error: The synchronous transaction queue doesn't belong to any instance
...
-- When not specified or nil - ignored.
s:alter({is_sync = nil})
diff --git a/test/box/error.result b/test/box/error.result
index 062a90399..dfe593dc2 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -444,6 +444,7 @@ t;
| 223: box.error.INTERFERING_PROMOTE
| 224: box.error.RAFT_DISABLED
| 225: box.error.TXN_ROLLBACK
+ | 226: box.error.SYNC_QUEUE_UNCLAIMED
| ...
test_run:cmd("setopt delimiter ''");
diff --git a/test/replication/gh-5440-qsync-ro.result b/test/replication/gh-5440-qsync-ro.result
deleted file mode 100644
index 1ece26a42..000000000
--- a/test/replication/gh-5440-qsync-ro.result
+++ /dev/null
@@ -1,133 +0,0 @@
--- test-run result file version 2
---
--- gh-5440 everyone but the limbo owner is read-only on non-empty limbo.
---
-env = require('test_run')
- | ---
- | ...
-test_run = env.new()
- | ---
- | ...
-fiber = require('fiber')
- | ---
- | ...
-
-box.schema.user.grant('guest', 'replication')
- | ---
- | ...
-test_run:cmd('create server replica with rpl_master=default, script="replication/replica.lua"')
- | ---
- | - true
- | ...
-test_run:cmd('start server replica with wait=True, wait_load=True')
- | ---
- | - true
- | ...
-
-_ = box.schema.space.create('test', {is_sync=true})
- | ---
- | ...
-_ = box.space.test:create_index('pk')
- | ---
- | ...
-
-old_synchro_quorum = box.cfg.replication_synchro_quorum
- | ---
- | ...
-old_synchro_timeout = box.cfg.replication_synchro_timeout
- | ---
- | ...
-
--- Make sure that the master stalls on commit leaving the limbo non-empty.
-box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=1000}
- | ---
- | ...
-
-f = fiber.new(function() box.space.test:insert{1} end)
- | ---
- | ...
-f:status()
- | ---
- | - suspended
- | ...
-
--- Wait till replica's limbo is non-empty.
-test_run:wait_lsn('replica', 'default')
- | ---
- | ...
-test_run:cmd('switch replica')
- | ---
- | - true
- | ...
-
-box.info.ro
- | ---
- | - true
- | ...
-box.space.test:insert{2}
- | ---
- | - error: Can't modify data because this instance is in read-only mode.
- | ...
-success = false
- | ---
- | ...
-f = require('fiber').new(function() box.ctl.wait_rw() success = true end)
- | ---
- | ...
-f:status()
- | ---
- | - suspended
- | ...
-
-test_run:cmd('switch default')
- | ---
- | - true
- | ...
-
--- Empty the limbo.
-box.cfg{replication_synchro_quorum=2}
- | ---
- | ...
-
-test_run:cmd('switch replica')
- | ---
- | - true
- | ...
-
-test_run:wait_cond(function() return success end)
- | ---
- | - true
- | ...
-box.info.ro
- | ---
- | - false
- | ...
--- Should succeed now.
-box.space.test:insert{2}
- | ---
- | - [2]
- | ...
-
--- Cleanup.
-test_run:cmd('switch default')
- | ---
- | - true
- | ...
-box.cfg{replication_synchro_quorum=old_synchro_quorum,\
- replication_synchro_timeout=old_synchro_timeout}
- | ---
- | ...
-box.space.test:drop()
- | ---
- | ...
-test_run:cmd('stop server replica')
- | ---
- | - true
- | ...
-test_run:cmd('delete server replica')
- | ---
- | - true
- | ...
-box.schema.user.revoke('guest', 'replication')
- | ---
- | ...
diff --git a/test/replication/gh-5440-qsync-ro.test.lua b/test/replication/gh-5440-qsync-ro.test.lua
deleted file mode 100644
index d63ec9c1e..000000000
--- a/test/replication/gh-5440-qsync-ro.test.lua
+++ /dev/null
@@ -1,53 +0,0 @@
---
--- gh-5440 everyone but the limbo owner is read-only on non-empty limbo.
---
-env = require('test_run')
-test_run = env.new()
-fiber = require('fiber')
-
-box.schema.user.grant('guest', 'replication')
-test_run:cmd('create server replica with rpl_master=default, script="replication/replica.lua"')
-test_run:cmd('start server replica with wait=True, wait_load=True')
-
-_ = box.schema.space.create('test', {is_sync=true})
-_ = box.space.test:create_index('pk')
-
-old_synchro_quorum = box.cfg.replication_synchro_quorum
-old_synchro_timeout = box.cfg.replication_synchro_timeout
-
--- Make sure that the master stalls on commit leaving the limbo non-empty.
-box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=1000}
-
-f = fiber.new(function() box.space.test:insert{1} end)
-f:status()
-
--- Wait till replica's limbo is non-empty.
-test_run:wait_lsn('replica', 'default')
-test_run:cmd('switch replica')
-
-box.info.ro
-box.space.test:insert{2}
-success = false
-f = require('fiber').new(function() box.ctl.wait_rw() success = true end)
-f:status()
-
-test_run:cmd('switch default')
-
--- Empty the limbo.
-box.cfg{replication_synchro_quorum=2}
-
-test_run:cmd('switch replica')
-
-test_run:wait_cond(function() return success end)
-box.info.ro
--- Should succeed now.
-box.space.test:insert{2}
-
--- Cleanup.
-test_run:cmd('switch default')
-box.cfg{replication_synchro_quorum=old_synchro_quorum,\
- replication_synchro_timeout=old_synchro_timeout}
-box.space.test:drop()
-test_run:cmd('stop server replica')
-test_run:cmd('delete server replica')
-box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 496b2e104..eb88b9420 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -41,7 +41,6 @@
"gh-4739-vclock-assert.test.lua": {},
"gh-4730-applier-rollback.test.lua": {},
"gh-4928-tx-boundaries.test.lua": {},
- "gh-5440-qsync-ro.test.lua": {},
"gh-5435-qsync-clear-synchro-queue-commit-all.test.lua": {},
"gh-5536-wal-limit.test.lua": {},
"gh-5566-final-join-synchro.test.lua": {},
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 06/12] box: allow calling promote on a candidate
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
` (4 preceding siblings ...)
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 05/12] replication: forbid implicit limbo owner transition Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-07-04 12:14 ` Vladislav Shpilevoy via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
` (7 subsequent siblings)
13 siblings, 1 reply; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Part of #6034
---
src/box/box.cc | 21 ++++-----------------
1 file changed, 4 insertions(+), 17 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index ce37b307d..1d894be97 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1538,11 +1538,6 @@ box_promote(void)
return -1;
}
- /*
- * Do nothing when box isn't configured and when PROMOTE was already
- * written for this term (synchronous replication and leader election
- * are in sync, and both chose this node as a leader).
- */
if (!is_box_configured)
return 0;
bool run_elections = false;
@@ -1558,21 +1553,13 @@ box_promote(void)
"manual elections");
return -1;
case ELECTION_MODE_MANUAL:
- if (box_raft()->state == RAFT_STATE_LEADER)
- return 0;
- run_elections = true;
- break;
case ELECTION_MODE_CANDIDATE:
+ run_elections = box_raft()->state != RAFT_STATE_LEADER;
/*
- * Leader elections are enabled, and this instance is allowed to
- * promote only if it's already an elected leader. No manual
- * elections.
+ * Do nothing when PROMOTE was already written for this term
+ * (synchronous replication and leader election are in sync, and
+ * both chose this node as a leader).
*/
- if (box_raft()->state != RAFT_STATE_LEADER) {
- diag_set(ClientError, ER_UNSUPPORTED, "election_mode="
- "'candidate'", "manual elections");
- return -1;
- }
if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
box_raft()->term)
return 0;
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
` (5 preceding siblings ...)
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 06/12] box: allow calling promote on a candidate Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-07-04 12:27 ` Vladislav Shpilevoy via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 08/12] txn_limbo: persist the latest effective promote in snapshot Serge Petrenko via Tarantool-patches
` (6 subsequent siblings)
13 siblings, 1 reply; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Introduce a new journal entry, DEMOTE. The entry has the same meaning as
PROMOTE, with the only difference that it clears limbo ownership instead
of transferring it to the issuer.
Introduce `box.ctl.demote`, which works exactly like `box.ctl.promote`,
but results in writing DEMOTE instead of PROMOTE.
A new request was necessary instead of simply writing PROMOTE(origin_id
= 0), because origin_id is deduced from row.replica_id, which cannot be
0 for replicated rows (it's always equal to instance_id of the row
originator).
Closes #6034
@TarantoolBot document
Title: box.ctl.demote
`box.ctl.demote()` is a new function, which works exactly like
`box.ctl.promote()`, with one exception that it results in the instance
writing DEMOTE request to WAL instead of a PROMOTE request.
A DEMOTE request (DEMOTE = 32) copies PROMOTE behaviour (it clears the
limbo as well), but clears the synchronous transaction queue ownership instead
of assigning it to a new instance.
---
src/box/box.cc | 48 ++++-
src/box/box.h | 3 +
src/box/iproto_constants.h | 10 +-
src/box/lua/ctl.c | 9 +
src/box/txn_limbo.c | 37 +++-
src/box/txn_limbo.h | 7 +
test/box/error.result | 1 +
test/replication/election_basic.result | 3 +
test/replication/election_basic.test.lua | 1 +
test/replication/election_qsync.result | 3 +
test/replication/election_qsync.test.lua | 1 +
.../gh-5140-qsync-casc-rollback.result | 6 +
.../gh-5140-qsync-casc-rollback.test.lua | 2 +
.../gh-5144-qsync-dup-confirm.result | 6 +
.../gh-5144-qsync-dup-confirm.test.lua | 2 +
.../gh-5163-qsync-restart-crash.result | 6 +
.../gh-5163-qsync-restart-crash.test.lua | 2 +
.../gh-5167-qsync-rollback-snap.result | 6 +
.../gh-5167-qsync-rollback-snap.test.lua | 2 +
.../gh-5195-qsync-replica-write.result | 10 +-
.../gh-5195-qsync-replica-write.test.lua | 6 +-
.../gh-5213-qsync-applier-order-3.result | 9 +
.../gh-5213-qsync-applier-order-3.test.lua | 3 +
.../gh-5213-qsync-applier-order.result | 6 +
.../gh-5213-qsync-applier-order.test.lua | 2 +
.../replication/gh-5288-qsync-recovery.result | 6 +
.../gh-5288-qsync-recovery.test.lua | 2 +
.../gh-5298-qsync-recovery-snap.result | 6 +
.../gh-5298-qsync-recovery-snap.test.lua | 2 +
.../gh-5426-election-on-off.result | 3 +
.../gh-5426-election-on-off.test.lua | 1 +
.../gh-5433-election-restart-recovery.result | 3 +
...gh-5433-election-restart-recovery.test.lua | 1 +
...sync-clear-synchro-queue-commit-all.result | 3 +
...nc-clear-synchro-queue-commit-all.test.lua | 1 +
test/replication/gh-5438-raft-state.result | 3 +
test/replication/gh-5438-raft-state.test.lua | 1 +
.../gh-5446-qsync-eval-quorum.result | 7 +
.../gh-5446-qsync-eval-quorum.test.lua | 3 +
.../gh-5506-election-on-off.result | 3 +
.../gh-5506-election-on-off.test.lua | 1 +
.../gh-5566-final-join-synchro.result | 6 +
.../gh-5566-final-join-synchro.test.lua | 2 +
.../gh-5874-qsync-txn-recovery.result | 6 +
.../gh-5874-qsync-txn-recovery.test.lua | 2 +
.../gh-6032-promote-wal-write.result | 3 +
.../gh-6032-promote-wal-write.test.lua | 1 +
.../gh-6034-limbo-ownership.result | 189 ++++++++++++++++++
.../gh-6034-limbo-ownership.test.lua | 69 +++++++
.../gh-6034-promote-bump-term.result | 3 +
.../gh-6034-promote-bump-term.test.lua | 1 +
.../gh-6057-qsync-confirm-async-no-wal.result | 7 +
...h-6057-qsync-confirm-async-no-wal.test.lua | 3 +
test/replication/hang_on_synchro_fail.result | 6 +
.../replication/hang_on_synchro_fail.test.lua | 2 +
test/replication/qsync_advanced.result | 12 ++
test/replication/qsync_advanced.test.lua | 4 +
test/replication/qsync_basic.result | 33 ++-
test/replication/qsync_basic.test.lua | 16 +-
test/replication/qsync_errinj.result | 6 +
test/replication/qsync_errinj.test.lua | 2 +
test/replication/qsync_snapshots.result | 6 +
test/replication/qsync_snapshots.test.lua | 2 +
test/replication/qsync_with_anon.result | 6 +
test/replication/qsync_with_anon.test.lua | 2 +
test/replication/suite.cfg | 1 +
66 files changed, 579 insertions(+), 48 deletions(-)
create mode 100644 test/replication/gh-6034-limbo-ownership.result
create mode 100644 test/replication/gh-6034-limbo-ownership.test.lua
diff --git a/src/box/box.cc b/src/box/box.cc
index 1d894be97..86c5967b9 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1527,13 +1527,14 @@ box_wait_quorum(uint32_t lead_id, int64_t target_lsn, int quorum,
return 0;
}
-int
-box_promote(void)
+static int
+box_clear_synchro_queue(bool demote)
{
/* A guard to block multiple simultaneous function invocations. */
static bool in_promote = false;
if (in_promote) {
- diag_set(ClientError, ER_UNSUPPORTED, "box.ctl.promote",
+ diag_set(ClientError, ER_UNSUPPORTED,
+ demote ? "box.ctl.demote" : "box.ctl.promote",
"simultaneous invocations");
return -1;
}
@@ -1554,13 +1555,18 @@ box_promote(void)
return -1;
case ELECTION_MODE_MANUAL:
case ELECTION_MODE_CANDIDATE:
- run_elections = box_raft()->state != RAFT_STATE_LEADER;
+ /*
+ * No elections involved in a DEMOTE or when the instance is
+ * already the leader.
+ */
+ run_elections = box_raft()->state != RAFT_STATE_LEADER &&
+ !demote;
/*
* Do nothing when PROMOTE was already written for this term
* (synchronous replication and leader election are in sync, and
* both chose this node as a leader).
*/
- if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
+ if (!demote && txn_limbo_replica_term(&txn_limbo, instance_id) ==
box_raft()->term)
return 0;
@@ -1674,15 +1680,22 @@ box_promote(void)
rc = -1;
} else {
promote:
- if (try_wait) {
+ if (try_wait || demote) {
raft_new_term(box_raft());
if (box_raft_wait_persisted() < 0)
return -1;
}
uint64_t term = box_raft()->term;
- txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
+ if (demote) {
+ txn_limbo_write_demote(&txn_limbo, wait_lsn,
+ term);
+ } else {
+ txn_limbo_write_promote(&txn_limbo, wait_lsn,
+ term);
+ }
+ uint16_t type = demote ? IPROTO_DEMOTE : IPROTO_PROMOTE;
struct synchro_request req = {
- .type = IPROTO_PROMOTE,
+ .type = type,
.replica_id = former_leader_id,
.origin_id = instance_id,
.lsn = wait_lsn,
@@ -1695,6 +1708,25 @@ promote:
return rc;
}
+int
+box_promote(void)
+{
+ return box_clear_synchro_queue(false);
+}
+
+int
+box_demote(void)
+{
+ if (txn_limbo.owner_id == REPLICA_ID_NIL)
+ return 0;
+ if (txn_limbo.owner_id != instance_id) {
+ diag_set(ClientError, ER_SYNC_QUEUE_FOREIGN,
+ txn_limbo.owner_id);
+ return -1;
+ }
+ return box_clear_synchro_queue(true);
+}
+
int
box_listen(void)
{
diff --git a/src/box/box.h b/src/box/box.h
index ecf32240d..aaf20d9dd 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -276,6 +276,9 @@ typedef struct tuple box_tuple_t;
int
box_promote(void);
+int
+box_demote(void);
+
/* box_select is private and used only by FFI */
API_EXPORT int
box_select(uint32_t space_id, uint32_t index_id,
diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
index 137bee9da..3c9edb7d2 100644
--- a/src/box/iproto_constants.h
+++ b/src/box/iproto_constants.h
@@ -241,6 +241,8 @@ enum iproto_type {
IPROTO_RAFT = 30,
/** PROMOTE request. */
IPROTO_PROMOTE = 31,
+ /** DEMOTE request. */
+ IPROTO_DEMOTE = 32,
/** A confirmation message for synchronous transactions. */
IPROTO_CONFIRM = 40,
@@ -310,6 +312,8 @@ iproto_type_name(uint16_t type)
return "RAFT";
case IPROTO_PROMOTE:
return "PROMOTE";
+ case IPROTO_DEMOTE:
+ return "DEMOTE";
case IPROTO_CONFIRM:
return "CONFIRM";
case IPROTO_ROLLBACK:
@@ -364,14 +368,14 @@ static inline bool
iproto_type_is_synchro_request(uint16_t type)
{
return type == IPROTO_CONFIRM || type == IPROTO_ROLLBACK ||
- type == IPROTO_PROMOTE;
+ type == IPROTO_PROMOTE || type == IPROTO_DEMOTE;
}
-/** PROMOTE entry (synchronous replication and leader elections). */
+/** PROMOTE/DEMOTE entry (synchronous replication and leader elections). */
static inline bool
iproto_type_is_promote_request(uint32_t type)
{
- return type == IPROTO_PROMOTE;
+ return type == IPROTO_PROMOTE || type == IPROTO_DEMOTE;
}
static inline bool
diff --git a/src/box/lua/ctl.c b/src/box/lua/ctl.c
index 368b9ab60..a613c4111 100644
--- a/src/box/lua/ctl.c
+++ b/src/box/lua/ctl.c
@@ -89,6 +89,14 @@ lbox_ctl_promote(struct lua_State *L)
return 0;
}
+static int
+lbox_ctl_demote(struct lua_State *L)
+{
+ if (box_demote() != 0)
+ return luaT_error(L);
+ return 0;
+}
+
static int
lbox_ctl_is_recovery_finished(struct lua_State *L)
{
@@ -127,6 +135,7 @@ static const struct luaL_Reg lbox_ctl_lib[] = {
{"promote", lbox_ctl_promote},
/* An old alias. */
{"clear_synchro_queue", lbox_ctl_promote},
+ {"demote", lbox_ctl_demote},
{"is_recovery_finished", lbox_ctl_is_recovery_finished},
{"set_on_shutdown_timeout", lbox_ctl_set_on_shutdown_timeout},
{NULL, NULL}
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 996f1a3fc..d21f05557 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -508,6 +508,29 @@ txn_limbo_read_promote(struct txn_limbo *limbo, uint32_t replica_id,
limbo->confirmed_lsn = 0;
}
+void
+txn_limbo_write_demote(struct txn_limbo *limbo, int64_t lsn, uint64_t term)
+{
+ limbo->confirmed_lsn = lsn;
+ limbo->is_in_rollback = true;
+ struct txn_limbo_entry *e = txn_limbo_last_synchro_entry(limbo);
+ assert(e == NULL || e->lsn <= lsn);
+ (void)e;
+ txn_limbo_write_synchro(limbo, IPROTO_DEMOTE, lsn, term);
+ limbo->is_in_rollback = false;
+}
+
+/**
+ * Process a DEMOTE request, which's like PROMOTE, but clears the limbo
+ * ownership.
+ * @sa txn_limbo_read_promote.
+ */
+static void
+txn_limbo_read_demote(struct txn_limbo *limbo, int64_t lsn)
+{
+ return txn_limbo_read_promote(limbo, REPLICA_ID_NIL, lsn);
+}
+
void
txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
{
@@ -659,12 +682,13 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
vclock_follow(&limbo->promote_term_map, origin, term);
if (term > limbo->promote_greatest_term)
limbo->promote_greatest_term = term;
- } else if (req->type == IPROTO_PROMOTE &&
+ } else if (iproto_type_is_promote_request(req->type) &&
limbo->promote_greatest_term > 1) {
/* PROMOTE for outdated term. Ignore. */
- say_info("RAFT: ignoring PROMOTE request from instance "
+ say_info("RAFT: ignoring %s request from instance "
"id %u for term %llu. Greatest term seen "
- "before (%llu) is bigger.", origin, (long long)term,
+ "before (%llu) is bigger.",
+ iproto_type_name(req->type), origin, (long long)term,
(long long)limbo->promote_greatest_term);
return;
}
@@ -675,7 +699,7 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
* The limbo was empty on the instance issuing the request.
* This means this instance must empty its limbo as well.
*/
- assert(lsn == 0 && req->type == IPROTO_PROMOTE);
+ assert(lsn == 0 && iproto_type_is_promote_request(req->type));
} else if (req->replica_id != limbo->owner_id) {
/*
* Ignore CONFIRM/ROLLBACK messages for a foreign master.
@@ -683,7 +707,7 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
* data from an old leader, who has just started and written
* confirm right on synchronous transaction recovery.
*/
- if (req->type != IPROTO_PROMOTE)
+ if (!iproto_type_is_promote_request(req->type))
return;
/*
* Promote has a bigger term, and tries to steal the limbo. It
@@ -703,6 +727,9 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
case IPROTO_PROMOTE:
txn_limbo_read_promote(limbo, req->origin_id, lsn);
break;
+ case IPROTO_DEMOTE:
+ txn_limbo_read_demote(limbo, lsn);
+ break;
default:
unreachable();
}
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index e409ac657..801a1a0ee 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -318,6 +318,13 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo);
void
txn_limbo_write_promote(struct txn_limbo *limbo, int64_t lsn, uint64_t term);
+/**
+ * Write a DEMOTE request.
+ * It has the same effect as PROMOTE and additionally clears limbo ownership.
+ */
+void
+txn_limbo_write_demote(struct txn_limbo *limbo, int64_t lsn, uint64_t term);
+
/**
* Update qsync parameters dynamically.
*/
diff --git a/test/box/error.result b/test/box/error.result
index dfe593dc2..8f783f927 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -445,6 +445,7 @@ t;
| 224: box.error.RAFT_DISABLED
| 225: box.error.TXN_ROLLBACK
| 226: box.error.SYNC_QUEUE_UNCLAIMED
+ | 227: box.error.SYNC_QUEUE_FOREIGN
| ...
test_run:cmd("setopt delimiter ''");
diff --git a/test/replication/election_basic.result b/test/replication/election_basic.result
index b64028c60..c0323a042 100644
--- a/test/replication/election_basic.result
+++ b/test/replication/election_basic.result
@@ -114,6 +114,9 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
--
-- See if bootstrap with election enabled works.
diff --git a/test/replication/election_basic.test.lua b/test/replication/election_basic.test.lua
index 77fdf6340..085a499d5 100644
--- a/test/replication/election_basic.test.lua
+++ b/test/replication/election_basic.test.lua
@@ -43,6 +43,7 @@ box.cfg{
election_mode = 'off', \
election_timeout = old_election_timeout \
}
+box.ctl.demote()
--
-- See if bootstrap with election enabled works.
diff --git a/test/replication/election_qsync.result b/test/replication/election_qsync.result
index c06400b38..2402c8578 100644
--- a/test/replication/election_qsync.result
+++ b/test/replication/election_qsync.result
@@ -165,6 +165,9 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
box.schema.user.revoke('guest', 'super')
| ---
| ...
diff --git a/test/replication/election_qsync.test.lua b/test/replication/election_qsync.test.lua
index ea6fc4a61..e1aca8351 100644
--- a/test/replication/election_qsync.test.lua
+++ b/test/replication/election_qsync.test.lua
@@ -84,4 +84,5 @@ box.cfg{
replication = old_replication, \
replication_synchro_timeout = old_replication_synchro_timeout, \
}
+box.ctl.demote()
box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5140-qsync-casc-rollback.result b/test/replication/gh-5140-qsync-casc-rollback.result
index da77631dd..d3208e1a4 100644
--- a/test/replication/gh-5140-qsync-casc-rollback.result
+++ b/test/replication/gh-5140-qsync-casc-rollback.result
@@ -73,6 +73,9 @@ _ = box.schema.space.create('async', {is_sync=false, engine = engine})
_ = _:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
-- Write something to flush the master state to replica.
box.space.sync:replace{1}
| ---
@@ -222,3 +225,6 @@ test_run:cmd('delete server replica')
box.schema.user.revoke('guest', 'super')
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5140-qsync-casc-rollback.test.lua b/test/replication/gh-5140-qsync-casc-rollback.test.lua
index 69fc9ad02..96ddfd260 100644
--- a/test/replication/gh-5140-qsync-casc-rollback.test.lua
+++ b/test/replication/gh-5140-qsync-casc-rollback.test.lua
@@ -48,6 +48,7 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
_ = _:create_index('pk')
_ = box.schema.space.create('async', {is_sync=false, engine = engine})
_ = _:create_index('pk')
+box.ctl.promote()
-- Write something to flush the master state to replica.
box.space.sync:replace{1}
@@ -103,3 +104,4 @@ test_run:cmd('stop server replica')
test_run:cmd('delete server replica')
box.schema.user.revoke('guest', 'super')
+box.ctl.demote()
diff --git a/test/replication/gh-5144-qsync-dup-confirm.result b/test/replication/gh-5144-qsync-dup-confirm.result
index 9d265d9ff..217e44412 100644
--- a/test/replication/gh-5144-qsync-dup-confirm.result
+++ b/test/replication/gh-5144-qsync-dup-confirm.result
@@ -46,6 +46,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
_ = _:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
-- Remember the current LSN. In the end, when the following synchronous
-- transaction is committed, result LSN should be this value +2: for the
@@ -148,6 +151,9 @@ test_run:cmd('delete server replica2')
| - true
| ...
+box.ctl.demote()
+ | ---
+ | ...
box.schema.user.revoke('guest', 'super')
| ---
| ...
diff --git a/test/replication/gh-5144-qsync-dup-confirm.test.lua b/test/replication/gh-5144-qsync-dup-confirm.test.lua
index 01a8351e0..1d6af2c62 100644
--- a/test/replication/gh-5144-qsync-dup-confirm.test.lua
+++ b/test/replication/gh-5144-qsync-dup-confirm.test.lua
@@ -19,6 +19,7 @@ box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
_ = _:create_index('pk')
+box.ctl.promote()
-- Remember the current LSN. In the end, when the following synchronous
-- transaction is committed, result LSN should be this value +2: for the
@@ -69,4 +70,5 @@ test_run:cmd('delete server replica1')
test_run:cmd('stop server replica2')
test_run:cmd('delete server replica2')
+box.ctl.demote()
box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5163-qsync-restart-crash.result b/test/replication/gh-5163-qsync-restart-crash.result
index e57bc76d1..1b4d3d9b5 100644
--- a/test/replication/gh-5163-qsync-restart-crash.result
+++ b/test/replication/gh-5163-qsync-restart-crash.result
@@ -16,6 +16,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
box.space.sync:replace{1}
| ---
@@ -30,3 +33,6 @@ box.space.sync:select{}
box.space.sync:drop()
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5163-qsync-restart-crash.test.lua b/test/replication/gh-5163-qsync-restart-crash.test.lua
index d5aca4749..c8d54aad2 100644
--- a/test/replication/gh-5163-qsync-restart-crash.test.lua
+++ b/test/replication/gh-5163-qsync-restart-crash.test.lua
@@ -7,8 +7,10 @@ engine = test_run:get_cfg('engine')
--
_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
+box.ctl.promote()
box.space.sync:replace{1}
test_run:cmd('restart server default')
box.space.sync:select{}
box.space.sync:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-5167-qsync-rollback-snap.result b/test/replication/gh-5167-qsync-rollback-snap.result
index 06f58526c..13166720f 100644
--- a/test/replication/gh-5167-qsync-rollback-snap.result
+++ b/test/replication/gh-5167-qsync-rollback-snap.result
@@ -41,6 +41,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
_ = box.space.sync:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
-- Write something to flush the current master's state to replica.
_ = box.space.sync:insert{1}
| ---
@@ -163,3 +166,6 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5167-qsync-rollback-snap.test.lua b/test/replication/gh-5167-qsync-rollback-snap.test.lua
index 475727e61..1a2a31b7c 100644
--- a/test/replication/gh-5167-qsync-rollback-snap.test.lua
+++ b/test/replication/gh-5167-qsync-rollback-snap.test.lua
@@ -16,6 +16,7 @@ fiber = require('fiber')
box.cfg{replication_synchro_quorum = 2, replication_synchro_timeout = 1000}
_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
_ = box.space.sync:create_index('pk')
+box.ctl.promote()
-- Write something to flush the current master's state to replica.
_ = box.space.sync:insert{1}
_ = box.space.sync:delete{1}
@@ -65,3 +66,4 @@ box.cfg{
replication_synchro_quorum = orig_synchro_quorum, \
replication_synchro_timeout = orig_synchro_timeout, \
}
+box.ctl.demote()
diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result
index 85e00e6ed..bc73bb599 100644
--- a/test/replication/gh-5195-qsync-replica-write.result
+++ b/test/replication/gh-5195-qsync-replica-write.result
@@ -40,6 +40,9 @@ _ = box.schema.space.create('sync', {engine = engine, is_sync = true})
_ = box.space.sync:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
box.cfg{replication_synchro_timeout = 1000, replication_synchro_quorum = 3}
| ---
@@ -71,12 +74,12 @@ test_run:wait_lsn('replica', 'default')
| ---
| ...
-- Normal DML is blocked - the limbo is not empty and does not belong to the
--- replica. But synchro queue cleanup also does a WAL write, and propagates LSN
+-- replica. But promote also does a WAL write, and propagates LSN
-- of the instance.
box.cfg{replication_synchro_timeout = 0.001}
| ---
| ...
-box.ctl.clear_synchro_queue()
+box.ctl.promote()
| ---
| ...
@@ -144,6 +147,9 @@ test_run:cmd('delete server replica')
| - true
| ...
+box.ctl.demote()
+ | ---
+ | ...
box.space.sync:drop()
| ---
| ...
diff --git a/test/replication/gh-5195-qsync-replica-write.test.lua b/test/replication/gh-5195-qsync-replica-write.test.lua
index 64c48be99..a59ec154e 100644
--- a/test/replication/gh-5195-qsync-replica-write.test.lua
+++ b/test/replication/gh-5195-qsync-replica-write.test.lua
@@ -17,6 +17,7 @@ test_run:cmd('start server replica with wait=True, wait_load=True')
--
_ = box.schema.space.create('sync', {engine = engine, is_sync = true})
_ = box.space.sync:create_index('pk')
+box.ctl.promote()
box.cfg{replication_synchro_timeout = 1000, replication_synchro_quorum = 3}
lsn = box.info.lsn
@@ -30,10 +31,10 @@ test_run:wait_cond(function() return box.info.lsn == lsn end)
test_run:switch('replica')
test_run:wait_lsn('replica', 'default')
-- Normal DML is blocked - the limbo is not empty and does not belong to the
--- replica. But synchro queue cleanup also does a WAL write, and propagates LSN
+-- replica. But promote also does a WAL write, and propagates LSN
-- of the instance.
box.cfg{replication_synchro_timeout = 0.001}
-box.ctl.clear_synchro_queue()
+box.ctl.promote()
test_run:switch('default')
-- Wait second ACK receipt.
@@ -59,6 +60,7 @@ test_run:switch('default')
test_run:cmd('stop server replica')
test_run:cmd('delete server replica')
+box.ctl.demote()
box.space.sync:drop()
box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5213-qsync-applier-order-3.result b/test/replication/gh-5213-qsync-applier-order-3.result
index bcb18b5c0..e788eec77 100644
--- a/test/replication/gh-5213-qsync-applier-order-3.result
+++ b/test/replication/gh-5213-qsync-applier-order-3.result
@@ -45,6 +45,9 @@ s = box.schema.space.create('test', {is_sync = true})
_ = s:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
test_run:cmd('create server replica1 with rpl_master=default,\
script="replication/replica1.lua"')
@@ -179,6 +182,9 @@ box.cfg{
-- Replica2 takes the limbo ownership and sends the transaction to the replica1.
-- Along with the CONFIRM from the default node, which is still not applied
-- on the replica1.
+box.ctl.promote()
+ | ---
+ | ...
fiber = require('fiber')
| ---
| ...
@@ -261,3 +267,6 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5213-qsync-applier-order-3.test.lua b/test/replication/gh-5213-qsync-applier-order-3.test.lua
index 37b569da7..304656de0 100644
--- a/test/replication/gh-5213-qsync-applier-order-3.test.lua
+++ b/test/replication/gh-5213-qsync-applier-order-3.test.lua
@@ -30,6 +30,7 @@ box.schema.user.grant('guest', 'super')
s = box.schema.space.create('test', {is_sync = true})
_ = s:create_index('pk')
+box.ctl.promote()
test_run:cmd('create server replica1 with rpl_master=default,\
script="replication/replica1.lua"')
@@ -90,6 +91,7 @@ box.cfg{
-- Replica2 takes the limbo ownership and sends the transaction to the replica1.
-- Along with the CONFIRM from the default node, which is still not applied
-- on the replica1.
+box.ctl.promote()
fiber = require('fiber')
f = fiber.new(function() box.space.test:replace{2} end)
@@ -123,3 +125,4 @@ box.cfg{
replication_synchro_quorum = old_synchro_quorum, \
replication_synchro_timeout = old_synchro_timeout, \
}
+box.ctl.demote()
diff --git a/test/replication/gh-5213-qsync-applier-order.result b/test/replication/gh-5213-qsync-applier-order.result
index a8c24c289..ba6cdab06 100644
--- a/test/replication/gh-5213-qsync-applier-order.result
+++ b/test/replication/gh-5213-qsync-applier-order.result
@@ -29,6 +29,9 @@ s = box.schema.space.create('test', {is_sync = true})
_ = s:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
test_run:cmd('create server replica with rpl_master=default,\
script="replication/gh-5213-replica.lua"')
@@ -300,3 +303,6 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5213-qsync-applier-order.test.lua b/test/replication/gh-5213-qsync-applier-order.test.lua
index f1eccfa84..39b1912e8 100644
--- a/test/replication/gh-5213-qsync-applier-order.test.lua
+++ b/test/replication/gh-5213-qsync-applier-order.test.lua
@@ -14,6 +14,7 @@ box.schema.user.grant('guest', 'super')
s = box.schema.space.create('test', {is_sync = true})
_ = s:create_index('pk')
+box.ctl.promote()
test_run:cmd('create server replica with rpl_master=default,\
script="replication/gh-5213-replica.lua"')
@@ -120,3 +121,4 @@ box.cfg{
replication_synchro_quorum = old_synchro_quorum, \
replication_synchro_timeout = old_synchro_timeout, \
}
+box.ctl.demote()
diff --git a/test/replication/gh-5288-qsync-recovery.result b/test/replication/gh-5288-qsync-recovery.result
index dc0babef6..704b71d93 100644
--- a/test/replication/gh-5288-qsync-recovery.result
+++ b/test/replication/gh-5288-qsync-recovery.result
@@ -12,6 +12,9 @@ s = box.schema.space.create('sync', {is_sync = true})
_ = s:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
s:insert{1}
| ---
| - [1]
@@ -25,3 +28,6 @@ test_run:cmd('restart server default')
box.space.sync:drop()
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5288-qsync-recovery.test.lua b/test/replication/gh-5288-qsync-recovery.test.lua
index 00bff7b87..2455f7278 100644
--- a/test/replication/gh-5288-qsync-recovery.test.lua
+++ b/test/replication/gh-5288-qsync-recovery.test.lua
@@ -5,7 +5,9 @@ test_run = require('test_run').new()
--
s = box.schema.space.create('sync', {is_sync = true})
_ = s:create_index('pk')
+box.ctl.promote()
s:insert{1}
box.snapshot()
test_run:cmd('restart server default')
box.space.sync:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-5298-qsync-recovery-snap.result b/test/replication/gh-5298-qsync-recovery-snap.result
index 922831552..0883fe5f5 100644
--- a/test/replication/gh-5298-qsync-recovery-snap.result
+++ b/test/replication/gh-5298-qsync-recovery-snap.result
@@ -17,6 +17,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
_ = box.space.sync:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
for i = 1, 10 do box.space.sync:replace{i} end
| ---
| ...
@@ -98,3 +101,6 @@ box.space.sync:drop()
box.space.loc:drop()
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5298-qsync-recovery-snap.test.lua b/test/replication/gh-5298-qsync-recovery-snap.test.lua
index 187f60d75..084cde963 100644
--- a/test/replication/gh-5298-qsync-recovery-snap.test.lua
+++ b/test/replication/gh-5298-qsync-recovery-snap.test.lua
@@ -8,6 +8,7 @@ engine = test_run:get_cfg('engine')
--
_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
_ = box.space.sync:create_index('pk')
+box.ctl.promote()
for i = 1, 10 do box.space.sync:replace{i} end
-- Local rows could affect this by increasing the signature.
@@ -46,3 +47,4 @@ box.cfg{
}
box.space.sync:drop()
box.space.loc:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-5426-election-on-off.result b/test/replication/gh-5426-election-on-off.result
index 7444ef7f2..2bdc17ec6 100644
--- a/test/replication/gh-5426-election-on-off.result
+++ b/test/replication/gh-5426-election-on-off.result
@@ -168,6 +168,9 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
box.schema.user.revoke('guest', 'super')
| ---
| ...
diff --git a/test/replication/gh-5426-election-on-off.test.lua b/test/replication/gh-5426-election-on-off.test.lua
index bdf06903b..6277e9ef2 100644
--- a/test/replication/gh-5426-election-on-off.test.lua
+++ b/test/replication/gh-5426-election-on-off.test.lua
@@ -69,4 +69,5 @@ box.cfg{
election_mode = old_election_mode, \
replication_timeout = old_replication_timeout, \
}
+box.ctl.demote()
box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5433-election-restart-recovery.result b/test/replication/gh-5433-election-restart-recovery.result
index f8f32416e..ed63ff409 100644
--- a/test/replication/gh-5433-election-restart-recovery.result
+++ b/test/replication/gh-5433-election-restart-recovery.result
@@ -169,6 +169,9 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
box.schema.user.revoke('guest', 'super')
| ---
| ...
diff --git a/test/replication/gh-5433-election-restart-recovery.test.lua b/test/replication/gh-5433-election-restart-recovery.test.lua
index 4aff000bf..ae1f42c4d 100644
--- a/test/replication/gh-5433-election-restart-recovery.test.lua
+++ b/test/replication/gh-5433-election-restart-recovery.test.lua
@@ -84,4 +84,5 @@ box.cfg{
election_mode = old_election_mode, \
replication_timeout = old_replication_timeout, \
}
+box.ctl.demote()
box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result
index 2699231e5..20fab4072 100644
--- a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result
+++ b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.result
@@ -49,6 +49,9 @@ _ = box.schema.space.create('test', {is_sync=true})
_ = box.space.test:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
-- Fill the limbo with pending entries. 3 mustn't receive them yet.
test_run:cmd('stop server election_replica3')
diff --git a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua
index 03705d96c..ec0f1d77e 100644
--- a/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua
+++ b/test/replication/gh-5435-qsync-clear-synchro-queue-commit-all.test.lua
@@ -21,6 +21,7 @@ box.ctl.wait_rw()
_ = box.schema.space.create('test', {is_sync=true})
_ = box.space.test:create_index('pk')
+box.ctl.promote()
-- Fill the limbo with pending entries. 3 mustn't receive them yet.
test_run:cmd('stop server election_replica3')
diff --git a/test/replication/gh-5438-raft-state.result b/test/replication/gh-5438-raft-state.result
index 6985f026a..68b6bfad8 100644
--- a/test/replication/gh-5438-raft-state.result
+++ b/test/replication/gh-5438-raft-state.result
@@ -47,6 +47,9 @@ end)
| ...
-- Cleanup.
+box.ctl.demote()
+ | ---
+ | ...
box.cfg{election_mode = old_election_mode}
| ---
| ...
diff --git a/test/replication/gh-5438-raft-state.test.lua b/test/replication/gh-5438-raft-state.test.lua
index 60c3366c1..cf0f4ca23 100644
--- a/test/replication/gh-5438-raft-state.test.lua
+++ b/test/replication/gh-5438-raft-state.test.lua
@@ -22,6 +22,7 @@ test_run:wait_cond(function()\
end)
-- Cleanup.
+box.ctl.demote()
box.cfg{election_mode = old_election_mode}
test_run:cmd('stop server replica')
test_run:cmd('delete server replica')
diff --git a/test/replication/gh-5446-qsync-eval-quorum.result b/test/replication/gh-5446-qsync-eval-quorum.result
index 5f83b248c..1173128a7 100644
--- a/test/replication/gh-5446-qsync-eval-quorum.result
+++ b/test/replication/gh-5446-qsync-eval-quorum.result
@@ -88,6 +88,9 @@ s = box.schema.space.create('sync', {is_sync = true, engine = engine})
_ = s:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
-- Only one master node -> 1/2 + 1 = 1
s:insert{1} -- should pass
@@ -343,3 +346,7 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
+
diff --git a/test/replication/gh-5446-qsync-eval-quorum.test.lua b/test/replication/gh-5446-qsync-eval-quorum.test.lua
index 6b9e324ed..b969df836 100644
--- a/test/replication/gh-5446-qsync-eval-quorum.test.lua
+++ b/test/replication/gh-5446-qsync-eval-quorum.test.lua
@@ -37,6 +37,7 @@ end
-- Create a sync space we will operate on
s = box.schema.space.create('sync', {is_sync = true, engine = engine})
_ = s:create_index('pk')
+box.ctl.promote()
-- Only one master node -> 1/2 + 1 = 1
s:insert{1} -- should pass
@@ -135,3 +136,5 @@ box.cfg{
replication_synchro_quorum = old_synchro_quorum, \
replication_synchro_timeout = old_synchro_timeout, \
}
+box.ctl.demote()
+
diff --git a/test/replication/gh-5506-election-on-off.result b/test/replication/gh-5506-election-on-off.result
index b8abd7ecd..a7f2b6a9c 100644
--- a/test/replication/gh-5506-election-on-off.result
+++ b/test/replication/gh-5506-election-on-off.result
@@ -138,3 +138,6 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5506-election-on-off.test.lua b/test/replication/gh-5506-election-on-off.test.lua
index 476b00ec0..f8915c333 100644
--- a/test/replication/gh-5506-election-on-off.test.lua
+++ b/test/replication/gh-5506-election-on-off.test.lua
@@ -66,3 +66,4 @@ box.cfg{
election_mode = old_election_mode, \
replication_timeout = old_replication_timeout, \
}
+box.ctl.demote()
diff --git a/test/replication/gh-5566-final-join-synchro.result b/test/replication/gh-5566-final-join-synchro.result
index a09882ba6..c5ae2f283 100644
--- a/test/replication/gh-5566-final-join-synchro.result
+++ b/test/replication/gh-5566-final-join-synchro.result
@@ -12,6 +12,9 @@ _ = box.schema.space.create('sync', {is_sync=true})
_ = box.space.sync:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
box.schema.user.grant('guest', 'replication')
| ---
@@ -137,3 +140,6 @@ test_run:cleanup_cluster()
box.schema.user.revoke('guest', 'replication')
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5566-final-join-synchro.test.lua b/test/replication/gh-5566-final-join-synchro.test.lua
index 2db2c742f..25f411407 100644
--- a/test/replication/gh-5566-final-join-synchro.test.lua
+++ b/test/replication/gh-5566-final-join-synchro.test.lua
@@ -5,6 +5,7 @@ test_run = require('test_run').new()
--
_ = box.schema.space.create('sync', {is_sync=true})
_ = box.space.sync:create_index('pk')
+box.ctl.promote()
box.schema.user.grant('guest', 'replication')
box.schema.user.grant('guest', 'write', 'space', 'sync')
@@ -59,3 +60,4 @@ box.cfg{\
box.space.sync:drop()
test_run:cleanup_cluster()
box.schema.user.revoke('guest', 'replication')
+box.ctl.demote()
diff --git a/test/replication/gh-5874-qsync-txn-recovery.result b/test/replication/gh-5874-qsync-txn-recovery.result
index 73f903ca7..01328a9e3 100644
--- a/test/replication/gh-5874-qsync-txn-recovery.result
+++ b/test/replication/gh-5874-qsync-txn-recovery.result
@@ -31,6 +31,9 @@ sync = box.schema.create_space('sync', {is_sync = true, engine = engine})
_ = sync:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
-- The transaction fails, but is written to the log anyway.
box.begin() async:insert{1} sync:insert{1} box.commit()
@@ -160,3 +163,6 @@ sync:drop()
loc:drop()
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-5874-qsync-txn-recovery.test.lua b/test/replication/gh-5874-qsync-txn-recovery.test.lua
index f35eb68de..6ddf164ac 100644
--- a/test/replication/gh-5874-qsync-txn-recovery.test.lua
+++ b/test/replication/gh-5874-qsync-txn-recovery.test.lua
@@ -12,6 +12,7 @@ async = box.schema.create_space('async', {engine = engine})
_ = async:create_index('pk')
sync = box.schema.create_space('sync', {is_sync = true, engine = engine})
_ = sync:create_index('pk')
+box.ctl.promote()
-- The transaction fails, but is written to the log anyway.
box.begin() async:insert{1} sync:insert{1} box.commit()
@@ -82,3 +83,4 @@ loc:select()
async:drop()
sync:drop()
loc:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-6032-promote-wal-write.result b/test/replication/gh-6032-promote-wal-write.result
index 246c7974f..03112fb8d 100644
--- a/test/replication/gh-6032-promote-wal-write.result
+++ b/test/replication/gh-6032-promote-wal-write.result
@@ -67,3 +67,6 @@ box.cfg{\
box.space.sync:drop()
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-6032-promote-wal-write.test.lua b/test/replication/gh-6032-promote-wal-write.test.lua
index 8c1859083..9a036a8b4 100644
--- a/test/replication/gh-6032-promote-wal-write.test.lua
+++ b/test/replication/gh-6032-promote-wal-write.test.lua
@@ -26,3 +26,4 @@ box.cfg{\
replication_synchro_timeout = replication_synchro_timeout,\
}
box.space.sync:drop()
+box.ctl.demote()
diff --git a/test/replication/gh-6034-limbo-ownership.result b/test/replication/gh-6034-limbo-ownership.result
new file mode 100644
index 000000000..e412b8d53
--- /dev/null
+++ b/test/replication/gh-6034-limbo-ownership.result
@@ -0,0 +1,189 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+
+--
+-- gh-6034: test that transactional limbo isn't accessible without a promotion.
+--
+synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+election_mode = box.cfg.election_mode
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum = 1, election_mode='off'}
+ | ---
+ | ...
+
+_ = box.schema.space.create('async'):create_index('pk')
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true}):create_index('pk')
+ | ---
+ | ...
+
+-- Limbo is initially unclaimed, everyone is writeable.
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == 0)
+ | ---
+ | - true
+ | ...
+box.space.async:insert{1} -- success.
+ | ---
+ | - [1]
+ | ...
+-- Synchro spaces aren't writeable
+box.space.sync:insert{1} -- error.
+ | ---
+ | - error: The synchronous transaction queue doesn't belong to any instance
+ | ...
+
+box.ctl.promote()
+ | ---
+ | ...
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == box.info.id)
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{1} -- success.
+ | ---
+ | - [1]
+ | ...
+
+-- Everyone but the limbo owner is read-only.
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+test_run:cmd('set variable rpl_listen to "replica.listen"')
+ | ---
+ | - true
+ | ...
+orig_replication = box.cfg.replication
+ | ---
+ | ...
+box.cfg{replication=rpl_listen}
+ | ---
+ | ...
+
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+assert(box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == test_run:eval('default', 'return box.info.id')[1])
+ | ---
+ | - true
+ | ...
+box.space.async:insert{2} -- failure.
+ | ---
+ | - error: Can't modify data because this instance is in read-only mode.
+ | ...
+
+-- Promotion on the other node. Default should become ro.
+box.ctl.promote()
+ | ---
+ | ...
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == box.info.id)
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{2} -- success.
+ | ---
+ | - [2]
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+assert(box.info.ro)
+ | ---
+ | - true
+ | ...
+assert(box.info.synchro.queue.owner == test_run:eval('replica', 'return box.info.id')[1])
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{3} -- failure.
+ | ---
+ | - error: Can't modify data because this instance is in read-only mode.
+ | ...
+
+box.ctl.promote()
+ | ---
+ | ...
+box.ctl.demote()
+ | ---
+ | ...
+assert(not box.info.ro)
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{3} -- still fails.
+ | ---
+ | - error: The synchronous transaction queue doesn't belong to any instance
+ | ...
+assert(box.info.synchro.queue.owner == 0)
+ | ---
+ | - true
+ | ...
+box.space.async:insert{3} -- success.
+ | ---
+ | - [3]
+ | ...
+
+-- Cleanup.
+box.ctl.demote()
+ | ---
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+box.space.async:drop()
+ | ---
+ | ...
+box.cfg{\
+ replication_synchro_quorum = synchro_quorum,\
+ election_mode = election_mode,\
+ replication = orig_replication,\
+}
+ | ---
+ | ...
diff --git a/test/replication/gh-6034-limbo-ownership.test.lua b/test/replication/gh-6034-limbo-ownership.test.lua
new file mode 100644
index 000000000..c17e12fa4
--- /dev/null
+++ b/test/replication/gh-6034-limbo-ownership.test.lua
@@ -0,0 +1,69 @@
+test_run = require('test_run').new()
+fiber = require('fiber')
+
+--
+-- gh-6034: test that transactional limbo isn't accessible without a promotion.
+--
+synchro_quorum = box.cfg.replication_synchro_quorum
+election_mode = box.cfg.election_mode
+box.cfg{replication_synchro_quorum = 1, election_mode='off'}
+
+_ = box.schema.space.create('async'):create_index('pk')
+_ = box.schema.space.create('sync', {is_sync=true}):create_index('pk')
+
+-- Limbo is initially unclaimed, everyone is writeable.
+assert(not box.info.ro)
+assert(box.info.synchro.queue.owner == 0)
+box.space.async:insert{1} -- success.
+-- Synchro spaces aren't writeable
+box.space.sync:insert{1} -- error.
+
+box.ctl.promote()
+assert(not box.info.ro)
+assert(box.info.synchro.queue.owner == box.info.id)
+box.space.sync:insert{1} -- success.
+
+-- Everyone but the limbo owner is read-only.
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+test_run:cmd('set variable rpl_listen to "replica.listen"')
+orig_replication = box.cfg.replication
+box.cfg{replication=rpl_listen}
+
+test_run:switch('replica')
+assert(box.info.ro)
+assert(box.info.synchro.queue.owner == test_run:eval('default', 'return box.info.id')[1])
+box.space.async:insert{2} -- failure.
+
+-- Promotion on the other node. Default should become ro.
+box.ctl.promote()
+assert(not box.info.ro)
+assert(box.info.synchro.queue.owner == box.info.id)
+box.space.sync:insert{2} -- success.
+
+test_run:switch('default')
+assert(box.info.ro)
+assert(box.info.synchro.queue.owner == test_run:eval('replica', 'return box.info.id')[1])
+box.space.sync:insert{3} -- failure.
+
+box.ctl.promote()
+box.ctl.demote()
+assert(not box.info.ro)
+box.space.sync:insert{3} -- still fails.
+assert(box.info.synchro.queue.owner == 0)
+box.space.async:insert{3} -- success.
+
+-- Cleanup.
+box.ctl.demote()
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.schema.user.revoke('guest', 'replication')
+box.space.sync:drop()
+box.space.async:drop()
+box.cfg{\
+ replication_synchro_quorum = synchro_quorum,\
+ election_mode = election_mode,\
+ replication = orig_replication,\
+}
diff --git a/test/replication/gh-6034-promote-bump-term.result b/test/replication/gh-6034-promote-bump-term.result
index 20e352922..e5087507c 100644
--- a/test/replication/gh-6034-promote-bump-term.result
+++ b/test/replication/gh-6034-promote-bump-term.result
@@ -32,6 +32,9 @@ assert(box.info.election.term == term + 2)
| ...
-- Cleanup.
+box.ctl.demote()
+ | ---
+ | ...
box.cfg{election_mode=election_mode}
| ---
| ...
diff --git a/test/replication/gh-6034-promote-bump-term.test.lua b/test/replication/gh-6034-promote-bump-term.test.lua
index 5847dbb8f..af18853f3 100644
--- a/test/replication/gh-6034-promote-bump-term.test.lua
+++ b/test/replication/gh-6034-promote-bump-term.test.lua
@@ -13,4 +13,5 @@ box.ctl.promote()
assert(box.info.election.term == term + 2)
-- Cleanup.
+box.ctl.demote()
box.cfg{election_mode=election_mode}
diff --git a/test/replication/gh-6057-qsync-confirm-async-no-wal.result b/test/replication/gh-6057-qsync-confirm-async-no-wal.result
index 23c77729b..e7beefb2a 100644
--- a/test/replication/gh-6057-qsync-confirm-async-no-wal.result
+++ b/test/replication/gh-6057-qsync-confirm-async-no-wal.result
@@ -40,6 +40,10 @@ _ = s2:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
+
errinj = box.error.injection
| ---
| ...
@@ -161,3 +165,6 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua b/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
index a11ddc042..bb459ea02 100644
--- a/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
+++ b/test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua
@@ -21,6 +21,8 @@ _ = s:create_index('pk')
s2 = box.schema.create_space('test2')
_ = s2:create_index('pk')
+box.ctl.promote()
+
errinj = box.error.injection
function create_hanging_async_after_confirm(sync_key, async_key1, async_key2) \
@@ -86,3 +88,4 @@ box.cfg{
replication_synchro_quorum = old_synchro_quorum, \
replication_synchro_timeout = old_synchro_timeout, \
}
+box.ctl.demote()
diff --git a/test/replication/hang_on_synchro_fail.result b/test/replication/hang_on_synchro_fail.result
index 9f6fac00b..dda15af20 100644
--- a/test/replication/hang_on_synchro_fail.result
+++ b/test/replication/hang_on_synchro_fail.result
@@ -19,6 +19,9 @@ _ = box.schema.space.create('sync', {is_sync=true})
_ = box.space.sync:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
old_synchro_quorum = box.cfg.replication_synchro_quorum
| ---
@@ -127,4 +130,7 @@ box.space.sync:drop()
box.schema.user.revoke('guest', 'replication')
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/hang_on_synchro_fail.test.lua b/test/replication/hang_on_synchro_fail.test.lua
index 6c3b09fab..f0d494eae 100644
--- a/test/replication/hang_on_synchro_fail.test.lua
+++ b/test/replication/hang_on_synchro_fail.test.lua
@@ -8,6 +8,7 @@ box.schema.user.grant('guest', 'replication')
_ = box.schema.space.create('sync', {is_sync=true})
_ = box.space.sync:create_index('pk')
+box.ctl.promote()
old_synchro_quorum = box.cfg.replication_synchro_quorum
box.cfg{replication_synchro_quorum=3}
@@ -54,4 +55,5 @@ box.cfg{replication_synchro_quorum=old_synchro_quorum,\
replication_synchro_timeout=old_synchro_timeout}
box.space.sync:drop()
box.schema.user.revoke('guest', 'replication')
+box.ctl.demote()
diff --git a/test/replication/qsync_advanced.result b/test/replication/qsync_advanced.result
index 94b19b1f2..72ac0c326 100644
--- a/test/replication/qsync_advanced.result
+++ b/test/replication/qsync_advanced.result
@@ -72,6 +72,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
-- Testcase body.
box.space.sync:insert{1} -- success
| ---
@@ -468,6 +471,9 @@ box.space.sync:select{} -- 1
box.cfg{read_only=false} -- promote replica to master
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
test_run:switch('default')
| ---
| - true
@@ -508,6 +514,9 @@ test_run:switch('default')
box.cfg{read_only=false}
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
test_run:switch('replica')
| ---
| - true
@@ -781,3 +790,6 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_advanced.test.lua b/test/replication/qsync_advanced.test.lua
index 058ece602..37c285b8d 100644
--- a/test/replication/qsync_advanced.test.lua
+++ b/test/replication/qsync_advanced.test.lua
@@ -30,6 +30,7 @@ test_run:switch('default')
box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
+box.ctl.promote()
-- Testcase body.
box.space.sync:insert{1} -- success
test_run:cmd('switch replica')
@@ -170,6 +171,7 @@ box.space.sync:select{} -- 1
test_run:switch('replica')
box.space.sync:select{} -- 1
box.cfg{read_only=false} -- promote replica to master
+box.ctl.promote()
test_run:switch('default')
box.cfg{read_only=true} -- demote master to replica
test_run:switch('replica')
@@ -181,6 +183,7 @@ box.space.sync:select{} -- 1, 2
-- Revert cluster configuration.
test_run:switch('default')
box.cfg{read_only=false}
+box.ctl.promote()
test_run:switch('replica')
box.cfg{read_only=true}
-- Testcase cleanup.
@@ -279,3 +282,4 @@ box.cfg{
replication_synchro_quorum = orig_synchro_quorum, \
replication_synchro_timeout = orig_synchro_timeout, \
}
+box.ctl.demote()
diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
index 7e711ba13..bbdfc42fe 100644
--- a/test/replication/qsync_basic.result
+++ b/test/replication/qsync_basic.result
@@ -14,6 +14,9 @@ s1.is_sync
pk = s1:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
box.begin() s1:insert({1}) s1:insert({2}) box.commit()
| ---
| ...
@@ -645,19 +648,12 @@ test_run:switch('default')
| ---
| - true
| ...
-box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
- | ---
- | ...
-f = fiber.create(function() box.space.sync:replace{1} end)
+box.ctl.demote()
| ---
| ...
-test_run:wait_lsn('replica', 'default')
+box.space.sync:replace{1}
| ---
- | ...
-
-test_run:switch('replica')
- | ---
- | - true
+ | - error: The synchronous transaction queue doesn't belong to any instance
| ...
function skip_row() return nil end
| ---
@@ -674,26 +670,22 @@ box.space.sync:replace{2}
box.space.sync:before_replace(nil, skip_row)
| ---
| ...
-assert(box.space.sync:get{2} == nil)
+assert(box.space.sync:get{1} == nil)
| ---
| - true
| ...
-assert(box.space.sync:get{1} ~= nil)
+assert(box.space.sync:get{2} == nil)
| ---
| - true
| ...
-
-test_run:switch('default')
+assert(box.info.lsn == old_lsn + 1)
| ---
| - true
| ...
-box.cfg{replication_synchro_quorum = 2}
+box.ctl.promote()
| ---
| ...
-test_run:wait_cond(function() return f:status() == 'dead' end)
- | ---
- | - true
- | ...
+
box.space.sync:truncate()
| ---
| ...
@@ -758,3 +750,6 @@ box.space.sync:drop()
box.schema.user.revoke('guest', 'replication')
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
index 75c9b222b..eac465e25 100644
--- a/test/replication/qsync_basic.test.lua
+++ b/test/replication/qsync_basic.test.lua
@@ -6,6 +6,7 @@
s1 = box.schema.create_space('test1', {is_sync = true})
s1.is_sync
pk = s1:create_index('pk')
+box.ctl.promote()
box.begin() s1:insert({1}) s1:insert({2}) box.commit()
s1:select{}
@@ -253,22 +254,18 @@ box.space.sync:count()
-- instances, but also works for local rows.
--
test_run:switch('default')
-box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
-f = fiber.create(function() box.space.sync:replace{1} end)
-test_run:wait_lsn('replica', 'default')
-
-test_run:switch('replica')
+box.ctl.demote()
+box.space.sync:replace{1}
function skip_row() return nil end
old_lsn = box.info.lsn
_ = box.space.sync:before_replace(skip_row)
box.space.sync:replace{2}
box.space.sync:before_replace(nil, skip_row)
+assert(box.space.sync:get{1} == nil)
assert(box.space.sync:get{2} == nil)
-assert(box.space.sync:get{1} ~= nil)
+assert(box.info.lsn == old_lsn + 1)
+box.ctl.promote()
-test_run:switch('default')
-box.cfg{replication_synchro_quorum = 2}
-test_run:wait_cond(function() return f:status() == 'dead' end)
box.space.sync:truncate()
--
@@ -301,3 +298,4 @@ test_run:cmd('delete server replica')
box.space.test:drop()
box.space.sync:drop()
box.schema.user.revoke('guest', 'replication')
+box.ctl.demote()
diff --git a/test/replication/qsync_errinj.result b/test/replication/qsync_errinj.result
index 635bcf939..cf1e30a90 100644
--- a/test/replication/qsync_errinj.result
+++ b/test/replication/qsync_errinj.result
@@ -35,6 +35,9 @@ _ = box.schema.space.create('sync', {is_sync = true, engine = engine})
_ = box.space.sync:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
--
-- gh-5100: slow ACK sending shouldn't stun replica for the
@@ -542,3 +545,6 @@ box.space.sync:drop()
box.schema.user.revoke('guest', 'super')
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_errinj.test.lua b/test/replication/qsync_errinj.test.lua
index 6a9fd3e1a..e7c85c58c 100644
--- a/test/replication/qsync_errinj.test.lua
+++ b/test/replication/qsync_errinj.test.lua
@@ -12,6 +12,7 @@ test_run:cmd('start server replica with wait=True, wait_load=True')
_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
_ = box.space.sync:create_index('pk')
+box.ctl.promote()
--
-- gh-5100: slow ACK sending shouldn't stun replica for the
@@ -222,3 +223,4 @@ test_run:cmd('delete server replica')
box.space.sync:drop()
box.schema.user.revoke('guest', 'super')
+box.ctl.demote()
diff --git a/test/replication/qsync_snapshots.result b/test/replication/qsync_snapshots.result
index cafdd63c8..ca418b168 100644
--- a/test/replication/qsync_snapshots.result
+++ b/test/replication/qsync_snapshots.result
@@ -57,6 +57,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
-- Testcase body.
box.space.sync:insert{1}
| ---
@@ -299,3 +302,6 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
diff --git a/test/replication/qsync_snapshots.test.lua b/test/replication/qsync_snapshots.test.lua
index 590610974..82c2e3f7c 100644
--- a/test/replication/qsync_snapshots.test.lua
+++ b/test/replication/qsync_snapshots.test.lua
@@ -23,6 +23,7 @@ test_run:switch('default')
box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
+box.ctl.promote()
-- Testcase body.
box.space.sync:insert{1}
box.space.sync:select{} -- 1
@@ -130,3 +131,4 @@ box.cfg{
replication_synchro_quorum = orig_synchro_quorum, \
replication_synchro_timeout = orig_synchro_timeout, \
}
+box.ctl.demote()
diff --git a/test/replication/qsync_with_anon.result b/test/replication/qsync_with_anon.result
index 6a2952a32..99c6fb902 100644
--- a/test/replication/qsync_with_anon.result
+++ b/test/replication/qsync_with_anon.result
@@ -57,6 +57,9 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
| ---
| ...
+box.ctl.promote()
+ | ---
+ | ...
-- Testcase body.
test_run:switch('default')
| ---
@@ -220,6 +223,9 @@ box.cfg{
}
| ---
| ...
+box.ctl.demote()
+ | ---
+ | ...
test_run:cleanup_cluster()
| ---
| ...
diff --git a/test/replication/qsync_with_anon.test.lua b/test/replication/qsync_with_anon.test.lua
index d7ecaa107..e73880ec7 100644
--- a/test/replication/qsync_with_anon.test.lua
+++ b/test/replication/qsync_with_anon.test.lua
@@ -22,6 +22,7 @@ test_run:switch('default')
box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
_ = box.space.sync:create_index('pk')
+box.ctl.promote()
-- Testcase body.
test_run:switch('default')
box.space.sync:insert{1} -- success
@@ -81,4 +82,5 @@ box.cfg{
replication_synchro_quorum = orig_synchro_quorum, \
replication_synchro_timeout = orig_synchro_timeout, \
}
+box.ctl.demote()
test_run:cleanup_cluster()
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index eb88b9420..977deeb40 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -47,6 +47,7 @@
"gh-5613-bootstrap-prefer-booted.test.lua": {},
"gh-6027-applier-error-show.test.lua": {},
"gh-6032-promote-wal-write.test.lua": {},
+ "gh-6034-limbo-ownership.test.lua": {},
"gh-6034-promote-bump-term.test.lua": {},
"gh-6057-qsync-confirm-async-no-wal.test.lua": {},
"gh-6094-rs-uuid-mismatch.test.lua": {},
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 08/12] txn_limbo: persist the latest effective promote in snapshot
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
` (6 preceding siblings ...)
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 09/12] replication: encode version in JOIN request Serge Petrenko via Tarantool-patches
` (5 subsequent siblings)
13 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Previously PROMOTE entries, just like CONFIRM and ROLLBACK were only
stored in WALs. This is because snapshots consist solely of confirmed
transactions, so there's nothing to CONFIRM or ROLLBACK.
PROMOTE has gained additional meaning recently: it pins limbo ownership
to a specific instance, rendering everyone else read-only. So now
PROMOTE information must be stored in snapshots as well.
Save the latest limbo state (owner id and latest confirmed lsn) to the
snapshot as a PROMOTE request.
Follow-up #6034
---
src/box/memtx_engine.c | 32 ++++++++++++++++++++++++++++++++
src/box/txn_limbo.c | 10 ++++++++++
src/box/txn_limbo.h | 7 +++++++
3 files changed, 49 insertions(+)
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index c662a3c8c..a2cfb2615 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -50,6 +50,7 @@
#include "schema.h"
#include "gc.h"
#include "raft.h"
+#include "txn_limbo.h"
/* sync snapshot every 16MB */
#define SNAP_SYNC_INTERVAL (1 << 24)
@@ -225,6 +226,22 @@ memtx_engine_recover_raft(const struct xrow_header *row)
return 0;
}
+static int
+memtx_engine_recover_synchro(const struct xrow_header *row)
+{
+ assert(row->type == IPROTO_PROMOTE);
+ struct synchro_request req;
+ if (xrow_decode_synchro(row, &req) != 0)
+ return -1;
+ /*
+ * Origin id cannot be deduced from row.replica_id in a checkpoint,
+ * because all its rows have a zero replica_id.
+ */
+ req.origin_id = req.replica_id;
+ txn_limbo_process(&txn_limbo, &req);
+ return 0;
+}
+
static int
memtx_engine_recover_snapshot_row(struct memtx_engine *memtx,
struct xrow_header *row, int *is_space_system)
@@ -233,6 +250,8 @@ memtx_engine_recover_snapshot_row(struct memtx_engine *memtx,
if (row->type != IPROTO_INSERT) {
if (row->type == IPROTO_RAFT)
return memtx_engine_recover_raft(row);
+ if (row->type == IPROTO_PROMOTE)
+ return memtx_engine_recover_synchro(row);
diag_set(ClientError, ER_UNKNOWN_REQUEST_TYPE,
(uint32_t) row->type);
return -1;
@@ -542,6 +561,7 @@ struct checkpoint {
struct vclock vclock;
struct xdir dir;
struct raft_request raft;
+ struct synchro_request synchro_state;
/**
* Do nothing, just touch the snapshot file - the
* checkpoint already exists.
@@ -567,6 +587,7 @@ checkpoint_new(const char *snap_dirname, uint64_t snap_io_rate_limit)
xdir_create(&ckpt->dir, snap_dirname, SNAP, &INSTANCE_UUID, &opts);
vclock_create(&ckpt->vclock);
box_raft_checkpoint_local(&ckpt->raft);
+ txn_limbo_checkpoint(&txn_limbo, &ckpt->synchro_state);
ckpt->touch = false;
return ckpt;
}
@@ -655,6 +676,15 @@ finish:
return rc;
}
+static int
+checkpoint_write_synchro(struct xlog *l, const struct synchro_request *req)
+{
+ struct xrow_header row;
+ char body[XROW_SYNCHRO_BODY_LEN_MAX];
+ xrow_encode_synchro(&row, body, req);
+ return checkpoint_write_row(l, &row);
+}
+
static int
checkpoint_f(va_list ap)
{
@@ -692,6 +722,8 @@ checkpoint_f(va_list ap)
}
if (checkpoint_write_raft(&snap, &ckpt->raft) != 0)
goto fail;
+ if (checkpoint_write_synchro(&snap, &ckpt->synchro_state) != 0)
+ goto fail;
if (xlog_flush(&snap) < 0)
goto fail;
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index d21f05557..239b6da76 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -301,6 +301,16 @@ complete:
return 0;
}
+void
+txn_limbo_checkpoint(const struct txn_limbo *limbo,
+ struct synchro_request *req)
+{
+ req->type = IPROTO_PROMOTE;
+ req->replica_id = limbo->owner_id;
+ req->lsn = limbo->confirmed_lsn;
+ req->term = limbo->promote_greatest_term;
+}
+
static void
txn_limbo_write_synchro(struct txn_limbo *limbo, uint16_t type, int64_t lsn,
uint64_t term)
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 801a1a0ee..442f2483e 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -311,6 +311,13 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
int
txn_limbo_wait_confirm(struct txn_limbo *limbo);
+/**
+ * Persist limbo state to a given synchro request.
+ */
+void
+txn_limbo_checkpoint(const struct txn_limbo *limbo,
+ struct synchro_request *req);
+
/**
* Write a PROMOTE request, which has the same effect as CONFIRM(@a lsn) and
* ROLLBACK(@a lsn + 1) combined.
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 09/12] replication: encode version in JOIN request
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
` (7 preceding siblings ...)
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 08/12] txn_limbo: persist the latest effective promote in snapshot Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-07-04 12:27 ` Vladislav Shpilevoy via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 10/12] replication: add META stage to JOIN Serge Petrenko via Tarantool-patches
` (4 subsequent siblings)
13 siblings, 1 reply; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
The replica's version will be needed once sending limbo and election
state snapshot is implemented.
Follow-up #6034
@TarantoolBot document
New field in JOIN request
JOIN request now comes with a new field: replica's version.
The field uses IPROTO_SERVER_VERSION key.
---
src/box/box.cc | 7 +++++--
src/box/xrow.c | 4 +++-
src/box/xrow.h | 25 +++++++++++++++----------
3 files changed, 23 insertions(+), 13 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 86c5967b9..b2c52bc54 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2522,7 +2522,9 @@ box_process_register(struct ev_io *io, struct xrow_header *header)
struct tt_uuid instance_uuid = uuid_nil;
struct vclock replica_vclock;
- xrow_decode_register_xc(header, &instance_uuid, &replica_vclock);
+ uint32_t replica_version_id;
+ xrow_decode_register_xc(header, &instance_uuid, &replica_vclock,
+ &replica_version_id);
if (!is_box_configured)
tnt_raise(ClientError, ER_LOADING);
@@ -2647,7 +2649,8 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
/* Decode JOIN request */
struct tt_uuid instance_uuid = uuid_nil;
- xrow_decode_join_xc(header, &instance_uuid);
+ uint32_t replica_version_id;
+ xrow_decode_join_xc(header, &instance_uuid, &replica_version_id);
/* Check that bootstrap has been finished */
if (!is_box_configured)
diff --git a/src/box/xrow.c b/src/box/xrow.c
index 16cb2484c..400e0456f 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -1639,10 +1639,12 @@ xrow_encode_join(struct xrow_header *row, const struct tt_uuid *instance_uuid)
return -1;
}
char *data = buf;
- data = mp_encode_map(data, 1);
+ data = mp_encode_map(data, 2);
data = mp_encode_uint(data, IPROTO_INSTANCE_UUID);
/* Greet the remote replica with our replica UUID */
data = xrow_encode_uuid(data, instance_uuid);
+ data = mp_encode_uint(data, IPROTO_SERVER_VERSION);
+ data = mp_encode_uint(data, tarantool_version_id());
assert(data <= buf + size);
row->body[0].iov_base = buf;
diff --git a/src/box/xrow.h b/src/box/xrow.h
index 055fd31e1..0a134d94d 100644
--- a/src/box/xrow.h
+++ b/src/box/xrow.h
@@ -471,15 +471,17 @@ xrow_encode_join(struct xrow_header *row, const struct tt_uuid *instance_uuid);
* Decode JOIN command.
* @param row Row to decode.
* @param[out] instance_uuid.
+ * @param[out] version_id.
*
* @retval 0 Success.
* @retval -1 Memory or format error.
*/
static inline int
-xrow_decode_join(struct xrow_header *row, struct tt_uuid *instance_uuid)
+xrow_decode_join(struct xrow_header *row, struct tt_uuid *instance_uuid,
+ uint32_t *version_id)
{
- return xrow_decode_subscribe(row, NULL, instance_uuid, NULL, NULL, NULL,
- NULL);
+ return xrow_decode_subscribe(row, NULL, instance_uuid, NULL, version_id,
+ NULL, NULL);
}
/**
@@ -487,15 +489,17 @@ xrow_decode_join(struct xrow_header *row, struct tt_uuid *instance_uuid)
* @param row Row to decode.
* @param[out] instance_uuid Instance uuid.
* @param[out] vclock Instance vclock.
+ * @param[out] version_id Replica version id.
+ *
* @retval 0 Success.
* @retval -1 Memory or format error.
*/
static inline int
xrow_decode_register(struct xrow_header *row, struct tt_uuid *instance_uuid,
- struct vclock *vclock)
+ struct vclock *vclock, uint32_t *version_id)
{
- return xrow_decode_subscribe(row, NULL, instance_uuid, vclock, NULL,
- NULL, NULL);
+ return xrow_decode_subscribe(row, NULL, instance_uuid, vclock,
+ version_id, NULL, NULL);
}
/**
@@ -960,18 +964,19 @@ xrow_encode_join_xc(struct xrow_header *row,
/** @copydoc xrow_decode_join. */
static inline void
-xrow_decode_join_xc(struct xrow_header *row, struct tt_uuid *instance_uuid)
+xrow_decode_join_xc(struct xrow_header *row, struct tt_uuid *instance_uuid,
+ uint32_t *version_id)
{
- if (xrow_decode_join(row, instance_uuid) != 0)
+ if (xrow_decode_join(row, instance_uuid, version_id) != 0)
diag_raise();
}
/** @copydoc xrow_decode_register. */
static inline void
xrow_decode_register_xc(struct xrow_header *row, struct tt_uuid *instance_uuid,
- struct vclock *vclock)
+ struct vclock *vclock, uint32_t *version_id)
{
- if (xrow_decode_register(row, instance_uuid, vclock) != 0)
+ if (xrow_decode_register(row, instance_uuid, vclock, version_id) != 0)
diag_raise();
}
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 10/12] replication: add META stage to JOIN
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
` (8 preceding siblings ...)
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 09/12] replication: encode version in JOIN request Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-07-04 12:28 ` Vladislav Shpilevoy via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 11/12] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
` (3 subsequent siblings)
13 siblings, 1 reply; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
The new META stage is part of server's response to a join request.
It's marked by IPROTO_JOIN_META and IPROTO_JOIN_SNAPSHOT requests and goes
before the actual snapshot data.
Follow-up #6034
---
src/box/applier.cc | 17 ++++++++++++++++-
src/box/box.cc | 5 +++--
src/box/iproto_constants.h | 2 ++
src/box/relay.cc | 19 ++++++++++++++++++-
src/box/relay.h | 4 +++-
5 files changed, 42 insertions(+), 5 deletions(-)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 07fe7f5c7..7abad3a64 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -447,12 +447,26 @@ applier_wait_snapshot(struct applier *applier)
xrow_decode_vclock_xc(&row, &replicaset.vclock);
}
+ coio_read_xrow(coio, ibuf, &row);
+ if (row.type == IPROTO_JOIN_META) {
+ /* Read additional metadata. Empty at the moment. */
+ do {
+ coio_read_xrow(coio, ibuf, &row);
+ if (iproto_type_is_error(row.type))
+ xrow_decode_error_xc(&row);
+ else if (row.type != IPROTO_JOIN_SNAPSHOT) {
+ tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE,
+ (uint32_t)row.type);
+ }
+ } while (row.type != IPROTO_JOIN_SNAPSHOT);
+ coio_read_xrow(coio, ibuf, &row);
+ }
+
/*
* Receive initial data.
*/
uint64_t row_count = 0;
while (true) {
- coio_read_xrow(coio, ibuf, &row);
applier->last_row_time = ev_monotonic_now(loop());
if (iproto_type_is_dml(row.type)) {
if (apply_snapshot_row(&row) != 0)
@@ -477,6 +491,7 @@ applier_wait_snapshot(struct applier *applier)
tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE,
(uint32_t) row.type);
}
+ coio_read_xrow(coio, ibuf, &row);
}
return row_count;
diff --git a/src/box/box.cc b/src/box/box.cc
index b2c52bc54..bc68ee4c8 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2501,7 +2501,7 @@ box_process_fetch_snapshot(struct ev_io *io, struct xrow_header *header)
/* Send the snapshot data to the instance. */
struct vclock start_vclock;
- relay_initial_join(io->fd, header->sync, &start_vclock);
+ relay_initial_join(io->fd, header->sync, &start_vclock, 0);
say_info("read-view sent.");
/* Remember master's vclock after the last request */
@@ -2699,7 +2699,8 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
* Initial stream: feed replica with dirty data from engines.
*/
struct vclock start_vclock;
- relay_initial_join(io->fd, header->sync, &start_vclock);
+ relay_initial_join(io->fd, header->sync, &start_vclock,
+ replica_version_id);
say_info("initial data sent.");
/**
diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
index 3c9edb7d2..247ca6f37 100644
--- a/src/box/iproto_constants.h
+++ b/src/box/iproto_constants.h
@@ -263,6 +263,8 @@ enum iproto_type {
IPROTO_FETCH_SNAPSHOT = 69,
/** REGISTER request to leave anonymous replication. */
IPROTO_REGISTER = 70,
+ IPROTO_JOIN_META = 71,
+ IPROTO_JOIN_SNAPSHOT = 72,
/** Vinyl run info stored in .index file */
VY_INDEX_RUN_INFO = 100,
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 60f527b7f..4ebe0fb06 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -392,7 +392,8 @@ relay_set_cord_name(int fd)
}
void
-relay_initial_join(int fd, uint64_t sync, struct vclock *vclock)
+relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
+ uint32_t replica_version_id)
{
struct relay *relay = relay_new(NULL);
if (relay == NULL)
@@ -432,6 +433,22 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock)
row.sync = sync;
coio_write_xrow(&relay->io, &row);
+ /*
+ * Version is present starting with 2.7.3, 2.8.2, 2.9.1
+ * All these versions know of additional META stage of initial join.
+ */
+ if (replica_version_id > 0) {
+ /* Mark the beginning of the metadata stream. */
+ row.type = IPROTO_JOIN_META;
+ coio_write_xrow(&relay->io, &row);
+
+ /* Empty at the moment. */
+
+ /* Mark the end of the metadata stream. */
+ row.type = IPROTO_JOIN_SNAPSHOT;
+ coio_write_xrow(&relay->io, &row);
+ }
+
/* Send read view to the replica. */
engine_join_xc(&ctx, &relay->stream);
}
diff --git a/src/box/relay.h b/src/box/relay.h
index 615ffb75d..112428ae8 100644
--- a/src/box/relay.h
+++ b/src/box/relay.h
@@ -116,9 +116,11 @@ relay_push_raft(struct relay *relay, const struct raft_request *req);
* @param fd client connection
* @param sync sync from incoming JOIN request
* @param vclock[out] vclock of the read view sent to the replica
+ * @param replica_version_id peer's version
*/
void
-relay_initial_join(int fd, uint64_t sync, struct vclock *vclock);
+relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
+ uint32_t replica_version_id);
/**
* Send final JOIN rows to the replica.
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 11/12] replication: send latest effective promote in initial join
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
` (9 preceding siblings ...)
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 10/12] replication: add META stage to JOIN Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-07-04 12:28 ` Vladislav Shpilevoy via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 12/12] replication: send current Raft term in join response Serge Petrenko via Tarantool-patches
` (2 subsequent siblings)
13 siblings, 1 reply; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
A joining instance may never receive the latest PROMOTE request, which
is the only source of information about the limbo owner. Send out the
latest limbo state (e.g. the latest applied PROMOTE request) together
with the initial join snapshot.
Follow-up #6034
---
src/box/applier.cc | 7 ++-
src/box/relay.cc | 9 ++-
test/replication/replica_rejoin.result | 77 ++++++++++++++----------
test/replication/replica_rejoin.test.lua | 50 +++++++--------
4 files changed, 86 insertions(+), 57 deletions(-)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 7abad3a64..482b9446a 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -454,7 +454,12 @@ applier_wait_snapshot(struct applier *applier)
coio_read_xrow(coio, ibuf, &row);
if (iproto_type_is_error(row.type))
xrow_decode_error_xc(&row);
- else if (row.type != IPROTO_JOIN_SNAPSHOT) {
+ else if (iproto_type_is_promote_request(row.type)) {
+ struct synchro_request req;
+ if (xrow_decode_synchro(&row, &req) != 0)
+ diag_raise();
+ txn_limbo_process(&txn_limbo, &req);
+ } else if (row.type != IPROTO_JOIN_SNAPSHOT) {
tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE,
(uint32_t)row.type);
}
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 4ebe0fb06..4b102a777 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -427,6 +427,9 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
if (txn_limbo_wait_confirm(&txn_limbo) != 0)
diag_raise();
+ struct synchro_request req;
+ txn_limbo_checkpoint(&txn_limbo, &req);
+
/* Respond to the JOIN request with the current vclock. */
struct xrow_header row;
xrow_encode_vclock_xc(&row, vclock);
@@ -442,7 +445,11 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
row.type = IPROTO_JOIN_META;
coio_write_xrow(&relay->io, &row);
- /* Empty at the moment. */
+ char body[XROW_SYNCHRO_BODY_LEN_MAX];
+ xrow_encode_synchro(&row, body, &req);
+ row.replica_id = req.replica_id;
+ row.sync = sync;
+ coio_write_xrow(&relay->io, &row);
/* Mark the end of the metadata stream. */
row.type = IPROTO_JOIN_SNAPSHOT;
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index 843333a19..e489c150a 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -7,10 +7,19 @@ test_run = env.new()
log = require('log')
---
...
-engine = test_run:get_cfg('engine')
+test_run:cmd("create server master with script='replication/master1.lua'")
---
+- true
...
-test_run:cleanup_cluster()
+test_run:cmd("start server master")
+---
+- true
+...
+test_run:switch("master")
+---
+- true
+...
+engine = test_run:get_cfg('engine')
---
...
--
@@ -43,7 +52,7 @@ _ = box.space.test:insert{3}
---
...
-- Join a replica, then stop it.
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica_rejoin.lua'")
+test_run:cmd("create server replica with rpl_master=master, script='replication/replica_rejoin.lua'")
---
- true
...
@@ -65,7 +74,7 @@ box.space.test:select()
- [2]
- [3]
...
-test_run:cmd("switch default")
+test_run:cmd("switch master")
---
- true
...
@@ -75,7 +84,7 @@ test_run:cmd("stop server replica")
...
-- Restart the server to purge the replica from
-- the garbage collection state.
-test_run:cmd("restart server default")
+test_run:cmd("restart server master")
box.cfg{wal_cleanup_delay = 0}
---
...
@@ -146,7 +155,7 @@ box.space.test:select()
- [20]
- [30]
...
-test_run:cmd("switch default")
+test_run:cmd("switch master")
---
- true
...
@@ -154,7 +163,7 @@ test_run:cmd("switch default")
for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end
---
...
-vclock = test_run:get_vclock('default')
+vclock = test_run:get_vclock('master')
---
...
vclock[0] = nil
@@ -191,7 +200,7 @@ box.space.test:replace{1, 2, 3} -- bumps LSN on the replica
---
- [1, 2, 3]
...
-test_run:cmd("switch default")
+test_run:cmd("switch master")
---
- true
...
@@ -199,7 +208,7 @@ test_run:cmd("stop server replica")
---
- true
...
-test_run:cmd("restart server default")
+test_run:cmd("restart server master")
box.cfg{wal_cleanup_delay = 0}
---
...
@@ -253,7 +262,7 @@ box.space.test:select()
-- from the replica.
--
-- Bootstrap a new replica.
-test_run:cmd("switch default")
+test_run:cmd("switch master")
---
- true
...
@@ -295,7 +304,7 @@ box.cfg{replication = ''}
---
...
-- Bump vclock on the master.
-test_run:cmd("switch default")
+test_run:cmd("switch master")
---
- true
...
@@ -317,15 +326,15 @@ vclock = test_run:get_vclock('replica')
vclock[0] = nil
---
...
-_ = test_run:wait_vclock('default', vclock)
+_ = test_run:wait_vclock('master', vclock)
---
...
-- Restart the master and force garbage collection.
-test_run:cmd("switch default")
+test_run:cmd("switch master")
---
- true
...
-test_run:cmd("restart server default")
+test_run:cmd("restart server master")
box.cfg{wal_cleanup_delay = 0}
---
...
@@ -373,7 +382,7 @@ vclock = test_run:get_vclock('replica')
vclock[0] = nil
---
...
-_ = test_run:wait_vclock('default', vclock)
+_ = test_run:wait_vclock('master', vclock)
---
...
-- Restart the replica. It should successfully rebootstrap.
@@ -396,38 +405,42 @@ test_run:cmd("switch default")
---
- true
...
-box.cfg{replication = ''}
+test_run:cmd("stop server replica")
---
+- true
...
-test_run:cmd("stop server replica")
+test_run:cmd("delete server replica")
---
- true
...
-test_run:cmd("cleanup server replica")
+test_run:cmd("stop server master")
---
- true
...
-test_run:cmd("delete server replica")
+test_run:cmd("delete server master")
---
- true
...
-test_run:cleanup_cluster()
+--
+-- gh-4107: rebootstrap fails if the replica was deleted from
+-- the cluster on the master.
+--
+test_run:cmd("create server master with script='replication/master1.lua'")
---
+- true
...
-box.space.test:drop()
+test_run:cmd("start server master")
---
+- true
...
-box.schema.user.revoke('guest', 'replication')
+test_run:switch("master")
---
+- true
...
---
--- gh-4107: rebootstrap fails if the replica was deleted from
--- the cluster on the master.
---
box.schema.user.grant('guest', 'replication')
---
...
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica_uuid.lua'")
+test_run:cmd("create server replica with rpl_master=master, script='replication/replica_uuid.lua'")
---
- true
...
@@ -462,11 +475,11 @@ box.space._cluster:get(2) ~= nil
---
- true
...
-test_run:cmd("stop server replica")
+test_run:switch("default")
---
- true
...
-test_run:cmd("cleanup server replica")
+test_run:cmd("stop server replica")
---
- true
...
@@ -474,9 +487,11 @@ test_run:cmd("delete server replica")
---
- true
...
-box.schema.user.revoke('guest', 'replication')
+test_run:cmd("stop server master")
---
+- true
...
-test_run:cleanup_cluster()
+test_run:cmd("delete server master")
---
+- true
...
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index c3ba9bf3f..2563177cf 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -1,9 +1,11 @@
env = require('test_run')
test_run = env.new()
log = require('log')
-engine = test_run:get_cfg('engine')
-test_run:cleanup_cluster()
+test_run:cmd("create server master with script='replication/master1.lua'")
+test_run:cmd("start server master")
+test_run:switch("master")
+engine = test_run:get_cfg('engine')
--
-- gh-5806: this replica_rejoin test relies on the wal cleanup fiber
@@ -23,17 +25,17 @@ _ = box.space.test:insert{2}
_ = box.space.test:insert{3}
-- Join a replica, then stop it.
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica_rejoin.lua'")
+test_run:cmd("create server replica with rpl_master=master, script='replication/replica_rejoin.lua'")
test_run:cmd("start server replica")
test_run:cmd("switch replica")
box.info.replication[1].upstream.status == 'follow' or log.error(box.info)
box.space.test:select()
-test_run:cmd("switch default")
+test_run:cmd("switch master")
test_run:cmd("stop server replica")
-- Restart the server to purge the replica from
-- the garbage collection state.
-test_run:cmd("restart server default")
+test_run:cmd("restart server master")
box.cfg{wal_cleanup_delay = 0}
-- Make some checkpoints to remove old xlogs.
@@ -58,11 +60,11 @@ box.info.replication[2].downstream.vclock ~= nil or log.error(box.info)
test_run:cmd("switch replica")
box.info.replication[1].upstream.status == 'follow' or log.error(box.info)
box.space.test:select()
-test_run:cmd("switch default")
+test_run:cmd("switch master")
-- Make sure the replica follows new changes.
for i = 10, 30, 10 do box.space.test:update(i, {{'!', 1, i}}) end
-vclock = test_run:get_vclock('default')
+vclock = test_run:get_vclock('master')
vclock[0] = nil
_ = test_run:wait_vclock('replica', vclock)
test_run:cmd("switch replica")
@@ -76,9 +78,9 @@ box.space.test:select()
-- Check that rebootstrap is NOT initiated unless the replica
-- is strictly behind the master.
box.space.test:replace{1, 2, 3} -- bumps LSN on the replica
-test_run:cmd("switch default")
+test_run:cmd("switch master")
test_run:cmd("stop server replica")
-test_run:cmd("restart server default")
+test_run:cmd("restart server master")
box.cfg{wal_cleanup_delay = 0}
checkpoint_count = box.cfg.checkpoint_count
box.cfg{checkpoint_count = 1}
@@ -99,7 +101,7 @@ box.space.test:select()
--
-- Bootstrap a new replica.
-test_run:cmd("switch default")
+test_run:cmd("switch master")
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
test_run:cleanup_cluster()
@@ -113,17 +115,17 @@ box.cfg{replication = replica_listen}
test_run:cmd("switch replica")
box.cfg{replication = ''}
-- Bump vclock on the master.
-test_run:cmd("switch default")
+test_run:cmd("switch master")
box.space.test:replace{1}
-- Bump vclock on the replica.
test_run:cmd("switch replica")
for i = 1, 10 do box.space.test:replace{2} end
vclock = test_run:get_vclock('replica')
vclock[0] = nil
-_ = test_run:wait_vclock('default', vclock)
+_ = test_run:wait_vclock('master', vclock)
-- Restart the master and force garbage collection.
-test_run:cmd("switch default")
-test_run:cmd("restart server default")
+test_run:cmd("switch master")
+test_run:cmd("restart server master")
box.cfg{wal_cleanup_delay = 0}
replica_listen = test_run:cmd("eval replica 'return box.cfg.listen'")
replica_listen ~= nil
@@ -139,7 +141,7 @@ test_run:cmd("switch replica")
for i = 1, 10 do box.space.test:replace{2} end
vclock = test_run:get_vclock('replica')
vclock[0] = nil
-_ = test_run:wait_vclock('default', vclock)
+_ = test_run:wait_vclock('master', vclock)
-- Restart the replica. It should successfully rebootstrap.
test_run:cmd("restart server replica with args='true'")
box.space.test:select()
@@ -148,20 +150,20 @@ box.space.test:replace{2}
-- Cleanup.
test_run:cmd("switch default")
-box.cfg{replication = ''}
test_run:cmd("stop server replica")
-test_run:cmd("cleanup server replica")
test_run:cmd("delete server replica")
-test_run:cleanup_cluster()
-box.space.test:drop()
-box.schema.user.revoke('guest', 'replication')
+test_run:cmd("stop server master")
+test_run:cmd("delete server master")
--
-- gh-4107: rebootstrap fails if the replica was deleted from
-- the cluster on the master.
--
+test_run:cmd("create server master with script='replication/master1.lua'")
+test_run:cmd("start server master")
+test_run:switch("master")
box.schema.user.grant('guest', 'replication')
-test_run:cmd("create server replica with rpl_master=default, script='replication/replica_uuid.lua'")
+test_run:cmd("create server replica with rpl_master=master, script='replication/replica_uuid.lua'")
start_cmd = string.format("start server replica with args='%s'", require('uuid').new())
box.space._cluster:get(2) == nil
test_run:cmd(start_cmd)
@@ -170,8 +172,8 @@ test_run:cmd("cleanup server replica")
box.space._cluster:delete(2) ~= nil
test_run:cmd(start_cmd)
box.space._cluster:get(2) ~= nil
+test_run:switch("default")
test_run:cmd("stop server replica")
-test_run:cmd("cleanup server replica")
test_run:cmd("delete server replica")
-box.schema.user.revoke('guest', 'replication')
-test_run:cleanup_cluster()
+test_run:cmd("stop server master")
+test_run:cmd("delete server master")
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* [Tarantool-patches] [PATCH v3 12/12] replication: send current Raft term in join response
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
` (10 preceding siblings ...)
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 11/12] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
@ 2021-06-28 22:12 ` Serge Petrenko via Tarantool-patches
2021-07-04 12:29 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-04 22:41 ` [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Vladislav Shpilevoy via Tarantool-patches
2021-08-06 8:31 ` Kirill Yukhin via Tarantool-patches
13 siblings, 1 reply; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-28 22:12 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Make Raft nodes send out their latest persisted term to joining
replicas.
Follow-up #6034
---
src/box/applier.cc | 5 +++++
src/box/relay.cc | 6 ++++++
2 files changed, 11 insertions(+)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 482b9446a..978383e64 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -459,6 +459,11 @@ applier_wait_snapshot(struct applier *applier)
if (xrow_decode_synchro(&row, &req) != 0)
diag_raise();
txn_limbo_process(&txn_limbo, &req);
+ } else if (iproto_type_is_raft_request(row.type)) {
+ struct raft_request req;
+ if (xrow_decode_raft(&row, &req, NULL) != 0)
+ diag_raise();
+ box_raft_recover(&req);
} else if (row.type != IPROTO_JOIN_SNAPSHOT) {
tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE,
(uint32_t)row.type);
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 4b102a777..70f1a045b 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -428,7 +428,9 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
diag_raise();
struct synchro_request req;
+ struct raft_request raft_req;
txn_limbo_checkpoint(&txn_limbo, &req);
+ box_raft_checkpoint_local(&raft_req);
/* Respond to the JOIN request with the current vclock. */
struct xrow_header row;
@@ -451,6 +453,10 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
row.sync = sync;
coio_write_xrow(&relay->io, &row);
+ xrow_encode_raft(&row, &fiber()->gc, &raft_req);
+ row.sync = sync;
+ coio_write_xrow(&relay->io, &row);
+
/* Mark the end of the metadata stream. */
row.type = IPROTO_JOIN_SNAPSHOT;
coio_write_xrow(&relay->io, &row);
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 01/12] replication: always send raft state to subscribers
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 01/12] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
@ 2021-07-04 12:12 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-09 9:43 ` Serge Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-04 12:12 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Hi! Thanks for the patch!
On 29.06.2021 00:12, Serge Petrenko via Tarantool-patches wrote:
> Tarantool used to send out raft state on subscribe only when raft was
> enabled. This was a safeguard against partially-upgraded clusters, where
> some nodes had no clue about Raft messages and couldn't handle them
> properly.
>
> Actually, Raft state should be sent out always. For example, promote
> will be changed to bump Raft term even when Raft is disabled, and it's
> important that everyone in cluster has the same term for the sake of promote
> at least.
>
> So, send out Raft state to every subscriber with version >= 2.6.0
> (that's when Raft was introduced).
> Do the same for Raft broadcasts. They should be sent only to replicas
> with version >= 2.6.0
>
> Closes #5438
> ---
> src/box/box.cc | 11 ++--
> src/box/relay.cc | 4 +-
> test/replication/gh-5438-raft-state.result | 63 ++++++++++++++++++++
> test/replication/gh-5438-raft-state.test.lua | 28 +++++++++
I propose to rename raft -> election in the test name. To be
consistent with the existing election tests. Also it simplifies
running all of them by doing `python test-run.py election`.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 04/12] box: make promote always bump the term
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 04/12] box: make promote always bump the term Serge Petrenko via Tarantool-patches
@ 2021-07-04 12:14 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:26 ` Serge Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-04 12:14 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the patch!
Did you think about making it NOP when the node is already a leader
(even in manual/off mode)? The current solution is all good except
that it makes the current leader temporary read-only until it wins
the election again, which looks strange. I would say "unexpected" for
users.
See 4 comments below.
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 6a0950f44..ce37b307d 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1687,16 +1687,19 @@ box_promote(void)
> rc = -1;
> } else {
> promote:
> - /* We cannot possibly get here in a volatile state. */
> - assert(box_raft()->volatile_term == box_raft()->term);
> - txn_limbo_write_promote(&txn_limbo, wait_lsn,
> - box_raft()->term);
> + if (try_wait) {
> + raft_new_term(box_raft());
> + if (box_raft_wait_persisted() < 0)
1. What if during term WAL write another node also started promote,
won the elections and delivered the promote to us? I suppose after
the WAL write we will silently write PROMOTE for the term which
was won by somebody else, right? Can it be covered by a test?
> + return -1;
> + }
> + uint64_t term = box_raft()->term;
> + txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
> struct synchro_request req = {
> .type = IPROTO_PROMOTE,
> .replica_id = former_leader_id,
> .origin_id = instance_id,
> .lsn = wait_lsn,
> - .term = box_raft()->term,
> + .term = term,
> };
> txn_limbo_process(&txn_limbo, &req);
> assert(txn_limbo_is_empty(&txn_limbo));
> diff --git a/src/box/raft.c b/src/box/raft.c
> index 7f787c0c5..17caf6f54 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -354,6 +354,42 @@ box_raft_wait_leader_found(void)
> return 0;
> }
>
> +struct raft_wait_persisted_data {
> + struct fiber *waiter;
> + uint64_t term;
> +};
> +
> +static int
> +box_raft_wait_persisted_f(struct trigger *trig, void *event)
> +{
> + struct raft *raft = event;
> + struct raft_wait_persisted_data *data = trig->data;
> + if (raft->term >= data->term)
> + fiber_wakeup(data->waiter);
> + return 0;
> +}
> +
> +int
> +box_raft_wait_persisted(void)
> +{
> + if (box_raft()->term == box_raft()->volatile_term)
2. Since it only waits for term being persisted, I would rather
call it 'wait_term_persisted'. Because there is also vote, and
you do not look at it.
> + return 0;
> + struct raft_wait_persisted_data data = {
> + .waiter = fiber(),
> + .term = box_raft()->volatile_term,
> + };
> + struct trigger trig;
> + trigger_create(&trig, box_raft_wait_persisted_f, &data, NULL);
> + raft_on_update(box_raft(), &trig);
> + fiber_yield();
3. What about spurious wakeups? I could call fiber.wakeup() from
Lua on this fiber.
> + trigger_clear(&trig);
> + if (fiber_is_cancelled()) {
> + diag_set(FiberIsCancelled);
> + return -1;
> + }
> + return 0;
> +}
> diff --git a/test/replication/gh-4114-local-space-replication.result b/test/replication/gh-4114-local-space-replication.result
> index 9b63a4b99..e71eb60a8 100644
> --- a/test/replication/gh-4114-local-space-replication.result
> +++ b/test/replication/gh-4114-local-space-replication.result
> @@ -45,9 +45,8 @@ test_run:cmd('switch replica')
> | ---
> | - true
> | ...
> -box.info.vclock[0]
> +a = box.info.vclock[0] or 0
> | ---
> - | - null
> | ...
> box.cfg{checkpoint_count=1}
> | ---
> @@ -77,9 +76,9 @@ box.space.test:insert{3}
> | - [3]
> | ...
>
> -box.info.vclock[0]
> +assert(box.info.vclock[0] == a + 3)
> | ---
> - | - 3
> + | - true
4. Why do you need these changes? I reverted this test and it passed.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 06/12] box: allow calling promote on a candidate
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 06/12] box: allow calling promote on a candidate Serge Petrenko via Tarantool-patches
@ 2021-07-04 12:14 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:26 ` Serge Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-04 12:14 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the patch!
Can you please add a test for that?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
@ 2021-07-04 12:27 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-04 12:27 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for working on this!
See 11 comments below.
1. On this commit a lot of tests fail. Why?
2. On top of this commit and on top of the branch too I tried to
promote a candidate and got a strange error in the logs, although
the promotion was successful:
--
-- Instance 1
--
-- Step 1
box.cfg{
listen = 3313,
election_mode = 'candidate',
replication_synchro_quorum = 2,
replication = {'localhost:3314', 'localhost:3313'}
}
box.schema.user.grant('guest', 'super')
--
-- Instance 2
--
-- Step 2
box.cfg{
listen = 3314,
election_mode = 'voter',
replication_synchro_quorum = 2,
replication = {'localhost:3314', 'localhost:3313'},
read_only = true,
}
-- Step 3
box.cfg{read_only = false, election_mode = 'candidate'}
-- Step 4
box.ctl.promote()
main/112/raft_worker box.cc:1538 E> ER_UNSUPPORTED: box.ctl.promote does not support simultaneous invocations
---
...
3. On top of the branch I tried demote on a leader with election mode
candidate 2 times. Demote didn't do anything on the second invocation.
The test:
--
-- Instance 1
--
-- Step 1
box.cfg{
listen = 3313,
election_mode = 'candidate',
replication_synchro_quorum = 2,
replication = {'localhost:3314', 'localhost:3313'}
}
box.schema.user.grant('guest', 'super')
-- Step 3
-- I demote self but become leader again because I am a
-- candidate. This is fine.
box.ctl.demote()
-- Step 4
tarantool> box.info.election
---
- state: leader
vote: 1
leader: 1
term: 3
...
-- Step 5
-- But now I try to demote again and nothing happens.
-- I expected the same behaviour - bump term and become
-- a leader again.
tarantool> box.ctl.demote()
---
...
tarantool> box.info.election
---
- state: leader
vote: 1
leader: 1
term: 3 <- the term is the same, so demote didn't do anything.
...
--
-- Instance 2
--
-- Step 2
box.cfg{
listen = 3314,
election_mode = 'voter',
replication_synchro_quorum = 2,
replication = {'localhost:3314', 'localhost:3313'},
read_only = true,
}
4. I tried switching between candidate and manual, trying promote/demote
in different combinations and got promote not doing anything. The test
is the same as above but steps >= 3 are different:
tarantool> box.ctl.demote()
2021-07-04 14:22:49.550 [588] main/103/interactive I> RAFT: bump term to 3, follow
2021-07-04 14:22:49.550 [588] main/113/raft_worker I> RAFT: persisted state {term: 3}
2021-07-04 14:22:49.550 [588] main/113/raft_worker I> RAFT: vote for 1, follow
2021-07-04 14:22:49.551 [588] main/113/raft_worker I> RAFT: persisted state {term: 3, vote: 1}
2021-07-04 14:22:49.551 [588] main/113/raft_worker I> RAFT: enter candidate state with 1 self vote
---
...
tarantool> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, state: follower} from 2
2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, vote: 1, state: follower} from 2
2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: enter leader state with quorum 2
2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, vote: 1, state: follower} from 2
2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: vote request is skipped - the leader is already known - 1
2021-07-04 14:22:49.553 [588] relay/[::1]:54006/101/main I> recover from `./00000000000000000000.xlog'
---
...
tarantool> box.cfg{election_mode='manual'}
2021-07-04 14:22:55.391 [588] main/103/interactive I> set 'election_mode' configuration option to "manual"
---
...
tarantool> box.info.election
---
- state: follower
vote: 1
leader: 0
term: 3
...
tarantool> box.ctl.promote()
---
...
tarantool> box.info.election
---
- state: follower
vote: 1
leader: 0
term: 3
...
As you can see, the last promote() didn't do anything. The node
stayed a follower, and no errors were raised.
From the comments 2-4 it seems to me the complexity of
box_clear_synchro_queue()/box_promote() is getting out of control with
the number of ifs and flags, sometimes having unexpected meaning (for
example, try_wait flag causes term bump somewhy). It might be good to
split it into separate functions and make promotion/demotion out of each
election mode linear and straight.
> @TarantoolBot document
> Title: box.ctl.demote
>
> `box.ctl.demote()` is a new function, which works exactly like
> `box.ctl.promote()`, with one exception that it results in the instance
> writing DEMOTE request to WAL instead of a PROMOTE request.
>
> A DEMOTE request (DEMOTE = 32) copies PROMOTE behaviour (it clears the
> limbo as well), but clears the synchronous transaction queue ownership instead
> of assigning it to a new instance.
5. It is worth mentioning that demote can only be called on a master.
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 1d894be97..86c5967b9 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1674,15 +1680,22 @@ box_promote(void)
> rc = -1;
> } else {
> promote:
> - if (try_wait) {
> + if (try_wait || demote) {
> raft_new_term(box_raft());
> if (box_raft_wait_persisted() < 0)
> return -1;
6. What if now some other node won the elections of the term
you just bumped? Then you wouldn't have the right to perform
DEMOTE as the term is not yours.
> }
> uint64_t term = box_raft()->term;
> - txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
> + if (demote) {
> + txn_limbo_write_demote(&txn_limbo, wait_lsn,
> + term);
> + } else {
> + txn_limbo_write_promote(&txn_limbo, wait_lsn,
> + term);
> + }
> + uint16_t type = demote ? IPROTO_DEMOTE : IPROTO_PROMOTE;
> struct synchro_request req = {
> - .type = IPROTO_PROMOTE,
> + .type = type,
> .replica_id = former_leader_id,
> .origin_id = instance_id,
> .lsn = wait_lsn,
> diff --git a/test/replication/gh-5446-qsync-eval-quorum.test.lua b/test/replication/gh-5446-qsync-eval-quorum.test.lua
> index 6b9e324ed..b969df836 100644
> --- a/test/replication/gh-5446-qsync-eval-quorum.test.lua
> +++ b/test/replication/gh-5446-qsync-eval-quorum.test.lua
> @@ -135,3 +136,5 @@ box.cfg{
> replication_synchro_quorum = old_synchro_quorum, \
> replication_synchro_timeout = old_synchro_timeout, \
> }
> +box.ctl.demote()
> +
7. Git diff highlights this new line with red as unnecessary.
> diff --git a/test/replication/gh-6034-limbo-ownership.result b/test/replication/gh-6034-limbo-ownership.result
> new file mode 100644
> index 000000000..e412b8d53
> --- /dev/null
> +++ b/test/replication/gh-6034-limbo-ownership.result
8. Could you please write qsync tests with gh-####-qsync-...
pattern? It is useful when you want to run all qsync tests
and can do `python test-run.py qsync` then.
> @@ -0,0 +1,189 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +fiber = require('fiber')
9. Fiber is not used in the test.
> + | ---
> + | ...
> +
<...>
> +assert(box.info.synchro.queue.owner == test_run:eval('default', 'return box.info.id')[1])
10. There is test_run:get_server_id().
> + | ---
> + | - true
> + | ...
> +box.space.async:insert{2} -- failure.
> + | ---
> + | - error: Can't modify data because this instance is in read-only mode.
> + | ...
> +
> +-- Promotion on the other node. Default should become ro.
> +box.ctl.promote()
> + | ---
> + | ...
> +assert(not box.info.ro)
> + | ---
> + | - true
> + | ...
> +assert(box.info.synchro.queue.owner == box.info.id)
> + | ---
> + | - true
> + | ...
> +box.space.sync:insert{2} -- success.
> + | ---
> + | - [2]
> + | ...
> +
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +assert(box.info.ro)
> + | ---
> + | - true
> + | ...
> +assert(box.info.synchro.queue.owner == test_run:eval('replica', 'return box.info.id')[1])
11. Ditto.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 09/12] replication: encode version in JOIN request
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 09/12] replication: encode version in JOIN request Serge Petrenko via Tarantool-patches
@ 2021-07-04 12:27 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-04 12:27 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Good job on the patch!
On 29.06.2021 00:12, Serge Petrenko via Tarantool-patches wrote:
> The replica's version will be needed once sending limbo and election
> state snapshot is implemented.
>
> Follow-up #6034
>
> @TarantoolBot document
> New field in JOIN request
You need to use `Title:` prefix. Otherwise it is not considered a
valid doc request.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 10/12] replication: add META stage to JOIN
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 10/12] replication: add META stage to JOIN Serge Petrenko via Tarantool-patches
@ 2021-07-04 12:28 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-04 12:28 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the patch!
See 3 comments below.
On 29.06.2021 00:12, Serge Petrenko via Tarantool-patches wrote:
> The new META stage is part of server's response to a join request.
> It's marked by IPROTO_JOIN_META and IPROTO_JOIN_SNAPSHOT requests and goes
> before the actual snapshot data.
>
> Follow-up #6034
1. You probably should document the new keys and their meaning. As well
as what is sent between them.
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 07fe7f5c7..7abad3a64 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -447,12 +447,26 @@ applier_wait_snapshot(struct applier *applier)
> xrow_decode_vclock_xc(&row, &replicaset.vclock);
> }
>
> + coio_read_xrow(coio, ibuf, &row);
> + if (row.type == IPROTO_JOIN_META) {
> + /* Read additional metadata. Empty at the moment. */
> + do {
> + coio_read_xrow(coio, ibuf, &row);
> + if (iproto_type_is_error(row.type))
> + xrow_decode_error_xc(&row);
> + else if (row.type != IPROTO_JOIN_SNAPSHOT) {
2. If one branch of `if` got {}, then all branches should get it too.
> + tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE,
> + (uint32_t)row.type);
> + }
> + } while (row.type != IPROTO_JOIN_SNAPSHOT);
> + coio_read_xrow(coio, ibuf, &row);
> + }
> diff --git a/src/box/box.cc b/src/box/box.cc
> index b2c52bc54..bc68ee4c8 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -2501,7 +2501,7 @@ box_process_fetch_snapshot(struct ev_io *io, struct xrow_header *header)
>
> /* Send the snapshot data to the instance. */
> struct vclock start_vclock;
> - relay_initial_join(io->fd, header->sync, &start_vclock);
> + relay_initial_join(io->fd, header->sync, &start_vclock, 0);
> say_info("read-view sent.");
>
> /* Remember master's vclock after the last request */
> @@ -2699,7 +2699,8 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
> * Initial stream: feed replica with dirty data from engines.
> */
> struct vclock start_vclock;
> - relay_initial_join(io->fd, header->sync, &start_vclock);
> + relay_initial_join(io->fd, header->sync, &start_vclock,
> + replica_version_id);
3. Shouldn't changes to this file be in the previous commit?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 11/12] replication: send latest effective promote in initial join
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 11/12] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
@ 2021-07-04 12:28 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-04 12:28 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the patch!
> diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
> index 843333a19..e489c150a 100644
> --- a/test/replication/replica_rejoin.result
> +++ b/test/replication/replica_rejoin.result
*. Why did you change this test?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 12/12] replication: send current Raft term in join response
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 12/12] replication: send current Raft term in join response Serge Petrenko via Tarantool-patches
@ 2021-07-04 12:29 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-04 12:29 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the patch!
Could you please explain in the commit message why do we send it?
(Because otherwise limbo's term becomes bigger than raft's, and that
could lead to data loss via the nopification).
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 01/12] replication: always send raft state to subscribers
2021-07-04 12:12 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-09 9:43 ` Serge Petrenko via Tarantool-patches
0 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-09 9:43 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
04.07.2021 15:12, Vladislav Shpilevoy пишет:
> Hi! Thanks for the patch!
>
> On 29.06.2021 00:12, Serge Petrenko via Tarantool-patches wrote:
>> Tarantool used to send out raft state on subscribe only when raft was
>> enabled. This was a safeguard against partially-upgraded clusters, where
>> some nodes had no clue about Raft messages and couldn't handle them
>> properly.
>>
>> Actually, Raft state should be sent out always. For example, promote
>> will be changed to bump Raft term even when Raft is disabled, and it's
>> important that everyone in cluster has the same term for the sake of promote
>> at least.
>>
>> So, send out Raft state to every subscriber with version >= 2.6.0
>> (that's when Raft was introduced).
>> Do the same for Raft broadcasts. They should be sent only to replicas
>> with version >= 2.6.0
>>
>> Closes #5438
>> ---
>> src/box/box.cc | 11 ++--
>> src/box/relay.cc | 4 +-
>> test/replication/gh-5438-raft-state.result | 63 ++++++++++++++++++++
>> test/replication/gh-5438-raft-state.test.lua | 28 +++++++++
> I propose to rename raft -> election in the test name. To be
> consistent with the existing election tests. Also it simplifies
> running all of them by doing `python test-run.py election`.
Thanks for the review!
Sure, fixed.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 04/12] box: make promote always bump the term
2021-07-04 12:14 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-14 18:26 ` Serge Petrenko via Tarantool-patches
0 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:26 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
04.07.2021 15:14, Vladislav Shpilevoy пишет:
> Thanks for the patch!
Thanks for the review!
> Did you think about making it NOP when the node is already a leader
> (even in manual/off mode)? The current solution is all good except
> that it makes the current leader temporary read-only until it wins
> the election again, which looks strange. I would say "unexpected" for
> users.
Sure, why not. I'll address it in a separate commit.
> See 4 comments below.
>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 6a0950f44..ce37b307d 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1687,16 +1687,19 @@ box_promote(void)
>> rc = -1;
>> } else {
>> promote:
>> - /* We cannot possibly get here in a volatile state. */
>> - assert(box_raft()->volatile_term == box_raft()->term);
>> - txn_limbo_write_promote(&txn_limbo, wait_lsn,
>> - box_raft()->term);
>> + if (try_wait) {
>> + raft_new_term(box_raft());
>> + if (box_raft_wait_persisted() < 0)
> 1. What if during term WAL write another node also started promote,
> won the elections and delivered the promote to us? I suppose after
> the WAL write we will silently write PROMOTE for the term which
> was won by somebody else, right? Can it be covered by a test?
Yep, need to handle that.
>> + return -1;
>> + }
>> + uint64_t term = box_raft()->term;
>> + txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
>> struct synchro_request req = {
>> .type = IPROTO_PROMOTE,
>> .replica_id = former_leader_id,
>> .origin_id = instance_id,
>> .lsn = wait_lsn,
>> - .term = box_raft()->term,
>> + .term = term,
>> };
>> txn_limbo_process(&txn_limbo, &req);
>> assert(txn_limbo_is_empty(&txn_limbo));
>> diff --git a/src/box/raft.c b/src/box/raft.c
>> index 7f787c0c5..17caf6f54 100644
>> --- a/src/box/raft.c
>> +++ b/src/box/raft.c
>> @@ -354,6 +354,42 @@ box_raft_wait_leader_found(void)
>> return 0;
>> }
>>
>> +struct raft_wait_persisted_data {
>> + struct fiber *waiter;
>> + uint64_t term;
>> +};
>> +
>> +static int
>> +box_raft_wait_persisted_f(struct trigger *trig, void *event)
>> +{
>> + struct raft *raft = event;
>> + struct raft_wait_persisted_data *data = trig->data;
>> + if (raft->term >= data->term)
>> + fiber_wakeup(data->waiter);
>> + return 0;
>> +}
>> +
>> +int
>> +box_raft_wait_persisted(void)
>> +{
>> + if (box_raft()->term == box_raft()->volatile_term)
> 2. Since it only waits for term being persisted, I would rather
> call it 'wait_term_persisted'. Because there is also vote, and
> you do not look at it.
Ok.
>> + return 0;
>> + struct raft_wait_persisted_data data = {
>> + .waiter = fiber(),
>> + .term = box_raft()->volatile_term,
>> + };
>> + struct trigger trig;
>> + trigger_create(&trig, box_raft_wait_persisted_f, &data, NULL);
>> + raft_on_update(box_raft(), &trig);
>> + fiber_yield();
> 3. What about spurious wakeups? I could call fiber.wakeup() from
> Lua on this fiber.
Yep, need to handle that.
I"ll do that for box_raft_wait_leader_found() as well. In a separate commit.
Good catch!
>> + trigger_clear(&trig);
>> + if (fiber_is_cancelled()) {
>> + diag_set(FiberIsCancelled);
>> + return -1;
>> + }
>> + return 0;
>> +}
>> diff --git a/test/replication/gh-4114-local-space-replication.result b/test/replication/gh-4114-local-space-replication.result
>> index 9b63a4b99..e71eb60a8 100644
>> --- a/test/replication/gh-4114-local-space-replication.result
>> +++ b/test/replication/gh-4114-local-space-replication.result
>> @@ -45,9 +45,8 @@ test_run:cmd('switch replica')
>> | ---
>> | - true
>> | ...
>> -box.info.vclock[0]
>> +a = box.info.vclock[0] or 0
>> | ---
>> - | - null
>> | ...
>> box.cfg{checkpoint_count=1}
>> | ---
>> @@ -77,9 +76,9 @@ box.space.test:insert{3}
>> | - [3]
>> | ...
>>
>> -box.info.vclock[0]
>> +assert(box.info.vclock[0] == a + 3)
>> | ---
>> - | - 3
>> + | - true
> 4. Why do you need these changes? I reverted this test and it passed.
When the test's run after some election test, master has non-default
raft term,
and replica persists this term bumping vclock[0].
--
Serge Petrenko
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 06/12] box: allow calling promote on a candidate
2021-07-04 12:14 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-14 18:26 ` Serge Petrenko via Tarantool-patches
0 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:26 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
04.07.2021 15:14, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
> Can you please add a test for that?
Thanks for reviewing this!
Sure.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
2021-07-04 12:27 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
2021-07-21 23:28 ` Vladislav Shpilevoy via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:28 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
04.07.2021 15:27, Vladislav Shpilevoy пишет:
> Thanks for working on this!
Thanks for a thorough review!
> See 11 comments below.
>
> 1. On this commit a lot of tests fail. Why?
Must be because persisting limbo and raft state isn't yet implemented.
I'll move the relevant patches before this one.
> 2. On top of this commit and on top of the branch too I tried to
> promote a candidate and got a strange error in the logs, although
> the promotion was successful:
>
> --
> -- Instance 1
> --
>
> -- Step 1
> box.cfg{
> listen = 3313,
> election_mode = 'candidate',
> replication_synchro_quorum = 2,
> replication = {'localhost:3314', 'localhost:3313'}
> }
> box.schema.user.grant('guest', 'super')
>
>
> --
> -- Instance 2
> --
>
> -- Step 2
> box.cfg{
> listen = 3314,
> election_mode = 'voter',
> replication_synchro_quorum = 2,
> replication = {'localhost:3314', 'localhost:3313'},
> read_only = true,
> }
>
> -- Step 3
> box.cfg{read_only = false, election_mode = 'candidate'}
>
> -- Step 4
> box.ctl.promote()
>
> main/112/raft_worker box.cc:1538 E> ER_UNSUPPORTED: box.ctl.promote does not support simultaneous invocations
> ---
> ...
That's because once a candidate becomes the leader, it tries to issue
`box.ctl.promote()`, and fails,
since we're already in `box.ctl.promote()` call.
I'm not sure how to handle that properly. This doesn't break anything
though.
> 3. On top of the branch I tried demote on a leader with election mode
> candidate 2 times. Demote didn't do anything on the second invocation.
> The test:
>
> --
> -- Instance 1
> --
>
> -- Step 1
> box.cfg{
> listen = 3313,
> election_mode = 'candidate',
> replication_synchro_quorum = 2,
> replication = {'localhost:3314', 'localhost:3313'}
> }
> box.schema.user.grant('guest', 'super')
>
> -- Step 3
> -- I demote self but become leader again because I am a
> -- candidate. This is fine.
> box.ctl.demote()
>
> -- Step 4
> tarantool> box.info.election
> ---
> - state: leader
> vote: 1
> leader: 1
> term: 3
> ...
>
> -- Step 5
> -- But now I try to demote again and nothing happens.
> -- I expected the same behaviour - bump term and become
> -- a leader again.
> tarantool> box.ctl.demote()
> ---
> ...
>
> tarantool> box.info.election
> ---
> - state: leader
> vote: 1
> leader: 1
> term: 3 <- the term is the same, so demote didn't do anything.
> ...
>
> --
> -- Instance 2
> --
>
> -- Step 2
> box.cfg{
> listen = 3314,
> election_mode = 'voter',
> replication_synchro_quorum = 2,
> replication = {'localhost:3314', 'localhost:3313'},
> read_only = true,
> }
>
This one is fixed now.
> 4. I tried switching between candidate and manual, trying promote/demote
> in different combinations and got promote not doing anything. The test
> is the same as above but steps >= 3 are different:
>
> tarantool> box.ctl.demote()
> 2021-07-04 14:22:49.550 [588] main/103/interactive I> RAFT: bump term to 3, follow
> 2021-07-04 14:22:49.550 [588] main/113/raft_worker I> RAFT: persisted state {term: 3}
> 2021-07-04 14:22:49.550 [588] main/113/raft_worker I> RAFT: vote for 1, follow
> 2021-07-04 14:22:49.551 [588] main/113/raft_worker I> RAFT: persisted state {term: 3, vote: 1}
> 2021-07-04 14:22:49.551 [588] main/113/raft_worker I> RAFT: enter candidate state with 1 self vote
> ---
> ...
>
> tarantool> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, state: follower} from 2
> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, vote: 1, state: follower} from 2
> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: enter leader state with quorum 2
> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: message {term: 3, vote: 1, state: follower} from 2
> 2021-07-04 14:22:49.552 [588] main/110/applier/localhost:3314 I> RAFT: vote request is skipped - the leader is already known - 1
> 2021-07-04 14:22:49.553 [588] relay/[::1]:54006/101/main I> recover from `./00000000000000000000.xlog'
>
> ---
> ...
>
> tarantool> box.cfg{election_mode='manual'}
> 2021-07-04 14:22:55.391 [588] main/103/interactive I> set 'election_mode' configuration option to "manual"
> ---
> ...
>
> tarantool> box.info.election
> ---
> - state: follower
> vote: 1
> leader: 0
> term: 3
> ...
>
> tarantool> box.ctl.promote()
> ---
> ...
>
> tarantool> box.info.election
> ---
> - state: follower
> vote: 1
> leader: 0
> term: 3
> ...
>
> As you can see, the last promote() didn't do anything. The node
> stayed a follower, and no errors were raised.
>
>
> From the comments 2-4 it seems to me the complexity of
> box_clear_synchro_queue()/box_promote() is getting out of control with
> the number of ifs and flags, sometimes having unexpected meaning (for
> example, try_wait flag causes term bump somewhy). It might be good to
> split it into separate functions and make promotion/demotion out of each
> election mode linear and straight.
I agree. I've reworked promote() to be more sane, please see the
corresponding patches
in v4.
>
>> @TarantoolBot document
>> Title: box.ctl.demote
>>
>> `box.ctl.demote()` is a new function, which works exactly like
>> `box.ctl.promote()`, with one exception that it results in the instance
>> writing DEMOTE request to WAL instead of a PROMOTE request.
>>
>> A DEMOTE request (DEMOTE = 32) copies PROMOTE behaviour (it clears the
>> limbo as well), but clears the synchronous transaction queue ownership instead
>> of assigning it to a new instance.
> 5. It is worth mentioning that demote can only be called on a master.
Sure.
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 1d894be97..86c5967b9 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1674,15 +1680,22 @@ box_promote(void)
>> rc = -1;
>> } else {
>> promote:
>> - if (try_wait) {
>> + if (try_wait || demote) {
>> raft_new_term(box_raft());
>> if (box_raft_wait_persisted() < 0)
>> return -1;
> 6. What if now some other node won the elections of the term
> you just bumped? Then you wouldn't have the right to perform
> DEMOTE as the term is not yours.
Yes, indeed. Here's how I reworked this bit:
let's not write demote at all, when elections are enabled. Simply
bumping the term should be enough. Another leader (or maybe this instance)
will take over the limbo in the new term.
When elections are off, assume nothing can happen other than someone else
writes a promote/demote earlier than we do. I handle it as well in the
new patch.
>> }
>> uint64_t term = box_raft()->term;
>> - txn_limbo_write_promote(&txn_limbo, wait_lsn, term);
>> + if (demote) {
>> + txn_limbo_write_demote(&txn_limbo, wait_lsn,
>> + term);
>> + } else {
>> + txn_limbo_write_promote(&txn_limbo, wait_lsn,
>> + term);
>> + }
>> + uint16_t type = demote ? IPROTO_DEMOTE : IPROTO_PROMOTE;
>> struct synchro_request req = {
>> - .type = IPROTO_PROMOTE,
>> + .type = type,
>> .replica_id = former_leader_id,
>> .origin_id = instance_id,
>> .lsn = wait_lsn,
>> diff --git a/test/replication/gh-5446-qsync-eval-quorum.test.lua b/test/replication/gh-5446-qsync-eval-quorum.test.lua
>> index 6b9e324ed..b969df836 100644
>> --- a/test/replication/gh-5446-qsync-eval-quorum.test.lua
>> +++ b/test/replication/gh-5446-qsync-eval-quorum.test.lua
>> @@ -135,3 +136,5 @@ box.cfg{
>> replication_synchro_quorum = old_synchro_quorum, \
>> replication_synchro_timeout = old_synchro_timeout, \
>> }
>> +box.ctl.demote()
>> +
> 7. Git diff highlights this new line with red as unnecessary.
Sure, fixed.
>> diff --git a/test/replication/gh-6034-limbo-ownership.result b/test/replication/gh-6034-limbo-ownership.result
>> new file mode 100644
>> index 000000000..e412b8d53
>> --- /dev/null
>> +++ b/test/replication/gh-6034-limbo-ownership.result
> 8. Could you please write qsync tests with gh-####-qsync-...
> pattern? It is useful when you want to run all qsync tests
> and can do `python test-run.py qsync` then.
Yep.
>> @@ -0,0 +1,189 @@
>> +-- test-run result file version 2
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +fiber = require('fiber')
> 9. Fiber is not used in the test.
Removed, thanks!
>> + | ---
>> + | ...
>> +
> <...>
>
>> +assert(box.info.synchro.queue.owner == test_run:eval('default', 'return box.info.id')[1])
> 10. There is test_run:get_server_id().
Hm, didn't know that, thanks!
>> + | ---
>> + | - true
>> + | ...
>> +box.space.async:insert{2} -- failure.
>> + | ---
>> + | - error: Can't modify data because this instance is in read-only mode.
>> + | ...
>> +
>> +-- Promotion on the other node. Default should become ro.
>> +box.ctl.promote()
>> + | ---
>> + | ...
>> +assert(not box.info.ro)
>> + | ---
>> + | - true
>> + | ...
>> +assert(box.info.synchro.queue.owner == box.info.id)
>> + | ---
>> + | - true
>> + | ...
>> +box.space.sync:insert{2} -- success.
>> + | ---
>> + | - [2]
>> + | ...
>> +
>> +test_run:switch('default')
>> + | ---
>> + | - true
>> + | ...
>> +assert(box.info.ro)
>> + | ---
>> + | - true
>> + | ...
>> +assert(box.info.synchro.queue.owner == test_run:eval('replica', 'return box.info.id')[1])
> 11. Ditto.
Fixed.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 09/12] replication: encode version in JOIN request
2021-07-04 12:27 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
0 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:28 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
04.07.2021 15:27, Vladislav Shpilevoy пишет:
> Good job on the patch!
>
> On 29.06.2021 00:12, Serge Petrenko via Tarantool-patches wrote:
>> The replica's version will be needed once sending limbo and election
>> state snapshot is implemented.
>>
>> Follow-up #6034
>>
>> @TarantoolBot document
>> New field in JOIN request
> You need to use `Title:` prefix. Otherwise it is not considered a
> valid doc request.
Sure, thanks for noticing!
--
Serge Petrenko
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 10/12] replication: add META stage to JOIN
2021-07-04 12:28 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
0 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:28 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
04.07.2021 15:28, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
> See 3 comments below.
>
> On 29.06.2021 00:12, Serge Petrenko via Tarantool-patches wrote:
>> The new META stage is part of server's response to a join request.
>> It's marked by IPROTO_JOIN_META and IPROTO_JOIN_SNAPSHOT requests and goes
>> before the actual snapshot data.
>>
>> Follow-up #6034
> 1. You probably should document the new keys and their meaning. As well
> as what is sent between them.
Yep, will do.
>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>> index 07fe7f5c7..7abad3a64 100644
>> --- a/src/box/applier.cc
>> +++ b/src/box/applier.cc
>> @@ -447,12 +447,26 @@ applier_wait_snapshot(struct applier *applier)
>> xrow_decode_vclock_xc(&row, &replicaset.vclock);
>> }
>>
>> + coio_read_xrow(coio, ibuf, &row);
>> + if (row.type == IPROTO_JOIN_META) {
>> + /* Read additional metadata. Empty at the moment. */
>> + do {
>> + coio_read_xrow(coio, ibuf, &row);
>> + if (iproto_type_is_error(row.type))
>> + xrow_decode_error_xc(&row);
>> + else if (row.type != IPROTO_JOIN_SNAPSHOT) {
> 2. If one branch of `if` got {}, then all branches should get it too.
Fixed, thanks!
>> + tnt_raise(ClientError, ER_UNKNOWN_REQUEST_TYPE,
>> + (uint32_t)row.type);
>> + }
>> + } while (row.type != IPROTO_JOIN_SNAPSHOT);
>> + coio_read_xrow(coio, ibuf, &row);
>> + }
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index b2c52bc54..bc68ee4c8 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -2501,7 +2501,7 @@ box_process_fetch_snapshot(struct ev_io *io, struct xrow_header *header)
>>
>> /* Send the snapshot data to the instance. */
>> struct vclock start_vclock;
>> - relay_initial_join(io->fd, header->sync, &start_vclock);
>> + relay_initial_join(io->fd, header->sync, &start_vclock, 0);
>> say_info("read-view sent.");
>>
>> /* Remember master's vclock after the last request */
>> @@ -2699,7 +2699,8 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
>> * Initial stream: feed replica with dirty data from engines.
>> */
>> struct vclock start_vclock;
>> - relay_initial_join(io->fd, header->sync, &start_vclock);
>> + relay_initial_join(io->fd, header->sync, &start_vclock,
>> + replica_version_id);
> 3. Shouldn't changes to this file be in the previous commit?
I think not. I start using replica's version only in this commit.
Why start passing it to relay earlier?
--
Serge Petrenko
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 11/12] replication: send latest effective promote in initial join
2021-07-04 12:28 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
0 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:28 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
04.07.2021 15:28, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
>> diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
>> index 843333a19..e489c150a 100644
>> --- a/test/replication/replica_rejoin.result
>> +++ b/test/replication/replica_rejoin.result
> *. Why did you change this test?
Thanks for the review!
This change belongs to the next commit, actually.
It's needed because othervise replica rejoin was broken due to RAFT
being written in vclock[0]
I guess.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 12/12] replication: send current Raft term in join response
2021-07-04 12:29 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
0 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-07-14 18:28 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
04.07.2021 15:29, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
> Could you please explain in the commit message why do we send it?
> (Because otherwise limbo's term becomes bigger than raft's, and that
> could lead to data loss via the nopification).
Ok, sure!
--
Serge Petrenko
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
@ 2021-07-21 23:28 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-23 7:44 ` Sergey Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-21 23:28 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the fixes!
>> 2. On top of this commit and on top of the branch too I tried to
>> promote a candidate and got a strange error in the logs, although
>> the promotion was successful:
>>
>> --
>> -- Instance 1
>> --
>>
>> -- Step 1
>> box.cfg{
>> listen = 3313,
>> election_mode = 'candidate',
>> replication_synchro_quorum = 2,
>> replication = {'localhost:3314', 'localhost:3313'}
>> }
>> box.schema.user.grant('guest', 'super')
>>
>>
>> --
>> -- Instance 2
>> --
>>
>> -- Step 2
>> box.cfg{
>> listen = 3314,
>> election_mode = 'voter',
>> replication_synchro_quorum = 2,
>> replication = {'localhost:3314', 'localhost:3313'},
>> read_only = true,
>> }
>>
>> -- Step 3
>> box.cfg{read_only = false, election_mode = 'candidate'}
>>
>> -- Step 4
>> box.ctl.promote()
>>
>> main/112/raft_worker box.cc:1538 E> ER_UNSUPPORTED: box.ctl.promote does not support simultaneous invocations
>> ---
>> ...
>
> That's because once a candidate becomes the leader, it tries to issue `box.ctl.promote()`, and fails,
> since we're already in `box.ctl.promote()` call.
> I'm not sure how to handle that properly. This doesn't break anything though.
Still, the error message looks really not good. There is an option -
make box_promote() for candidate node just call raft_promote() and set
box_in_promote = false. Then wait for the term outcome. Will it work?
You would need to rebase your patchset on master branch then.
It might be a little easier to do if you apply the diff below. (Warning:
I didn't test it.) The motivation is that one of the main reasons why I
wanted box_promote() simplified was because of the strange meaning of some
flags. In particular, try_wait flag somewhy triggered elections before the
waiting which is super not obvious why. How does 'wait' come to 'elections'?
In the diff I tried to remove these flags entirely. And now you have a
single place in the code of box_promote(), where ELECTION_MODE_CANDIDATE
stuff is handled. Here you could try the proposal I gave above.
====================
diff --git a/src/box/box.cc b/src/box/box.cc
index f68fffcab..e7765b657 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1698,34 +1698,6 @@ box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
assert(txn_limbo_is_empty(&txn_limbo));
}
-/**
- * Check whether this instance may run a promote() and set promote parameters
- * according to its election mode.
- */
-static int
-box_check_promote_election_mode(bool *try_wait, bool *run_elections)
-{
- switch (box_election_mode) {
- case ELECTION_MODE_OFF:
- if (try_wait != NULL)
- *try_wait = true;
- break;
- case ELECTION_MODE_VOTER:
- assert(box_raft()->state == RAFT_STATE_FOLLOWER);
- diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'",
- "manual elections");
- return -1;
- case ELECTION_MODE_MANUAL:
- case ELECTION_MODE_CANDIDATE:
- if (run_elections != NULL)
- *run_elections = box_raft()->state != RAFT_STATE_LEADER;
- break;
- default:
- unreachable();
- }
- return 0;
-}
-
/* A guard to block multiple simultaneous promote()/demote() invocations. */
static bool box_in_promote = false;
@@ -1757,27 +1729,35 @@ box_promote(void)
if (is_leader)
return 0;
- bool run_elections = false;
- bool try_wait = false;
-
- if (box_check_promote_election_mode(&try_wait, &run_elections) < 0)
- return -1;
-
- int64_t wait_lsn = -1;
-
- if (run_elections && box_run_elections() < 0)
- return -1;
- if (try_wait) {
- if (box_try_wait_confirm(2 * replication_synchro_timeout) < 0)
+ switch (box_election_mode) {
+ case ELECTION_MODE_OFF:
+ if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
+ return -1;
+ if (box_trigger_elections() != 0)
return -1;
- if (box_trigger_elections() < 0)
+ break;
+ case ELECTION_MODE_VOTER:
+ assert(box_raft()->state == RAFT_STATE_FOLLOWER);
+ diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'",
+ "manual elections");
+ return -1;
+ case ELECTION_MODE_MANUAL:
+ case ELECTION_MODE_CANDIDATE:
+ if (box_raft()->state != RAFT_STATE_LEADER &&
+ box_run_elections() != 0)
return -1;
+ break;
+ default:
+ unreachable();
}
- if ((wait_lsn = box_wait_limbo_acked()) < 0)
+ if (box_check_promote_election_mode(&try_wait, &run_elections) < 0)
return -1;
- box_issue_promote(txn_limbo.owner_id, wait_lsn);
+ int64_t wait_lsn = box_wait_limbo_acked();
+ if (wait_lsn < 0)
+ return -1;
+ box_issue_promote(txn_limbo.owner_id, wait_lsn);
return 0;
}
@@ -1804,29 +1784,16 @@ box_demote(void)
is_leader = is_leader && box_raft()->state == RAFT_STATE_LEADER;
if (!is_leader)
return 0;
-
- bool try_wait = false;
-
- if (box_check_promote_election_mode(&try_wait, NULL) < 0)
- return -1;
-
- int64_t wait_lsn = -1;
-
if (box_trigger_elections() < 0)
return -1;
-
if (box_election_mode != ELECTION_MODE_OFF)
return 0;
-
- if (try_wait &&
- box_try_wait_confirm(2 * replication_synchro_timeout) < 0)
+ if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
return -1;
-
- if ((wait_lsn = box_wait_limbo_acked()) < 0)
+ int64_t wait_lsn = box_wait_limbo_acked();
+ if (wait_lsn < 0)
return -1;
-
box_issue_demote(txn_limbo.owner_id, wait_lsn);
-
return 0;
}
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
2021-07-21 23:28 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-23 7:44 ` Sergey Petrenko via Tarantool-patches
2021-07-26 23:50 ` Vladislav Shpilevoy via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Sergey Petrenko via Tarantool-patches @ 2021-07-23 7:44 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
22.07.2021 01:28, Vladislav Shpilevoy пишет:
> Thanks for the fixes!
>
>>> 2. On top of this commit and on top of the branch too I tried to
>>> promote a candidate and got a strange error in the logs, although
>>> the promotion was successful:
>>>
>>> --
>>> -- Instance 1
>>> --
>>>
>>> -- Step 1
>>> box.cfg{
>>> listen = 3313,
>>> election_mode = 'candidate',
>>> replication_synchro_quorum = 2,
>>> replication = {'localhost:3314', 'localhost:3313'}
>>> }
>>> box.schema.user.grant('guest', 'super')
>>>
>>>
>>> --
>>> -- Instance 2
>>> --
>>>
>>> -- Step 2
>>> box.cfg{
>>> listen = 3314,
>>> election_mode = 'voter',
>>> replication_synchro_quorum = 2,
>>> replication = {'localhost:3314', 'localhost:3313'},
>>> read_only = true,
>>> }
>>>
>>> -- Step 3
>>> box.cfg{read_only = false, election_mode = 'candidate'}
>>>
>>> -- Step 4
>>> box.ctl.promote()
>>>
>>> main/112/raft_worker box.cc:1538 E> ER_UNSUPPORTED: box.ctl.promote does not support simultaneous invocations
>>> ---
>>> ...
>> That's because once a candidate becomes the leader, it tries to issue `box.ctl.promote()`, and fails,
>> since we're already in `box.ctl.promote()` call.
>> I'm not sure how to handle that properly. This doesn't break anything though.
> Still, the error message looks really not good. There is an option -
> make box_promote() for candidate node just call raft_promote() and set
> box_in_promote = false. Then wait for the term outcome. Will it work?
> You would need to rebase your patchset on master branch then.
Hmm, I don't like that. This will work, but it will complicate things in
box_promote even more. Now one has to keep in mind that a manual
promote call has 2 phases, one when it is issued manually and the other
when it's called automatically when (and if) the instance becomes the
leader.
I think it'd be better to check whether we should run box_promote()
right in box_raft_update_synchro_queue().
Check out the diff (mostly for commit "box: allow calling promote on a
candidate"):
=====================================
diff --git a/src/box/box.cc b/src/box/box.cc
index a30e4f78d..5cdca4bd4 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1692,20 +1692,24 @@ box_issue_demote(uint32_t prev_leader_id,
int64_t promote_lsn)
}
/* A guard to block multiple simultaneous promote()/demote()
invocations. */
-static bool box_in_promote = false;
+static bool in_box_promote = false;
+
+bool box_in_promote(void) {
+ return in_box_promote;
+}
int
box_promote(void)
{
- if (box_in_promote) {
+ if (in_box_promote) {
diag_set(ClientError, ER_UNSUPPORTED, "box.ctl.promote",
"simultaneous invocations");
return -1;
}
struct raft *raft = box_raft();
- box_in_promote = true;
+ in_box_promote = true;
auto promote_guard = make_scoped_guard([&] {
- box_in_promote = false;
+ in_box_promote = false;
});
if (!is_box_configured)
@@ -1757,14 +1761,14 @@ box_promote(void)
int
box_demote(void)
{
- if (box_in_promote) {
+ if (in_box_promote) {
diag_set(ClientError, ER_UNSUPPORTED, "box.ctl.demote",
"simultaneous invocations");
return -1;
}
- box_in_promote = true;
+ in_box_promote = true;
auto promote_guard = make_scoped_guard([&] {
- box_in_promote = false;
+ in_box_promote = false;
});
if (!is_box_configured)
diff --git a/src/box/box.h b/src/box/box.h
index aaf20d9dd..344ed90f2 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -273,6 +273,9 @@ extern "C" {
typedef struct tuple box_tuple_t;
+bool
+box_in_promote(void);
+
int
box_promote(void);
diff --git a/src/box/raft.c b/src/box/raft.c
index 35c471f58..5e496c2e4 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -88,11 +88,11 @@ box_raft_update_synchro_queue(struct raft *raft)
{
assert(raft == box_raft());
/*
- * In case these are manual elections, we are already in the middle
of a
- * `promote` call. No need to call it once again.
+ * In case the elections were triggered manually, we are already in
+ * the middle of a `promote` call. No need to call it once again.
*/
if (raft->state == RAFT_STATE_LEADER &&
- box_election_mode != ELECTION_MODE_MANUAL) {
+ !box_in_promote()) {
int rc = 0;
uint32_t errcode = 0;
do {
=====================================
>
> It might be a little easier to do if you apply the diff below. (Warning:
> I didn't test it.) The motivation is that one of the main reasons why I
> wanted box_promote() simplified was because of the strange meaning of some
> flags. In particular, try_wait flag somewhy triggered elections before the
> waiting which is super not obvious why. How does 'wait' come to 'elections'?
>
> In the diff I tried to remove these flags entirely. And now you have a
> single place in the code of box_promote(), where ELECTION_MODE_CANDIDATE
> stuff is handled. Here you could try the proposal I gave above.
Thanks for the help! Your diff looks good, I've reworked my patches to
comply.
> ====================
> diff --git a/src/box/box.cc b/src/box/box.cc
> index f68fffcab..e7765b657 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1698,34 +1698,6 @@ box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
> assert(txn_limbo_is_empty(&txn_limbo));
> }
>
> -/**
> - * Check whether this instance may run a promote() and set promote parameters
> - * according to its election mode.
> - */
> -static int
> -box_check_promote_election_mode(bool *try_wait, bool *run_elections)
> -{
> - switch (box_election_mode) {
> - case ELECTION_MODE_OFF:
> - if (try_wait != NULL)
> - *try_wait = true;
> - break;
> - case ELECTION_MODE_VOTER:
> - assert(box_raft()->state == RAFT_STATE_FOLLOWER);
> - diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'",
> - "manual elections");
> - return -1;
> - case ELECTION_MODE_MANUAL:
> - case ELECTION_MODE_CANDIDATE:
> - if (run_elections != NULL)
> - *run_elections = box_raft()->state != RAFT_STATE_LEADER;
> - break;
> - default:
> - unreachable();
> - }
> - return 0;
> -}
> -
> /* A guard to block multiple simultaneous promote()/demote() invocations. */
> static bool box_in_promote = false;
>
> @@ -1757,27 +1729,35 @@ box_promote(void)
> if (is_leader)
> return 0;
>
> - bool run_elections = false;
> - bool try_wait = false;
> -
> - if (box_check_promote_election_mode(&try_wait, &run_elections) < 0)
> - return -1;
> -
> - int64_t wait_lsn = -1;
> -
> - if (run_elections && box_run_elections() < 0)
> - return -1;
> - if (try_wait) {
> - if (box_try_wait_confirm(2 * replication_synchro_timeout) < 0)
> + switch (box_election_mode) {
> + case ELECTION_MODE_OFF:
> + if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
> + return -1;
> + if (box_trigger_elections() != 0)
> return -1;
> - if (box_trigger_elections() < 0)
> + break;
> + case ELECTION_MODE_VOTER:
> + assert(box_raft()->state == RAFT_STATE_FOLLOWER);
> + diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'",
> + "manual elections");
> + return -1;
> + case ELECTION_MODE_MANUAL:
> + case ELECTION_MODE_CANDIDATE:
> + if (box_raft()->state != RAFT_STATE_LEADER &&
> + box_run_elections() != 0)
> return -1;
> + break;
> + default:
> + unreachable();
> }
> - if ((wait_lsn = box_wait_limbo_acked()) < 0)
> + if (box_check_promote_election_mode(&try_wait, &run_elections) < 0)
> return -1;
>
> - box_issue_promote(txn_limbo.owner_id, wait_lsn);
> + int64_t wait_lsn = box_wait_limbo_acked();
> + if (wait_lsn < 0)
> + return -1;
>
> + box_issue_promote(txn_limbo.owner_id, wait_lsn);
> return 0;
> }
>
> @@ -1804,29 +1784,16 @@ box_demote(void)
> is_leader = is_leader && box_raft()->state == RAFT_STATE_LEADER;
> if (!is_leader)
> return 0;
> -
> - bool try_wait = false;
> -
> - if (box_check_promote_election_mode(&try_wait, NULL) < 0)
> - return -1;
> -
> - int64_t wait_lsn = -1;
> -
> if (box_trigger_elections() < 0)
> return -1;
> -
> if (box_election_mode != ELECTION_MODE_OFF)
> return 0;
> -
> - if (try_wait &&
> - box_try_wait_confirm(2 * replication_synchro_timeout) < 0)
> + if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
> return -1;
> -
> - if ((wait_lsn = box_wait_limbo_acked()) < 0)
> + int64_t wait_lsn = box_wait_limbo_acked();
> + if (wait_lsn < 0)
> return -1;
> -
> box_issue_demote(txn_limbo.owner_id, wait_lsn);
> -
> return 0;
> }
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
2021-07-23 7:44 ` Sergey Petrenko via Tarantool-patches
@ 2021-07-26 23:50 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-29 20:56 ` Sergey Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-26 23:50 UTC (permalink / raw)
To: Sergey Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the fixes!
On 23.07.2021 09:44, Sergey Petrenko wrote:
>
> 22.07.2021 01:28, Vladislav Shpilevoy пишет:
>> Thanks for the fixes!
>>
>>>> 2. On top of this commit and on top of the branch too I tried to
>>>> promote a candidate and got a strange error in the logs, although
>>>> the promotion was successful:
>>>>
>>>> --
>>>> -- Instance 1
>>>> --
>>>>
>>>> -- Step 1
>>>> box.cfg{
>>>> listen = 3313,
>>>> election_mode = 'candidate',
>>>> replication_synchro_quorum = 2,
>>>> replication = {'localhost:3314', 'localhost:3313'}
>>>> }
>>>> box.schema.user.grant('guest', 'super')
>>>>
>>>>
>>>> --
>>>> -- Instance 2
>>>> --
>>>>
>>>> -- Step 2
>>>> box.cfg{
>>>> listen = 3314,
>>>> election_mode = 'voter',
>>>> replication_synchro_quorum = 2,
>>>> replication = {'localhost:3314', 'localhost:3313'},
>>>> read_only = true,
>>>> }
>>>>
>>>> -- Step 3
>>>> box.cfg{read_only = false, election_mode = 'candidate'}
>>>>
>>>> -- Step 4
>>>> box.ctl.promote()
>>>>
>>>> main/112/raft_worker box.cc:1538 E> ER_UNSUPPORTED: box.ctl.promote does not support simultaneous invocations
>>>> ---
>>>> ...
>>> That's because once a candidate becomes the leader, it tries to issue `box.ctl.promote()`, and fails,
>>> since we're already in `box.ctl.promote()` call.
>>> I'm not sure how to handle that properly. This doesn't break anything though.
>> Still, the error message looks really not good. There is an option -
>> make box_promote() for candidate node just call raft_promote() and set
>> box_in_promote = false. Then wait for the term outcome. Will it work?
>> You would need to rebase your patchset on master branch then.
>
>
> Hmm, I don't like that. This will work, but it will complicate things in
>
> box_promote even more. Now one has to keep in mind that a manual
>
> promote call has 2 phases, one when it is issued manually and the other
>
> when it's called automatically when (and if) the instance becomes the leader.
But is it different in the new version on the branch? You still need to check
if you are "in promote", but instead of it being encapsulated into box_promote()
we now need to check it potentially in all box_promote() usage places. Which is
only 2 now though, and only 1 needs this. But you can see that now you need to
think if you should ignore being already in promote.
> diff --git a/src/box/box.cc b/src/box/box.cc
> index a30e4f78d..5cdca4bd4 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1692,20 +1692,24 @@ box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
> }
>
> /* A guard to block multiple simultaneous promote()/demote() invocations. */
> -static bool box_in_promote = false;
> +static bool in_box_promote = false;
> +
> +bool box_in_promote(void) {
Could you please put the return value on a separate line and use 'is'
as a suffix? For instance, `box_is_in_promote()`.
> diff --git a/src/box/box.h b/src/box/box.h
> index aaf20d9dd..344ed90f2 100644
> --- a/src/box/box.h
> +++ b/src/box/box.h
> @@ -88,11 +88,11 @@ box_raft_update_synchro_queue(struct raft *raft)
> {
> assert(raft == box_raft());
> /*
> - * In case these are manual elections, we are already in the middle of a
> - * `promote` call. No need to call it once again.
> + * In case the elections were triggered manually, we are already in
> + * the middle of a `promote` call. No need to call it once again.
> */
> if (raft->state == RAFT_STATE_LEADER &&
> - box_election_mode != ELECTION_MODE_MANUAL) {
> + !box_in_promote()) {
It just seems to me that it should not be of raft's business if it was
promoted manually or not.
> int rc = 0;
> uint32_t errcode = 0;
> do {
> =====================================
>
>
>>
>> It might be a little easier to do if you apply the diff below. (Warning:
>> I didn't test it.) The motivation is that one of the main reasons why I
>> wanted box_promote() simplified was because of the strange meaning of some
>> flags. In particular, try_wait flag somewhy triggered elections before the
>> waiting which is super not obvious why. How does 'wait' come to 'elections'?
>>
>> In the diff I tried to remove these flags entirely. And now you have a
>> single place in the code of box_promote(), where ELECTION_MODE_CANDIDATE
>> stuff is handled. Here you could try the proposal I gave above.
>
>
> Thanks for the help! Your diff looks good, I've reworked my patches to comply.
Wouldn't it be too hard to merge this diff into the previous
commits? Because it could be done like that from the first
box_promote-refactoring commit it looks to me.
> diff --git a/src/box/box.cc b/src/box/box.cc
> index af39f66f4..36302310b 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1701,14 +1701,14 @@ box_promote(void)
> if (!is_box_configured)
> return 0;
> if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
> - box_raft()->term)
> + raft->term)
> return 0;
> - bool run_elections = false;
> - bool try_wait = false;
> -
> switch (box_election_mode) {
> case ELECTION_MODE_OFF:
> - try_wait = true;
> + if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
> + return -1;
> + if (box_trigger_elections() != 0)
> + return -1;
> break;
> case ELECTION_MODE_VOTER:
> assert(raft->state == RAFT_STATE_FOLLOWER);
> @@ -1717,23 +1717,17 @@ box_promote(void)
> return -1;
> case ELECTION_MODE_MANUAL:
> case ELECTION_MODE_CANDIDATE:
> - run_elections = raft->state != RAFT_STATE_LEADER;
> + if (raft->state != RAFT_STATE_LEADER &&
> + box_run_elections() != 0) {
> + return -1;
> + }
> break;
> default:
> unreachable();
> }
>
> - int64_t wait_lsn = -1;
> -
> - if (run_elections && box_run_elections() != 0)
> - return -1;
> - if (try_wait) {
> - if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
> - return -1;
> - if (box_trigger_elections() != 0)
> - return -1;
> - }
> - if ((wait_lsn = box_wait_limbo_acked()) < 0)
> + int64_t wait_lsn = box_wait_limbo_acked();
> + if (wait_lsn < 0)
> return -1;
>
> box_issue_promote(txn_limbo.owner_id, wait_lsn);
I thought more about how the manual promotion conflicts with the
automatic one and I tried to separate them. See my diff below, it
is done on top of this commit, but I didn't push it, because I
didn't run the tests except the one which used to log an error
in the first version of the patchset.
Thanks to the new small helper functions, I implemented a special
function for automatic promote which does not conflict with
the manual one. It you will apply it though, you might want to
do it earlier that this commit.
====================
diff --git a/src/box/box.cc b/src/box/box.cc
index 36302310b..65d064615 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1599,7 +1599,7 @@ box_try_wait_confirm(double timeout)
* Return lsn of the last limbo entry on success, -1 on error.
*/
static int64_t
-box_wait_limbo_acked(void)
+box_wait_limbo_acked(double timeout)
{
if (txn_limbo_is_empty(&txn_limbo))
return txn_limbo.confirmed_lsn;
@@ -1629,7 +1629,7 @@ box_wait_limbo_acked(void)
int64_t wait_lsn = last_entry->lsn;
if (box_wait_quorum(txn_limbo.owner_id, wait_lsn, quorum,
- replication_synchro_timeout) != 0)
+ timeout) != 0)
return -1;
if (box_check_promote_term_intact(promote_term) != 0)
@@ -1675,9 +1675,23 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
/* A guard to block multiple simultaneous box_promote() invocations. */
static bool in_box_promote = false;
-bool box_in_promote(void)
+int
+box_promote_qsync(void)
{
- return in_box_promote;
+ assert(!in_box_promote);
+ assert(is_box_configured);
+ struct raft *raft = box_raft();
+ if (raft->state != RAFT_STATE_LEADER)
+ return 0;
+ assert(box_election_mode == ELECTION_MODE_MANUAL ||
+ box_election_mode == ELECTION_MODE_CANDIDATE);
+ if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term)
+ return 0;
+ int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY);
+ if (wait_lsn < 0)
+ return -1;
+ box_issue_promote(txn_limbo.owner_id, wait_lsn);
+ return 0;
}
int
@@ -1700,8 +1714,8 @@ box_promote(void)
*/
if (!is_box_configured)
return 0;
- if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
- raft->term)
+ if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term ||
+ raft->state == RAFT_STATE_LEADER)
return 0;
switch (box_election_mode) {
case ELECTION_MODE_OFF:
@@ -1715,18 +1729,16 @@ box_promote(void)
diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'",
"manual elections");
return -1;
- case ELECTION_MODE_MANUAL:
case ELECTION_MODE_CANDIDATE:
- if (raft->state != RAFT_STATE_LEADER &&
- box_run_elections() != 0) {
- return -1;
- }
- break;
+ case ELECTION_MODE_MANUAL:
+ /* Let the automatic elections finish the promotion properly. */
+ in_box_promote = false;
+ return box_run_elections();
default:
unreachable();
}
- int64_t wait_lsn = box_wait_limbo_acked();
+ int64_t wait_lsn = box_wait_limbo_acked(replication_synchro_timeout);
if (wait_lsn < 0)
return -1;
diff --git a/src/box/box.h b/src/box/box.h
index db636b058..5e5d0bf6d 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -273,12 +273,12 @@ extern "C" {
typedef struct tuple box_tuple_t;
-bool
-box_in_promote(void);
-
int
box_promote(void);
+int
+box_promote_qsync(void);
+
/* box_select is private and used only by FFI */
API_EXPORT int
box_select(uint32_t space_id, uint32_t index_id,
diff --git a/src/box/raft.c b/src/box/raft.c
index 5e496c2e4..07501157f 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -83,30 +83,6 @@ box_raft_request_to_msg(const struct raft_request *req, struct raft_msg *msg)
};
}
-static void
-box_raft_update_synchro_queue(struct raft *raft)
-{
- assert(raft == box_raft());
- /*
- * In case the elections were triggered manually, we are already in
- * the middle of a `promote` call. No need to call it once again.
- */
- if (raft->state == RAFT_STATE_LEADER &&
- !box_in_promote()) {
- int rc = 0;
- uint32_t errcode = 0;
- do {
- rc = box_promote();
- if (rc != 0) {
- struct error *err = diag_last_error(diag_get());
- errcode = box_error_code(err);
- diag_log();
- }
- } while (rc != 0 && errcode == ER_QUORUM_WAIT &&
- !fiber_is_cancelled());
- }
-}
-
static int
box_raft_worker_f(va_list args)
{
@@ -117,7 +93,12 @@ box_raft_worker_f(va_list args)
box_raft_has_work = false;
raft_process_async(raft);
- box_raft_update_synchro_queue(raft);
+ /*
+ * XXX: perhaps it should not ever fail. Or at least need a
+ * proper support for failures instead of the ignorance.
+ */
+ if (box_promote_qsync() != 0)
+ diag_log();
if (!box_raft_has_work)
fiber_yield();
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
2021-07-26 23:50 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-29 20:56 ` Sergey Petrenko via Tarantool-patches
2021-08-01 16:19 ` Vladislav Shpilevoy via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Sergey Petrenko via Tarantool-patches @ 2021-07-29 20:56 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
27.07.2021 02:50, Vladislav Shpilevoy пишет:
> Thanks for the fixes!
>
> On 23.07.2021 09:44, Sergey Petrenko wrote:
>> 22.07.2021 01:28, Vladislav Shpilevoy пишет:
>>> Thanks for the fixes!
>>>
>>>>> 2. On top of this commit and on top of the branch too I tried to
>>>>> promote a candidate and got a strange error in the logs, although
>>>>> the promotion was successful:
>>>>>
>>>>> --
>>>>> -- Instance 1
>>>>> --
>>>>>
>>>>> -- Step 1
>>>>> box.cfg{
>>>>> listen = 3313,
>>>>> election_mode = 'candidate',
>>>>> replication_synchro_quorum = 2,
>>>>> replication = {'localhost:3314', 'localhost:3313'}
>>>>> }
>>>>> box.schema.user.grant('guest', 'super')
>>>>>
>>>>>
>>>>> --
>>>>> -- Instance 2
>>>>> --
>>>>>
>>>>> -- Step 2
>>>>> box.cfg{
>>>>> listen = 3314,
>>>>> election_mode = 'voter',
>>>>> replication_synchro_quorum = 2,
>>>>> replication = {'localhost:3314', 'localhost:3313'},
>>>>> read_only = true,
>>>>> }
>>>>>
>>>>> -- Step 3
>>>>> box.cfg{read_only = false, election_mode = 'candidate'}
>>>>>
>>>>> -- Step 4
>>>>> box.ctl.promote()
>>>>>
>>>>> main/112/raft_worker box.cc:1538 E> ER_UNSUPPORTED: box.ctl.promote does not support simultaneous invocations
>>>>> ---
>>>>> ...
>>>> That's because once a candidate becomes the leader, it tries to issue `box.ctl.promote()`, and fails,
>>>> since we're already in `box.ctl.promote()` call.
>>>> I'm not sure how to handle that properly. This doesn't break anything though.
>>> Still, the error message looks really not good. There is an option -
>>> make box_promote() for candidate node just call raft_promote() and set
>>> box_in_promote = false. Then wait for the term outcome. Will it work?
>>> You would need to rebase your patchset on master branch then.
>> Hmm, I don't like that. This will work, but it will complicate things in
>>
>> box_promote even more. Now one has to keep in mind that a manual
>>
>> promote call has 2 phases, one when it is issued manually and the other
>>
>> when it's called automatically when (and if) the instance becomes the leader.
> But is it different in the new version on the branch? You still need to check
> if you are "in promote", but instead of it being encapsulated into box_promote()
> we now need to check it potentially in all box_promote() usage places. Which is
> only 2 now though, and only 1 needs this. But you can see that now you need to
> think if you should ignore being already in promote.
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index a30e4f78d..5cdca4bd4 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1692,20 +1692,24 @@ box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
>> }
>>
>> /* A guard to block multiple simultaneous promote()/demote() invocations. */
>> -static bool box_in_promote = false;
>> +static bool in_box_promote = false;
>> +
>> +bool box_in_promote(void) {
> Could you please put the return value on a separate line and use 'is'
> as a suffix? For instance, `box_is_in_promote()`.
>> diff --git a/src/box/box.h b/src/box/box.h
>> index aaf20d9dd..344ed90f2 100644
>> --- a/src/box/box.h
>> +++ b/src/box/box.h
>> @@ -88,11 +88,11 @@ box_raft_update_synchro_queue(struct raft *raft)
>> {
>> assert(raft == box_raft());
>> /*
>> - * In case these are manual elections, we are already in the middle of a
>> - * `promote` call. No need to call it once again.
>> + * In case the elections were triggered manually, we are already in
>> + * the middle of a `promote` call. No need to call it once again.
>> */
>> if (raft->state == RAFT_STATE_LEADER &&
>> - box_election_mode != ELECTION_MODE_MANUAL) {
>> + !box_in_promote()) {
> It just seems to me that it should not be of raft's business if it was
> promoted manually or not.
Ok, I see your point. I'll apply your diff.
>> int rc = 0;
>> uint32_t errcode = 0;
>> do {
>> =====================================
>>
>>
>>> It might be a little easier to do if you apply the diff below. (Warning:
>>> I didn't test it.) The motivation is that one of the main reasons why I
>>> wanted box_promote() simplified was because of the strange meaning of some
>>> flags. In particular, try_wait flag somewhy triggered elections before the
>>> waiting which is super not obvious why. How does 'wait' come to 'elections'?
>>>
>>> In the diff I tried to remove these flags entirely. And now you have a
>>> single place in the code of box_promote(), where ELECTION_MODE_CANDIDATE
>>> stuff is handled. Here you could try the proposal I gave above.
>> Thanks for the help! Your diff looks good, I've reworked my patches to comply.
> Wouldn't it be too hard to merge this diff into the previous
> commits? Because it could be done like that from the first
> box_promote-refactoring commit it looks to me.
No problem.
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index af39f66f4..36302310b 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1701,14 +1701,14 @@ box_promote(void)
>> if (!is_box_configured)
>> return 0;
>> if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
>> - box_raft()->term)
>> + raft->term)
>> return 0;
>> - bool run_elections = false;
>> - bool try_wait = false;
>> -
>> switch (box_election_mode) {
>> case ELECTION_MODE_OFF:
>> - try_wait = true;
>> + if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
>> + return -1;
>> + if (box_trigger_elections() != 0)
>> + return -1;
>> break;
>> case ELECTION_MODE_VOTER:
>> assert(raft->state == RAFT_STATE_FOLLOWER);
>> @@ -1717,23 +1717,17 @@ box_promote(void)
>> return -1;
>> case ELECTION_MODE_MANUAL:
>> case ELECTION_MODE_CANDIDATE:
>> - run_elections = raft->state != RAFT_STATE_LEADER;
>> + if (raft->state != RAFT_STATE_LEADER &&
>> + box_run_elections() != 0) {
>> + return -1;
>> + }
>> break;
>> default:
>> unreachable();
>> }
>>
>> - int64_t wait_lsn = -1;
>> -
>> - if (run_elections && box_run_elections() != 0)
>> - return -1;
>> - if (try_wait) {
>> - if (box_try_wait_confirm(2 * replication_synchro_timeout) != 0)
>> - return -1;
>> - if (box_trigger_elections() != 0)
>> - return -1;
>> - }
>> - if ((wait_lsn = box_wait_limbo_acked()) < 0)
>> + int64_t wait_lsn = box_wait_limbo_acked();
>> + if (wait_lsn < 0)
>> return -1;
>>
>> box_issue_promote(txn_limbo.owner_id, wait_lsn);
> I thought more about how the manual promotion conflicts with the
> automatic one and I tried to separate them. See my diff below, it
> is done on top of this commit, but I didn't push it, because I
> didn't run the tests except the one which used to log an error
> in the first version of the patchset.
>
> Thanks to the new small helper functions, I implemented a special
> function for automatic promote which does not conflict with
> the manual one. It you will apply it though, you might want to
> do it earlier that this commit.
Thanks for the help! I've applied your diff with minor changes to
patches
box: split manual and automatic promotion
box: allow calling promote on a candidate
>
> ====================
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 36302310b..65d064615 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1599,7 +1599,7 @@ box_try_wait_confirm(double timeout)
> * Return lsn of the last limbo entry on success, -1 on error.
> */
> static int64_t
> -box_wait_limbo_acked(void)
> +box_wait_limbo_acked(double timeout)
> {
> if (txn_limbo_is_empty(&txn_limbo))
> return txn_limbo.confirmed_lsn;
> @@ -1629,7 +1629,7 @@ box_wait_limbo_acked(void)
> int64_t wait_lsn = last_entry->lsn;
>
> if (box_wait_quorum(txn_limbo.owner_id, wait_lsn, quorum,
> - replication_synchro_timeout) != 0)
> + timeout) != 0)
> return -1;
>
> if (box_check_promote_term_intact(promote_term) != 0)
> @@ -1675,9 +1675,23 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
> /* A guard to block multiple simultaneous box_promote() invocations. */
> static bool in_box_promote = false;
>
> -bool box_in_promote(void)
> +int
> +box_promote_qsync(void)
> {
> - return in_box_promote;
> + assert(!in_box_promote);
> + assert(is_box_configured);
> + struct raft *raft = box_raft();
> + if (raft->state != RAFT_STATE_LEADER)
> + return 0;
> + assert(box_election_mode == ELECTION_MODE_MANUAL ||
> + box_election_mode == ELECTION_MODE_CANDIDATE);
> + if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term)
> + return 0;
> + int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY);
> + if (wait_lsn < 0)
> + return -1;
> + box_issue_promote(txn_limbo.owner_id, wait_lsn);
> + return 0;
> }
>
> int
> @@ -1700,8 +1714,8 @@ box_promote(void)
> */
> if (!is_box_configured)
> return 0;
> - if (txn_limbo_replica_term(&txn_limbo, instance_id) ==
> - raft->term)
> + if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term ||
> + raft->state == RAFT_STATE_LEADER)
> return 0;
> switch (box_election_mode) {
> case ELECTION_MODE_OFF:
> @@ -1715,18 +1729,16 @@ box_promote(void)
> diag_set(ClientError, ER_UNSUPPORTED, "election_mode='voter'",
> "manual elections");
> return -1;
> - case ELECTION_MODE_MANUAL:
> case ELECTION_MODE_CANDIDATE:
> - if (raft->state != RAFT_STATE_LEADER &&
> - box_run_elections() != 0) {
> - return -1;
> - }
> - break;
> + case ELECTION_MODE_MANUAL:
> + /* Let the automatic elections finish the promotion properly. */
> + in_box_promote = false;
> + return box_run_elections();
> default:
> unreachable();
> }
>
> - int64_t wait_lsn = box_wait_limbo_acked();
> + int64_t wait_lsn = box_wait_limbo_acked(replication_synchro_timeout);
> if (wait_lsn < 0)
> return -1;
>
> diff --git a/src/box/box.h b/src/box/box.h
> index db636b058..5e5d0bf6d 100644
> --- a/src/box/box.h
> +++ b/src/box/box.h
> @@ -273,12 +273,12 @@ extern "C" {
>
> typedef struct tuple box_tuple_t;
>
> -bool
> -box_in_promote(void);
> -
> int
> box_promote(void);
>
> +int
> +box_promote_qsync(void);
> +
> /* box_select is private and used only by FFI */
> API_EXPORT int
> box_select(uint32_t space_id, uint32_t index_id,
> diff --git a/src/box/raft.c b/src/box/raft.c
> index 5e496c2e4..07501157f 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -83,30 +83,6 @@ box_raft_request_to_msg(const struct raft_request *req, struct raft_msg *msg)
> };
> }
>
> -static void
> -box_raft_update_synchro_queue(struct raft *raft)
> -{
> - assert(raft == box_raft());
> - /*
> - * In case the elections were triggered manually, we are already in
> - * the middle of a `promote` call. No need to call it once again.
> - */
> - if (raft->state == RAFT_STATE_LEADER &&
> - !box_in_promote()) {
> - int rc = 0;
> - uint32_t errcode = 0;
> - do {
> - rc = box_promote();
> - if (rc != 0) {
> - struct error *err = diag_last_error(diag_get());
> - errcode = box_error_code(err);
> - diag_log();
> - }
> - } while (rc != 0 && errcode == ER_QUORUM_WAIT &&
> - !fiber_is_cancelled());
> - }
> -}
> -
> static int
> box_raft_worker_f(va_list args)
> {
> @@ -117,7 +93,12 @@ box_raft_worker_f(va_list args)
> box_raft_has_work = false;
>
> raft_process_async(raft);
> - box_raft_update_synchro_queue(raft);
> + /*
> + * XXX: perhaps it should not ever fail. Or at least need a
> + * proper support for failures instead of the ignorance.
> + */
> + if (box_promote_qsync() != 0)
> + diag_log();
>
> if (!box_raft_has_work)
> fiber_yield();
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
2021-07-29 20:56 ` Sergey Petrenko via Tarantool-patches
@ 2021-08-01 16:19 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-03 7:56 ` Serge Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-01 16:19 UTC (permalink / raw)
To: Sergey Petrenko, gorcunov; +Cc: tarantool-patches
Hi! Thanks for the patch!
I think we are on the finish line here, see 4 small
comments below. After them and after you fix the failing
vinyl test, the patchset will probably be finished!
There are only 2 things which bother me. They are not bugs
and we can work on them in the next quarter.
1) Assume you have election_mode = 'manual'. And you are a
leader. You call box.ctl.demote() and stop being a leader.
But the limbo is still yours. If now you switch election_mode to
'off', you need to call box.ctl.demote() again to free the
limbo.
2) In the last commit I see we make too much actions to ensure
we are a writable leader. Perhaps in the future we should not
report box.info.election.state == 'leader' until promote is
written and should not say the instance is writable.
I don't have a plan for either of these ideas yet.
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 41f665e38..a34e05e94 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1679,20 +1679,44 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
> assert(txn_limbo_is_empty(&txn_limbo));
> }
>
> +/* A guard to block multiple simultaneous box_promote() invocations. */
1. For out of function comments we usually use /** as an opening.
> +static bool in_box_promote = false;
2. Could you please use `is_` prefix here? `is_in_box_promote`.
> +
> +int
> +box_promote_qsync(void)
> +{
> + assert(!in_box_promote);
> + assert(is_box_configured);
> + struct raft *raft = box_raft();
> + in_box_promote = true;
> + auto promote_guard = make_scoped_guard([&] {
> + in_box_promote = false;
> + });
> + if (raft->state != RAFT_STATE_LEADER)
> + return 0;
3. This condition is not reachable, according to what I see in
box_raft_worker_f().
> + assert(box_election_mode == ELECTION_MODE_MANUAL ||
> + box_election_mode == ELECTION_MODE_CANDIDATE);
> + if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term)
> + return 0;
> + int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY);
> + if (wait_lsn < 0)
> + return -1;
4. Perhaps this better be panic? Because infinity timeout should
not ever be reached. And then the function becomes void, because
would not be able to fail anymore.
> + box_issue_promote(txn_limbo.owner_id, wait_lsn);
> + return 0;
> +}
> commit 7980cb3096f2616a2851f8d97db8091f0d82879c
> Author: Serge Petrenko <sergepetrenko@tarantool.org>
> Date: Mon Jun 28 11:52:44 2021 +0300
>
> box: allow calling promote on a candidate
>
> Part of #6034
>
> diff --git a/test/replication/gh-6034-election-candidate-promote.result b/test/replication/gh-6034-election-candidate-promote.result
> new file mode 100644
> index 000000000..2b4bc0213
> --- /dev/null
> +++ b/test/replication/gh-6034-election-candidate-promote.result
5. The test name format `gh-####-...` is obligatory only for bug tests.
This patch seems to be adding a feature.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
2021-08-01 16:19 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-03 7:56 ` Serge Petrenko via Tarantool-patches
2021-08-03 23:25 ` Vladislav Shpilevoy via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-08-03 7:56 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
01.08.2021 19:19, Vladislav Shpilevoy пишет:
> Hi! Thanks for the patch!
>
> I think we are on the finish line here, see 4 small
> comments below. After them and after you fix the failing
> vinyl test, the patchset will probably be finished!
As discussed, fixed vinyl/errinj test by making
iproto_write_error() use blocking writes in debug build.
Here's the relevant commit (indentation is broken in the letter) :
=============================================
commit c6283769a887429f81d0005b2354691ace22514d
Author: Serge Petrenko <sergepetrenko@tarantool.org>
Date: Mon Aug 2 19:33:36 2021 +0300
iproto: make iproto_write_error() blocking in debug
iproto_write_error() used to be blocking until the commit
4dac37a66b0674c345e036faa9984c9ae0d70382 (iproto: remove
iproto_write_error_blocking())
Actually, it should block until the error is written to socket, because
some tests (vinyl/errinj.test.lua, for example) rely on that.
Do not make iproto_write_error() blocking in release builds for safety
reasons, as stated in commit above. But make it blocking in debug for
testing sake.
Part-of #6034
diff --git a/src/box/iproto.cc b/src/box/iproto.cc
index 3ed641eea..5cc69b77f 100644
--- a/src/box/iproto.cc
+++ b/src/box/iproto.cc
@@ -32,6 +32,7 @@
#include <string.h>
#include <stdarg.h>
#include <stdio.h>
+#include <fcntl.h>
#include <msgpuck.h>
#include <small/ibuf.h>
@@ -557,6 +558,20 @@ struct iproto_connection
struct iproto_thread *iproto_thread;
};
+#ifdef NDEBUG
+#define iproto_write_error(fd, e, schema_version,
sync) \
+ iproto_do_write_error(fd, e, schema_version, sync);
+#else
+#define iproto_write_error(fd, e, schema_version, sync) do
{ \
+ int flags = fcntl(fd, F_GETFL,
0); \
+ if (flags >=
0) \
+ fcntl(fd, F_SETFL, flags & (~O_NONBLOCK)); \
+ iproto_do_write_error(fd, e, schema_version,
sync); \
+ if (flags >=
0) \
+ fcntl(fd, F_SETFL, flags); \
+} while (0);
+#endif
+
/**
* Return true if we have not enough spare messages
* in the message pool.
diff --git a/src/box/xrow.c b/src/box/xrow.c
index 116be01ce..5c5da4808 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -543,8 +543,8 @@ iproto_reply_error(struct obuf *out, const struct
error *e, uint64_t sync,
}
void
-iproto_write_error(int fd, const struct error *e, uint32_t schema_version,
- uint64_t sync)
+iproto_do_write_error(int fd, const struct error *e, uint32_t
schema_version,
+ uint64_t sync)
{
bool is_error = false;
struct mpstream stream;
diff --git a/src/box/xrow.h b/src/box/xrow.h
index 3867e0c0e..30d6b8639 100644
--- a/src/box/xrow.h
+++ b/src/box/xrow.h
@@ -719,8 +719,8 @@ iproto_reply_chunk(struct obuf *buf, struct obuf_svp
*svp, uint64_t sync,
/** Write error directly to a socket. */
void
-iproto_write_error(int fd, const struct error *e, uint32_t schema_version,
- uint64_t sync);
+iproto_do_write_error(int fd, const struct error *e, uint32_t
schema_version,
+ uint64_t sync);
enum {
/* Maximal length of protocol name in handshake */
=============================================
I've also noticed some failures in box-py/iproto.test.py,
so I've made the following changes in these 3 commits:
d61435f4a replication: send current Raft term in join response
f0e4d1b73 replication: send latest effective promote in initial join
2d2ba4d35 replication: add META stage to JOIN
=============================================
diff --git a/src/box/box.cc b/src/box/box.cc
index f3da02231..e7b8ddda5 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2724,7 +2724,7 @@ box_process_join(struct ev_io *io, struct
xrow_header *header)
/* Decode JOIN request */
struct tt_uuid instance_uuid = uuid_nil;
- uint32_t replica_version_id;
+ uint32_t replica_version_id = 0;
xrow_decode_join_xc(header, &instance_uuid, &replica_version_id);
/* Check that bootstrap has been finished */
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 805b5e7ff..2947468ba 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -444,20 +444,20 @@ relay_initial_join(int fd, uint64_t sync, struct
vclock *vclock,
*/
if (replica_version_id > 0) {
/* Mark the beginning of the metadata stream. */
- row.type = IPROTO_JOIN_META;
- coio_write_xrow(&relay->io, &row);
+ xrow_encode_type(&row, IPROTO_JOIN_META);
+ xstream_write(&relay->stream, &row);
xrow_encode_raft(&row, &fiber()->gc, &raft_req);
- coio_write_xrow(&relay->io, &row);
+ xstream_write(&relay->stream, &row);
char body[XROW_SYNCHRO_BODY_LEN_MAX];
xrow_encode_synchro(&row, body, &req);
row.replica_id = req.replica_id;
- coio_write_xrow(&relay->io, &row);
+ xstream_write(&relay->stream, &row);
/* Mark the end of the metadata stream. */
- row.type = IPROTO_JOIN_SNAPSHOT;
- coio_write_xrow(&relay->io, &row);
+ xrow_encode_type(&row, IPROTO_JOIN_SNAPSHOT);
+ xstream_write(&relay->stream, &row);
}
/* Send read view to the replica. */
diff --git a/src/box/xrow.c b/src/box/xrow.c
index 5c5da4808..8ab8b2768 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -1730,6 +1730,13 @@ xrow_encode_timestamp(struct xrow_header *row,
uint32_t replica_id, double tm)
row->tm = tm;
}
+void
+xrow_encode_type(struct xrow_header *row, uint16_t type)
+{
+ memset(row, 0, sizeof(*row));
+ row->type = type;
+}
+
void
greeting_encode(char *greetingbuf, uint32_t version_id,
const struct tt_uuid *uuid, const char *salt, uint32_t salt_len)
diff --git a/src/box/xrow.h b/src/box/xrow.h
index 30d6b8639..c6e8ed0fd 100644
--- a/src/box/xrow.h
+++ b/src/box/xrow.h
@@ -570,6 +570,14 @@ xrow_decode_subscribe_response(struct xrow_header *row,
void
xrow_encode_timestamp(struct xrow_header *row, uint32_t replica_id,
double tm);
+/**
+ * Encode any bodyless message.
+ * @param row[out] Row to encode into.
+ * @param type Message type.
+ */
+void
+xrow_encode_type(struct xrow_header *row, uint16_t type);
+
/**
* Fast encode xrow header using the specified header fields.
* It is faster than the xrow_header_encode, because uses
=============================================
>
> There are only 2 things which bother me. They are not bugs
> and we can work on them in the next quarter.
>
> 1) Assume you have election_mode = 'manual'. And you are a
> leader. You call box.ctl.demote() and stop being a leader.
> But the limbo is still yours. If now you switch election_mode to
> 'off', you need to call box.ctl.demote() again to free the
> limbo.
This is an inconvenience rather than a bug IMO.
I couldn't find a solution right away.
> 2) In the last commit I see we make too much actions to ensure
> we are a writable leader. Perhaps in the future we should not
> report box.info.election.state == 'leader' until promote is
> written and should not say the instance is writable.
Yes, maybe. Need to think about that.
>
> I don't have a plan for either of these ideas yet.
>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 41f665e38..a34e05e94 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1679,20 +1679,44 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
>> assert(txn_limbo_is_empty(&txn_limbo));
>> }
>>
>> +/* A guard to block multiple simultaneous box_promote() invocations. */
> 1. For out of function comments we usually use /** as an opening.
Fixed, sorry.
>> +static bool in_box_promote = false;
> 2. Could you please use `is_` prefix here? `is_in_box_promote`.
Ok.
>> +
>> +int
>> +box_promote_qsync(void)
>> +{
>> + assert(!in_box_promote);
>> + assert(is_box_configured);
>> + struct raft *raft = box_raft();
>> + in_box_promote = true;
>> + auto promote_guard = make_scoped_guard([&] {
>> + in_box_promote = false;
>> + });
>> + if (raft->state != RAFT_STATE_LEADER)
>> + return 0;
> 3. This condition is not reachable, according to what I see in
> box_raft_worker_f().
Indeed. Changed, please, see the diff below.
>> + assert(box_election_mode == ELECTION_MODE_MANUAL ||
>> + box_election_mode == ELECTION_MODE_CANDIDATE);
>> + if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term)
>> + return 0;
>> + int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY);
>> + if (wait_lsn < 0)
>> + return -1;
> 4. Perhaps this better be panic? Because infinity timeout should
> not ever be reached. And then the function becomes void, because
> would not be able to fail anymore.
No. Actually, there are still multiple reasons for box_wait_limbo_acked()
to fail: the quorum may be reached, but then increased,
the fiber might be cancelled, new synchronous transactions might appear
while waiting.
I think box_promote_qsync() should be retried in raft worker, like it was
done for box_promote().
Also there's box_check_promote_term_intact() inside box_wait_limbo_acked(),
which could cause box_promote_qsync() to fail, but we shouldn't panic in
this case.
Here's the diff for this commit (box: split manual and automatic promotion):
============================================
diff --git a/src/box/box.cc b/src/box/box.cc
index dad5e4557..53ed64e51 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1679,23 +1679,20 @@ box_issue_promote(uint32_t prev_leader_id,
int64_t promote_lsn)
assert(txn_limbo_is_empty(&txn_limbo));
}
-/* A guard to block multiple simultaneous box_promote() invocations. */
-static bool in_box_promote = false;
+/** A guard to block multiple simultaneous box_promote() invocations. */
+static bool is_in_box_promote = false;
int
box_promote_qsync(void)
{
- assert(!in_box_promote);
+ assert(!is_in_box_promote);
assert(is_box_configured);
struct raft *raft = box_raft();
- in_box_promote = true;
+ is_in_box_promote = true;
auto promote_guard = make_scoped_guard([&] {
- in_box_promote = false;
+ is_in_box_promote = false;
});
- if (raft->state != RAFT_STATE_LEADER)
- return 0;
- assert(box_election_mode == ELECTION_MODE_MANUAL ||
- box_election_mode == ELECTION_MODE_CANDIDATE);
+ assert(raft->state == RAFT_STATE_LEADER);
if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term)
return 0;
int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY);
@@ -1708,15 +1705,15 @@ box_promote_qsync(void)
int
box_promote(void)
{
- if (in_box_promote) {
+ if (is_in_box_promote) {
diag_set(ClientError, ER_UNSUPPORTED, "box.ctl.promote",
"simultaneous invocations");
return -1;
}
struct raft *raft = box_raft();
- in_box_promote = true;
+ is_in_box_promote = true;
auto promote_guard = make_scoped_guard([&] {
- in_box_promote = false;
+ is_in_box_promote = false;
});
/*
* Do nothing when box isn't configured and when PROMOTE was already
@@ -1742,7 +1739,7 @@ box_promote(void)
case ELECTION_MODE_MANUAL:
if (raft->state == RAFT_STATE_LEADER)
return 0;
- in_box_promote = false;
+ is_in_box_promote = false;
return box_run_elections();
case ELECTION_MODE_CANDIDATE:
/*
diff --git a/src/box/raft.c b/src/box/raft.c
index f8d13aa32..bc69f7f1b 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -83,6 +83,24 @@ box_raft_request_to_msg(const struct raft_request
*req, struct raft_msg *msg)
};
}
+static void
+box_raft_update_synchro_queue(struct raft *raft)
+{
+ assert(raft == box_raft());
+ if (raft->state != RAFT_STATE_LEADER)
+ return;
+ int rc = 0;
+ uint32_t errcode = 0;
+ do {
+ rc = box_promote_qsync();
+ if (rc != 0) {
+ struct error *err = diag_last_error(diag_get());
+ errcode = box_error_code(err);
+ diag_log();
+ }
+ } while (rc != 0 && errcode == ER_QUORUM_WAIT &&
!fiber_is_cancelled());
+}
+
static int
box_raft_worker_f(va_list args)
{
@@ -93,9 +111,7 @@ box_raft_worker_f(va_list args)
box_raft_has_work = false;
raft_process_async(raft);
- if (raft->state == RAFT_STATE_LEADER &&
- box_promote_qsync() != 0)
- diag_log();
+ box_raft_update_synchro_queue(raft);
if (!box_raft_has_work)
fiber_yield();
============================================
>> + box_issue_promote(txn_limbo.owner_id, wait_lsn);
>> + return 0;
>> +}
>> commit 7980cb3096f2616a2851f8d97db8091f0d82879c
>> Author: Serge Petrenko<sergepetrenko@tarantool.org>
>> Date: Mon Jun 28 11:52:44 2021 +0300
>>
>> box: allow calling promote on a candidate
>>
>> Part of #6034
>>
>> diff --git a/test/replication/gh-6034-election-candidate-promote.result b/test/replication/gh-6034-election-candidate-promote.result
>> new file mode 100644
>> index 000000000..2b4bc0213
>> --- /dev/null
>> +++ b/test/replication/gh-6034-election-candidate-promote.result
> 5. The test name format `gh-####-...` is obligatory only for bug tests.
> This patch seems to be adding a feature.
Ok:
===============================
diff --git a/test/replication/gh-6034-election-candidate-promote.result
b/test/replication/election-candidate-promote.result
similarity index 100%
rename from test/replication/gh-6034-election-candidate-promote.result
rename to test/replication/election-candidate-promote.result
diff --git
a/test/replication/gh-6034-election-candidate-promote.test.lua
b/test/replication/election-candidate-promote.test.lua
similarity index 100%
rename from test/replication/gh-6034-election-candidate-promote.test.lua
rename to test/replication/election-candidate-promote.test.lua
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 1ec1a94eb..1a3c991f0 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -57,8 +57,8 @@
"gh-6057-qsync-confirm-async-no-wal.test.lua": {},
"gh-6094-rs-uuid-mismatch.test.lua": {},
"gh-6127-election-join-new.test.lua": {},
- "gh-6034-election-candidate-promote.test.lua": {},
"gh-6035-applier-filter.test.lua": {},
+ "election-candidate-promote.test.lua": {},
"*": {
"memtx": {"engine": "memtx"},
"vinyl": {"engine": "vinyl"}
===============================
--
Serge Petrenko
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
2021-08-03 7:56 ` Serge Petrenko via Tarantool-patches
@ 2021-08-03 23:25 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-04 13:08 ` Serge Petrenko via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-03 23:25 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Thanks for the fixes!
See 3 comments below.
> diff --git a/src/box/iproto.cc b/src/box/iproto.cc
> index 3ed641eea..5cc69b77f 100644
> --- a/src/box/iproto.cc
> +++ b/src/box/iproto.cc
> @@ -557,6 +558,20 @@ struct iproto_connection
> struct iproto_thread *iproto_thread;
> };
>
> +#ifdef NDEBUG
> +#define iproto_write_error(fd, e, schema_version, sync) \
> + iproto_do_write_error(fd, e, schema_version, sync);
> +#else
> +#define iproto_write_error(fd, e, schema_version, sync) do { \
> + int flags = fcntl(fd, F_GETFL, 0); \
> + if (flags >= 0) \
> + fcntl(fd, F_SETFL, flags & (~O_NONBLOCK)); \
> + iproto_do_write_error(fd, e, schema_version, sync); \
> + if (flags >= 0) \
> + fcntl(fd, F_SETFL, flags); \
> +} while (0);
> +#endif
1. Why didn't you do this right in iproto_write_error? Why do
you need to change iproto.cc code at all?
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index 805b5e7ff..2947468ba 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -444,20 +444,20 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
> */
> if (replica_version_id > 0) {
> /* Mark the beginning of the metadata stream. */
> - row.type = IPROTO_JOIN_META;
> - coio_write_xrow(&relay->io, &row);
> + xrow_encode_type(&row, IPROTO_JOIN_META);
> + xstream_write(&relay->stream, &row);
2. Why do you use xtream_write? And why coio_write_xrow just a few lines
above works just fine then?
> xrow_encode_raft(&row, &fiber()->gc, &raft_req);
> - coio_write_xrow(&relay->io, &row);
> + xstream_write(&relay->stream, &row);
>
> char body[XROW_SYNCHRO_BODY_LEN_MAX];
> xrow_encode_synchro(&row, body, &req);
> row.replica_id = req.replica_id;
> - coio_write_xrow(&relay->io, &row);
> + xstream_write(&relay->stream, &row);
>
> /* Mark the end of the metadata stream. */
> - row.type = IPROTO_JOIN_SNAPSHOT;
> - coio_write_xrow(&relay->io, &row);
> + xrow_encode_type(&row, IPROTO_JOIN_SNAPSHOT);
> + xstream_write(&relay->stream, &row);
> }
> diff --git a/src/box/raft.c b/src/box/raft.c
> index f8d13aa32..bc69f7f1b 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -83,6 +83,24 @@ box_raft_request_to_msg(const struct raft_request *req, struct raft_msg *msg)
> };
> }
>
> +static void
> +box_raft_update_synchro_queue(struct raft *raft)
> +{
> + assert(raft == box_raft());
> + if (raft->state != RAFT_STATE_LEADER)
> + return;
> + int rc = 0;
> + uint32_t errcode = 0;
> + do {
> + rc = box_promote_qsync();
> + if (rc != 0) {
> + struct error *err = diag_last_error(diag_get());
> + errcode = box_error_code(err);
> + diag_log();
> + }
3. Could it stop being a leader during the retries? Would it
make sense to continue them then?
> + } while (rc != 0 && errcode == ER_QUORUM_WAIT && !fiber_is_cancelled());
> +}
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote`
2021-08-03 23:25 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-04 13:08 ` Serge Petrenko via Tarantool-patches
0 siblings, 0 replies; 42+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-08-04 13:08 UTC (permalink / raw)
To: Vladislav Shpilevoy, gorcunov; +Cc: tarantool-patches
04.08.2021 02:25, Vladislav Shpilevoy пишет:
> Thanks for the fixes!
>
> See 3 comments below.
>
>> diff --git a/src/box/iproto.cc b/src/box/iproto.cc
>> index 3ed641eea..5cc69b77f 100644
>> --- a/src/box/iproto.cc
>> +++ b/src/box/iproto.cc
>> @@ -557,6 +558,20 @@ struct iproto_connection
>> struct iproto_thread *iproto_thread;
>> };
>>
>> +#ifdef NDEBUG
>> +#define iproto_write_error(fd, e, schema_version, sync) \
>> + iproto_do_write_error(fd, e, schema_version, sync);
>> +#else
>> +#define iproto_write_error(fd, e, schema_version, sync) do { \
>> + int flags = fcntl(fd, F_GETFL, 0); \
>> + if (flags >= 0) \
>> + fcntl(fd, F_SETFL, flags & (~O_NONBLOCK)); \
>> + iproto_do_write_error(fd, e, schema_version, sync); \
>> + if (flags >= 0) \
>> + fcntl(fd, F_SETFL, flags); \
>> +} while (0);
>> +#endif
> 1. Why didn't you do this right in iproto_write_error? Why do
> you need to change iproto.cc code at all?
I've found a commit way back from 2017 where you removed the
fcntl dependency from xrow. I didn't want to introduce it again.
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index 805b5e7ff..2947468ba 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -444,20 +444,20 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock,
>> */
>> if (replica_version_id > 0) {
>> /* Mark the beginning of the metadata stream. */
>> - row.type = IPROTO_JOIN_META;
>> - coio_write_xrow(&relay->io, &row);
>> + xrow_encode_type(&row, IPROTO_JOIN_META);
>> + xstream_write(&relay->stream, &row);
> 2. Why do you use xtream_write? And why coio_write_xrow just a few lines
> above works just fine then?
xstream write is basically a shortcut for "row.sync = sync ;
coio_write_xrow"
Writing row.sync = sync before every coio_write_xrow looked rather ugly
IMO, that's
why I changed it.
I need to reassign row.sync before each despatch because xrow_encode_*
functions usually
reset all the xrow fields to 0.
>
>> xrow_encode_raft(&row, &fiber()->gc, &raft_req);
>> - coio_write_xrow(&relay->io, &row);
>> + xstream_write(&relay->stream, &row);
>>
>> char body[XROW_SYNCHRO_BODY_LEN_MAX];
>> xrow_encode_synchro(&row, body, &req);
>> row.replica_id = req.replica_id;
>> - coio_write_xrow(&relay->io, &row);
>> + xstream_write(&relay->stream, &row);
>>
>> /* Mark the end of the metadata stream. */
>> - row.type = IPROTO_JOIN_SNAPSHOT;
>> - coio_write_xrow(&relay->io, &row);
>> + xrow_encode_type(&row, IPROTO_JOIN_SNAPSHOT);
>> + xstream_write(&relay->stream, &row);
>> }
>> diff --git a/src/box/raft.c b/src/box/raft.c
>> index f8d13aa32..bc69f7f1b 100644
>> --- a/src/box/raft.c
>> +++ b/src/box/raft.c
>> @@ -83,6 +83,24 @@ box_raft_request_to_msg(const struct raft_request *req, struct raft_msg *msg)
>> };
>> }
>>
>> +static void
>> +box_raft_update_synchro_queue(struct raft *raft)
>> +{
>> + assert(raft == box_raft());
>> + if (raft->state != RAFT_STATE_LEADER)
>> + return;
>> + int rc = 0;
>> + uint32_t errcode = 0;
>> + do {
>> + rc = box_promote_qsync();
>> + if (rc != 0) {
>> + struct error *err = diag_last_error(diag_get());
>> + errcode = box_error_code(err);
>> + diag_log();
>> + }
> 3. Could it stop being a leader during the retries? Would it
> make sense to continue them then?
Thanks for noticing!
Check out the diff:
===============================
diff --git a/src/box/box.cc b/src/box/box.cc
index 4882b76a4..c7a972992 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1698,6 +1698,10 @@ box_promote_qsync(void)
int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY);
if (wait_lsn < 0)
return -1;
+ if (raft->state != RAFT_STATE_LEADER) {
+ diag_set(ClientError, ER_NOT_LEADER, raft->leader);
+ return -1;
+ }
box_issue_promote(txn_limbo.owner_id, wait_lsn);
return 0;
}
diff --git a/src/box/errcode.h b/src/box/errcode.h
index d2854677f..906cd28b2 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -278,6 +278,7 @@ struct errcode_record {
/*223 */_(ER_INTERFERING_PROMOTE, "Instance with replica
id %u was promoted first") \
/*224 */_(ER_ELECTION_DISABLED, "Elections were turned
off")\
/*225 */_(ER_TXN_ROLLBACK, "Transaction was rolled
back") \
+ /*226 */_(ER_NOT_LEADER, "The instance is not a
leader. New leader is %u")\
/*
* !IMPORTANT! Please follow instructions at start of the file
===============================
>
>> + } while (rc != 0 && errcode == ER_QUORUM_WAIT && !fiber_is_cancelled());
>> +}
--
Serge Petrenko
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
` (11 preceding siblings ...)
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 12/12] replication: send current Raft term in join response Serge Petrenko via Tarantool-patches
@ 2021-08-04 22:41 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-06 7:54 ` Vitaliia Ioffe via Tarantool-patches
2021-08-06 8:31 ` Kirill Yukhin via Tarantool-patches
13 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-04 22:41 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Hi! Thanks for the patchset!
LGTM.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition
2021-08-04 22:41 ` [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-06 7:54 ` Vitaliia Ioffe via Tarantool-patches
0 siblings, 0 replies; 42+ messages in thread
From: Vitaliia Ioffe via Tarantool-patches @ 2021-08-06 7:54 UTC (permalink / raw)
Cc: tarantool-patches, Vladislav Shpilevoy
[-- Attachment #1: Type: text/plain, Size: 238 bytes --]
Hi team,
QA LGTM.
--
Vitaliia Ioffe
>Четверг, 5 августа 2021, 1:41 +03:00 от Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>:
>
>Hi! Thanks for the patchset!
>
>LGTM.
[-- Attachment #2: Type: text/html, Size: 703 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
` (12 preceding siblings ...)
2021-08-04 22:41 ` [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-06 8:31 ` Kirill Yukhin via Tarantool-patches
2021-08-08 10:46 ` Vladislav Shpilevoy via Tarantool-patches
13 siblings, 1 reply; 42+ messages in thread
From: Kirill Yukhin via Tarantool-patches @ 2021-08-06 8:31 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy
Hello,
On 29 июн 01:12, Serge Petrenko via Tarantool-patches wrote:
> Changes in v3:
> - change demote() behaviour as discussed with Vlad:
> * make it work only on the current leader
> * make it demote the current leader and always
> bump the term
> - change how limbo and raft snapshots are sent in response
> to JOIN:
> * encode replica's version in JOIN request
> * introduce a special stage: JOIN_META with raft and limbo
> snapshots. Send it based on replica's version.
>
>
> https://github.com/tarantool/tarantool/issues/5438
> https://github.com/tarantool/tarantool/issues/6034
> https://github.com/tarantool/tarantool/tree/sp/gh-6034-empty-limbo-transition
I've checked your patchset into master.
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition
2021-08-06 8:31 ` Kirill Yukhin via Tarantool-patches
@ 2021-08-08 10:46 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-09 7:14 ` Kirill Yukhin via Tarantool-patches
0 siblings, 1 reply; 42+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-08 10:46 UTC (permalink / raw)
To: Kirill Yukhin, Serge Petrenko; +Cc: tarantool-patches
On 06.08.2021 11:31, Kirill Yukhin wrote:
> Hello,
>
> On 29 июн 01:12, Serge Petrenko via Tarantool-patches wrote:
>> Changes in v3:
>> - change demote() behaviour as discussed with Vlad:
>> * make it work only on the current leader
>> * make it demote the current leader and always
>> bump the term
>> - change how limbo and raft snapshots are sent in response
>> to JOIN:
>> * encode replica's version in JOIN request
>> * introduce a special stage: JOIN_META with raft and limbo
>> snapshots. Send it based on replica's version.
>>
>>
>> https://github.com/tarantool/tarantool/issues/5438
>> https://github.com/tarantool/tarantool/issues/6034
>> https://github.com/tarantool/tarantool/tree/sp/gh-6034-empty-limbo-transition
>
> I've checked your patchset into master.
This must be in all branches, not only master. The patch fixes serious
bugs.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition
2021-08-08 10:46 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-09 7:14 ` Kirill Yukhin via Tarantool-patches
0 siblings, 0 replies; 42+ messages in thread
From: Kirill Yukhin via Tarantool-patches @ 2021-08-09 7:14 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tarantool-patches
Hello,
On 08 авг 13:46, Vladislav Shpilevoy wrote:
> On 06.08.2021 11:31, Kirill Yukhin wrote:
> > Hello,
> >
> > On 29 июн 01:12, Serge Petrenko via Tarantool-patches wrote:
> >> Changes in v3:
> >> - change demote() behaviour as discussed with Vlad:
> >> * make it work only on the current leader
> >> * make it demote the current leader and always
> >> bump the term
> >> - change how limbo and raft snapshots are sent in response
> >> to JOIN:
> >> * encode replica's version in JOIN request
> >> * introduce a special stage: JOIN_META with raft and limbo
> >> snapshots. Send it based on replica's version.
> >>
> >>
> >> https://github.com/tarantool/tarantool/issues/5438
> >> https://github.com/tarantool/tarantool/issues/6034
> >> https://github.com/tarantool/tarantool/tree/sp/gh-6034-empty-limbo-transition
> >
> > I've checked your patchset into master.
>
> This must be in all branches, not only master. The patch fixes serious
> bugs.
Checked into 2.7 and 2.8 as well.
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2021-08-09 7:14 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-28 22:12 [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 01/12] replication: always send raft state to subscribers Serge Petrenko via Tarantool-patches
2021-07-04 12:12 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-09 9:43 ` Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 02/12] txn_limbo: fix promote term filtering Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 03/12] raft: refactor raft_new_term() Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 04/12] box: make promote always bump the term Serge Petrenko via Tarantool-patches
2021-07-04 12:14 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:26 ` Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 05/12] replication: forbid implicit limbo owner transition Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 06/12] box: allow calling promote on a candidate Serge Petrenko via Tarantool-patches
2021-07-04 12:14 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:26 ` Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 07/12] box: introduce `box.ctl.demote` Serge Petrenko via Tarantool-patches
2021-07-04 12:27 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
2021-07-21 23:28 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-23 7:44 ` Sergey Petrenko via Tarantool-patches
2021-07-26 23:50 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-29 20:56 ` Sergey Petrenko via Tarantool-patches
2021-08-01 16:19 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-03 7:56 ` Serge Petrenko via Tarantool-patches
2021-08-03 23:25 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-04 13:08 ` Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 08/12] txn_limbo: persist the latest effective promote in snapshot Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 09/12] replication: encode version in JOIN request Serge Petrenko via Tarantool-patches
2021-07-04 12:27 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 10/12] replication: add META stage to JOIN Serge Petrenko via Tarantool-patches
2021-07-04 12:28 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 11/12] replication: send latest effective promote in initial join Serge Petrenko via Tarantool-patches
2021-07-04 12:28 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
2021-06-28 22:12 ` [Tarantool-patches] [PATCH v3 12/12] replication: send current Raft term in join response Serge Petrenko via Tarantool-patches
2021-07-04 12:29 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-14 18:28 ` Serge Petrenko via Tarantool-patches
2021-08-04 22:41 ` [Tarantool-patches] [PATCH v3 00/12] forbid implicit limbo ownership transition Vladislav Shpilevoy via Tarantool-patches
2021-08-06 7:54 ` Vitaliia Ioffe via Tarantool-patches
2021-08-06 8:31 ` Kirill Yukhin via Tarantool-patches
2021-08-08 10:46 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-09 7:14 ` Kirill Yukhin via Tarantool-patches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox