[Tarantool-patches] [PATCH v17 0/5] qsync: implement packet filtering (part 1)

Tarantool development patches archive
 help / color / mirror / Atom feed

* [Tarantool-patches] [PATCH v17 0/5] qsync: implement packet filtering (part 1)
@ 2021-09-22 13:05 Cyrill Gorcunov via Tarantool-patches
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 1/5] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-09-22 13:05 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Please take a look once time permit. I'm mostly worried about exporting
limbo internals but without it we can't test order of promotion tracking.

branch gorcunov/gh-6036-rollback-confirm-17
issue https://github.com/tarantool/tarantool/issues/6036
previous series https://lists.tarantool.org/tarantool-patches/YUso7ImbeNg82tfv@grain/T/#t

Cyrill Gorcunov (5):
  latch: add latch_is_locked helper
  qsync: order access to the limbo terms
  qsync: track confirmed_lsn upon CONFIRM packet
  qsync: export more details on promote tracking
  test: add gh-6036-term-order test

 src/box/applier.cc                           |  16 +-
 src/box/box.cc                               |  71 +++---
 src/box/lua/info.c                           |  11 +-
 src/box/memtx_engine.cc                      |   3 +-
 src/box/txn_limbo.c                          |  48 ++++-
 src/box/txn_limbo.h                          |  49 ++++-
 src/lib/core/latch.h                         |  11 +
 test/replication/gh-6036-order-master.lua    |   1 +
 test/replication/gh-6036-order-node.lua      |  60 ++++++
 test/replication/gh-6036-order-replica1.lua  |   1 +
 test/replication/gh-6036-order-replica2.lua  |   1 +
 test/replication/gh-6036-term-order.result   | 216 +++++++++++++++++++
 test/replication/gh-6036-term-order.test.lua |  94 ++++++++
 test/replication/suite.cfg                   |   1 +
 test/replication/suite.ini                   |   2 +-
 15 files changed, 531 insertions(+), 54 deletions(-)
 create mode 120000 test/replication/gh-6036-order-master.lua
 create mode 100644 test/replication/gh-6036-order-node.lua
 create mode 120000 test/replication/gh-6036-order-replica1.lua
 create mode 120000 test/replication/gh-6036-order-replica2.lua
 create mode 100644 test/replication/gh-6036-term-order.result
 create mode 100644 test/replication/gh-6036-term-order.test.lua


base-commit: e509054ccd9a6ccea8a0c201a4ebd12b07826026
-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Tarantool-patches] [PATCH v17 1/5] latch: add latch_is_locked helper
  2021-09-22 13:05 [Tarantool-patches] [PATCH v17 0/5] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
@ 2021-09-22 13:05 ` Cyrill Gorcunov via Tarantool-patches
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 2/5] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-09-22 13:05 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

To test if latch is locked.

Part-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/lib/core/latch.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/src/lib/core/latch.h b/src/lib/core/latch.h
index 49c59cf63..0aaa8b634 100644
--- a/src/lib/core/latch.h
+++ b/src/lib/core/latch.h
@@ -95,6 +95,17 @@ latch_owner(struct latch *l)
 	return l->owner;
 }
 
+/**
+ * Return true if the latch is locked.
+ *
+ * @param l - latch to be tested.
+ */
+static inline bool
+latch_is_locked(const struct latch *l)
+{
+	return l->owner != NULL;
+}
+
 /**
  * Lock a latch. If the latch is already locked by another fiber,
  * waits for timeout.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Tarantool-patches] [PATCH v17 2/5] qsync: order access to the limbo terms
  2021-09-22 13:05 [Tarantool-patches] [PATCH v17 0/5] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 1/5] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
@ 2021-09-22 13:05 ` Cyrill Gorcunov via Tarantool-patches
  2021-09-26 14:29   ` Vladislav Shpilevoy via Tarantool-patches
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 3/5] qsync: track confirmed_lsn upon CONFIRM packet Cyrill Gorcunov via Tarantool-patches
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-09-22 13:05 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Limbo terms tracking is shared between appliers and when
one of appliers is waiting for write to complete inside
journal_write() routine, an other may need to access read
term value to figure out if promote request is valid to
apply. Due to cooperative multitasking access to the terms
is not consistent so we need to be sure that other fibers
read up to date terms (ie written to the WAL).

For this sake we use a latching mechanism, when one fiber
takes a lock for updating other readers are waiting until
the operation is complete.

For example here is a call graph of two appliers

applier 1
---------
applier_apply_tx
  (promote term = 3
   current max term = 2)
  applier_synchro_filter_tx
  apply_synchro_row
    journal_write
      (sleeping)

at this moment another applier comes in with obsolete
data and term 2

                              applier 2
                              ---------
                              applier_apply_tx
                                (term 2)
                                applier_synchro_filter_tx
                                  txn_limbo_is_replica_outdated -> false
                                journal_write (sleep)

applier 1
---------
journal wakes up
  apply_synchro_row_cb
    set max term to 3

So the applier 2 didn't notice that term 3 is already seen
and wrote obsolete data. With locking the applier 2 will
wait until applier 1 has finished its write.

We introduce the following helpers:

1) txn_limbo_process_begin: which takes a lock (and will
   be validating request in future in sake of split brain
   detection)
2) txn_limbo_process_commit and txn_limbo_process_rollback
   which simply release the lock but have different names
   for better semantics
3) txn_limbo_process is a general function which uses x_begin
   and x_commit helper internally

Since txn_limbo_process_begin() can fail we had to change all
callers to return error.

In particular box.ctrl.promote() and box.ctrl.demote() commands
now may fail. Internally they write promote or demote packet to
the WAL and process synchro queue. So to eliminate code duplication
we use new box_issue_synchro() helper which respects locking
mechanism.

Part-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/applier.cc      | 16 +++++++---
 src/box/box.cc          | 71 +++++++++++++++++++----------------------
 src/box/memtx_engine.cc |  3 +-
 src/box/txn_limbo.c     | 34 ++++++++++++++++++--
 src/box/txn_limbo.h     | 49 +++++++++++++++++++++++++---
 5 files changed, 121 insertions(+), 52 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index b981bd436..f0751b68a 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -458,7 +458,8 @@ applier_wait_snapshot(struct applier *applier)
 				struct synchro_request req;
 				if (xrow_decode_synchro(&row, &req) != 0)
 					diag_raise();
-				txn_limbo_process(&txn_limbo, &req);
+				if (txn_limbo_process(&txn_limbo, &req) != 0)
+					diag_raise();
 			} else if (iproto_type_is_raft_request(row.type)) {
 				struct raft_request req;
 				if (xrow_decode_raft(&row, &req, NULL) != 0)
@@ -857,7 +858,7 @@ apply_synchro_row_cb(struct journal_entry *entry)
 		applier_rollback_by_wal_io(entry->res);
 	} else {
 		replica_txn_wal_write_cb(synchro_entry->rcb);
-		txn_limbo_process(&txn_limbo, synchro_entry->req);
+		txn_limbo_process_run(&txn_limbo, synchro_entry->req);
 		trigger_run(&replicaset.applier.on_wal_write, NULL);
 	}
 	fiber_wakeup(synchro_entry->owner);
@@ -873,6 +874,9 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 	if (xrow_decode_synchro(row, &req) != 0)
 		goto err;
 
+	if (txn_limbo_process_begin(&txn_limbo, &req) != 0)
+		goto err;
+
 	struct replica_cb_data rcb_data;
 	struct synchro_entry entry;
 	/*
@@ -910,12 +914,16 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 	 * transactions side, including the async ones.
 	 */
 	if (journal_write(&entry.base) != 0)
-		goto err;
+		goto err_rollback;
 	if (entry.base.res < 0) {
 		diag_set_journal_res(entry.base.res);
-		goto err;
+		goto err_rollback;
 	}
+	txn_limbo_process_commit(&txn_limbo);
 	return 0;
+
+err_rollback:
+	txn_limbo_process_rollback(&txn_limbo);
 err:
 	diag_log();
 	return -1;
diff --git a/src/box/box.cc b/src/box/box.cc
index 7b11d56d6..19e67b186 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -424,8 +424,7 @@ wal_stream_apply_synchro_row(struct wal_stream *stream, struct xrow_header *row)
 		say_error("couldn't decode a synchro request");
 		return -1;
 	}
-	txn_limbo_process(&txn_limbo, &syn_req);
-	return 0;
+	return txn_limbo_process(&txn_limbo, &syn_req);
 }
 
 static int
@@ -1670,48 +1669,43 @@ box_wait_limbo_acked(double timeout)
 	return wait_lsn;
 }
 
-/** Write and process a PROMOTE request. */
-static void
-box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
+/** Write and process PROMOTE or DEMOTE request. */
+static int
+box_issue_synchro(uint16_t type, uint32_t prev_leader_id, int64_t promote_lsn)
 {
 	struct raft *raft = box_raft();
+
+	assert(type == IPROTO_RAFT_PROMOTE ||
+	       type == IPROTO_RAFT_DEMOTE);
 	assert(raft->volatile_term == raft->term);
 	assert(promote_lsn >= 0);
-	txn_limbo_write_promote(&txn_limbo, promote_lsn,
-				raft->term);
+
 	struct synchro_request req = {
-		.type = IPROTO_RAFT_PROMOTE,
-		.replica_id = prev_leader_id,
-		.origin_id = instance_id,
-		.lsn = promote_lsn,
-		.term = raft->term,
+		.type		= type,
+		.replica_id	= prev_leader_id,
+		.origin_id	= instance_id,
+		.lsn		= promote_lsn,
+		.term		= raft->term,
 	};
-	txn_limbo_process(&txn_limbo, &req);
+
+	if (txn_limbo_process_begin(&txn_limbo, &req) != 0)
+		return -1;
+
+	if (type == IPROTO_RAFT_PROMOTE)
+		txn_limbo_write_promote(&txn_limbo, req.lsn, req.term);
+	else
+		txn_limbo_write_demote(&txn_limbo, req.lsn, req.term);
+
+	txn_limbo_process_run(&txn_limbo, &req);
 	assert(txn_limbo_is_empty(&txn_limbo));
+
+	txn_limbo_process_commit(&txn_limbo);
+	return 0;
 }
 
 /** A guard to block multiple simultaneous promote()/demote() invocations. */
 static bool is_in_box_promote = false;
 
-/** Write and process a DEMOTE request. */
-static void
-box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
-{
-	assert(box_raft()->volatile_term == box_raft()->term);
-	assert(promote_lsn >= 0);
-	txn_limbo_write_demote(&txn_limbo, promote_lsn,
-				box_raft()->term);
-	struct synchro_request req = {
-		.type = IPROTO_RAFT_DEMOTE,
-		.replica_id = prev_leader_id,
-		.origin_id = instance_id,
-		.lsn = promote_lsn,
-		.term = box_raft()->term,
-	};
-	txn_limbo_process(&txn_limbo, &req);
-	assert(txn_limbo_is_empty(&txn_limbo));
-}
-
 int
 box_promote_qsync(void)
 {
@@ -1732,8 +1726,8 @@ box_promote_qsync(void)
 		diag_set(ClientError, ER_NOT_LEADER, raft->leader);
 		return -1;
 	}
-	box_issue_promote(txn_limbo.owner_id, wait_lsn);
-	return 0;
+	return box_issue_synchro(IPROTO_RAFT_PROMOTE,
+				 txn_limbo.owner_id, wait_lsn);
 }
 
 int
@@ -1789,9 +1783,8 @@ box_promote(void)
 	if (wait_lsn < 0)
 		return -1;
 
-	box_issue_promote(txn_limbo.owner_id, wait_lsn);
-
-	return 0;
+	return box_issue_synchro(IPROTO_RAFT_PROMOTE,
+				 txn_limbo.owner_id, wait_lsn);
 }
 
 int
@@ -1826,8 +1819,8 @@ box_demote(void)
 	int64_t wait_lsn = box_wait_limbo_acked(replication_synchro_timeout);
 	if (wait_lsn < 0)
 		return -1;
-	box_issue_demote(txn_limbo.owner_id, wait_lsn);
-	return 0;
+	return box_issue_synchro(IPROTO_RAFT_DEMOTE,
+				 txn_limbo.owner_id, wait_lsn);
 }
 
 int
diff --git a/src/box/memtx_engine.cc b/src/box/memtx_engine.cc
index de918c335..09f1d671c 100644
--- a/src/box/memtx_engine.cc
+++ b/src/box/memtx_engine.cc
@@ -238,8 +238,7 @@ memtx_engine_recover_synchro(const struct xrow_header *row)
 	 * because all its rows have a zero replica_id.
 	 */
 	req.origin_id = req.replica_id;
-	txn_limbo_process(&txn_limbo, &req);
-	return 0;
+	return txn_limbo_process(&txn_limbo, &req);
 }
 
 static int
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 70447caaf..eb9aa7780 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -47,6 +47,7 @@ txn_limbo_create(struct txn_limbo *limbo)
 	vclock_create(&limbo->vclock);
 	vclock_create(&limbo->promote_term_map);
 	limbo->promote_greatest_term = 0;
+	latch_create(&limbo->promote_latch);
 	limbo->confirmed_lsn = 0;
 	limbo->rollback_count = 0;
 	limbo->is_in_rollback = false;
@@ -724,11 +725,14 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout)
 }
 
 void
-txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
+txn_limbo_process_run(struct txn_limbo *limbo,
+		      const struct synchro_request *req)
 {
+	assert(latch_is_locked(&limbo->promote_latch));
+
 	uint64_t term = req->term;
 	uint32_t origin = req->origin_id;
-	if (txn_limbo_replica_term(limbo, origin) < term) {
+	if (vclock_get(&limbo->promote_term_map, origin) < (int64_t)term) {
 		vclock_follow(&limbo->promote_term_map, origin, term);
 		if (term > limbo->promote_greatest_term)
 			limbo->promote_greatest_term = term;
@@ -786,6 +790,32 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 	return;
 }
 
+int
+txn_limbo_process_begin(struct txn_limbo *limbo,
+			const struct synchro_request *req)
+{
+	latch_lock(&limbo->promote_latch);
+	/*
+	 * FIXME: For now we take a lock only but idea
+	 * is to verify incoming requests to detect split
+	 * brain situation. Thus we need to change the code
+	 * semantics in advance.
+	 */
+	(void)req;
+	return 0;
+}
+
+int
+txn_limbo_process(struct txn_limbo *limbo,
+		  const struct synchro_request *req)
+{
+	if (txn_limbo_process_begin(limbo, req) != 0)
+		return -1;
+	txn_limbo_process_run(limbo, req);
+	txn_limbo_process_commit(limbo);
+	return 0;
+}
+
 void
 txn_limbo_on_parameters_change(struct txn_limbo *limbo)
 {
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 53e52f676..afe9b429f 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -31,6 +31,7 @@
  */
 #include "small/rlist.h"
 #include "vclock/vclock.h"
+#include "latch.h"
 
 #include <stdint.h>
 
@@ -147,6 +148,10 @@ struct txn_limbo {
 	 * limbo and raft are in sync and the terms are the same.
 	 */
 	uint64_t promote_greatest_term;
+	/**
+	 * To order access to the promote data.
+	 */
+	struct latch promote_latch;
 	/**
 	 * Maximal LSN gathered quorum and either already confirmed in WAL, or
 	 * whose confirmation is in progress right now. Any attempt to confirm
@@ -216,9 +221,12 @@ txn_limbo_last_entry(struct txn_limbo *limbo)
  * @a replica_id.
  */
 static inline uint64_t
-txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t replica_id)
+txn_limbo_replica_term(struct txn_limbo *limbo, uint32_t replica_id)
 {
-	return vclock_get(&limbo->promote_term_map, replica_id);
+	latch_lock(&limbo->promote_latch);
+	uint64_t v = vclock_get(&limbo->promote_term_map, replica_id);
+	latch_unlock(&limbo->promote_latch);
+	return v;
 }
 
 /**
@@ -226,11 +234,14 @@ txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t replica_id)
  * data from it. The check is only valid when elections are enabled.
  */
 static inline bool
-txn_limbo_is_replica_outdated(const struct txn_limbo *limbo,
+txn_limbo_is_replica_outdated(struct txn_limbo *limbo,
 			      uint32_t replica_id)
 {
-	return txn_limbo_replica_term(limbo, replica_id) <
-	       limbo->promote_greatest_term;
+	latch_lock(&limbo->promote_latch);
+	uint64_t v = vclock_get(&limbo->promote_term_map, replica_id);
+	bool res = v < limbo->promote_greatest_term;
+	latch_unlock(&limbo->promote_latch);
+	return res;
 }
 
 /**
@@ -300,8 +311,36 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn);
 int
 txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 
+/**
+ * Initiate execution of a synchronous replication request.
+ */
+int
+txn_limbo_process_begin(struct txn_limbo *limbo,
+			const struct synchro_request *req);
+
+/** Commit a synchronous replication request. */
+static inline void
+txn_limbo_process_commit(struct txn_limbo *limbo)
+{
+	assert(latch_is_locked(&limbo->promote_latch));
+	latch_unlock(&limbo->promote_latch);
+}
+
+/** Rollback a synchronous replication request. */
+static inline void
+txn_limbo_process_rollback(struct txn_limbo *limbo)
+{
+	assert(latch_is_locked(&limbo->promote_latch));
+	latch_unlock(&limbo->promote_latch);
+}
+
 /** Execute a synchronous replication request. */
 void
+txn_limbo_process_run(struct txn_limbo *limbo,
+		      const struct synchro_request *req);
+
+/** Process a synchronous replication request. */
+int
 txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
 
 /**
-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Tarantool-patches] [PATCH v17 3/5] qsync: track confirmed_lsn upon CONFIRM packet
  2021-09-22 13:05 [Tarantool-patches] [PATCH v17 0/5] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 1/5] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 2/5] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
@ 2021-09-22 13:05 ` Cyrill Gorcunov via Tarantool-patches
  2021-09-26 14:29   ` Vladislav Shpilevoy via Tarantool-patches
  2021-09-27  7:05   ` Serge Petrenko via Tarantool-patches
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 4/5] qsync: export more details on promote tracking Cyrill Gorcunov via Tarantool-patches
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 5/5] test: add gh-6036-term-order test Cyrill Gorcunov via Tarantool-patches
  4 siblings, 2 replies; 16+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-09-22 13:05 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

While been investigating various cluster split-brain scenarios and
trying to gather valid incoming synchro request domains and ranges
we've discovered that limbo::confirmed_lsn updated not dense enough
to cover our needs.

In particular this variable is always updated by a limbo owner upon
write of syncro entry (to a journal) while replica just reads such
record without confirmed_lsn update, so when the replica become a cluster
leader it sends a promote request back to the former leader where the
packet carries zero LSN instead of previous confirmed_lsn and validation
of such packet won't pass.

Note the packet validation is not yet implemented in this patch so it
is rather a preparatory work for future.

Part-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/txn_limbo.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index eb9aa7780..959811309 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -774,6 +774,20 @@ txn_limbo_process_run(struct txn_limbo *limbo,
 	switch (req->type) {
 	case IPROTO_RAFT_CONFIRM:
 		txn_limbo_read_confirm(limbo, lsn);
+		/*
+		 * We have to adjust confirmed_lsn according
+		 * to LSN coming from the request. It is because
+		 * we will need to report it as old's limbo owner
+		 * LSN inside PROMOTE requests (if administrator
+		 * or election engine will make us so).
+		 *
+		 * We could update confirmed_lsn on every
+		 * txn_limbo_read_confirm call but this function
+		 * is usually called in a couple with
+		 * txn_limbo_write_confirm, thus to eliminate redundant
+		 * variables update we make so once but explicitly.
+		 */
+		limbo->confirmed_lsn = req->lsn;
 		break;
 	case IPROTO_RAFT_ROLLBACK:
 		txn_limbo_read_rollback(limbo, lsn);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Tarantool-patches] [PATCH v17 4/5] qsync: export more details on promote tracking
  2021-09-22 13:05 [Tarantool-patches] [PATCH v17 0/5] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
                   ` (2 preceding siblings ...)
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 3/5] qsync: track confirmed_lsn upon CONFIRM packet Cyrill Gorcunov via Tarantool-patches
@ 2021-09-22 13:05 ` Cyrill Gorcunov via Tarantool-patches
  2021-09-27  7:00   ` Serge Petrenko via Tarantool-patches
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 5/5] test: add gh-6036-term-order test Cyrill Gorcunov via Tarantool-patches
  4 siblings, 1 reply; 16+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-09-22 13:05 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

The patch introduces `promote` leaf to `box.info.synchro` table.

 | tarantool> box.info.synchro
 | ---
 | - queue:
 |     len: 0
 |     owner: 1
 |   quorum: 1
 |   promote:
 |     term_max: 4
 |     term_map: {1: 4}
 | ...

An idea is to be able to track changes of seen requests. Since it is
internal implementation details I prefer to not document it. Actually
better to mark is as non-API somehow.

Part-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/lua/info.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index 040af306a..144fba12d 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -607,7 +607,7 @@ lbox_info_election(struct lua_State *L)
 static int
 lbox_info_synchro(struct lua_State *L)
 {
-	lua_createtable(L, 0, 2);
+	lua_createtable(L, 0, 3);
 
 	/* Quorum value may be evaluated via formula */
 	lua_pushinteger(L, replication_synchro_quorum);
@@ -622,6 +622,15 @@ lbox_info_synchro(struct lua_State *L)
 	lua_setfield(L, -2, "owner");
 	lua_setfield(L, -2, "queue");
 
+	/* Promote related info. Suitable for debug. */
+	lua_createtable(L, 0, 2);
+	lua_pushnumber(L, queue->promote_greatest_term);
+	lua_setfield(L, -2, "term_max");
+	lua_pushstring(L, "term_map");
+	lbox_pushvclock(L, &queue->promote_term_map);
+	lua_settable(L, -3);
+	lua_setfield(L, -2, "promote");
+
 	return 1;
 }
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Tarantool-patches] [PATCH v17 5/5] test: add gh-6036-term-order test
  2021-09-22 13:05 [Tarantool-patches] [PATCH v17 0/5] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
                   ` (3 preceding siblings ...)
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 4/5] qsync: export more details on promote tracking Cyrill Gorcunov via Tarantool-patches
@ 2021-09-22 13:05 ` Cyrill Gorcunov via Tarantool-patches
  2021-09-26 14:30   ` Vladislav Shpilevoy via Tarantool-patches
  2021-09-27  7:13   ` Serge Petrenko via Tarantool-patches
  4 siblings, 2 replies; 16+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-09-22 13:05 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

To test that promotion requests are handled only when appropriate
write to WAL completes, because we update memory data before the
write finishes.

Part-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 test/replication/gh-6036-order-master.lua    |   1 +
 test/replication/gh-6036-order-node.lua      |  60 ++++++
 test/replication/gh-6036-order-replica1.lua  |   1 +
 test/replication/gh-6036-order-replica2.lua  |   1 +
 test/replication/gh-6036-term-order.result   | 216 +++++++++++++++++++
 test/replication/gh-6036-term-order.test.lua |  94 ++++++++
 test/replication/suite.cfg                   |   1 +
 test/replication/suite.ini                   |   2 +-
 8 files changed, 375 insertions(+), 1 deletion(-)
 create mode 120000 test/replication/gh-6036-order-master.lua
 create mode 100644 test/replication/gh-6036-order-node.lua
 create mode 120000 test/replication/gh-6036-order-replica1.lua
 create mode 120000 test/replication/gh-6036-order-replica2.lua
 create mode 100644 test/replication/gh-6036-term-order.result
 create mode 100644 test/replication/gh-6036-term-order.test.lua

diff --git a/test/replication/gh-6036-order-master.lua b/test/replication/gh-6036-order-master.lua
new file mode 120000
index 000000000..82a6073a1
--- /dev/null
+++ b/test/replication/gh-6036-order-master.lua
@@ -0,0 +1 @@
+gh-6036-order-node.lua
\ No newline at end of file
diff --git a/test/replication/gh-6036-order-node.lua b/test/replication/gh-6036-order-node.lua
new file mode 100644
index 000000000..b22a7cb4c
--- /dev/null
+++ b/test/replication/gh-6036-order-node.lua
@@ -0,0 +1,60 @@
+local INSTANCE_ID = string.match(arg[0], "gh%-6036%-order%-(.+)%.lua")
+
+local function unix_socket(name)
+    return "unix/:./" .. name .. '.sock';
+end
+
+require('console').listen(os.getenv('ADMIN'))
+
+if INSTANCE_ID == "master" then
+    box.cfg({
+        listen                      = unix_socket(INSTANCE_ID),
+        replication                 = {
+            unix_socket(INSTANCE_ID),
+            unix_socket("replica1"),
+            unix_socket("replica2"),
+        },
+        replication_connect_quorum  = 1,
+        replication_synchro_quorum  = 1,
+        replication_synchro_timeout = 10000,
+        replication_sync_timeout    = 5,
+        read_only                   = false,
+        election_mode               = "off",
+    })
+elseif INSTANCE_ID == "replica1" then
+    box.cfg({
+        listen                      = unix_socket(INSTANCE_ID),
+        replication                 = {
+            unix_socket("master"),
+            unix_socket(INSTANCE_ID),
+            unix_socket("replica2"),
+        },
+        replication_connect_quorum  = 1,
+        replication_synchro_quorum  = 1,
+        replication_synchro_timeout = 10000,
+        replication_sync_timeout    = 5,
+        read_only                   = false,
+        election_mode               = "off",
+    })
+else
+    assert(INSTANCE_ID == "replica2")
+    box.cfg({
+        listen                      = unix_socket(INSTANCE_ID),
+        replication                 = {
+            unix_socket("master"),
+            unix_socket("replica1"),
+            unix_socket(INSTANCE_ID),
+        },
+        replication_connect_quorum  = 1,
+        replication_synchro_quorum  = 1,
+        replication_synchro_timeout = 10000,
+        replication_sync_timeout    = 5,
+        read_only                   = true,
+        election_mode               = "off",
+    })
+end
+
+--box.ctl.wait_rw()
+box.once("bootstrap", function()
+    box.schema.user.grant('guest', 'super')
+end)
diff --git a/test/replication/gh-6036-order-replica1.lua b/test/replication/gh-6036-order-replica1.lua
new file mode 120000
index 000000000..82a6073a1
--- /dev/null
+++ b/test/replication/gh-6036-order-replica1.lua
@@ -0,0 +1 @@
+gh-6036-order-node.lua
\ No newline at end of file
diff --git a/test/replication/gh-6036-order-replica2.lua b/test/replication/gh-6036-order-replica2.lua
new file mode 120000
index 000000000..82a6073a1
--- /dev/null
+++ b/test/replication/gh-6036-order-replica2.lua
@@ -0,0 +1 @@
+gh-6036-order-node.lua
\ No newline at end of file
diff --git a/test/replication/gh-6036-term-order.result b/test/replication/gh-6036-term-order.result
new file mode 100644
index 000000000..6b19fc2c8
--- /dev/null
+++ b/test/replication/gh-6036-term-order.result
@@ -0,0 +1,216 @@
+-- test-run result file version 2
+--
+-- gh-6036: verify that terms are locked when we're inside journal
+-- write routine, because parallel appliers may ignore the fact that
+-- the term is updated already but not yet written leading to data
+-- inconsistency.
+--
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+test_run:cmd('create server master with script="replication/gh-6036-order-master.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server replica1 with script="replication/gh-6036-order-replica1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server replica2 with script="replication/gh-6036-order-replica2.lua"')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('start server master with wait=False')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica1 with wait=False')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica2 with wait=False')
+ | ---
+ | - true
+ | ...
+
+test_run:wait_fullmesh({"master", "replica1", "replica2"})
+ | ---
+ | ...
+
+test_run:switch("master")
+ | ---
+ | - true
+ | ...
+box.ctl.demote()
+ | ---
+ | ...
+
+test_run:switch("replica1")
+ | ---
+ | - true
+ | ...
+box.ctl.demote()
+ | ---
+ | ...
+
+test_run:switch("replica2")
+ | ---
+ | - true
+ | ...
+box.ctl.demote()
+ | ---
+ | ...
+
+--
+-- Drop connection between master and replica1.
+test_run:switch("master")
+ | ---
+ | - true
+ | ...
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./master.sock",              \
+        "unix/:./replica2.sock",            \
+    },                                      \
+})
+ | ---
+ | ...
+test_run:switch("replica1")
+ | ---
+ | - true
+ | ...
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./replica1.sock",            \
+        "unix/:./replica2.sock",            \
+    },                                      \
+})
+ | ---
+ | ...
+
+--
+-- Initiate disk delay and remember the max term seen so far.
+test_run:switch("replica2")
+ | ---
+ | - true
+ | ...
+box.error.injection.set('ERRINJ_WAL_DELAY', true)
+ | ---
+ | - ok
+ | ...
+term_max_replica2 = box.info.synchro.promote.term_max
+ | ---
+ | ...
+
+--
+-- Ping-pong the promote action between master and
+-- replica1 nodes, the term updates get queued on
+-- replica2 because of disk being disabled.
+test_run:switch("master")
+ | ---
+ | - true
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+box.ctl.demote()
+ | ---
+ | ...
+
+test_run:switch("replica1")
+ | ---
+ | - true
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+box.ctl.demote()
+ | ---
+ | ...
+
+test_run:switch("master")
+ | ---
+ | - true
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+
+--
+-- Since we're guarding max promote term make sure that
+-- 1) The max term has not yet been updated because WAL
+--    is in sleeping state.
+-- 2) Max term on master and replica1 nodes are greater
+--    than we have now because terms update is locked.
+-- 3) Once WAL is unlocked we make sure that terms has
+--    reached us.
+test_run:switch("replica2")
+ | ---
+ | - true
+ | ...
+assert(term_max_replica2 == box.info.synchro.promote.term_max)
+ | ---
+ | - true
+ | ...
+
+term_max_master = test_run:eval('master', 'box.info.synchro.promote.term_max')[1]
+ | ---
+ | ...
+term_max_replica1 = test_run:eval('replica1', 'box.info.synchro.promote.term_max')[1]
+ | ---
+ | ...
+assert(term_max_master > term_max_replica2)
+ | ---
+ | - true
+ | ...
+assert(term_max_replica1 > term_max_replica2)
+ | ---
+ | - true
+ | ...
+term_max_wait4 = term_max_master
+ | ---
+ | ...
+if term_max_wait4 < term_max_replica1 then term_max_wait4 = term_max_replica1 end
+ | ---
+ | ...
+
+box.error.injection.set('ERRINJ_WAL_DELAY', false)
+ | ---
+ | - ok
+ | ...
+test_run:wait_cond(function() return box.info.synchro.promote.term_max == term_max_wait4 end)
+ | ---
+ | - true
+ | ...
+
+test_run:switch("default")
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica2')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('delete server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica2')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-6036-term-order.test.lua b/test/replication/gh-6036-term-order.test.lua
new file mode 100644
index 000000000..01d73ac55
--- /dev/null
+++ b/test/replication/gh-6036-term-order.test.lua
@@ -0,0 +1,94 @@
+--
+-- gh-6036: verify that terms are locked when we're inside journal
+-- write routine, because parallel appliers may ignore the fact that
+-- the term is updated already but not yet written leading to data
+-- inconsistency.
+--
+test_run = require('test_run').new()
+
+test_run:cmd('create server master with script="replication/gh-6036-order-master.lua"')
+test_run:cmd('create server replica1 with script="replication/gh-6036-order-replica1.lua"')
+test_run:cmd('create server replica2 with script="replication/gh-6036-order-replica2.lua"')
+
+test_run:cmd('start server master with wait=False')
+test_run:cmd('start server replica1 with wait=False')
+test_run:cmd('start server replica2 with wait=False')
+
+test_run:wait_fullmesh({"master", "replica1", "replica2"})
+
+test_run:switch("master")
+box.ctl.demote()
+
+test_run:switch("replica1")
+box.ctl.demote()
+
+test_run:switch("replica2")
+box.ctl.demote()
+
+--
+-- Drop connection between master and replica1.
+test_run:switch("master")
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./master.sock",              \
+        "unix/:./replica2.sock",            \
+    },                                      \
+})
+test_run:switch("replica1")
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./replica1.sock",            \
+        "unix/:./replica2.sock",            \
+    },                                      \
+})
+
+--
+-- Initiate disk delay and remember the max term seen so far.
+test_run:switch("replica2")
+box.error.injection.set('ERRINJ_WAL_DELAY', true)
+term_max_replica2 = box.info.synchro.promote.term_max
+
+--
+-- Ping-pong the promote action between master and
+-- replica1 nodes, the term updates get queued on
+-- replica2 because of disk being disabled.
+test_run:switch("master")
+box.ctl.promote()
+box.ctl.demote()
+
+test_run:switch("replica1")
+box.ctl.promote()
+box.ctl.demote()
+
+test_run:switch("master")
+box.ctl.promote()
+
+--
+-- Since we're guarding max promote term make sure that
+-- 1) The max term has not yet been updated because WAL
+--    is in sleeping state.
+-- 2) Max term on master and replica1 nodes are greater
+--    than we have now because terms update is locked.
+-- 3) Once WAL is unlocked we make sure that terms has
+--    reached us.
+test_run:switch("replica2")
+assert(term_max_replica2 == box.info.synchro.promote.term_max)
+
+term_max_master = test_run:eval('master', 'box.info.synchro.promote.term_max')[1]
+term_max_replica1 = test_run:eval('replica1', 'box.info.synchro.promote.term_max')[1]
+assert(term_max_master > term_max_replica2)
+assert(term_max_replica1 > term_max_replica2)
+term_max_wait4 = term_max_master
+if term_max_wait4 < term_max_replica1 then term_max_wait4 = term_max_replica1 end
+
+box.error.injection.set('ERRINJ_WAL_DELAY', false)
+test_run:wait_cond(function() return box.info.synchro.promote.term_max == term_max_wait4 end)
+
+test_run:switch("default")
+test_run:cmd('stop server master')
+test_run:cmd('stop server replica1')
+test_run:cmd('stop server replica2')
+
+test_run:cmd('delete server master')
+test_run:cmd('delete server replica1')
+test_run:cmd('delete server replica2')
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 3eee0803c..ac2bedfd9 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -59,6 +59,7 @@
     "gh-6094-rs-uuid-mismatch.test.lua": {},
     "gh-6127-election-join-new.test.lua": {},
     "gh-6035-applier-filter.test.lua": {},
+    "gh-6036-term-order.test.lua": {},
     "election-candidate-promote.test.lua": {},
     "*": {
         "memtx": {"engine": "memtx"},
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index 77eb95f49..16840e01f 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -3,7 +3,7 @@ core = tarantool
 script =  master.lua
 description = tarantool/box, replication
 disabled = consistent.test.lua
-release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua  gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua
+release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua  gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua gh-6036-term-order.test.lua
 config = suite.cfg
 lua_libs = lua/fast_replica.lua lua/rlimit.lua
 use_unix_sockets = True
-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v17 2/5] qsync: order access to the limbo terms
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 2/5] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
@ 2021-09-26 14:29   ` Vladislav Shpilevoy via Tarantool-patches
  2021-09-27  7:42     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-09-26 14:29 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Hi! Thanks for the patch!

I consider this series as an independent patchset which fixes
the ordering, not split-brain. Like you said.

But now it is not independent. The main problem is that you
just blended in a few changes from the split-brain patches. I
point them out in my comments.

See 4 comments below.

>  src/box/applier.cc      | 16 +++++++---
>  src/box/box.cc          | 71 +++++++++++++++++++----------------------
>  src/box/memtx_engine.cc |  3 +-
>  src/box/txn_limbo.c     | 34 ++++++++++++++++++--
>  src/box/txn_limbo.h     | 49 +++++++++++++++++++++++++---
>  5 files changed, 121 insertions(+), 52 deletions(-)
> 
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index b981bd436..f0751b68a 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -458,7 +458,8 @@ applier_wait_snapshot(struct applier *applier)
>  				struct synchro_request req;
>  				if (xrow_decode_synchro(&row, &req) != 0)
>  					diag_raise();
> -				txn_limbo_process(&txn_limbo, &req);
> +				if (txn_limbo_process(&txn_limbo, &req) != 0)
> +					diag_raise();

1. How txn_limbo_process() can fail? You just fixed ordering. Essentially, added
a few yields in some places. You didn't add any validation, any new errors in
this patchset. Please, drop the empty 'return 0's from this patchset. They
can't be and are not tested here anyway.

Addition of a lock-unlock pair to txn_limbo_process didn't affect whether it
can fail. It just blocks the execution sometimes for a while.

>  			} else if (iproto_type_is_raft_request(row.type)) {
>  				struct raft_request req;
>  				if (xrow_decode_raft(&row, &req, NULL) != 0)
> @@ -857,7 +858,7 @@ apply_synchro_row_cb(struct journal_entry *entry)
>  		applier_rollback_by_wal_io(entry->res);
>  	} else {
>  		replica_txn_wal_write_cb(synchro_entry->rcb);
> -		txn_limbo_process(&txn_limbo, synchro_entry->req);
> +		txn_limbo_process_run(&txn_limbo, synchro_entry->req);

2. _run is usually used for infinite loops in fibers, or for handling a
sequence of something. Like trigger_run(). Here you handle a single
request. The only difference from process() is that the lock is taken.
I propose to rename it to _do or _ex or _in_tx or something.

> diff --git a/src/box/box.cc b/src/box/box.cc
> index 7b11d56d6..19e67b186 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1670,48 +1669,43 @@ box_wait_limbo_acked(double timeout)
>  	return wait_lsn;
>  }
>  
> -/** Write and process a PROMOTE request. */
> -static void
> -box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
> +/** Write and process PROMOTE or DEMOTE request. */
> +static int
> +box_issue_synchro(uint16_t type, uint32_t prev_leader_id, int64_t promote_lsn)
>  {
>  	struct raft *raft = box_raft();
> +
> +	assert(type == IPROTO_RAFT_PROMOTE ||
> +	       type == IPROTO_RAFT_DEMOTE);
>  	assert(raft->volatile_term == raft->term);
>  	assert(promote_lsn >= 0);
> -	txn_limbo_write_promote(&txn_limbo, promote_lsn,
> -				raft->term);
> +
>  	struct synchro_request req = {
> -		.type = IPROTO_RAFT_PROMOTE,
> -		.replica_id = prev_leader_id,
> -		.origin_id = instance_id,
> -		.lsn = promote_lsn,
> -		.term = raft->term,
> +		.type		= type,
> +		.replica_id	= prev_leader_id,
> +		.origin_id	= instance_id,
> +		.lsn		= promote_lsn,
> +		.term		= raft->term,
>  	};
> -	txn_limbo_process(&txn_limbo, &req);
> +
> +	if (txn_limbo_process_begin(&txn_limbo, &req) != 0)
> +		return -1;
> +
> +	if (type == IPROTO_RAFT_PROMOTE)
> +		txn_limbo_write_promote(&txn_limbo, req.lsn, req.term);
> +	else
> +		txn_limbo_write_demote(&txn_limbo, req.lsn, req.term);
> +
> +	txn_limbo_process_run(&txn_limbo, &req);
>  	assert(txn_limbo_is_empty(&txn_limbo));
> +
> +	txn_limbo_process_commit(&txn_limbo);
> +	return 0;
>  }
>  
>  /** A guard to block multiple simultaneous promote()/demote() invocations. */
>  static bool is_in_box_promote = false;
>  
> -/** Write and process a DEMOTE request. */
> -static void
> -box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
> -{
> -	assert(box_raft()->volatile_term == box_raft()->term);
> -	assert(promote_lsn >= 0);
> -	txn_limbo_write_demote(&txn_limbo, promote_lsn,
> -				box_raft()->term);
> -	struct synchro_request req = {
> -		.type = IPROTO_RAFT_DEMOTE,
> -		.replica_id = prev_leader_id,
> -		.origin_id = instance_id,
> -		.lsn = promote_lsn,
> -		.term = box_raft()->term,
> -	};
> -	txn_limbo_process(&txn_limbo, &req);
> -	assert(txn_limbo_is_empty(&txn_limbo));

3. Why did you merge these 2 functions? AFAIR, their split was
deliberate. To make each of them simpler to understand and maintain.

> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index 70447caaf..eb9aa7780 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -786,6 +790,32 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
>  	return;
>  }
>  
> +int
> +txn_limbo_process_begin(struct txn_limbo *limbo,
> +			const struct synchro_request *req)
> +{
> +	latch_lock(&limbo->promote_latch);
> +	/*
> +	 * FIXME: For now we take a lock only but idea
> +	 * is to verify incoming requests to detect split
> +	 * brain situation. Thus we need to change the code
> +	 * semantics in advance.
> +	 */
> +	(void)req;
> +	return 0;

4. Return value is a part of the split-brain patch, not of the
ordering patch. It is clearly seen from this patchset, because
this series never changes `return 0` to anything else.

I get that you want to merge something. Hence we are working on this
independent issue of reodering. But then lets make it truly
independent.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v17 3/5] qsync: track confirmed_lsn upon CONFIRM packet
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 3/5] qsync: track confirmed_lsn upon CONFIRM packet Cyrill Gorcunov via Tarantool-patches
@ 2021-09-26 14:29   ` Vladislav Shpilevoy via Tarantool-patches
  2021-09-27  7:05   ` Serge Petrenko via Tarantool-patches
  1 sibling, 0 replies; 16+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-09-26 14:29 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

How is this patch related to the ordering issue? Your test passes without it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v17 5/5] test: add gh-6036-term-order test
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 5/5] test: add gh-6036-term-order test Cyrill Gorcunov via Tarantool-patches
@ 2021-09-26 14:30   ` Vladislav Shpilevoy via Tarantool-patches
  2021-09-27 10:08     ` Cyrill Gorcunov via Tarantool-patches
  2021-09-27  7:13   ` Serge Petrenko via Tarantool-patches
  1 sibling, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-09-26 14:30 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Thanks for the patch!

See  6 comments below.

> diff --git a/test/replication/gh-6036-order-node.lua b/test/replication/gh-6036-order-node.lua
> new file mode 100644
> index 000000000..b22a7cb4c
> --- /dev/null
> +++ b/test/replication/gh-6036-order-node.lua
> @@ -0,0 +1,60 @@
> +local INSTANCE_ID = string.match(arg[0], "gh%-6036%-order%-(.+)%.lua")
> +
> +local function unix_socket(name)
> +    return "unix/:./" .. name .. '.sock';
> +end
> +
> +require('console').listen(os.getenv('ADMIN'))
> +
> +if INSTANCE_ID == "master" then
> +    box.cfg({
> +        listen                      = unix_socket(INSTANCE_ID),
> +        replication                 = {
> +            unix_socket(INSTANCE_ID),
> +            unix_socket("replica1"),
> +            unix_socket("replica2"),
> +        },
> +        replication_connect_quorum  = 1,
> +        replication_synchro_quorum  = 1,
> +        replication_synchro_timeout = 10000,
> +        replication_sync_timeout    = 5,

1. Why do you need sync_timeout 5?

> +        read_only                   = false,
> +        election_mode               = "off",
> +    })
> +elseif INSTANCE_ID == "replica1" then
> +    box.cfg({
> +        listen                      = unix_socket(INSTANCE_ID),
> +        replication                 = {
> +            unix_socket("master"),
> +            unix_socket(INSTANCE_ID),
> +            unix_socket("replica2"),
> +        },
> +        replication_connect_quorum  = 1,
> +        replication_synchro_quorum  = 1,
> +        replication_synchro_timeout = 10000,
> +        replication_sync_timeout    = 5,
> +        read_only                   = false,
> +        election_mode               = "off",
> +    })
> +else
> +    assert(INSTANCE_ID == "replica2")
> +    box.cfg({
> +        listen                      = unix_socket(INSTANCE_ID),
> +        replication                 = {
> +            unix_socket("master"),
> +            unix_socket("replica1"),
> +            unix_socket(INSTANCE_ID),
> +        },
> +        replication_connect_quorum  = 1,
> +        replication_synchro_quorum  = 1,
> +        replication_synchro_timeout = 10000,
> +        replication_sync_timeout    = 5,
> +        read_only                   = true,
> +        election_mode               = "off",
> +    })
> +end
> +
> +--box.ctl.wait_rw()

2. Please, remove commented out code.

> +box.once("bootstrap", function()
> +    box.schema.user.grant('guest', 'super')
> +end)
> diff --git a/test/replication/gh-6036-term-order.result b/test/replication/gh-6036-term-order.result
> new file mode 100644
> index 000000000..6b19fc2c8
> --- /dev/null
> +++ b/test/replication/gh-6036-term-order.result

3. Please, use prefix gh-####-qsync to be consistent with other qsync tests. Having
'qsync' in the test name helps to run all qsync tests in a single command

	python test-run.py qsync

> @@ -0,0 +1,216 @@
> +-- test-run result file version 2
> +--
> +-- gh-6036: verify that terms are locked when we're inside journal
> +-- write routine, because parallel appliers may ignore the fact that
> +-- the term is updated already but not yet written leading to data
> +-- inconsistency.
> +--
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +test_run:cmd('create server master with script="replication/gh-6036-order-master.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica1 with script="replication/gh-6036-order-replica1.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica2 with script="replication/gh-6036-order-replica2.lua"')
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('start server master with wait=False')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica1 with wait=False')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica2 with wait=False')
> + | ---
> + | - true
> + | ...
> +
> +test_run:wait_fullmesh({"master", "replica1", "replica2"})
> + | ---
> + | ...
> +
> +test_run:switch("master")
> + | ---
> + | - true
> + | ...
> +box.ctl.demote()
> + | ---
> + | ...
> +
> +test_run:switch("replica1")
> + | ---
> + | - true
> + | ...
> +box.ctl.demote()
> + | ---
> + | ...
> +
> +test_run:switch("replica2")
> + | ---
> + | - true
> + | ...
> +box.ctl.demote()

4. I dropped all 3 demotes and the test passed. Why do you need them?

> + | ---
> + | ...
> +
> +--
> +-- Drop connection between master and replica1.
> +test_run:switch("master")
> + | ---
> + | - true
> + | ...
> +box.cfg({                                   \
> +    replication = {                         \
> +        "unix/:./master.sock",              \
> +        "unix/:./replica2.sock",            \
> +    },                                      \
> +})
> + | ---
> + | ...
> +test_run:switch("replica1")
> + | ---
> + | - true
> + | ...
> +box.cfg({                                   \
> +    replication = {                         \
> +        "unix/:./replica1.sock",            \
> +        "unix/:./replica2.sock",            \
> +    },                                      \
> +})
> + | ---
> + | ...
> +
> +--
> +-- Initiate disk delay and remember the max term seen so far.
> +test_run:switch("replica2")
> + | ---
> + | - true
> + | ...
> +box.error.injection.set('ERRINJ_WAL_DELAY', true)
> + | ---
> + | - ok
> + | ...
> +term_max_replica2 = box.info.synchro.promote.term_max
> + | ---
> + | ...
> +
> +--
> +-- Ping-pong the promote action between master and
> +-- replica1 nodes, the term updates get queued on
> +-- replica2 because of disk being disabled.
> +test_run:switch("master")
> + | ---
> + | - true
> + | ...
> +box.ctl.promote()
> + | ---
> + | ...
> +box.ctl.demote()
> + | ---
> + | ...
> +
> +test_run:switch("replica1")
> + | ---
> + | - true
> + | ...
> +box.ctl.promote()
> + | ---
> + | ...
> +box.ctl.demote()
> + | ---
> + | ...
> +
> +test_run:switch("master")
> + | ---
> + | - true
> + | ...
> +box.ctl.promote()
> + | ---
> + | ...
> +
> +--
> +-- Since we're guarding max promote term make sure that
> +-- 1) The max term has not yet been updated because WAL
> +--    is in sleeping state.
> +-- 2) Max term on master and replica1 nodes are greater
> +--    than we have now because terms update is locked.
> +-- 3) Once WAL is unlocked we make sure that terms has
> +--    reached us.
> +test_run:switch("replica2")
> + | ---
> + | - true
> + | ...
> +assert(term_max_replica2 == box.info.synchro.promote.term_max)
> + | ---
> + | - true
> + | ...
> +
> +term_max_master = test_run:eval('master', 'box.info.synchro.promote.term_max')[1]
> + | ---
> + | ...
> +term_max_replica1 = test_run:eval('replica1', 'box.info.synchro.promote.term_max')[1]
> + | ---
> + | ...
> +assert(term_max_master > term_max_replica2)
> + | ---
> + | - true
> + | ...
> +assert(term_max_replica1 > term_max_replica2)
> + | ---
> + | - true
> + | ...
> +term_max_wait4 = term_max_master
> + | ---
> + | ...
> +if term_max_wait4 < term_max_replica1 then term_max_wait4 = term_max_replica1 end

5. How is it possible? The master did more promotes, it should have a
bigger term for sure.

> + | ---
> + | ...
> +
> +box.error.injection.set('ERRINJ_WAL_DELAY', false)
> + | ---
> + | - ok
> + | ...
> +test_run:wait_cond(function() return box.info.synchro.promote.term_max == term_max_wait4 end)
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch("default")
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica2')
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('delete server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica2')

6. Can you add some data to the test? Which before the patches was applied, and now
is rejected/nopified. Otherwise you added a lock, tested the lock, but if I move
the data filtering before the lock, your test still will pass.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v17 4/5] qsync: export more details on promote tracking
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 4/5] qsync: export more details on promote tracking Cyrill Gorcunov via Tarantool-patches
@ 2021-09-27  7:00   ` Serge Petrenko via Tarantool-patches
  2021-09-27  7:58     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 16+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-09-27  7:00 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy



22.09.2021 16:05, Cyrill Gorcunov пишет:
> The patch introduces `promote` leaf to `box.info.synchro` table.
>
>   | tarantool> box.info.synchro
>   | ---
>   | - queue:
>   |     len: 0
>   |     owner: 1
>   |   quorum: 1
>   |   promote:
>   |     term_max: 4
>   |     term_map: {1: 4}
>   | ...
>
> An idea is to be able to track changes of seen requests. Since it is
> internal implementation details I prefer to not document it. Actually
> better to mark is as non-API somehow.

I think this info might be useful, so maybe document it as well?

I'd call it `journal`, probably.
box.info.synchro.journal.term - what you call term_max
box.info.synchro.journal.term_map

>
> Part-of #6036
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
>   src/box/lua/info.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/src/box/lua/info.c b/src/box/lua/info.c
> index 040af306a..144fba12d 100644
> --- a/src/box/lua/info.c
> +++ b/src/box/lua/info.c
> @@ -607,7 +607,7 @@ lbox_info_election(struct lua_State *L)
>   static int
>   lbox_info_synchro(struct lua_State *L)
>   {
> -	lua_createtable(L, 0, 2);
> +	lua_createtable(L, 0, 3);
>   
>   	/* Quorum value may be evaluated via formula */
>   	lua_pushinteger(L, replication_synchro_quorum);
> @@ -622,6 +622,15 @@ lbox_info_synchro(struct lua_State *L)
>   	lua_setfield(L, -2, "owner");
>   	lua_setfield(L, -2, "queue");
>   
> +	/* Promote related info. Suitable for debug. */
> +	lua_createtable(L, 0, 2);
> +	lua_pushnumber(L, queue->promote_greatest_term);
> +	lua_setfield(L, -2, "term_max");
> +	lua_pushstring(L, "term_map");
> +	lbox_pushvclock(L, &queue->promote_term_map);
> +	lua_settable(L, -3);
> +	lua_setfield(L, -2, "promote");
> +
>   	return 1;
>   }
>   

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v17 3/5] qsync: track confirmed_lsn upon CONFIRM packet
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 3/5] qsync: track confirmed_lsn upon CONFIRM packet Cyrill Gorcunov via Tarantool-patches
  2021-09-26 14:29   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-09-27  7:05   ` Serge Petrenko via Tarantool-patches
  1 sibling, 0 replies; 16+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-09-27  7:05 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy



22.09.2021 16:05, Cyrill Gorcunov пишет:
> While been investigating various cluster split-brain scenarios and
> trying to gather valid incoming synchro request domains and ranges
> we've discovered that limbo::confirmed_lsn updated not dense enough
> to cover our needs.
>
> In particular this variable is always updated by a limbo owner upon
> write of syncro entry (to a journal) while replica just reads such
> record without confirmed_lsn update, so when the replica become a cluster
> leader it sends a promote request back to the former leader where the
> packet carries zero LSN instead of previous confirmed_lsn and validation
> of such packet won't pass.
>
> Note the packet validation is not yet implemented in this patch so it
> is rather a preparatory work for future.
>
> Part-of #6036
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
>   src/box/txn_limbo.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
>
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index eb9aa7780..959811309 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -774,6 +774,20 @@ txn_limbo_process_run(struct txn_limbo *limbo,
>   	switch (req->type) {
>   	case IPROTO_RAFT_CONFIRM:
>   		txn_limbo_read_confirm(limbo, lsn);
> +		/*
> +		 * We have to adjust confirmed_lsn according
> +		 * to LSN coming from the request. It is because
> +		 * we will need to report it as old's limbo owner
> +		 * LSN inside PROMOTE requests (if administrator
> +		 * or election engine will make us so).
> +		 *
> +		 * We could update confirmed_lsn on every
> +		 * txn_limbo_read_confirm call but this function
> +		 * is usually called in a couple with
> +		 * txn_limbo_write_confirm, thus to eliminate redundant
> +		 * variables update we make so once but explicitly.
> +		 */
> +		limbo->confirmed_lsn = req->lsn;
>   		break;
>   	case IPROTO_RAFT_ROLLBACK:
>   		txn_limbo_read_rollback(limbo, lsn);
LGTM.

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v17 5/5] test: add gh-6036-term-order test
  2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 5/5] test: add gh-6036-term-order test Cyrill Gorcunov via Tarantool-patches
  2021-09-26 14:30   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-09-27  7:13   ` Serge Petrenko via Tarantool-patches
  2021-09-27  7:33     ` Cyrill Gorcunov via Tarantool-patches
  1 sibling, 1 reply; 16+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-09-27  7:13 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy



22.09.2021 16:05, Cyrill Gorcunov пишет:
> To test that promotion requests are handled only when appropriate
> write to WAL completes, because we update memory data before the
> write finishes.
>
> Part-of #6036
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
>   test/replication/gh-6036-order-master.lua    |   1 +
>   test/replication/gh-6036-order-node.lua      |  60 ++++++
>   test/replication/gh-6036-order-replica1.lua  |   1 +
>   test/replication/gh-6036-order-replica2.lua  |   1 +
>   test/replication/gh-6036-term-order.result   | 216 +++++++++++++++++++
>   test/replication/gh-6036-term-order.test.lua |  94 ++++++++
>   test/replication/suite.cfg                   |   1 +
>   test/replication/suite.ini                   |   2 +-
>   8 files changed, 375 insertions(+), 1 deletion(-)
>   create mode 120000 test/replication/gh-6036-order-master.lua
>   create mode 100644 test/replication/gh-6036-order-node.lua
>   create mode 120000 test/replication/gh-6036-order-replica1.lua
>   create mode 120000 test/replication/gh-6036-order-replica2.lua
>   create mode 100644 test/replication/gh-6036-term-order.result
>   create mode 100644 test/replication/gh-6036-term-order.test.lua
>
> diff --git a/test/replication/gh-6036-order-master.lua b/test/replication/gh-6036-order-master.lua
> new file mode 120000
> index 000000000..82a6073a1
> --- /dev/null
> +++ b/test/replication/gh-6036-order-master.lua
> @@ -0,0 +1 @@
> +gh-6036-order-node.lua
> \ No newline at end of file
> diff --git a/test/replication/gh-6036-order-node.lua b/test/replication/gh-6036-order-node.lua
> new file mode 100644
> index 000000000..b22a7cb4c
> --- /dev/null
> +++ b/test/replication/gh-6036-order-node.lua
> @@ -0,0 +1,60 @@
> +local INSTANCE_ID = string.match(arg[0], "gh%-6036%-order%-(.+)%.lua")
> +
> +local function unix_socket(name)
> +    return "unix/:./" .. name .. '.sock';
> +end
> +
> +require('console').listen(os.getenv('ADMIN'))
> +
> +if INSTANCE_ID == "master" then
> +    box.cfg({
> +        listen                      = unix_socket(INSTANCE_ID),
> +        replication                 = {
> +            unix_socket(INSTANCE_ID),
> +            unix_socket("replica1"),
> +            unix_socket("replica2"),
> +        },
> +        replication_connect_quorum  = 1,
> +        replication_synchro_quorum  = 1,
> +        replication_synchro_timeout = 10000,
> +        replication_sync_timeout    = 5,
> +        read_only                   = false,
> +        election_mode               = "off",
> +    })
> +elseif INSTANCE_ID == "replica1" then
> +    box.cfg({
> +        listen                      = unix_socket(INSTANCE_ID),
> +        replication                 = {
> +            unix_socket("master"),
> +            unix_socket(INSTANCE_ID),
> +            unix_socket("replica2"),
> +        },
> +        replication_connect_quorum  = 1,
> +        replication_synchro_quorum  = 1,
> +        replication_synchro_timeout = 10000,
> +        replication_sync_timeout    = 5,
> +        read_only                   = false,
> +        election_mode               = "off",
> +    })
> +else
> +    assert(INSTANCE_ID == "replica2")
> +    box.cfg({
> +        listen                      = unix_socket(INSTANCE_ID),
> +        replication                 = {
> +            unix_socket("master"),
> +            unix_socket("replica1"),
> +            unix_socket(INSTANCE_ID),
> +        },
> +        replication_connect_quorum  = 1,
> +        replication_synchro_quorum  = 1,
> +        replication_synchro_timeout = 10000,
> +        replication_sync_timeout    = 5,
> +        read_only                   = true,
> +        election_mode               = "off",
> +    })
> +end

I think it would be simpler to do something like:

cfg_tbl = {every common parameter}

if INSTANCE_ID == 'replica2' then cfg_tbl.read_only = true

box.cfg{} calls for master and replica 1 are identical, replica2 only 
differs in read_only = true.

> +
> +--box.ctl.wait_rw()
> +box.once("bootstrap", function()
> +    box.schema.user.grant('guest', 'super')
> +end)
> diff --git a/test/replication/gh-6036-order-replica1.lua b/test/replication/gh-6036-order-replica1.lua
> new file mode 120000
> index 000000000..82a6073a1
> --- /dev/null
> +++ b/test/replication/gh-6036-order-replica1.lua
> @@ -0,0 +1 @@
> +gh-6036-order-node.lua
> \ No newline at end of file
> diff --git a/test/replication/gh-6036-order-replica2.lua b/test/replication/gh-6036-order-replica2.lua
> new file mode 120000
> index 000000000..82a6073a1
> --- /dev/null
> +++ b/test/replication/gh-6036-order-replica2.lua
> @@ -0,0 +1 @@
> +gh-6036-order-node.lua
> \ No newline at end of file
> diff --git a/test/replication/gh-6036-term-order.result b/test/replication/gh-6036-term-order.result
> new file mode 100644
> index 000000000..6b19fc2c8
> --- /dev/null
> +++ b/test/replication/gh-6036-term-order.result
> @@ -0,0 +1,216 @@
> +-- test-run result file version 2
> +--
> +-- gh-6036: verify that terms are locked when we're inside journal
> +-- write routine, because parallel appliers may ignore the fact that
> +-- the term is updated already but not yet written leading to data
> +-- inconsistency.
> +--
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +test_run:cmd('create server master with script="replication/gh-6036-order-master.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica1 with script="replication/gh-6036-order-replica1.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica2 with script="replication/gh-6036-order-replica2.lua"')
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('start server master with wait=False')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica1 with wait=False')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica2 with wait=False')
> + | ---
> + | - true
> + | ...
> +
> +test_run:wait_fullmesh({"master", "replica1", "replica2"})
> + | ---
> + | ...
> +
> +test_run:switch("master")
> + | ---
> + | - true
> + | ...
> +box.ctl.demote()
> + | ---
> + | ...
> +
> +test_run:switch("replica1")
> + | ---
> + | - true
> + | ...
> +box.ctl.demote()
> + | ---
> + | ...
> +
> +test_run:switch("replica2")
> + | ---
> + | - true
> + | ...
> +box.ctl.demote()
> + | ---
> + | ...
> +
> +--
> +-- Drop connection between master and replica1.
> +test_run:switch("master")
> + | ---
> + | - true
> + | ...
> +box.cfg({                                   \
> +    replication = {                         \
> +        "unix/:./master.sock",              \
> +        "unix/:./replica2.sock",            \
> +    },                                      \
> +})
> + | ---
> + | ...
> +test_run:switch("replica1")
> + | ---
> + | - true
> + | ...
> +box.cfg({                                   \
> +    replication = {                         \
> +        "unix/:./replica1.sock",            \
> +        "unix/:./replica2.sock",            \
> +    },                                      \
> +})
> + | ---
> + | ...
> +
> +--
> +-- Initiate disk delay and remember the max term seen so far.
> +test_run:switch("replica2")
> + | ---
> + | - true
> + | ...
> +box.error.injection.set('ERRINJ_WAL_DELAY', true)
> + | ---
> + | - ok
> + | ...
> +term_max_replica2 = box.info.synchro.promote.term_max
> + | ---
> + | ...
> +
> +--
> +-- Ping-pong the promote action between master and
> +-- replica1 nodes, the term updates get queued on
> +-- replica2 because of disk being disabled.
> +test_run:switch("master")
> + | ---
> + | - true
> + | ...
> +box.ctl.promote()
> + | ---
> + | ...
> +box.ctl.demote()
> + | ---
> + | ...
> +
> +test_run:switch("replica1")
> + | ---
> + | - true
> + | ...
> +box.ctl.promote()
> + | ---
> + | ...
> +box.ctl.demote()
> + | ---
> + | ...
> +
> +test_run:switch("master")
> + | ---
> + | - true
> + | ...
> +box.ctl.promote()
> + | ---
> + | ...
> +
> +--
> +-- Since we're guarding max promote term make sure that
> +-- 1) The max term has not yet been updated because WAL
> +--    is in sleeping state.
> +-- 2) Max term on master and replica1 nodes are greater
> +--    than we have now because terms update is locked.
> +-- 3) Once WAL is unlocked we make sure that terms has
> +--    reached us.
> +test_run:switch("replica2")
> + | ---
> + | - true
> + | ...
> +assert(term_max_replica2 == box.info.synchro.promote.term_max)
> + | ---
> + | - true
> + | ...
> +
> +term_max_master = test_run:eval('master', 'box.info.synchro.promote.term_max')[1]
> + | ---
> + | ...
> +term_max_replica1 = test_run:eval('replica1', 'box.info.synchro.promote.term_max')[1]
> + | ---
> + | ...
> +assert(term_max_master > term_max_replica2)
> + | ---
> + | - true
> + | ...
> +assert(term_max_replica1 > term_max_replica2)
> + | ---
> + | - true
> + | ...
> +term_max_wait4 = term_max_master
> + | ---
> + | ...
> +if term_max_wait4 < term_max_replica1 then term_max_wait4 = term_max_replica1 end
> + | ---
> + | ...
> +
> +box.error.injection.set('ERRINJ_WAL_DELAY', false)
> + | ---
> + | - ok
> + | ...
> +test_run:wait_cond(function() return box.info.synchro.promote.term_max == term_max_wait4 end)
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch("default")
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica2')
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('delete server master')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica2')
> + | ---
> + | - true
> + | ...
> diff --git a/test/replication/gh-6036-term-order.test.lua b/test/replication/gh-6036-term-order.test.lua
> new file mode 100644
> index 000000000..01d73ac55
> --- /dev/null
> +++ b/test/replication/gh-6036-term-order.test.lua
> @@ -0,0 +1,94 @@
> +--
> +-- gh-6036: verify that terms are locked when we're inside journal
> +-- write routine, because parallel appliers may ignore the fact that
> +-- the term is updated already but not yet written leading to data
> +-- inconsistency.
> +--
> +test_run = require('test_run').new()
> +
> +test_run:cmd('create server master with script="replication/gh-6036-order-master.lua"')
> +test_run:cmd('create server replica1 with script="replication/gh-6036-order-replica1.lua"')
> +test_run:cmd('create server replica2 with script="replication/gh-6036-order-replica2.lua"')
> +
> +test_run:cmd('start server master with wait=False')
> +test_run:cmd('start server replica1 with wait=False')
> +test_run:cmd('start server replica2 with wait=False')
> +
> +test_run:wait_fullmesh({"master", "replica1", "replica2"})
> +
> +test_run:switch("master")
> +box.ctl.demote()
> +
> +test_run:switch("replica1")
> +box.ctl.demote()
> +
> +test_run:switch("replica2")
> +box.ctl.demote()
> +
> +--
> +-- Drop connection between master and replica1.
> +test_run:switch("master")
> +box.cfg({                                   \
> +    replication = {                         \
> +        "unix/:./master.sock",              \
> +        "unix/:./replica2.sock",            \
> +    },                                      \
> +})
> +test_run:switch("replica1")
> +box.cfg({                                   \
> +    replication = {                         \
> +        "unix/:./replica1.sock",            \
> +        "unix/:./replica2.sock",            \
> +    },                                      \
> +})
> +
> +--
> +-- Initiate disk delay and remember the max term seen so far.
> +test_run:switch("replica2")
> +box.error.injection.set('ERRINJ_WAL_DELAY', true)
> +term_max_replica2 = box.info.synchro.promote.term_max
> +
> +--
> +-- Ping-pong the promote action between master and
> +-- replica1 nodes, the term updates get queued on
> +-- replica2 because of disk being disabled.
> +test_run:switch("master")
> +box.ctl.promote()
> +box.ctl.demote()
> +
> +test_run:switch("replica1")
> +box.ctl.promote()
> +box.ctl.demote()
> +
> +test_run:switch("master")
> +box.ctl.promote()
> +
> +--
> +-- Since we're guarding max promote term make sure that
> +-- 1) The max term has not yet been updated because WAL
> +--    is in sleeping state.
> +-- 2) Max term on master and replica1 nodes are greater
> +--    than we have now because terms update is locked.
> +-- 3) Once WAL is unlocked we make sure that terms has
> +--    reached us.
> +test_run:switch("replica2")
> +assert(term_max_replica2 == box.info.synchro.promote.term_max)
> +
> +term_max_master = test_run:eval('master', 'box.info.synchro.promote.term_max')[1]
> +term_max_replica1 = test_run:eval('replica1', 'box.info.synchro.promote.term_max')[1]
> +assert(term_max_master > term_max_replica2)
> +assert(term_max_replica1 > term_max_replica2)
> +term_max_wait4 = term_max_master
> +if term_max_wait4 < term_max_replica1 then term_max_wait4 = term_max_replica1 end
> +
> +box.error.injection.set('ERRINJ_WAL_DELAY', false)
> +test_run:wait_cond(function() return box.info.synchro.promote.term_max == term_max_wait4 end)
> +
> +test_run:switch("default")
> +test_run:cmd('stop server master')
> +test_run:cmd('stop server replica1')
> +test_run:cmd('stop server replica2')
> +
> +test_run:cmd('delete server master')
> +test_run:cmd('delete server replica1')
> +test_run:cmd('delete server replica2')
> diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
> index 3eee0803c..ac2bedfd9 100644
> --- a/test/replication/suite.cfg
> +++ b/test/replication/suite.cfg
> @@ -59,6 +59,7 @@
>       "gh-6094-rs-uuid-mismatch.test.lua": {},
>       "gh-6127-election-join-new.test.lua": {},
>       "gh-6035-applier-filter.test.lua": {},
> +    "gh-6036-term-order.test.lua": {},
>       "election-candidate-promote.test.lua": {},
>       "*": {
>           "memtx": {"engine": "memtx"},
> diff --git a/test/replication/suite.ini b/test/replication/suite.ini
> index 77eb95f49..16840e01f 100644
> --- a/test/replication/suite.ini
> +++ b/test/replication/suite.ini
> @@ -3,7 +3,7 @@ core = tarantool
>   script =  master.lua
>   description = tarantool/box, replication
>   disabled = consistent.test.lua
> -release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua  gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua
> +release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua  gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua gh-6036-term-order.test.lua
>   config = suite.cfg
>   lua_libs = lua/fast_replica.lua lua/rlimit.lua
>   use_unix_sockets = True

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v17 5/5] test: add gh-6036-term-order test
  2021-09-27  7:13   ` Serge Petrenko via Tarantool-patches
@ 2021-09-27  7:33     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 16+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-09-27  7:33 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy

On Mon, Sep 27, 2021 at 10:13:32AM +0300, Serge Petrenko wrote:
> 
> I think it would be simpler to do something like:
> 
> cfg_tbl = {every common parameter}
> 
> if INSTANCE_ID == 'replica2' then cfg_tbl.read_only = true
> 
> box.cfg{} calls for master and replica 1 are identical, replica2 only
> differs in read_only = true.

Good point! Will do, thanks!

	Cyrill

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v17 2/5] qsync: order access to the limbo terms
  2021-09-26 14:29   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-09-27  7:42     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 16+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-09-27  7:42 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Sun, Sep 26, 2021 at 04:29:13PM +0200, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!

Hi! Thanks for review!

> I consider this series as an independent patchset which fixes
> the ordering, not split-brain. Like you said.
> 
> But now it is not independent. The main problem is that you
> just blended in a few changes from the split-brain patches. I
> point them out in my comments.

It can't be completely separate and self consisting because of the
locks we introduce to order operations. In split brain series we
agreed that locking should be hidden inside txn_process_begin()
which can fail. In result plain txn_process() call become a plain
wrapper over begin/commit|rollback and since we're changing architecture
i think altering semantics immediately will make next patches less
intrusive.

If you prefer this way I can make it so but I think this is not
that good idea, though I don't have a strong preference here.

> > 
> > diff --git a/src/box/applier.cc b/src/box/applier.cc
> > index b981bd436..f0751b68a 100644
> > --- a/src/box/applier.cc
> > +++ b/src/box/applier.cc
> > @@ -458,7 +458,8 @@ applier_wait_snapshot(struct applier *applier)
> >  				struct synchro_request req;
> >  				if (xrow_decode_synchro(&row, &req) != 0)
> >  					diag_raise();
> > -				txn_limbo_process(&txn_limbo, &req);
> > +				if (txn_limbo_process(&txn_limbo, &req) != 0)
> > +					diag_raise();
> 
> 1. How txn_limbo_process() can fail? You just fixed ordering. Essentially, added
> a few yields in some places. You didn't add any validation, any new errors in
> this patchset. Please, drop the empty 'return 0's from this patchset. They
> can't be and are not tested here anyway.

It is not just order fixing but scaffolds for second series of patches on top.
Anyway, will do the way you're asking since you prefer.

> Addition of a lock-unlock pair to txn_limbo_process didn't affect whether it
> can fail. It just blocks the execution sometimes for a while.

I changed semantics early on a purpose, will rework to fit your request.

> >  			} else if (iproto_type_is_raft_request(row.type)) {
> >  				struct raft_request req;
> >  				if (xrow_decode_raft(&row, &req, NULL) != 0)
> > @@ -857,7 +858,7 @@ apply_synchro_row_cb(struct journal_entry *entry)
> >  		applier_rollback_by_wal_io(entry->res);
> >  	} else {
> >  		replica_txn_wal_write_cb(synchro_entry->rcb);
> > -		txn_limbo_process(&txn_limbo, synchro_entry->req);
> > +		txn_limbo_process_run(&txn_limbo, synchro_entry->req);
> 
> 2. _run is usually used for infinite loops in fibers, or for handling a
> sequence of something. Like trigger_run(). Here you handle a single
> request. The only difference from process() is that the lock is taken.
> I propose to rename it to _do or _ex or _in_tx or something.

OK

> 
> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index 7b11d56d6..19e67b186 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -1670,48 +1669,43 @@ box_wait_limbo_acked(double timeout)
> >  	return wait_lsn;
> >  }
> >  
> > -/** Write and process a PROMOTE request. */
> > -static void
> > -box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
> > +/** Write and process PROMOTE or DEMOTE request. */
> > +static int
> > +box_issue_synchro(uint16_t type, uint32_t prev_leader_id, int64_t promote_lsn)
> >  {
> >  	struct raft *raft = box_raft();
> > +
> > +	assert(type == IPROTO_RAFT_PROMOTE ||
> > +	       type == IPROTO_RAFT_DEMOTE);
> >  	assert(raft->volatile_term == raft->term);
> >  	assert(promote_lsn >= 0);
> > -	txn_limbo_write_promote(&txn_limbo, promote_lsn,
> > -				raft->term);
> > +
> >  	struct synchro_request req = {
> > -		.type = IPROTO_RAFT_PROMOTE,
> > -		.replica_id = prev_leader_id,
> > -		.origin_id = instance_id,
> > -		.lsn = promote_lsn,
> > -		.term = raft->term,
> > +		.type		= type,
> > +		.replica_id	= prev_leader_id,
> > +		.origin_id	= instance_id,
> > +		.lsn		= promote_lsn,
> > +		.term		= raft->term,
> >  	};
> > -	txn_limbo_process(&txn_limbo, &req);
> > +
> > +	if (txn_limbo_process_begin(&txn_limbo, &req) != 0)
> > +		return -1;
> > +
> > +	if (type == IPROTO_RAFT_PROMOTE)
> > +		txn_limbo_write_promote(&txn_limbo, req.lsn, req.term);
> > +	else
> > +		txn_limbo_write_demote(&txn_limbo, req.lsn, req.term);
> > +
> > +	txn_limbo_process_run(&txn_limbo, &req);
> >  	assert(txn_limbo_is_empty(&txn_limbo));
> > +
> > +	txn_limbo_process_commit(&txn_limbo);
> > +	return 0;
> >  }
> >  
> >  /** A guard to block multiple simultaneous promote()/demote() invocations. */
> >  static bool is_in_box_promote = false;
> >  
> > -/** Write and process a DEMOTE request. */
> > -static void
> > -box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
> > -{
> > -	assert(box_raft()->volatile_term == box_raft()->term);
> > -	assert(promote_lsn >= 0);
> > -	txn_limbo_write_demote(&txn_limbo, promote_lsn,
> > -				box_raft()->term);
> > -	struct synchro_request req = {
> > -		.type = IPROTO_RAFT_DEMOTE,
> > -		.replica_id = prev_leader_id,
> > -		.origin_id = instance_id,
> > -		.lsn = promote_lsn,
> > -		.term = box_raft()->term,
> > -	};
> > -	txn_limbo_process(&txn_limbo, &req);
> > -	assert(txn_limbo_is_empty(&txn_limbo));
> 
> 3. Why did you merge these 2 functions? AFAIR, their split was
> deliberate. To make each of them simpler to understand and maintain.

To eliminate code duplication. These two functions are _completely_
identical in terms of operations: they write promote/demote packet,
which in turn are the same except packet type. So I don't follow how
this is easier to understand and maintain: if functions are the same
better have one instance, no?

But since you're asking I'll move these two functions back, no problem.

> > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> > index 70447caaf..eb9aa7780 100644
> > --- a/src/box/txn_limbo.c
> > +++ b/src/box/txn_limbo.c
> > @@ -786,6 +790,32 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
> >  	return;
> >  }
> >  
> > +int
> > +txn_limbo_process_begin(struct txn_limbo *limbo,
> > +			const struct synchro_request *req)
> > +{
> > +	latch_lock(&limbo->promote_latch);
> > +	/*
> > +	 * FIXME: For now we take a lock only but idea
> > +	 * is to verify incoming requests to detect split
> > +	 * brain situation. Thus we need to change the code
> > +	 * semantics in advance.
> > +	 */
> > +	(void)req;
> > +	return 0;
> 
> 4. Return value is a part of the split-brain patch, not of the
> ordering patch. It is clearly seen from this patchset, because
> this series never changes `return 0` to anything else.
> 
> I get that you want to merge something. Hence we are working on this
> independent issue of reodering. But then lets make it truly
> independent.

OK. Thanks for comments!

	Cyrill

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v17 4/5] qsync: export more details on promote tracking
  2021-09-27  7:00   ` Serge Petrenko via Tarantool-patches
@ 2021-09-27  7:58     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 16+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-09-27  7:58 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy

On Mon, Sep 27, 2021 at 10:00:28AM +0300, Serge Petrenko wrote:
> 
> 
> 22.09.2021 16:05, Cyrill Gorcunov пишет:
> > The patch introduces `promote` leaf to `box.info.synchro` table.
> > 
> >   | tarantool> box.info.synchro
> >   | ---
> >   | - queue:
> >   |     len: 0
> >   |     owner: 1
> >   |   quorum: 1
> >   |   promote:
> >   |     term_max: 4
> >   |     term_map: {1: 4}
> >   | ...
> > 
> > An idea is to be able to track changes of seen requests. Since it is
> > internal implementation details I prefer to not document it. Actually
> > better to mark is as non-API somehow.
> 
> I think this info might be useful, so maybe document it as well?
> 
> I'd call it `journal`, probably.
> box.info.synchro.journal.term - what you call term_max
> box.info.synchro.journal.term_map

You know, I don't mind :) We need to choose "info" leaves and branches
names very carefully though, because it becomes API. So I guess we
could consider journal branch where we would gather not only synchro
related information but extend output for debug/stat sake in future?

box.info
  journal:
    synchro: (replication related info)
      term: 4
      term_map: {1: 4}
    queue: (flush queue related info)
      len: 12
    stat:
      wrote: 400 bytes

I don't have strong opinion which form is better, I'm fine with any,
just sharing an idea.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v17 5/5] test: add gh-6036-term-order test
  2021-09-26 14:30   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-09-27 10:08     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 16+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-09-27 10:08 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Sun, Sep 26, 2021 at 04:30:38PM +0200, Vladislav Shpilevoy wrote:
> > +if INSTANCE_ID == "master" then
> > +    box.cfg({
> > +        listen                      = unix_socket(INSTANCE_ID),
> > +        replication                 = {
> > +            unix_socket(INSTANCE_ID),
> > +            unix_socket("replica1"),
> > +            unix_socket("replica2"),
> > +        },
> > +        replication_connect_quorum  = 1,
> > +        replication_synchro_quorum  = 1,
> > +        replication_synchro_timeout = 10000,
> > +        replication_sync_timeout    = 5,
> 
> 1. Why do you need sync_timeout 5?

To make sure it has some sane short value, our default
300 seconds is too big I think.

> > +
> > +--box.ctl.wait_rw()
> 
> 2. Please, remove commented out code.

ok

> > +box.once("bootstrap", function()
> > +    box.schema.user.grant('guest', 'super')
> > +end)
> > diff --git a/test/replication/gh-6036-term-order.result b/test/replication/gh-6036-term-order.result
> > new file mode 100644
> > index 000000000..6b19fc2c8
> > --- /dev/null
> > +++ b/test/replication/gh-6036-term-order.result
> 
> 3. Please, use prefix gh-####-qsync to be consistent with other qsync tests. Having
> 'qsync' in the test name helps to run all qsync tests in a single command
> 
> 	python test-run.py qsync

sure, will do

> > +test_run:switch("replica2")
> > + | ---
> > + | - true
> > + | ...
> > +box.ctl.demote()
> 
> 4. I dropped all 3 demotes and the test passed. Why do you need them?

To make sure none of the fresh booted up nodes are owning the limbo even
if something is changed in future inside test-run engine (test engine is
not a stable API while our demote() operation is part of API and I can
be sure that I may don't care how exactly nodes has been started, they
all won't be owning the limbo after this command).

> > +term_max_wait4 = term_max_master
> > + | ---
> > + | ...
> > +if term_max_wait4 < term_max_replica1 then term_max_wait4 = term_max_replica1 end
> 
> 5. How is it possible? The master did more promotes, it should have a
> bigger term for sure.

IIRC I've hunting a race buggy testcase and left this snippet untouched. So this snippet
simply sneaked in, it is harmless, I'll cleanup, thanks!

> 
> > +test_run:cmd('delete server replica2')
> 
> 6. Can you add some data to the test? Which before the patches was applied, and now
> is rejected/nopified. Otherwise you added a lock, tested the lock, but if I move
> the data filtering before the lock, your test still will pass.

Will try, thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-09-27 10:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22 13:05 [Tarantool-patches] [PATCH v17 0/5] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 1/5] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 2/5] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
2021-09-26 14:29   ` Vladislav Shpilevoy via Tarantool-patches
2021-09-27  7:42     ` Cyrill Gorcunov via Tarantool-patches
2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 3/5] qsync: track confirmed_lsn upon CONFIRM packet Cyrill Gorcunov via Tarantool-patches
2021-09-26 14:29   ` Vladislav Shpilevoy via Tarantool-patches
2021-09-27  7:05   ` Serge Petrenko via Tarantool-patches
2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 4/5] qsync: export more details on promote tracking Cyrill Gorcunov via Tarantool-patches
2021-09-27  7:00   ` Serge Petrenko via Tarantool-patches
2021-09-27  7:58     ` Cyrill Gorcunov via Tarantool-patches
2021-09-22 13:05 ` [Tarantool-patches] [PATCH v17 5/5] test: add gh-6036-term-order test Cyrill Gorcunov via Tarantool-patches
2021-09-26 14:30   ` Vladislav Shpilevoy via Tarantool-patches
2021-09-27 10:08     ` Cyrill Gorcunov via Tarantool-patches
2021-09-27  7:13   ` Serge Petrenko via Tarantool-patches
2021-09-27  7:33     ` Cyrill Gorcunov via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox