[Tarantool-patches] [PATCH v9 0/5] limbo: implement packets filtering

Tarantool development patches archive
 help / color / mirror / Atom feed

* [Tarantool-patches] [PATCH v9 0/5] limbo: implement packets filtering
@ 2021-07-30 11:35 Cyrill Gorcunov via Tarantool-patches
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 1/5] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-07-30 11:35 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Replication tests are passing but gihutb tests are not: the series
is on top of Serge's branch `sp/gh-6034-empty-limbo-transition` which
is failing yet.

Vlad, I removed txn_limb_terms structure which made me to rework
the series, please take a look if you have some other comments
to address.

branch gorcunov/gh-6036-rollback-confirm-09-notest
issue https://github.com/tarantool/tarantool/issues/6036

v6:
 - use txn_limbo_terms name for structure
 - rebase on fresh sp/gh-6034-empty-limbo-transition branch
 - rework filtering chains
v8:
 - add ability to disable filtering for local recovery
   and join stages
 - update tests
v9:
 - opencode terms tracking
 - fix tests to use wait function since log output might
   be deferred by OS

Cyrill Gorcunov (5):
  latch: add latch_is_locked helper
  say: introduce panic_on helper
  limbo: order access to the limbo terms
  limbo: filter incoming synchro requests
  test: add replication/gh-6036-rollback-confirm

 src/box/applier.cc                            |  31 +-
 src/box/box.cc                                |  23 +-
 src/box/memtx_engine.c                        |   3 +-
 src/box/txn_limbo.c                           | 323 ++++++++++++++++--
 src/box/txn_limbo.h                           | 103 +++++-
 src/lib/core/latch.h                          |  11 +
 src/lib/core/say.h                            |   1 +
 test/replication/gh-6036-master.lua           |   1 +
 test/replication/gh-6036-node.lua             |  33 ++
 test/replication/gh-6036-replica.lua          |   1 +
 .../gh-6036-rollback-confirm.result           | 180 ++++++++++
 .../gh-6036-rollback-confirm.test.lua         |  92 +++++
 12 files changed, 759 insertions(+), 43 deletions(-)
 create mode 120000 test/replication/gh-6036-master.lua
 create mode 100644 test/replication/gh-6036-node.lua
 create mode 120000 test/replication/gh-6036-replica.lua
 create mode 100644 test/replication/gh-6036-rollback-confirm.result
 create mode 100644 test/replication/gh-6036-rollback-confirm.test.lua


base-commit: 354a4897eb8e5a60a4d6673e17b00a2e597b719e
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Tarantool-patches] [PATCH v9 1/5] latch: add latch_is_locked helper
  2021-07-30 11:35 [Tarantool-patches] [PATCH v9 0/5] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
@ 2021-07-30 11:35 ` Cyrill Gorcunov via Tarantool-patches
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 2/5] say: introduce panic_on helper Cyrill Gorcunov via Tarantool-patches
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-07-30 11:35 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

To test if latch is locked.

In-scope-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/lib/core/latch.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/src/lib/core/latch.h b/src/lib/core/latch.h
index 49c59cf63..0aaa8b634 100644
--- a/src/lib/core/latch.h
+++ b/src/lib/core/latch.h
@@ -95,6 +95,17 @@ latch_owner(struct latch *l)
 	return l->owner;
 }
 
+/**
+ * Return true if the latch is locked.
+ *
+ * @param l - latch to be tested.
+ */
+static inline bool
+latch_is_locked(const struct latch *l)
+{
+	return l->owner != NULL;
+}
+
 /**
  * Lock a latch. If the latch is already locked by another fiber,
  * waits for timeout.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Tarantool-patches] [PATCH v9 2/5] say: introduce panic_on helper
  2021-07-30 11:35 [Tarantool-patches] [PATCH v9 0/5] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 1/5] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
@ 2021-07-30 11:35 ` Cyrill Gorcunov via Tarantool-patches
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 3/5] limbo: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-07-30 11:35 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

In-scope-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/lib/core/say.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/lib/core/say.h b/src/lib/core/say.h
index e1fec8c60..4bb1645fd 100644
--- a/src/lib/core/say.h
+++ b/src/lib/core/say.h
@@ -348,6 +348,7 @@ CFORMAT(printf, 5, 6) extern sayfunc_t _say;
 
 #define panic_status(status, ...)	({ say(S_FATAL, NULL, __VA_ARGS__); exit(status); })
 #define panic(...)			panic_status(EXIT_FAILURE, __VA_ARGS__)
+#define panic_on(cond, ...)		if (cond) panic(__VA_ARGS__)
 #define panic_syserror(...)		({ say(S_FATAL, strerror(errno), __VA_ARGS__); exit(EXIT_FAILURE); })
 
 enum {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Tarantool-patches] [PATCH v9 3/5] limbo: order access to the limbo terms
  2021-07-30 11:35 [Tarantool-patches] [PATCH v9 0/5] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 1/5] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 2/5] say: introduce panic_on helper Cyrill Gorcunov via Tarantool-patches
@ 2021-07-30 11:35 ` Cyrill Gorcunov via Tarantool-patches
  2021-08-02 23:48   ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 5/5] test: add replication/gh-6036-rollback-confirm Cyrill Gorcunov via Tarantool-patches
  4 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-07-30 11:35 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Limbo terms tracking is shared between appliers and when
one of appliers is waiting for write to complete inside
journal_write() routine, an other may need to access read
term value to figure out if promote request is valid to
apply. Due to cooperative multitasking access to the terms
is not consistent so we need to be sure that other fibers
either read up to date terms (ie written to the WAL).

For this sake we use latching mechanism, when one fiber
took terms-lock for updating other readers are waiting
until the operation is complete.

For example here is a call graph of two appliers

applier 1
---------
applier_apply_tx
  (promote term = 3
   current max term = 2)
  applier_synchro_filter_tx
  apply_synchro_row
    journal_write
      (sleeping)

at this moment another applier comes in with obsolete
data and term 2

                              applier 2
                              ---------
                              applier_apply_tx
                                (term 2)
                                applier_synchro_filter_tx
                                  txn_limbo_is_replica_outdated -> false
                                journal_write (sleep)

applier 1
---------
journal wakes up
  apply_synchro_row_cb
    set max term to 3

So the applier 2 didn't notice that term 3 is already seen
and wrote obsolete data. With locking the applier 2 will
wait until applier 1 has finished its write.

Part-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/applier.cc  | 10 +++++--
 src/box/box.cc      | 12 ++++----
 src/box/txn_limbo.c | 27 ++++++++++++-----
 src/box/txn_limbo.h | 72 +++++++++++++++++++++++++++++++++++++++++----
 4 files changed, 98 insertions(+), 23 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index f621fa657..a7f472714 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -856,7 +856,7 @@ apply_synchro_row_cb(struct journal_entry *entry)
 		applier_rollback_by_wal_io(entry->res);
 	} else {
 		replica_txn_wal_write_cb(synchro_entry->rcb);
-		txn_limbo_process(&txn_limbo, synchro_entry->req);
+		txn_limbo_process_locked(&txn_limbo, synchro_entry->req);
 		trigger_run(&replicaset.applier.on_wal_write, NULL);
 	}
 	fiber_wakeup(synchro_entry->owner);
@@ -872,6 +872,7 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 	if (xrow_decode_synchro(row, &req) != 0)
 		goto err;
 
+	txn_limbo_term_lock(&txn_limbo);
 	struct replica_cb_data rcb_data;
 	struct synchro_entry entry;
 	/*
@@ -909,12 +910,15 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 	 * transactions side, including the async ones.
 	 */
 	if (journal_write(&entry.base) != 0)
-		goto err;
+		goto err_unlock;
 	if (entry.base.res < 0) {
 		diag_set_journal_res(entry.base.res);
-		goto err;
+		goto err_unlock;
 	}
+	txn_limbo_term_unlock(&txn_limbo);
 	return 0;
+err_unlock:
+	txn_limbo_term_unlock(&txn_limbo);
 err:
 	diag_log();
 	return -1;
diff --git a/src/box/box.cc b/src/box/box.cc
index 535f30292..5ca617e32 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1573,7 +1573,7 @@ box_run_elections(void)
 static int
 box_check_promote_term_intact(uint64_t promote_term)
 {
-	if (txn_limbo.promote_greatest_term != promote_term) {
+	if (txn_limbo_term_max_raw(&txn_limbo) != promote_term) {
 		diag_set(ClientError, ER_INTERFERING_PROMOTE,
 			 txn_limbo.owner_id);
 		return -1;
@@ -1585,7 +1585,7 @@ box_check_promote_term_intact(uint64_t promote_term)
 static int
 box_trigger_elections(void)
 {
-	uint64_t promote_term = txn_limbo.promote_greatest_term;
+	uint64_t promote_term = txn_limbo_term_max_raw(&txn_limbo);
 	raft_new_term(box_raft());
 	if (box_raft_wait_term_persisted() < 0)
 		return -1;
@@ -1596,7 +1596,7 @@ box_trigger_elections(void)
 static int
 box_try_wait_confirm(double timeout)
 {
-	uint64_t promote_term = txn_limbo.promote_greatest_term;
+	uint64_t promote_term = txn_limbo_term_max_raw(&txn_limbo);
 	txn_limbo_wait_empty(&txn_limbo, timeout);
 	return box_check_promote_term_intact(promote_term);
 }
@@ -1612,7 +1612,7 @@ box_wait_limbo_acked(void)
 	if (txn_limbo_is_empty(&txn_limbo))
 		return txn_limbo.confirmed_lsn;
 
-	uint64_t promote_term = txn_limbo.promote_greatest_term;
+	uint64_t promote_term = txn_limbo_term_max_raw(&txn_limbo);
 	int quorum = replication_synchro_quorum;
 	struct txn_limbo_entry *last_entry;
 	last_entry = txn_limbo_last_synchro_entry(&txn_limbo);
@@ -1728,7 +1728,7 @@ box_promote(void)
 	 * Currently active leader (the instance that is seen as leader by both
 	 * raft and txn_limbo) can't issue another PROMOTE.
 	 */
-	bool is_leader = txn_limbo_replica_term(&txn_limbo, instance_id) ==
+	bool is_leader = txn_limbo_term(&txn_limbo, instance_id) ==
 			 raft->term && txn_limbo.owner_id == instance_id;
 	if (box_election_mode != ELECTION_MODE_OFF)
 		is_leader = is_leader && raft->state == RAFT_STATE_LEADER;
@@ -1784,7 +1784,7 @@ box_demote(void)
 		return 0;
 
 	/* Currently active leader is the only one who can issue a DEMOTE. */
-	bool is_leader = txn_limbo_replica_term(&txn_limbo, instance_id) ==
+	bool is_leader = txn_limbo_term(&txn_limbo, instance_id) ==
 			 box_raft()->term && txn_limbo.owner_id == instance_id;
 	if (box_election_mode != ELECTION_MODE_OFF)
 		is_leader = is_leader && box_raft()->state == RAFT_STATE_LEADER;
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 570f77c46..be5e0adf5 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -46,7 +46,8 @@ txn_limbo_create(struct txn_limbo *limbo)
 	fiber_cond_create(&limbo->wait_cond);
 	vclock_create(&limbo->vclock);
 	vclock_create(&limbo->promote_term_map);
-	limbo->promote_greatest_term = 0;
+	limbo->promote_term_max = 0;
+	latch_create(&limbo->promote_latch);
 	limbo->confirmed_lsn = 0;
 	limbo->rollback_count = 0;
 	limbo->is_in_rollback = false;
@@ -308,7 +309,7 @@ txn_limbo_checkpoint(const struct txn_limbo *limbo,
 	req->type = IPROTO_PROMOTE;
 	req->replica_id = limbo->owner_id;
 	req->lsn = limbo->confirmed_lsn;
-	req->term = limbo->promote_greatest_term;
+	req->term = limbo->promote_term_max;
 }
 
 static void
@@ -724,22 +725,23 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout)
 }
 
 void
-txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
+txn_limbo_process_locked(struct txn_limbo *limbo,
+			 const struct synchro_request *req)
 {
 	uint64_t term = req->term;
 	uint32_t origin = req->origin_id;
-	if (txn_limbo_replica_term(limbo, origin) < term) {
+	if (txn_limbo_term_locked(limbo, origin) < term) {
 		vclock_follow(&limbo->promote_term_map, origin, term);
-		if (term > limbo->promote_greatest_term)
-			limbo->promote_greatest_term = term;
+		if (term > limbo->promote_term_max)
+			limbo->promote_term_max = term;
 	} else if (iproto_type_is_promote_request(req->type) &&
-		   limbo->promote_greatest_term > 1) {
+		   limbo->promote_term_max > 1) {
 		/* PROMOTE for outdated term. Ignore. */
 		say_info("RAFT: ignoring %s request from instance "
 			 "id %u for term %llu. Greatest term seen "
 			 "before (%llu) is bigger.",
 			 iproto_type_name(req->type), origin, (long long)term,
-			 (long long)limbo->promote_greatest_term);
+			 (long long)limbo->promote_term_max);
 		return;
 	}
 
@@ -786,6 +788,15 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 	return;
 }
 
+void
+txn_limbo_process(struct txn_limbo *limbo,
+		  const struct synchro_request *req)
+{
+	txn_limbo_term_lock(limbo);
+	txn_limbo_process_locked(limbo, req);
+	txn_limbo_term_unlock(limbo);
+}
+
 void
 txn_limbo_on_parameters_change(struct txn_limbo *limbo)
 {
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 53e52f676..25faffd2b 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -31,6 +31,7 @@
  */
 #include "small/rlist.h"
 #include "vclock/vclock.h"
+#include "latch.h"
 
 #include <stdint.h>
 
@@ -146,7 +147,11 @@ struct txn_limbo {
 	 * instance hasn't read its PROMOTE request yet. During other times the
 	 * limbo and raft are in sync and the terms are the same.
 	 */
-	uint64_t promote_greatest_term;
+	uint64_t promote_term_max;
+	/**
+	 * To order access to the promote data.
+	 */
+	struct latch promote_latch;
 	/**
 	 * Maximal LSN gathered quorum and either already confirmed in WAL, or
 	 * whose confirmation is in progress right now. Any attempt to confirm
@@ -211,14 +216,61 @@ txn_limbo_last_entry(struct txn_limbo *limbo)
 				in_queue);
 }
 
+/** Lock promote data. */
+static inline void
+txn_limbo_term_lock(struct txn_limbo *limbo)
+{
+	latch_lock(&limbo->promote_latch);
+}
+
+/** Unlock promote data. */
+static void
+txn_limbo_term_unlock(struct txn_limbo *limbo)
+{
+	latch_unlock(&limbo->promote_latch);
+}
+
+/** Test if promote data is locked. */
+static inline bool
+txn_limbo_term_is_locked(const struct txn_limbo *limbo)
+{
+	return latch_is_locked(&limbo->promote_latch);
+}
+
+/** Fetch replica's term with lock taken. */
+static inline uint64_t
+txn_limbo_term_locked(struct txn_limbo *limbo, uint32_t replica_id)
+{
+	panic_on(!txn_limbo_term_is_locked(limbo),
+		 "limbo: unlocked term read for replica_id %u",
+		 replica_id);
+	return vclock_get(&limbo->promote_term_map, replica_id);
+}
+
 /**
  * Return the latest term as seen in PROMOTE requests from instance with id
  * @a replica_id.
  */
 static inline uint64_t
-txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t replica_id)
+txn_limbo_term(struct txn_limbo *limbo, uint32_t replica_id)
 {
-	return vclock_get(&limbo->promote_term_map, replica_id);
+	txn_limbo_term_lock(limbo);
+	uint64_t v = txn_limbo_term_locked(limbo, replica_id);
+	txn_limbo_term_unlock(limbo);
+	return v;
+}
+
+/**
+ * Fiber's preempt not safe read of @a terms_max.
+ *
+ * Use it if you're interested in current value
+ * only and ready that the value is getting updated
+ * if after the read yield happens.
+ */
+static inline uint64_t
+txn_limbo_term_max_raw(struct txn_limbo *limbo)
+{
+	return limbo->promote_term_max;
 }
 
 /**
@@ -226,11 +278,14 @@ txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t replica_id)
  * data from it. The check is only valid when elections are enabled.
  */
 static inline bool
-txn_limbo_is_replica_outdated(const struct txn_limbo *limbo,
+txn_limbo_is_replica_outdated(struct txn_limbo *limbo,
 			      uint32_t replica_id)
 {
-	return txn_limbo_replica_term(limbo, replica_id) <
-	       limbo->promote_greatest_term;
+	txn_limbo_term_lock(limbo);
+	bool res = txn_limbo_term_locked(limbo, replica_id) <
+		limbo->promote_term_max;
+	txn_limbo_term_unlock(limbo);
+	return res;
 }
 
 /**
@@ -302,6 +357,11 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 
 /** Execute a synchronous replication request. */
 void
+txn_limbo_process_locked(struct txn_limbo *limbo,
+			 const struct synchro_request *req);
+
+/** Lock limbo terms and execute a synchronous replication request. */
+void
 txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
 
 /**
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests
  2021-07-30 11:35 [Tarantool-patches] [PATCH v9 0/5] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
                   ` (2 preceding siblings ...)
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 3/5] limbo: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
@ 2021-07-30 11:35 ` Cyrill Gorcunov via Tarantool-patches
  2021-08-02 23:50   ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-03 10:51   ` Serge Petrenko via Tarantool-patches
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 5/5] test: add replication/gh-6036-rollback-confirm Cyrill Gorcunov via Tarantool-patches
  4 siblings, 2 replies; 12+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-07-30 11:35 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

When we receive synchro requests we can't just apply
them blindly because in worse case they may come from
split-brain configuration (where a cluster splitted into
several subclusters and each one has own leader elected,
then subclisters are trying to merge back into original
cluster). We need to do our best to detect such configs
and force these nodes to rejoin from the scratch for
data consistency sake.

Thus when we're processing requests we pass them to the
packet filter first which validates their contents and
refuse to apply if they are not matched.

Depending on request type each packet traverse an
appropriate chain(s)

FILTER_IN
 - Common chain for any synchro packet. We verify
   that if replica_id is nil then it shall be
   PROMOTE request with lsn 0 to migrate limbo owner

FILTER_CONFIRM
FILTER_ROLLBACK
 - Both confirm and rollback requests shall not come
   with empty limbo since it measn the synchro queue
   is already processed and the peer didn't notice
   that

FILTER_PROMOTE
 - Promote request should come in with new terms only,
   otherwise it means the peer didn't notice election

 - If limbo's confirmed_lsn is equal to promote LSN then
   it is a valid request to process

 - If limbo's confirmed_lsn is bigger than requested then
   it is valid in one case only -- limbo migration so the
   queue shall be empty

 - If limbo's confirmed_lsn is less than promote LSN then
   - If queue is empty then it means the transactions are
     already rolled back and request is invalid
   - If queue is not empty then its first entry might be
     greater than promote LSN and it means that old data
     either committed or rolled back already and request
     is invalid

FILTER_DEMOTE
 - NOP, reserved for future use

Closes #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/applier.cc     |  21 ++-
 src/box/box.cc         |  11 +-
 src/box/memtx_engine.c |   3 +-
 src/box/txn_limbo.c    | 304 ++++++++++++++++++++++++++++++++++++++---
 src/box/txn_limbo.h    |  33 ++++-
 5 files changed, 347 insertions(+), 25 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index a7f472714..fefaa4ced 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -458,7 +458,8 @@ applier_wait_snapshot(struct applier *applier)
 				struct synchro_request req;
 				if (xrow_decode_synchro(&row, &req) != 0)
 					diag_raise();
-				txn_limbo_process(&txn_limbo, &req);
+				if (txn_limbo_process(&txn_limbo, &req) != 0)
+					diag_raise();
 			} else if (iproto_type_is_raft_request(row.type)) {
 				struct raft_request req;
 				if (xrow_decode_raft(&row, &req, NULL) != 0)
@@ -514,6 +515,11 @@ applier_fetch_snapshot(struct applier *applier)
 	struct ev_io *coio = &applier->io;
 	struct xrow_header row;
 
+	txn_limbo_filter_disable(&txn_limbo);
+	auto filter_guard = make_scoped_guard([&]{
+		txn_limbo_filter_enable(&txn_limbo);
+	});
+
 	memset(&row, 0, sizeof(row));
 	row.type = IPROTO_FETCH_SNAPSHOT;
 	coio_write_xrow(coio, &row);
@@ -587,6 +593,11 @@ applier_register(struct applier *applier, bool was_anon)
 	struct ev_io *coio = &applier->io;
 	struct xrow_header row;
 
+	txn_limbo_filter_disable(&txn_limbo);
+	auto filter_guard = make_scoped_guard([&]{
+		txn_limbo_filter_enable(&txn_limbo);
+	});
+
 	memset(&row, 0, sizeof(row));
 	/*
 	 * Send this instance's current vclock together
@@ -620,6 +631,11 @@ applier_join(struct applier *applier)
 	struct xrow_header row;
 	uint64_t row_count;
 
+	txn_limbo_filter_disable(&txn_limbo);
+	auto filter_guard = make_scoped_guard([&]{
+		txn_limbo_filter_enable(&txn_limbo);
+	});
+
 	xrow_encode_join_xc(&row, &INSTANCE_UUID);
 	coio_write_xrow(coio, &row);
 
@@ -873,6 +889,9 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 		goto err;
 
 	txn_limbo_term_lock(&txn_limbo);
+	if (txn_limbo_filter_locked(&txn_limbo, &req) != 0)
+		goto err_unlock;
+
 	struct replica_cb_data rcb_data;
 	struct synchro_entry entry;
 	/*
diff --git a/src/box/box.cc b/src/box/box.cc
index 5ca617e32..000887aa6 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1676,7 +1676,8 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
 		.lsn = promote_lsn,
 		.term = raft->term,
 	};
-	txn_limbo_process(&txn_limbo, &req);
+	if (txn_limbo_process(&txn_limbo, &req) != 0)
+		diag_raise();
 	assert(txn_limbo_is_empty(&txn_limbo));
 }
 
@@ -1703,7 +1704,8 @@ box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
 		.lsn = promote_lsn,
 		.term = box_raft()->term,
 	};
-	txn_limbo_process(&txn_limbo, &req);
+	if (txn_limbo_process(&txn_limbo, &req) != 0)
+		diag_raise();
 	assert(txn_limbo_is_empty(&txn_limbo));
 }
 
@@ -3281,6 +3283,11 @@ local_recovery(const struct tt_uuid *instance_uuid,
 
 	say_info("instance uuid %s", tt_uuid_str(&INSTANCE_UUID));
 
+	txn_limbo_filter_disable(&txn_limbo);
+	auto filter_guard = make_scoped_guard([&]{
+		txn_limbo_filter_enable(&txn_limbo);
+	});
+
 	struct wal_stream wal_stream;
 	wal_stream_create(&wal_stream);
 	auto stream_guard = make_scoped_guard([&]{
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 0b06e5e63..4aed24fe3 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -238,7 +238,8 @@ memtx_engine_recover_synchro(const struct xrow_header *row)
 	 * because all its rows have a zero replica_id.
 	 */
 	req.origin_id = req.replica_id;
-	txn_limbo_process(&txn_limbo, &req);
+	if (txn_limbo_process(&txn_limbo, &req) != 0)
+		return -1;
 	return 0;
 }
 
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index be5e0adf5..b01b5a572 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -51,6 +51,7 @@ txn_limbo_create(struct txn_limbo *limbo)
 	limbo->confirmed_lsn = 0;
 	limbo->rollback_count = 0;
 	limbo->is_in_rollback = false;
+	limbo->is_filtering = true;
 }
 
 bool
@@ -724,35 +725,291 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout)
 	return 0;
 }
 
+enum filter_chain {
+	FILTER_IN,
+	FILTER_CONFIRM,
+	FILTER_ROLLBACK,
+	FILTER_PROMOTE,
+	FILTER_DEMOTE,
+	FILTER_MAX,
+};
+
+/**
+ * Common chain for any incoming packet.
+ */
+static int
+filter_in(struct txn_limbo *limbo, const struct synchro_request *req)
+{
+	(void)limbo;
+
+	if (req->replica_id == REPLICA_ID_NIL) {
+		/*
+		 * The limbo was empty on the instance issuing
+		 * the request. This means this instance must
+		 * empty its limbo as well.
+		 */
+		if (req->lsn != 0 ||
+		    !iproto_type_is_promote_request(req->type)) {
+			say_info("RAFT: rejecting %s request from "
+				 "instance id %u for term %llu. "
+				 "req->replica_id = 0 but lsn %lld.",
+				 iproto_type_name(req->type),
+				 req->origin_id, (long long)req->term,
+				 (long long)req->lsn);
+
+			diag_set(ClientError, ER_UNSUPPORTED,
+				 "Replication",
+				 "empty replica_id with nonzero LSN");
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * Filter CONFIRM and ROLLBACK packets.
+ */
+static int
+filter_confirm_rollback(struct txn_limbo *limbo,
+			const struct synchro_request *req)
+{
+	/*
+	 * When limbo is empty we have nothing to
+	 * confirm/commit and if this request comes
+	 * in it means the split brain has happened.
+	 */
+	if (!txn_limbo_is_empty(limbo))
+		return 0;
+
+	say_info("RAFT: rejecting %s request from "
+		 "instance id %u for term %llu. "
+		 "Empty limbo detected.",
+		 iproto_type_name(req->type),
+		 req->origin_id,
+		 (long long)req->term);
+
+	diag_set(ClientError, ER_UNSUPPORTED,
+		 "Replication",
+		 "confirm/rollback with empty limbo");
+	return -1;
+}
+
+/**
+ * Filter PROMOTE packets.
+ */
+static int
+filter_promote(struct txn_limbo *limbo, const struct synchro_request *req)
+{
+	int64_t promote_lsn = req->lsn;
+
+	/*
+	 * If the term is already seen it means it comes
+	 * from a node which didn't notice new elections,
+	 * thus been living in subdomain and its data is
+	 * no longer consistent.
+	 */
+	if (limbo->promote_term_max > 1 &&
+	    limbo->promote_term_max > req->term) {
+		say_info("RAFT: rejecting %s request from "
+			 "instance id %u for term %llu. "
+			 "Max term seen is %llu.",
+			 iproto_type_name(req->type),
+			 req->origin_id,
+			 (long long)req->term,
+			 (long long)limbo->promote_term_max);
+
+		diag_set(ClientError, ER_UNSUPPORTED,
+			 "Replication", "obsolete terms");
+		return -1;
+	}
+
+	/*
+	 * Either the limbo is empty or new promote will
+	 * rollback all waiting transactions. Which
+	 * is fine.
+	 */
+	if (limbo->confirmed_lsn == promote_lsn)
+		return 0;
+
+	/*
+	 * Explicit split brain situation. Promote
+	 * comes in with an old LSN which we've already
+	 * processed.
+	 */
+	if (limbo->confirmed_lsn > promote_lsn) {
+		/*
+		 * If limbo is empty we're migrating
+		 * the owner.
+		 */
+		if (txn_limbo_is_empty(limbo))
+			return 0;
+
+		say_info("RAFT: rejecting %s request from "
+			 "instance id %u for term %llu. "
+			 "confirmed_lsn %lld > promote_lsn %lld.",
+			 iproto_type_name(req->type),
+			 req->origin_id, (long long)req->term,
+			 (long long)limbo->confirmed_lsn,
+			 (long long)promote_lsn);
+
+		diag_set(ClientError, ER_UNSUPPORTED,
+			 "Replication",
+			 "backward promote LSN (split brain)");
+		return -1;
+	}
+
+	/*
+	 * The last case requires a few subcases.
+	 */
+	assert(limbo->confirmed_lsn < promote_lsn);
+
+	if (txn_limbo_is_empty(limbo)) {
+		/*
+		 * Transactions are already rolled back
+		 * since the limbo is empty.
+		 */
+		say_info("RAFT: rejecting %s request from "
+			 "instance id %u for term %llu. "
+			 "confirmed_lsn %lld < promote_lsn %lld "
+			 "and empty limbo.",
+			 iproto_type_name(req->type),
+			 req->origin_id, (long long)req->term,
+			 (long long)limbo->confirmed_lsn,
+			 (long long)promote_lsn);
+
+		diag_set(ClientError, ER_UNSUPPORTED,
+			 "Replication",
+			 "forward promote LSN "
+			 "(empty limbo, split brain)");
+		return -1;
+	} else {
+		/*
+		 * Some entries are present in the limbo,
+		 * and if first entry's LSN is greater than
+		 * requested then old data either commited
+		 * or rolled back, so can't continue.
+		 */
+		struct txn_limbo_entry *first;
+
+		first = txn_limbo_first_entry(limbo);
+		if (first->lsn > promote_lsn) {
+			say_info("RAFT: rejecting %s request from "
+				 "instance id %u for term %llu. "
+				 "confirmed_lsn %lld < promote_lsn %lld "
+				 "and limbo first lsn %lld.",
+				 iproto_type_name(req->type),
+				 req->origin_id, (long long)req->term,
+				 (long long)limbo->confirmed_lsn,
+				 (long long)promote_lsn,
+				 (long long)first->lsn);
+
+			diag_set(ClientError, ER_UNSUPPORTED,
+				 "Replication",
+				 "promote LSN confilict "
+				 "(limbo LSN ahead, split brain)");
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * Filter DEMOTE packets.
+ */
+static int
+filter_demote(struct txn_limbo *limbo, const struct synchro_request *req)
+{
+	(void)limbo;
+	(void)req;
+	return 0;
+}
+
+static int (*filter_req[FILTER_MAX])
+(struct txn_limbo *limbo, const struct synchro_request *req) = {
+	[FILTER_IN]		= filter_in,
+	[FILTER_CONFIRM]	= filter_confirm_rollback,
+	[FILTER_ROLLBACK]	= filter_confirm_rollback,
+	[FILTER_PROMOTE]	= filter_promote,
+	[FILTER_DEMOTE]		= filter_demote,
+};
+
+int
+txn_limbo_filter_locked(struct txn_limbo *limbo,
+			const struct synchro_request *req)
+{
+	unsigned int mask = (1u << FILTER_IN);
+	unsigned int pos = 0;
+
+#ifndef NDEBUG
+	say_info("limbo: filter %s replica_id %u origin_id %u "
+		 "term %lld lsn %lld, queue owner_id %u len %lld "
+		 "confirmed_lsn %lld (%s)",
+		 iproto_type_name(req->type),
+		 req->replica_id, req->origin_id,
+		 (long long)req->term, (long long)req->lsn,
+		 limbo->owner_id, (long long)limbo->len,
+		 (long long)limbo->confirmed_lsn,
+		 limbo->is_filtering ? "on" : "off");
+#endif
+
+	if (!limbo->is_filtering)
+		return 0;
+
+	switch (req->type) {
+	case IPROTO_CONFIRM:
+		mask |= (1u << FILTER_CONFIRM);
+		break;
+	case IPROTO_ROLLBACK:
+		mask |= (1u << FILTER_ROLLBACK);
+		break;
+	case IPROTO_PROMOTE:
+		mask |= (1u << FILTER_PROMOTE);
+		break;
+	case IPROTO_DEMOTE:
+		mask |= (1u << FILTER_DEMOTE);
+		break;
+	default:
+		say_info("RAFT: rejecting unexpected %d "
+			 "request from instance id %u "
+			 "for term %llu.",
+			 req->type, req->origin_id,
+			 (long long)req->term);
+		diag_set(ClientError, ER_UNSUPPORTED,
+			 "Replication",
+			 "unexpected request type");
+		return -1;
+	}
+
+	while (mask != 0) {
+		if ((mask & 1) != 0) {
+			assert(pos < lengthof(filter_req));
+			if (filter_req[pos](limbo, req) != 0)
+				return -1;
+		}
+		pos++;
+		mask >>= 1;
+	};
+
+	return 0;
+}
+
 void
 txn_limbo_process_locked(struct txn_limbo *limbo,
 			 const struct synchro_request *req)
 {
 	uint64_t term = req->term;
 	uint32_t origin = req->origin_id;
+
 	if (txn_limbo_term_locked(limbo, origin) < term) {
 		vclock_follow(&limbo->promote_term_map, origin, term);
 		if (term > limbo->promote_term_max)
 			limbo->promote_term_max = term;
-	} else if (iproto_type_is_promote_request(req->type) &&
-		   limbo->promote_term_max > 1) {
-		/* PROMOTE for outdated term. Ignore. */
-		say_info("RAFT: ignoring %s request from instance "
-			 "id %u for term %llu. Greatest term seen "
-			 "before (%llu) is bigger.",
-			 iproto_type_name(req->type), origin, (long long)term,
-			 (long long)limbo->promote_term_max);
-		return;
 	}
 
 	int64_t lsn = req->lsn;
-	if (req->replica_id == REPLICA_ID_NIL) {
-		/*
-		 * The limbo was empty on the instance issuing the request.
-		 * This means this instance must empty its limbo as well.
-		 */
-		assert(lsn == 0 && iproto_type_is_promote_request(req->type));
-	} else if (req->replica_id != limbo->owner_id) {
+	if (req->replica_id != limbo->owner_id) {
 		/*
 		 * Ignore CONFIRM/ROLLBACK messages for a foreign master.
 		 * These are most likely outdated messages for already confirmed
@@ -783,18 +1040,25 @@ txn_limbo_process_locked(struct txn_limbo *limbo,
 		txn_limbo_read_demote(limbo, lsn);
 		break;
 	default:
-		unreachable();
+		panic("limbo: unexpected request type %d",
+		      req->type);
+		break;
 	}
-	return;
 }
 
-void
+int
 txn_limbo_process(struct txn_limbo *limbo,
 		  const struct synchro_request *req)
 {
+	int rc;
+
 	txn_limbo_term_lock(limbo);
-	txn_limbo_process_locked(limbo, req);
+	rc = txn_limbo_filter_locked(limbo, req);
+	if (rc == 0)
+		txn_limbo_process_locked(limbo, req);
 	txn_limbo_term_unlock(limbo);
+
+	return rc;
 }
 
 void
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 25faffd2b..eb74dda00 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -184,6 +184,14 @@ struct txn_limbo {
 	 * by the 'reversed rollback order' rule - contradiction.
 	 */
 	bool is_in_rollback;
+	/**
+	 * Whether the limbo should filter incoming requests.
+	 * The phases of local recovery from WAL file and on applier's
+	 * join phase we are in complete trust of incoming data because
+	 * this data forms an initial limbo state and should not
+	 * filter out requests.
+	 */
+	bool is_filtering;
 };
 
 /**
@@ -355,15 +363,38 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn);
 int
 txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 
+/**
+ * Verify if the request is valid for processing.
+ */
+int
+txn_limbo_filter_locked(struct txn_limbo *limbo,
+			const struct synchro_request *req);
+
 /** Execute a synchronous replication request. */
 void
 txn_limbo_process_locked(struct txn_limbo *limbo,
 			 const struct synchro_request *req);
 
 /** Lock limbo terms and execute a synchronous replication request. */
-void
+int
 txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
 
+/** Enable filtering of synchro requests. */
+static inline void
+txn_limbo_filter_enable(struct txn_limbo *limbo)
+{
+	limbo->is_filtering = true;
+	say_info("limbo: filter enabled");
+}
+
+/** Disable filtering of synchro requests. */
+static inline void
+txn_limbo_filter_disable(struct txn_limbo *limbo)
+{
+	limbo->is_filtering = false;
+	say_info("limbo: filter disabled");
+}
+
 /**
  * Waiting for confirmation of all "sync" transactions
  * during confirm timeout or fail.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Tarantool-patches] [PATCH v9 5/5] test: add replication/gh-6036-rollback-confirm
  2021-07-30 11:35 [Tarantool-patches] [PATCH v9 0/5] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
                   ` (3 preceding siblings ...)
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches
@ 2021-07-30 11:35 ` Cyrill Gorcunov via Tarantool-patches
  4 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-07-30 11:35 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Follow-up #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 test/replication/gh-6036-master.lua           |   1 +
 test/replication/gh-6036-node.lua             |  33 ++++
 test/replication/gh-6036-replica.lua          |   1 +
 .../gh-6036-rollback-confirm.result           | 180 ++++++++++++++++++
 .../gh-6036-rollback-confirm.test.lua         |  92 +++++++++
 5 files changed, 307 insertions(+)
 create mode 120000 test/replication/gh-6036-master.lua
 create mode 100644 test/replication/gh-6036-node.lua
 create mode 120000 test/replication/gh-6036-replica.lua
 create mode 100644 test/replication/gh-6036-rollback-confirm.result
 create mode 100644 test/replication/gh-6036-rollback-confirm.test.lua

diff --git a/test/replication/gh-6036-master.lua b/test/replication/gh-6036-master.lua
new file mode 120000
index 000000000..65baed5de
--- /dev/null
+++ b/test/replication/gh-6036-master.lua
@@ -0,0 +1 @@
+gh-6036-node.lua
\ No newline at end of file
diff --git a/test/replication/gh-6036-node.lua b/test/replication/gh-6036-node.lua
new file mode 100644
index 000000000..ac701b7a2
--- /dev/null
+++ b/test/replication/gh-6036-node.lua
@@ -0,0 +1,33 @@
+local INSTANCE_ID = string.match(arg[0], "gh%-6036%-(.+)%.lua")
+
+local function unix_socket(name)
+    return "unix/:./" .. name .. '.sock';
+end
+
+require('console').listen(os.getenv('ADMIN'))
+
+if INSTANCE_ID == "master" then
+    box.cfg({
+        listen = unix_socket("master"),
+        replication_connect_quorum = 0,
+        election_mode = 'candidate',
+        replication_synchro_quorum = 3,
+        replication_synchro_timeout = 1000,
+    })
+elseif INSTANCE_ID == "replica" then
+    box.cfg({
+        listen = unix_socket("replica"),
+        replication = {
+            unix_socket("master"),
+            unix_socket("replica")
+        },
+        read_only = true,
+        election_mode = 'voter',
+        replication_synchro_quorum = 2,
+        replication_synchro_timeout = 1000,
+    })
+end
+
+box.once("bootstrap", function()
+    box.schema.user.grant('guest', 'super')
+end)
diff --git a/test/replication/gh-6036-replica.lua b/test/replication/gh-6036-replica.lua
new file mode 120000
index 000000000..65baed5de
--- /dev/null
+++ b/test/replication/gh-6036-replica.lua
@@ -0,0 +1 @@
+gh-6036-node.lua
\ No newline at end of file
diff --git a/test/replication/gh-6036-rollback-confirm.result b/test/replication/gh-6036-rollback-confirm.result
new file mode 100644
index 000000000..27419ce21
--- /dev/null
+++ b/test/replication/gh-6036-rollback-confirm.result
@@ -0,0 +1,180 @@
+-- test-run result file version 2
+--
+-- gh-6036: Test for record collision detection. We have a cluster
+-- of two nodes: master and replica. The master initiates syncho write
+-- but fails to gather a quorum. Before it rolls back the record the
+-- network breakage occurs and replica lives with dirty data while
+-- master node goes offline. The replica becomes a new raft leader
+-- and commits the dirty data, same time master node rolls back this
+-- record and tries to connect to the new raft leader back. Such
+-- connection should be refused because old master node is not longer
+-- consistent.
+--
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+test_run:cmd('create server master with script="replication/gh-6036-master.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server replica with script="replication/gh-6036-replica.lua"')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('start server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica')
+ | ---
+ | - true
+ | ...
+
+--
+-- Connect master to the replica and write a record. Since the quorum
+-- value is bigger than number of nodes in a cluster it will be rolled
+-- back later.
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.cfg({                                       \
+    replication = {                             \
+            "unix/:./master.sock",              \
+            "unix/:./replica.sock",             \
+    },                                          \
+})
+ | ---
+ | ...
+_ = box.schema.create_space('sync', {is_sync = true})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+
+--
+-- Wait the record to appear on the master.
+f = require('fiber').create(function() box.space.sync:replace{1} end)
+ | ---
+ | ...
+test_run:wait_cond(function() return box.space.sync:get({1}) ~= nil end, 100)
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+
+--
+-- Wait the record from master get written and then
+-- drop the replication.
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return box.space.sync:get({1}) ~= nil end, 100)
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+box.cfg{replication = {}}
+ | ---
+ | ...
+
+--
+-- Then we jump back to the master and drop the replication,
+-- thus unconfirmed record get rolled back.
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.cfg({                                       \
+    replication = {},                           \
+    replication_synchro_timeout = 0.001,        \
+    election_mode = 'manual',                   \
+})
+ | ---
+ | ...
+while f:status() ~= 'dead' do require('fiber').sleep(0.1) end
+ | ---
+ | ...
+test_run:wait_cond(function() return box.space.sync:get({1}) == nil end, 100)
+ | ---
+ | - true
+ | ...
+
+--
+-- Force the replica to become a RAFT leader and
+-- commit this new record.
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.cfg({                                       \
+    replication_synchro_quorum = 1,             \
+    election_mode = 'manual'                    \
+})
+ | ---
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+
+--
+-- Connect master back to the replica, it should
+-- be refused.
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.cfg({                                       \
+    replication = {                             \
+            "unix/:./replica.sock",             \
+    },                                          \
+})
+ | ---
+ | ...
+box.space.sync:select{}
+ | ---
+ | - []
+ | ...
+test_run:wait_cond(function() return            \
+    test_run:grep_log('master',                 \
+        'rejecting PROMOTE') ~= nil end, 100)   \
+test_run:wait_cond(function() return            \
+    test_run:grep_log('master',                 \
+        'ER_UNSUPPORTED') ~= nil end, 100)
+ | ---
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-6036-rollback-confirm.test.lua b/test/replication/gh-6036-rollback-confirm.test.lua
new file mode 100644
index 000000000..f84f3dd45
--- /dev/null
+++ b/test/replication/gh-6036-rollback-confirm.test.lua
@@ -0,0 +1,92 @@
+--
+-- gh-6036: Test for record collision detection. We have a cluster
+-- of two nodes: master and replica. The master initiates syncho write
+-- but fails to gather a quorum. Before it rolls back the record the
+-- network breakage occurs and replica lives with dirty data while
+-- master node goes offline. The replica becomes a new raft leader
+-- and commits the dirty data, same time master node rolls back this
+-- record and tries to connect to the new raft leader back. Such
+-- connection should be refused because old master node is not longer
+-- consistent.
+--
+test_run = require('test_run').new()
+
+test_run:cmd('create server master with script="replication/gh-6036-master.lua"')
+test_run:cmd('create server replica with script="replication/gh-6036-replica.lua"')
+
+test_run:cmd('start server master')
+test_run:cmd('start server replica')
+
+--
+-- Connect master to the replica and write a record. Since the quorum
+-- value is bigger than number of nodes in a cluster it will be rolled
+-- back later.
+test_run:switch('master')
+box.cfg({                                       \
+    replication = {                             \
+            "unix/:./master.sock",              \
+            "unix/:./replica.sock",             \
+    },                                          \
+})
+_ = box.schema.create_space('sync', {is_sync = true})
+_ = box.space.sync:create_index('pk')
+
+--
+-- Wait the record to appear on the master.
+f = require('fiber').create(function() box.space.sync:replace{1} end)
+test_run:wait_cond(function() return box.space.sync:get({1}) ~= nil end, 100)
+box.space.sync:select{}
+
+--
+-- Wait the record from master get written and then
+-- drop the replication.
+test_run:switch('replica')
+test_run:wait_cond(function() return box.space.sync:get({1}) ~= nil end, 100)
+box.space.sync:select{}
+box.cfg{replication = {}}
+
+--
+-- Then we jump back to the master and drop the replication,
+-- thus unconfirmed record get rolled back.
+test_run:switch('master')
+box.cfg({                                       \
+    replication = {},                           \
+    replication_synchro_timeout = 0.001,        \
+    election_mode = 'manual',                   \
+})
+while f:status() ~= 'dead' do require('fiber').sleep(0.1) end
+test_run:wait_cond(function() return box.space.sync:get({1}) == nil end, 100)
+
+--
+-- Force the replica to become a RAFT leader and
+-- commit this new record.
+test_run:switch('replica')
+box.cfg({                                       \
+    replication_synchro_quorum = 1,             \
+    election_mode = 'manual'                    \
+})
+box.ctl.promote()
+box.space.sync:select{}
+
+--
+-- Connect master back to the replica, it should
+-- be refused.
+test_run:switch('master')
+box.cfg({                                       \
+    replication = {                             \
+            "unix/:./replica.sock",             \
+    },                                          \
+})
+box.space.sync:select{}
+test_run:wait_cond(function() return            \
+    test_run:grep_log('master',                 \
+        'rejecting PROMOTE') ~= nil end, 100)   \
+test_run:wait_cond(function() return            \
+    test_run:grep_log('master',                 \
+        'ER_UNSUPPORTED') ~= nil end, 100)
+
+test_run:switch('default')
+test_run:cmd('stop server master')
+test_run:cmd('delete server master')
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 3/5] limbo: order access to the limbo terms
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 3/5] limbo: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
@ 2021-08-02 23:48   ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-03 11:23     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-02 23:48 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Hi! Thanks for the patch!

See 6 comments below.

> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index f621fa657..a7f472714 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -909,12 +910,15 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
>  	 * transactions side, including the async ones.
>  	 */
>  	if (journal_write(&entry.base) != 0)
> -		goto err;
> +		goto err_unlock;
>  	if (entry.base.res < 0) {
>  		diag_set_journal_res(entry.base.res);
> -		goto err;
> +		goto err_unlock;
>  	}
> +	txn_limbo_term_unlock(&txn_limbo);
>  	return 0;
> +err_unlock:
> +	txn_limbo_term_unlock(&txn_limbo);

1. Could be done simpler:

====================
@@ -908,7 +909,9 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 	 * before trying to commit. But that requires extra steps from the
 	 * transactions side, including the async ones.
 	 */
-	if (journal_write(&entry.base) != 0)
+	int rc = journal_write(&entry.base);
+	txn_limbo_term_unlock(&txn_limbo);
+	if (rc != 0)
 		goto err;
 	if (entry.base.res < 0) {
 		diag_set_journal_res(entry.base.res);
====================

> diff --git a/src/box/box.cc b/src/box/box.cc
> index 535f30292..5ca617e32 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1573,7 +1573,7 @@ box_run_elections(void)
>  static int
>  box_check_promote_term_intact(uint64_t promote_term)
>  {
> -	if (txn_limbo.promote_greatest_term != promote_term) {
> +	if (txn_limbo_term_max_raw(&txn_limbo) != promote_term) {

2. In raft terminology we call such data 'volatile' instead
of 'raw'.

>  		diag_set(ClientError, ER_INTERFERING_PROMOTE,
>  			 txn_limbo.owner_id);
>  		return -1;
> @@ -1728,7 +1728,7 @@ box_promote(void)
>  	 * Currently active leader (the instance that is seen as leader by both
>  	 * raft and txn_limbo) can't issue another PROMOTE.
>  	 */
> -	bool is_leader = txn_limbo_replica_term(&txn_limbo, instance_id) ==
> +	bool is_leader = txn_limbo_term(&txn_limbo, instance_id) ==

3. Why did you change the name? The old one was closer to reality. When
you say 'limbo term', it is not clear whether you mean the max term or
term of one of the nodes.

Please, extract all such renames into a separate commit. Otherwise
it is harder to find the functional changes in this one.

> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index 570f77c46..be5e0adf5 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -724,22 +725,23 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout)
>  }
>  
>  void
> -txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
> +txn_limbo_process_locked(struct txn_limbo *limbo,
> +			 const struct synchro_request *req)
>  {

4. Please, add an assertion that the latch is occupied. The same for the
other _locked functions.

>  	uint64_t term = req->term;
>  	uint32_t origin = req->origin_id;
> -	if (txn_limbo_replica_term(limbo, origin) < term) {
> +	if (txn_limbo_term_locked(limbo, origin) < term) {
>  		vclock_follow(&limbo->promote_term_map, origin, term);
> -		if (term > limbo->promote_greatest_term)
> -			limbo->promote_greatest_term = term;
> +		if (term > limbo->promote_term_max)
> +			limbo->promote_term_max = term;
>  	} else if (iproto_type_is_promote_request(req->type) &&
> -		   limbo->promote_greatest_term > 1) {
> +		   limbo->promote_term_max > 1) {
>  		/* PROMOTE for outdated term. Ignore. */
>  		say_info("RAFT: ignoring %s request from instance "
>  			 "id %u for term %llu. Greatest term seen "
>  			 "before (%llu) is bigger.",
>  			 iproto_type_name(req->type), origin, (long long)term,
> -			 (long long)limbo->promote_greatest_term);
> +			 (long long)limbo->promote_term_max);
>  		return;
>  	}
> diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
> index 53e52f676..25faffd2b 100644
> --- a/src/box/txn_limbo.h
> +++ b/src/box/txn_limbo.h
> @@ -211,14 +216,61 @@ txn_limbo_last_entry(struct txn_limbo *limbo)
>  				in_queue);
>  }
>  
> +/** Lock promote data. */
> +static inline void
> +txn_limbo_term_lock(struct txn_limbo *limbo)
> +{
> +	latch_lock(&limbo->promote_latch);
> +}
> +
> +/** Unlock promote data. */
> +static void

5. Why isn't it inline while the others are?

> +txn_limbo_term_unlock(struct txn_limbo *limbo)
> +{
> +	latch_unlock(&limbo->promote_latch);
> +}
> +
> +/** Test if promote data is locked. */
> +static inline bool
> +txn_limbo_term_is_locked(const struct txn_limbo *limbo)
> +{
> +	return latch_is_locked(&limbo->promote_latch);
> +}
> +
> +/** Fetch replica's term with lock taken. */
> +static inline uint64_t
> +txn_limbo_term_locked(struct txn_limbo *limbo, uint32_t replica_id)
> +{
> +	panic_on(!txn_limbo_term_is_locked(limbo),
> +		 "limbo: unlocked term read for replica_id %u",
> +		 replica_id);

6. This new macro seems counter-intuitive because works vice versa
compared to assert(). Can you try to rename it somehow and make it
accept a condition which must be true instead of false?

> +	return vclock_get(&limbo->promote_term_map, replica_id);
> +}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches
@ 2021-08-02 23:50   ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-03 13:25     ` Cyrill Gorcunov via Tarantool-patches
  2021-08-03 10:51   ` Serge Petrenko via Tarantool-patches
  1 sibling, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-02 23:50 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Thanks for the patch!

On 30.07.2021 13:35, Cyrill Gorcunov wrote:
> When we receive synchro requests we can't just apply
> them blindly because in worse case they may come from
> split-brain configuration (where a cluster splitted into

splitted -> split.

> several subclusters and each one has own leader elected,
> then subclisters are trying to merge back into original

subclisters -> subclusters.

> cluster). We need to do our best to detect such configs
> and force these nodes to rejoin from the scratch for
> data consistency sake.
> 
> Thus when we're processing requests we pass them to the
> packet filter first which validates their contents and
> refuse to apply if they are not matched.
> 
> Depending on request type each packet traverse an
> appropriate chain(s)
> 
> FILTER_IN
>  - Common chain for any synchro packet. We verify
>    that if replica_id is nil then it shall be
>    PROMOTE request with lsn 0 to migrate limbo owner

How can it be 0 for non PROMOTE/DEMOTE requests?
Do we ever encode such rows at all? Why isn't this
a part of FILTER_PROMOTE?

> FILTER_CONFIRM
> FILTER_ROLLBACK
>  - Both confirm and rollback requests shall not come
>    with empty limbo since it measn the synchro queue

measn -> means.

>    is already processed and the peer didn't notice
>    that

Is it the only issue? What about ROLLBACK coming to
an instance, which already made PROMOTE on the rolled back
data? That is a part of the original problem in the ticket.

> FILTER_PROMOTE
>  - Promote request should come in with new terms only,
>    otherwise it means the peer didn't notice election
> 
>  - If limbo's confirmed_lsn is equal to promote LSN then
>    it is a valid request to process
> 
>  - If limbo's confirmed_lsn is bigger than requested then
>    it is valid in one case only -- limbo migration so the
>    queue shall be empty

I don't understand. How is it valid? PROMOTE(lsn) rolls
back everything > lsn. If the local confirmed_lsn > lsn, it
means that data can't be rolled back now and the data becomes
inconsistent.

>  - If limbo's confirmed_lsn is less than promote LSN then
>    - If queue is empty then it means the transactions are
>      already rolled back and request is invalid
>    - If queue is not empty then its first entry might be
>      greater than promote LSN and it means that old data
>      either committed or rolled back already and request
>      is invalid

If the first entry's LSN in the limbo > promote LSN, it
means it wasn't committed yet. The promote will roll it back
and it is fine. This will make the data consistent.

The problem appears if there were some other sync txns
rolled back or even committed with quorum=1 before this
hanging txn. And I don't remember we figured a way to
distinguish between these situations. Did we?

I didn't get to the code yet. Will do later.

> FILTER_DEMOTE
>  - NOP, reserved for future use
> 
> Closes #6036

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests
  2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches
  2021-08-02 23:50   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-03 10:51   ` Serge Petrenko via Tarantool-patches
  2021-08-03 13:49     ` Cyrill Gorcunov via Tarantool-patches
  1 sibling, 1 reply; 12+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-08-03 10:51 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy



30.07.2021 14:35, Cyrill Gorcunov пишет:
> When we receive synchro requests we can't just apply
> them blindly because in worse case they may come from
> split-brain configuration (where a cluster splitted into
> several subclusters and each one has own leader elected,
> then subclisters are trying to merge back into original
> cluster). We need to do our best to detect such configs
> and force these nodes to rejoin from the scratch for
> data consistency sake.
>
> Thus when we're processing requests we pass them to the
> packet filter first which validates their contents and
> refuse to apply if they are not matched.
>
> Depending on request type each packet traverse an
> appropriate chain(s)
>
> FILTER_IN
>   - Common chain for any synchro packet. We verify
>     that if replica_id is nil then it shall be
>     PROMOTE request with lsn 0 to migrate limbo owner
>
> FILTER_CONFIRM
> FILTER_ROLLBACK
>   - Both confirm and rollback requests shall not come
>     with empty limbo since it measn the synchro queue
>     is already processed and the peer didn't notice
>     that
>
> FILTER_PROMOTE
>   - Promote request should come in with new terms only,
>     otherwise it means the peer didn't notice election
>
>   - If limbo's confirmed_lsn is equal to promote LSN then
>     it is a valid request to process
>
>   - If limbo's confirmed_lsn is bigger than requested then
>     it is valid in one case only -- limbo migration so the
>     queue shall be empty
>
>   - If limbo's confirmed_lsn is less than promote LSN then
>     - If queue is empty then it means the transactions are
>       already rolled back and request is invalid
>     - If queue is not empty then its first entry might be
>       greater than promote LSN and it means that old data
>       either committed or rolled back already and request
>       is invalid
>
> FILTER_DEMOTE
>   - NOP, reserved for future use
>
> Closes #6036

Thanks for the patch!
You're definitely moving in the right direction here.
Please, find a couple of comments below.

>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
>   src/box/applier.cc     |  21 ++-
>   src/box/box.cc         |  11 +-
>   src/box/memtx_engine.c |   3 +-
>   src/box/txn_limbo.c    | 304 ++++++++++++++++++++++++++++++++++++++---
>   src/box/txn_limbo.h    |  33 ++++-
>   5 files changed, 347 insertions(+), 25 deletions(-)
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index a7f472714..fefaa4ced 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
...

> diff --git a/src/box/box.cc b/src/box/box.cc
> index 5ca617e32..000887aa6 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc

...

> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index be5e0adf5..b01b5a572 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -51,6 +51,7 @@ txn_limbo_create(struct txn_limbo *limbo)
>   	limbo->confirmed_lsn = 0;
>   	limbo->rollback_count = 0;
>   	limbo->is_in_rollback = false;
> +	limbo->is_filtering = true;
>   }
>   
>   bool
> @@ -724,35 +725,291 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout)
>   	return 0;
>   }
>   
> +enum filter_chain {
> +	FILTER_IN,
> +	FILTER_CONFIRM,
> +	FILTER_ROLLBACK,
> +	FILTER_PROMOTE,
> +	FILTER_DEMOTE,
> +	FILTER_MAX,
> +};
> +
> +/**
> + * Common chain for any incoming packet.
> + */
> +static int
> +filter_in(struct txn_limbo *limbo, const struct synchro_request *req)
> +{
> +	(void)limbo;
> +
> +	if (req->replica_id == REPLICA_ID_NIL) {
> +		/*
> +		 * The limbo was empty on the instance issuing
> +		 * the request. This means this instance must
> +		 * empty its limbo as well.
> +		 */
> +		if (req->lsn != 0 ||
> +		    !iproto_type_is_promote_request(req->type)) {
> +			say_info("RAFT: rejecting %s request from "
> +				 "instance id %u for term %llu. "
> +				 "req->replica_id = 0 but lsn %lld.",
> +				 iproto_type_name(req->type),
> +				 req->origin_id, (long long)req->term,
> +				 (long long)req->lsn);
> +
> +			diag_set(ClientError, ER_UNSUPPORTED,
> +				 "Replication",
> +				 "empty replica_id with nonzero LSN");
> +			return -1;
> +		}
> +	}

I agree with Vlad. This may be moved to filter_confirm_rollback.

> +
> +	return 0;
> +}
> +
> +/**
> + * Filter CONFIRM and ROLLBACK packets.
> + */
> +static int
> +filter_confirm_rollback(struct txn_limbo *limbo,
> +			const struct synchro_request *req)
> +{
> +	/*
> +	 * When limbo is empty we have nothing to
> +	 * confirm/commit and if this request comes
> +	 * in it means the split brain has happened.
> +	 */
> +	if (!txn_limbo_is_empty(limbo))
> +		return 0;
> +
> +	say_info("RAFT: rejecting %s request from "
> +		 "instance id %u for term %llu. "
> +		 "Empty limbo detected.",
> +		 iproto_type_name(req->type),
> +		 req->origin_id,
> +		 (long long)req->term);
> +
> +	diag_set(ClientError, ER_UNSUPPORTED,
> +		 "Replication",
> +		 "confirm/rollback with empty limbo");
> +	return -1;
> +}
> +
> +/**
> + * Filter PROMOTE packets.
> + */
> +static int
> +filter_promote(struct txn_limbo *limbo, const struct synchro_request *req)
> +{
> +	int64_t promote_lsn = req->lsn;
> +
> +	/*
> +	 * If the term is already seen it means it comes
> +	 * from a node which didn't notice new elections,
> +	 * thus been living in subdomain and its data is
> +	 * no longer consistent.
> +	 */
> +	if (limbo->promote_term_max > 1 &&
> +	    limbo->promote_term_max > req->term) {
> +		say_info("RAFT: rejecting %s request from "
> +			 "instance id %u for term %llu. "
> +			 "Max term seen is %llu.",
> +			 iproto_type_name(req->type),
> +			 req->origin_id,
> +			 (long long)req->term,
> +			 (long long)limbo->promote_term_max);
> +
> +		diag_set(ClientError, ER_UNSUPPORTED,
> +			 "Replication", "obsolete terms");
> +		return -1;
> +	}
> +
> +	/*
> +	 * Either the limbo is empty or new promote will
> +	 * rollback all waiting transactions. Which
> +	 * is fine.
> +	 */
> +	if (limbo->confirmed_lsn == promote_lsn)
> +		return 0;
> +
> +	/*
> +	 * Explicit split brain situation. Promote
> +	 * comes in with an old LSN which we've already
> +	 * processed.
> +	 */
> +	if (limbo->confirmed_lsn > promote_lsn) {
> +		/*
> +		 * If limbo is empty we're migrating
> +		 * the owner.
> +		 */
> +		if (txn_limbo_is_empty(limbo))
> +			return 0;

I don't understand this part. Are you sure this check is needed?
We're always migrating the owner with a promote.

> +
> +		say_info("RAFT: rejecting %s request from "
> +			 "instance id %u for term %llu. "
> +			 "confirmed_lsn %lld > promote_lsn %lld.",
> +			 iproto_type_name(req->type),
> +			 req->origin_id, (long long)req->term,
> +			 (long long)limbo->confirmed_lsn,
> +			 (long long)promote_lsn);
> +
> +		diag_set(ClientError, ER_UNSUPPORTED,
> +			 "Replication",
> +			 "backward promote LSN (split brain)");
> +		return -1;
> +	}
> +
> +	/*
> +	 * The last case requires a few subcases.
> +	 */
> +	assert(limbo->confirmed_lsn < promote_lsn);
> +
> +	if (txn_limbo_is_empty(limbo)) {
> +		/*
> +		 * Transactions are already rolled back
> +		 * since the limbo is empty.
> +		 */
> +		say_info("RAFT: rejecting %s request from "
> +			 "instance id %u for term %llu. "
> +			 "confirmed_lsn %lld < promote_lsn %lld "
> +			 "and empty limbo.",
> +			 iproto_type_name(req->type),
> +			 req->origin_id, (long long)req->term,
> +			 (long long)limbo->confirmed_lsn,
> +			 (long long)promote_lsn);
> +
> +		diag_set(ClientError, ER_UNSUPPORTED,
> +			 "Replication",
> +			 "forward promote LSN "
> +			 "(empty limbo, split brain)");
> +		return -1;

I think it'd be better to have a separate error code for this purpose.
Say, ER_SPLITBRAIN or something.
Then applier would have more control over what to do when such an error
is raised. Say, never reconnect. (I doesn't reconnect on ER_UNSUPPORTED,
I believe, but a distinct error is still better).

> +	} else {
> +		/*
> +		 * Some entries are present in the limbo,
> +		 * and if first entry's LSN is greater than
> +		 * requested then old data either commited
> +		 * or rolled back, so can't continue.
> +		 */
> +		struct txn_limbo_entry *first;
> +
> +		first = txn_limbo_first_entry(limbo);
> +		if (first->lsn > promote_lsn) {

This should only happen when confirmed_lsn > promote_lsn, shouldn't it?
If yes, than you've already handled it above.

> +			say_info("RAFT: rejecting %s request from "
> +				 "instance id %u for term %llu. "
> +				 "confirmed_lsn %lld < promote_lsn %lld "
> +				 "and limbo first lsn %lld.",
> +				 iproto_type_name(req->type),
> +				 req->origin_id, (long long)req->term,
> +				 (long long)limbo->confirmed_lsn,
> +				 (long long)promote_lsn,
> +				 (long long)first->lsn);
> +
> +			diag_set(ClientError, ER_UNSUPPORTED,
> +				 "Replication",
> +				 "promote LSN confilict "
> +				 "(limbo LSN ahead, split brain)");
> +			return -1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * Filter DEMOTE packets.
> + */
> +static int
> +filter_demote(struct txn_limbo *limbo, const struct synchro_request *req)
> +{
> +	(void)limbo;
> +	(void)req;
> +	return 0;
> +}
> +
> +static int (*filter_req[FILTER_MAX])
> +(struct txn_limbo *limbo, const struct synchro_request *req) = {
> +	[FILTER_IN]		= filter_in,
> +	[FILTER_CONFIRM]	= filter_confirm_rollback,
> +	[FILTER_ROLLBACK]	= filter_confirm_rollback,
> +	[FILTER_PROMOTE]	= filter_promote,
> +	[FILTER_DEMOTE]		= filter_demote,

Demote should be filtered the same way as promote, they are
basically the same requests with same meaning.

demote is just a promote for replica id 0, because we couldn't
do promote replica id 0.

> +};
> +
> +int
> +txn_limbo_filter_locked(struct txn_limbo *limbo,
> +			const struct synchro_request *req)
> +{
> +	unsigned int mask = (1u << FILTER_IN);
> +	unsigned int pos = 0;
> +
> +#ifndef NDEBUG
> +	say_info("limbo: filter %s replica_id %u origin_id %u "
> +		 "term %lld lsn %lld, queue owner_id %u len %lld "
> +		 "confirmed_lsn %lld (%s)",
> +		 iproto_type_name(req->type),
> +		 req->replica_id, req->origin_id,
> +		 (long long)req->term, (long long)req->lsn,
> +		 limbo->owner_id, (long long)limbo->len,
> +		 (long long)limbo->confirmed_lsn,
> +		 limbo->is_filtering ? "on" : "off");
> +#endif
> +
> +	if (!limbo->is_filtering)
> +		return 0;
> +
> +	switch (req->type) {
> +	case IPROTO_CONFIRM:
> +		mask |= (1u << FILTER_CONFIRM);
> +		break;
> +	case IPROTO_ROLLBACK:
> +		mask |= (1u << FILTER_ROLLBACK);
> +		break;
> +	case IPROTO_PROMOTE:
> +		mask |= (1u << FILTER_PROMOTE);
> +		break;
> +	case IPROTO_DEMOTE:
> +		mask |= (1u << FILTER_DEMOTE);
> +		break;
> +	default:
> +		say_info("RAFT: rejecting unexpected %d "
> +			 "request from instance id %u "
> +			 "for term %llu.",
> +			 req->type, req->origin_id,
> +			 (long long)req->term);
> +		diag_set(ClientError, ER_UNSUPPORTED,
> +			 "Replication",
> +			 "unexpected request type");
> +		return -1;
> +	}
> +
> +	while (mask != 0) {
> +		if ((mask & 1) != 0) {
> +			assert(pos < lengthof(filter_req));
> +			if (filter_req[pos](limbo, req) != 0)
> +				return -1;
> +		}
> +		pos++;
> +		mask >>= 1;
> +	};
> +
> +	return 0;
> +}
> +
>   void
>   txn_limbo_process_locked(struct txn_limbo *limbo,
>   			 const struct synchro_request *req)
>   {
>   	uint64_t term = req->term;
>   	uint32_t origin = req->origin_id;
> +
>   	if (txn_limbo_term_locked(limbo, origin) < term) {
>   		vclock_follow(&limbo->promote_term_map, origin, term);
>   		if (term > limbo->promote_term_max)
>   			limbo->promote_term_max = term;
> -	} else if (iproto_type_is_promote_request(req->type) &&
> -		   limbo->promote_term_max > 1) {
> -		/* PROMOTE for outdated term. Ignore. */
> -		say_info("RAFT: ignoring %s request from instance "
> -			 "id %u for term %llu. Greatest term seen "
> -			 "before (%llu) is bigger.",
> -			 iproto_type_name(req->type), origin, (long long)term,
> -			 (long long)limbo->promote_term_max);
> -		return;
>   	}
>   
>   	int64_t lsn = req->lsn;
> -	if (req->replica_id == REPLICA_ID_NIL) {
> -		/*
> -		 * The limbo was empty on the instance issuing the request.
> -		 * This means this instance must empty its limbo as well.
> -		 */
> -		assert(lsn == 0 && iproto_type_is_promote_request(req->type));
> -	} else if (req->replica_id != limbo->owner_id) {
> +	if (req->replica_id != limbo->owner_id) {
>   		/*
>   		 * Ignore CONFIRM/ROLLBACK messages for a foreign master.
>   		 * These are most likely outdated messages for already confirmed

We should error when request -> replica_id != limbo->owner_id.
For every entry type: promote/demote/confirm/rollback.

req->replica_id != limbo->owner_id means that the remote instance has
taken some actions in the past (say, confirmed something) of which we
didn't know until now. This is basically a splitbrain again.


> @@ -783,18 +1040,25 @@ txn_limbo_process_locked(struct txn_limbo *limbo,
>   		txn_limbo_read_demote(limbo, lsn);
>   		break;
>   	default:
> -		unreachable();
> +		panic("limbo: unexpected request type %d",
> +		      req->type);
> +		break;
>   	}
> -	return;
>   }
>   
> -void
> +int
>   txn_limbo_process(struct txn_limbo *limbo,
>   		  const struct synchro_request *req)
>   {
> +	int rc;
> +
>   	txn_limbo_term_lock(limbo);
> -	txn_limbo_process_locked(limbo, req);
> +	rc = txn_limbo_filter_locked(limbo, req);
> +	if (rc == 0)
> +		txn_limbo_process_locked(limbo, req);
>   	txn_limbo_term_unlock(limbo);
> +
> +	return rc;
>   }
>   
>   void
> diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
> index 25faffd2b..eb74dda00 100644
> --- a/src/box/txn_limbo.h
> +++ b/src/box/txn_limbo.h
> @@ -184,6 +184,14 @@ struct txn_limbo {
>   	 * by the 'reversed rollback order' rule - contradiction.
>   	 */
>   	bool is_in_rollback;
> +	/**
> +	 * Whether the limbo should filter incoming requests.
> +	 * The phases of local recovery from WAL file and on applier's
> +	 * join phase we are in complete trust of incoming data because
> +	 * this data forms an initial limbo state and should not
> +	 * filter out requests.
> +	 */
> +	bool is_filtering;
>   };
>   
>   /**
> @@ -355,15 +363,38 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn);
>   int
>   txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
>   
> +/**
> + * Verify if the request is valid for processing.
> + */
> +int
> +txn_limbo_filter_locked(struct txn_limbo *limbo,
> +			const struct synchro_request *req);
> +
>   /** Execute a synchronous replication request. */
>   void
>   txn_limbo_process_locked(struct txn_limbo *limbo,
>   			 const struct synchro_request *req);
>   
>   /** Lock limbo terms and execute a synchronous replication request. */
> -void
> +int
>   txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
>   
> +/** Enable filtering of synchro requests. */
> +static inline void
> +txn_limbo_filter_enable(struct txn_limbo *limbo)
> +{
> +	limbo->is_filtering = true;
> +	say_info("limbo: filter enabled");
> +}
> +
> +/** Disable filtering of synchro requests. */
> +static inline void
> +txn_limbo_filter_disable(struct txn_limbo *limbo)
> +{
> +	limbo->is_filtering = false;
> +	say_info("limbo: filter disabled");
> +}
> +
>   /**
>    * Waiting for confirmation of all "sync" transactions
>    * during confirm timeout or fail.

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 3/5] limbo: order access to the limbo terms
  2021-08-02 23:48   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-03 11:23     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-03 11:23 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Tue, Aug 03, 2021 at 01:48:18AM +0200, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> See 6 comments below.
> 
> > diff --git a/src/box/applier.cc b/src/box/applier.cc
> > index f621fa657..a7f472714 100644
> > --- a/src/box/applier.cc
> > +++ b/src/box/applier.cc
> > @@ -909,12 +910,15 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
> >  	 * transactions side, including the async ones.
> >  	 */
> >  	if (journal_write(&entry.base) != 0)
> > -		goto err;
> > +		goto err_unlock;
> >  	if (entry.base.res < 0) {
> >  		diag_set_journal_res(entry.base.res);
> > -		goto err;
> > +		goto err_unlock;
> >  	}
> > +	txn_limbo_term_unlock(&txn_limbo);
> >  	return 0;
> > +err_unlock:
> > +	txn_limbo_term_unlock(&txn_limbo);
> 
> 1. Could be done simpler:
> 
> ====================
> @@ -908,7 +909,9 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
>  	 * before trying to commit. But that requires extra steps from the
>  	 * transactions side, including the async ones.
>  	 */
> -	if (journal_write(&entry.base) != 0)
> +	int rc = journal_write(&entry.base);
> +	txn_limbo_term_unlock(&txn_limbo);
> +	if (rc != 0)
>  		goto err;
>  	if (entry.base.res < 0) {
>  		diag_set_journal_res(entry.base.res);
> ====================
> 

I thought about it, and initially used exactly this way. But I think
the error code is bound to the execution context and we should setup
fiber's error in a locked way simply because our fiber's schedule
could be reworked in future where unlock would cause immediate resched
with fiber switch and execution (say we introduce a fiber's priority),
in result another fiber get running while this one woud sit without
error assigned.

Anyway, since sched reworks seems not happen in near future I'll
update the code and use your snippet above. Kind of, I've to declare
@rc earlier because otherwise I get

/home/cyrill/projects/tarantool/tarantool.git/src/box/applier.cc:912:13: note:   crosses initialization of ‘int rc’
  912 |         int rc = journal_write(&entry.base);
      |             ^~


> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index 535f30292..5ca617e32 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -1573,7 +1573,7 @@ box_run_elections(void)
> >  static int
> >  box_check_promote_term_intact(uint64_t promote_term)
> >  {
> > -	if (txn_limbo.promote_greatest_term != promote_term) {
> > +	if (txn_limbo_term_max_raw(&txn_limbo) != promote_term) {
> 
> 2. In raft terminology we call such data 'volatile' instead
> of 'raw'.

We need to access unlocked value for read only purpose knowing that
it might be changed after we've read it, and _raw suffix is commonly
used for such things. Since you don't like the suffix I think better
would be simply use direct access to the @promote_greatest_term variable
since _volatile suffix would be too long.

> 
> >  		diag_set(ClientError, ER_INTERFERING_PROMOTE,
> >  			 txn_limbo.owner_id);
> >  		return -1;
> > @@ -1728,7 +1728,7 @@ box_promote(void)
> >  	 * Currently active leader (the instance that is seen as leader by both
> >  	 * raft and txn_limbo) can't issue another PROMOTE.
> >  	 */
> > -	bool is_leader = txn_limbo_replica_term(&txn_limbo, instance_id) ==
> > +	bool is_leader = txn_limbo_term(&txn_limbo, instance_id) ==
> 
> 3. Why did you change the name? The old one was closer to reality. When
> you say 'limbo term', it is not clear whether you mean the max term or
> term of one of the nodes.
> 
> Please, extract all such renames into a separate commit. Otherwise
> it is harder to find the functional changes in this one.

Because we pass the replica_id as an argument which implies that we
fetch replica's term. But not a problem, will revert it back to the
original naming.

> > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> > index 570f77c46..be5e0adf5 100644
> > --- a/src/box/txn_limbo.c
> > +++ b/src/box/txn_limbo.c
> > @@ -724,22 +725,23 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout)
> >  }
> >  
> >  void
> > -txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
> > +txn_limbo_process_locked(struct txn_limbo *limbo,
> > +			 const struct synchro_request *req)
> >  {
> 
> 4. Please, add an assertion that the latch is occupied. The same for the
> other _locked functions.

OK

> >  
> > +/** Lock promote data. */
> > +static inline void
> > +txn_limbo_term_lock(struct txn_limbo *limbo)
> > +{
> > +	latch_lock(&limbo->promote_latch);
> > +}
> > +
> > +/** Unlock promote data. */
> > +static void
> 
> 5. Why isn't it inline while the others are?

Seems to loose it while been reordering the code, will
add it back, thanks!

> > +
> > +/** Fetch replica's term with lock taken. */
> > +static inline uint64_t
> > +txn_limbo_term_locked(struct txn_limbo *limbo, uint32_t replica_id)
> > +{
> > +	panic_on(!txn_limbo_term_is_locked(limbo),
> > +		 "limbo: unlocked term read for replica_id %u",
> > +		 replica_id);
> 
> 6. This new macro seems counter-intuitive because works vice versa
> compared to assert(). Can you try to rename it somehow and make it
> accept a condition which must be true instead of false?

I'll drop it.
---
Here is a diff on top

 - drop panic_on helper
 - access promote_greatest_term directly
 - bring txn_limbo_replica_term name back
 - use assert(latch_is_locked(&limbo->promote_latch)) test in _locked helpers
 - drop txn_limbo_term_is_locked helper

I didn't pushed it out yet, because of the next patch which we have to discuss
and rework, so mostl likely I'll send another series later, so this diff is
rather to show what have been changed.
---
 src/box/applier.cc  | 12 ++++++------
 src/box/box.cc      | 12 ++++++------
 src/box/txn_limbo.c | 14 ++++++++------
 src/box/txn_limbo.h | 32 +++++---------------------------
 4 files changed, 25 insertions(+), 45 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index a7f472714..9db286ae2 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -867,6 +867,7 @@ static int
 apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 {
 	assert(iproto_type_is_synchro_request(row->type));
+	int rc = 0;
 
 	struct synchro_request req;
 	if (xrow_decode_synchro(row, &req) != 0)
@@ -909,16 +910,15 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 	 * before trying to commit. But that requires extra steps from the
 	 * transactions side, including the async ones.
 	 */
-	if (journal_write(&entry.base) != 0)
-		goto err_unlock;
+	rc = journal_write(&entry.base);
+	txn_limbo_term_unlock(&txn_limbo);
+	if (rc != 0)
+		goto err;
 	if (entry.base.res < 0) {
 		diag_set_journal_res(entry.base.res);
-		goto err_unlock;
+		goto err;
 	}
-	txn_limbo_term_unlock(&txn_limbo);
 	return 0;
-err_unlock:
-	txn_limbo_term_unlock(&txn_limbo);
 err:
 	diag_log();
 	return -1;
diff --git a/src/box/box.cc b/src/box/box.cc
index a9f6da19d..31adb6c7d 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1573,7 +1573,7 @@ box_run_elections(void)
 static int
 box_check_promote_term_intact(uint64_t promote_term)
 {
-	if (txn_limbo_term_max_raw(&txn_limbo) != promote_term) {
+	if (txn_limbo.promote_greatest_term != promote_term) {
 		diag_set(ClientError, ER_INTERFERING_PROMOTE,
 			 txn_limbo.owner_id);
 		return -1;
@@ -1585,7 +1585,7 @@ box_check_promote_term_intact(uint64_t promote_term)
 static int
 box_trigger_elections(void)
 {
-	uint64_t promote_term = txn_limbo_term_max_raw(&txn_limbo);
+	uint64_t promote_term = txn_limbo.promote_greatest_term;
 	raft_new_term(box_raft());
 	if (box_raft_wait_term_persisted() < 0)
 		return -1;
@@ -1596,7 +1596,7 @@ box_trigger_elections(void)
 static int
 box_try_wait_confirm(double timeout)
 {
-	uint64_t promote_term = txn_limbo_term_max_raw(&txn_limbo);
+	uint64_t promote_term = txn_limbo.promote_greatest_term;
 	txn_limbo_wait_empty(&txn_limbo, timeout);
 	return box_check_promote_term_intact(promote_term);
 }
@@ -1612,7 +1612,7 @@ box_wait_limbo_acked(void)
 	if (txn_limbo_is_empty(&txn_limbo))
 		return txn_limbo.confirmed_lsn;
 
-	uint64_t promote_term = txn_limbo_term_max_raw(&txn_limbo);
+	uint64_t promote_term = txn_limbo.promote_greatest_term;
 	int quorum = replication_synchro_quorum;
 	struct txn_limbo_entry *last_entry;
 	last_entry = txn_limbo_last_synchro_entry(&txn_limbo);
@@ -1728,7 +1728,7 @@ box_promote(void)
 	 * Currently active leader (the instance that is seen as leader by both
 	 * raft and txn_limbo) can't issue another PROMOTE.
 	 */
-	bool is_leader = txn_limbo_term(&txn_limbo, instance_id) ==
+	bool is_leader = txn_limbo_replica_term(&txn_limbo, instance_id) ==
 			 raft->term && txn_limbo.owner_id == instance_id;
 	if (box_election_mode != ELECTION_MODE_OFF)
 		is_leader = is_leader && raft->state == RAFT_STATE_LEADER;
@@ -1784,7 +1784,7 @@ box_demote(void)
 		return 0;
 
 	/* Currently active leader is the only one who can issue a DEMOTE. */
-	bool is_leader = txn_limbo_term(&txn_limbo, instance_id) ==
+	bool is_leader = txn_limbo_replica_term(&txn_limbo, instance_id) ==
 			 box_raft()->term && txn_limbo.owner_id == instance_id;
 	if (box_election_mode != ELECTION_MODE_OFF)
 		is_leader = is_leader && box_raft()->state == RAFT_STATE_LEADER;
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index be5e0adf5..5a5565a70 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -46,7 +46,7 @@ txn_limbo_create(struct txn_limbo *limbo)
 	fiber_cond_create(&limbo->wait_cond);
 	vclock_create(&limbo->vclock);
 	vclock_create(&limbo->promote_term_map);
-	limbo->promote_term_max = 0;
+	limbo->promote_greatest_term = 0;
 	latch_create(&limbo->promote_latch);
 	limbo->confirmed_lsn = 0;
 	limbo->rollback_count = 0;
@@ -309,7 +309,7 @@ txn_limbo_checkpoint(const struct txn_limbo *limbo,
 	req->type = IPROTO_PROMOTE;
 	req->replica_id = limbo->owner_id;
 	req->lsn = limbo->confirmed_lsn;
-	req->term = limbo->promote_term_max;
+	req->term = limbo->promote_greatest_term;
 }
 
 static void
@@ -728,20 +728,22 @@ void
 txn_limbo_process_locked(struct txn_limbo *limbo,
 			 const struct synchro_request *req)
 {
+	assert(latch_is_locked(&limbo->promote_latch));
+
 	uint64_t term = req->term;
 	uint32_t origin = req->origin_id;
 	if (txn_limbo_term_locked(limbo, origin) < term) {
 		vclock_follow(&limbo->promote_term_map, origin, term);
-		if (term > limbo->promote_term_max)
-			limbo->promote_term_max = term;
+		if (term > limbo->promote_greatest_term)
+			limbo->promote_greatest_term = term;
 	} else if (iproto_type_is_promote_request(req->type) &&
-		   limbo->promote_term_max > 1) {
+		   limbo->promote_greatest_term > 1) {
 		/* PROMOTE for outdated term. Ignore. */
 		say_info("RAFT: ignoring %s request from instance "
 			 "id %u for term %llu. Greatest term seen "
 			 "before (%llu) is bigger.",
 			 iproto_type_name(req->type), origin, (long long)term,
-			 (long long)limbo->promote_term_max);
+			 (long long)limbo->promote_greatest_term);
 		return;
 	}
 
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 25faffd2b..abec9f72a 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -147,7 +147,7 @@ struct txn_limbo {
 	 * instance hasn't read its PROMOTE request yet. During other times the
 	 * limbo and raft are in sync and the terms are the same.
 	 */
-	uint64_t promote_term_max;
+	uint64_t promote_greatest_term;
 	/**
 	 * To order access to the promote data.
 	 */
@@ -224,26 +224,17 @@ txn_limbo_term_lock(struct txn_limbo *limbo)
 }
 
 /** Unlock promote data. */
-static void
+static inline void
 txn_limbo_term_unlock(struct txn_limbo *limbo)
 {
 	latch_unlock(&limbo->promote_latch);
 }
 
-/** Test if promote data is locked. */
-static inline bool
-txn_limbo_term_is_locked(const struct txn_limbo *limbo)
-{
-	return latch_is_locked(&limbo->promote_latch);
-}
-
 /** Fetch replica's term with lock taken. */
 static inline uint64_t
 txn_limbo_term_locked(struct txn_limbo *limbo, uint32_t replica_id)
 {
-	panic_on(!txn_limbo_term_is_locked(limbo),
-		 "limbo: unlocked term read for replica_id %u",
-		 replica_id);
+	assert(latch_is_locked(&limbo->promote_latch));
 	return vclock_get(&limbo->promote_term_map, replica_id);
 }
 
@@ -252,7 +243,7 @@ txn_limbo_term_locked(struct txn_limbo *limbo, uint32_t replica_id)
  * @a replica_id.
  */
 static inline uint64_t
-txn_limbo_term(struct txn_limbo *limbo, uint32_t replica_id)
+txn_limbo_replica_term(struct txn_limbo *limbo, uint32_t replica_id)
 {
 	txn_limbo_term_lock(limbo);
 	uint64_t v = txn_limbo_term_locked(limbo, replica_id);
@@ -260,19 +251,6 @@ txn_limbo_term(struct txn_limbo *limbo, uint32_t replica_id)
 	return v;
 }
 
-/**
- * Fiber's preempt not safe read of @a terms_max.
- *
- * Use it if you're interested in current value
- * only and ready that the value is getting updated
- * if after the read yield happens.
- */
-static inline uint64_t
-txn_limbo_term_max_raw(struct txn_limbo *limbo)
-{
-	return limbo->promote_term_max;
-}
-
 /**
  * Check whether replica with id @a source_id is too old to apply synchronous
  * data from it. The check is only valid when elections are enabled.
@@ -283,7 +261,7 @@ txn_limbo_is_replica_outdated(struct txn_limbo *limbo,
 {
 	txn_limbo_term_lock(limbo);
 	bool res = txn_limbo_term_locked(limbo, replica_id) <
-		limbo->promote_term_max;
+		limbo->promote_greatest_term;
 	txn_limbo_term_unlock(limbo);
 	return res;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests
  2021-08-02 23:50   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-03 13:25     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-03 13:25 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Tue, Aug 03, 2021 at 01:50:49AM +0200, Vladislav Shpilevoy wrote:
> Thanks for the patch!
> 
> On 30.07.2021 13:35, Cyrill Gorcunov wrote:
> > When we receive synchro requests we can't just apply
> > them blindly because in worse case they may come from
> > split-brain configuration (where a cluster splitted into
> 
> splitted -> split.
> 
> > several subclusters and each one has own leader elected,
> > then subclisters are trying to merge back into original
> 
> subclisters -> subclusters.

Thanks!

> 
> > cluster). We need to do our best to detect such configs
> > and force these nodes to rejoin from the scratch for
> > data consistency sake.
> > 
> > Thus when we're processing requests we pass them to the
> > packet filter first which validates their contents and
> > refuse to apply if they are not matched.
> > 
> > Depending on request type each packet traverse an
> > appropriate chain(s)
> > 
> > FILTER_IN
> >  - Common chain for any synchro packet. We verify
> >    that if replica_id is nil then it shall be
> >    PROMOTE request with lsn 0 to migrate limbo owner
> 
> How can it be 0 for non PROMOTE/DEMOTE requests?
> Do we ever encode such rows at all? Why isn't this
> a part of FILTER_PROMOTE?

There could be network errors for example, thus when we
see synchro type of a packet we need to verify its common
attributes first before passing to the next chain. These
attributes are not depending on limbo state I think.

It this particular case if we see a packet with replica_id
is nil then it must be a promote/demote request. Otherwise
I'll have to add these tests to every non promote/demote
packets.

For example imagine a packet {rollback | lsn = 0}, it is
obviously wrong because we don't have lsn = 0 records at
all. Thus either I should this test inside confirm/rollback
chains or make a common helper which would make a general
validation of incming synchro packets. For this sake
filter-in chain has been introduced.

> 
> > FILTER_CONFIRM
> > FILTER_ROLLBACK
> >  - Both confirm and rollback requests shall not come
> >    with empty limbo since it measn the synchro queue
> 
> measn -> means.

thanks

> 
> >    is already processed and the peer didn't notice
> >    that
> 
> Is it the only issue? What about ROLLBACK coming to
> an instance, which already made PROMOTE on the rolled back
> data? That is a part of the original problem in the ticket.

Then it is an error as far as I understand. There is no more
queued data and promote request basically dropped any information
we've had in memory related to the limbo state. The promote request
implies that the node where it is executed is a raft leader and
its data is only valid point for all other node in clusters. Thus
if in receve confirm/rollback request on rows which are already
commited (or rolled back) via promote request, then other peer
should exit with error. Don't we? Or I miss something?

> > FILTER_PROMOTE
> >  - Promote request should come in with new terms only,
> >    otherwise it means the peer didn't notice election
> > 
> >  - If limbo's confirmed_lsn is equal to promote LSN then
> >    it is a valid request to process
> > 
> >  - If limbo's confirmed_lsn is bigger than requested then
> >    it is valid in one case only -- limbo migration so the
> >    queue shall be empty
> 
> I don't understand. How is it valid? PROMOTE(lsn) rolls
> back everything > lsn. If the local confirmed_lsn > lsn, it
> means that data can't be rolled back now and the data becomes
> inconsistent.

IIRC this was a scenario where we're migrating a limbo owner,
I think the scenario with owner migration is yet unclear for
me, need to revisit this moment.

> 
> >  - If limbo's confirmed_lsn is less than promote LSN then
> >    - If queue is empty then it means the transactions are
> >      already rolled back and request is invalid
> >    - If queue is not empty then its first entry might be
> >      greater than promote LSN and it means that old data
> >      either committed or rolled back already and request
> >      is invalid
> 
> If the first entry's LSN in the limbo > promote LSN, it
> means it wasn't committed yet. The promote will roll it back
> and it is fine. This will make the data consistent.

 quoting you:
  > Первая транзакция лимба имеет lsn > promote lsn. Это уже конец.
  > Потому что старый мастер уже старые данные либо закатил, либо
  > откатил, уже неважно, и это сплит бреин.
  translation:
  > First limbo transaction lsn > promote lsn. This is the end.
  > Because old master has committed or rolled back the data
  > already, it doesn't matter, this is a split brain situation.

Maybe I got you wrong?

> 
> The problem appears if there were some other sync txns
> rolled back or even committed with quorum=1 before this
> hanging txn. And I don't remember we figured a way to
> distinguish between these situations. Did we?

Seems like not yet. Need more time to think if we can have some
other scenarios here...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests
  2021-08-03 10:51   ` Serge Petrenko via Tarantool-patches
@ 2021-08-03 13:49     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-03 13:49 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy

On Tue, Aug 03, 2021 at 01:51:44PM +0300, Serge Petrenko wrote:
> 
> > +		if (req->lsn != 0 ||
> > +		    !iproto_type_is_promote_request(req->type)) {
> > +			say_info("RAFT: rejecting %s request from "
> > +				 "instance id %u for term %llu. "
> > +				 "req->replica_id = 0 but lsn %lld.",
> > +				 iproto_type_name(req->type),
> > +				 req->origin_id, (long long)req->term,
> > +				 (long long)req->lsn);
> > +
> > +			diag_set(ClientError, ER_UNSUPPORTED,
> > +				 "Replication",
> > +				 "empty replica_id with nonzero LSN");
> > +			return -1;
> > +		}
> > +	}
> 
> I agree with Vlad. This may be moved to filter_confirm_rollback.

You know, as I replied to Vlad, I think we should do a common
attibutes verification in early stage, because not only due to
network error but for security reasons as well, ie say some fuzzer
in  network could feed us with screwed data, and when we sort packets
via chains to process we could do an early verification first.

But I won't insist, surely I can move it to promote chain.

> > +
> > +	/*
> > +	 * Explicit split brain situation. Promote
> > +	 * comes in with an old LSN which we've already
> > +	 * processed.
> > +	 */
> > +	if (limbo->confirmed_lsn > promote_lsn) {
> > +		/*
> > +		 * If limbo is empty we're migrating
> > +		 * the owner.
> > +		 */
> > +		if (txn_limbo_is_empty(limbo))
> > +			return 0;
> 
> I don't understand this part. Are you sure this check is needed?
> We're always migrating the owner with a promote.

Serge, I think this code hunk is due to my misunderstanding of
owner migration, should remove it I think, thanks!

> > +
> > +	if (txn_limbo_is_empty(limbo)) {
> > +		/*
> > +		 * Transactions are already rolled back
> > +		 * since the limbo is empty.
> > +		 */
> > +		say_info("RAFT: rejecting %s request from "
> > +			 "instance id %u for term %llu. "
> > +			 "confirmed_lsn %lld < promote_lsn %lld "
> > +			 "and empty limbo.",
> > +			 iproto_type_name(req->type),
> > +			 req->origin_id, (long long)req->term,
> > +			 (long long)limbo->confirmed_lsn,
> > +			 (long long)promote_lsn);
> > +
> > +		diag_set(ClientError, ER_UNSUPPORTED,
> > +			 "Replication",
> > +			 "forward promote LSN "
> > +			 "(empty limbo, split brain)");
> > +		return -1;
> 
> I think it'd be better to have a separate error code for this purpose.
> Say, ER_SPLITBRAIN or something.
> Then applier would have more control over what to do when such an error
> is raised. Say, never reconnect. (I doesn't reconnect on ER_UNSUPPORTED,
> I believe, but a distinct error is still better).

Split-bain is one particular case, since more cases why we refuse to proceed
may appear in future maybe ER_REJECT with rejection error (just like netfilter
does?)

> > +	} else {
> > +		/*
> > +		 * Some entries are present in the limbo,
> > +		 * and if first entry's LSN is greater than
> > +		 * requested then old data either commited
> > +		 * or rolled back, so can't continue.
> > +		 */
> > +		struct txn_limbo_entry *first;
> > +
> > +		first = txn_limbo_first_entry(limbo);
> > +		if (first->lsn > promote_lsn) {
> 
> This should only happen when confirmed_lsn > promote_lsn, shouldn't it?
> If yes, than you've already handled it above.

Nope. This is the case where the data is still in limbo and we've not
yet commited it but master already commited/rolledback it already.
If only I'm not misssing something obvious.

> > +
> > +static int (*filter_req[FILTER_MAX])
> > +(struct txn_limbo *limbo, const struct synchro_request *req) = {
> > +	[FILTER_IN]		= filter_in,
> > +	[FILTER_CONFIRM]	= filter_confirm_rollback,
> > +	[FILTER_ROLLBACK]	= filter_confirm_rollback,
> > +	[FILTER_PROMOTE]	= filter_promote,
> > +	[FILTER_DEMOTE]		= filter_demote,
> 
> Demote should be filtered the same way as promote, they are
> basically the same requests with same meaning.
> 
> demote is just a promote for replica id 0, because we couldn't
> do promote replica id 0.

Aha, thanks, will update.

> > -	if (req->replica_id == REPLICA_ID_NIL) {
> > -		/*
> > -		 * The limbo was empty on the instance issuing the request.
> > -		 * This means this instance must empty its limbo as well.
> > -		 */
> > -		assert(lsn == 0 && iproto_type_is_promote_request(req->type));
> > -	} else if (req->replica_id != limbo->owner_id) {
> > +	if (req->replica_id != limbo->owner_id) {
> >   		/*
> >   		 * Ignore CONFIRM/ROLLBACK messages for a foreign master.
> >   		 * These are most likely outdated messages for already confirmed
> 
> We should error when request -> replica_id != limbo->owner_id.
> For every entry type: promote/demote/confirm/rollback.
> 
> req->replica_id != limbo->owner_id means that the remote instance has
> taken some actions in the past (say, confirmed something) of which we
> didn't know until now. This is basically a splitbrain again.

Good point. I think we should put it into filter-in chain then.

	Cyrill

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-08-03 13:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 11:35 [Tarantool-patches] [PATCH v9 0/5] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 1/5] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 2/5] say: introduce panic_on helper Cyrill Gorcunov via Tarantool-patches
2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 3/5] limbo: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
2021-08-02 23:48   ` Vladislav Shpilevoy via Tarantool-patches
2021-08-03 11:23     ` Cyrill Gorcunov via Tarantool-patches
2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches
2021-08-02 23:50   ` Vladislav Shpilevoy via Tarantool-patches
2021-08-03 13:25     ` Cyrill Gorcunov via Tarantool-patches
2021-08-03 10:51   ` Serge Petrenko via Tarantool-patches
2021-08-03 13:49     ` Cyrill Gorcunov via Tarantool-patches
2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 5/5] test: add replication/gh-6036-rollback-confirm Cyrill Gorcunov via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox