Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering
@ 2021-08-04 19:07 Cyrill Gorcunov via Tarantool-patches
  2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 1/4] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-04 19:07 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Replication tests are passing but gihutb tests are not: the series
is on top of Serge's branch `sp/gh-6034-empty-limbo-transition` which
is failing yet.

Vlad, back to your question about one single test for
CONFIRM/ROLLBACK packet filtering

 > Is it the only issue? What about ROLLBACK coming to
 > an instance, which already made PROMOTE on the rolled back
 > data? That is a part of the original problem in the ticket.

if any confirm/rollback comes at the moment when our limbo is
already empty I think we can't do anything, there is no data
to investigate and the node should rejoin from the scratch,
or you have something else in mind? If yes, then we can extend
our confirm/rollback filter chain with more cases.

We've a long discussion with Serge about current packet validation
in PROMOTE/DEMOTE filter so I reworked the validation a lot, please
take a look.

previous series https://lists.tarantool.org/tarantool-patches/20210730113539.563318-1-gorcunov@gmail.com/
branch gorcunov/gh-6036-rollback-confirm-10-notest
issue https://github.com/tarantool/tarantool/issues/6036

v6:
 - use txn_limbo_terms name for structure
 - rebase on fresh sp/gh-6034-empty-limbo-transition branch
 - rework filtering chains
v8:
 - add ability to disable filtering for local recovery
   and join stages
 - update tests
v9:
 - opencode terms tracking
 - fix tests to use wait function since log output might
   be deferred by OS
v10:
 - rework FILTER_IN and FILTER_PROMOTE chains with more
   detailed packets inspection
 - preserve old naming for terms manipulations
 - require the packet's replica_id to match limbo owner_id
   all the time

Cyrill Gorcunov (4):
  latch: add latch_is_locked helper
  limbo: order access to the limbo terms
  limbo: filter incoming synchro requests
  test: add replication/gh-6036-rollback-confirm

 src/box/applier.cc                            |  31 +-
 src/box/box.cc                                |  11 +-
 src/box/memtx_engine.c                        |   3 +-
 src/box/txn_limbo.c                           | 367 ++++++++++++++++--
 src/box/txn_limbo.h                           |  79 +++-
 src/lib/core/latch.h                          |  11 +
 test/replication/gh-6036-master.lua           |   1 +
 test/replication/gh-6036-node.lua             |  33 ++
 test/replication/gh-6036-replica.lua          |   1 +
 .../gh-6036-rollback-confirm.result           | 180 +++++++++
 .../gh-6036-rollback-confirm.test.lua         |  92 +++++
 11 files changed, 757 insertions(+), 52 deletions(-)
 create mode 120000 test/replication/gh-6036-master.lua
 create mode 100644 test/replication/gh-6036-node.lua
 create mode 120000 test/replication/gh-6036-replica.lua
 create mode 100644 test/replication/gh-6036-rollback-confirm.result
 create mode 100644 test/replication/gh-6036-rollback-confirm.test.lua


base-commit: 9b668f4f8508ee890f4b643b0f9ec16024939bff
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Tarantool-patches] [PATCH v10 1/4] latch: add latch_is_locked helper
  2021-08-04 19:07 [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
@ 2021-08-04 19:07 ` Cyrill Gorcunov via Tarantool-patches
  2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-04 19:07 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

To test if latch is locked.

In-scope-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/lib/core/latch.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/src/lib/core/latch.h b/src/lib/core/latch.h
index 49c59cf63..0aaa8b634 100644
--- a/src/lib/core/latch.h
+++ b/src/lib/core/latch.h
@@ -95,6 +95,17 @@ latch_owner(struct latch *l)
 	return l->owner;
 }
 
+/**
+ * Return true if the latch is locked.
+ *
+ * @param l - latch to be tested.
+ */
+static inline bool
+latch_is_locked(const struct latch *l)
+{
+	return l->owner != NULL;
+}
+
 /**
  * Lock a latch. If the latch is already locked by another fiber,
  * waits for timeout.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms
  2021-08-04 19:07 [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
  2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 1/4] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
@ 2021-08-04 19:07 ` Cyrill Gorcunov via Tarantool-patches
  2021-08-05 23:29   ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-04 19:07 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Limbo terms tracking is shared between appliers and when
one of appliers is waiting for write to complete inside
journal_write() routine, an other may need to access read
term value to figure out if promote request is valid to
apply. Due to cooperative multitasking access to the terms
is not consistent so we need to be sure that other fibers
either read up to date terms (ie written to the WAL).

For this sake we use latching mechanism, when one fiber
took terms-lock for updating other readers are waiting
until the operation is complete.

For example here is a call graph of two appliers

applier 1
---------
applier_apply_tx
  (promote term = 3
   current max term = 2)
  applier_synchro_filter_tx
  apply_synchro_row
    journal_write
      (sleeping)

at this moment another applier comes in with obsolete
data and term 2

                              applier 2
                              ---------
                              applier_apply_tx
                                (term 2)
                                applier_synchro_filter_tx
                                  txn_limbo_is_replica_outdated -> false
                                journal_write (sleep)

applier 1
---------
journal wakes up
  apply_synchro_row_cb
    set max term to 3

So the applier 2 didn't notice that term 3 is already seen
and wrote obsolete data. With locking the applier 2 will
wait until applier 1 has finished its write.

Part-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/applier.cc  |  8 ++++++--
 src/box/txn_limbo.c | 17 ++++++++++++++--
 src/box/txn_limbo.h | 48 ++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 64 insertions(+), 9 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index f621fa657..9db286ae2 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -856,7 +856,7 @@ apply_synchro_row_cb(struct journal_entry *entry)
 		applier_rollback_by_wal_io(entry->res);
 	} else {
 		replica_txn_wal_write_cb(synchro_entry->rcb);
-		txn_limbo_process(&txn_limbo, synchro_entry->req);
+		txn_limbo_process_locked(&txn_limbo, synchro_entry->req);
 		trigger_run(&replicaset.applier.on_wal_write, NULL);
 	}
 	fiber_wakeup(synchro_entry->owner);
@@ -867,11 +867,13 @@ static int
 apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 {
 	assert(iproto_type_is_synchro_request(row->type));
+	int rc = 0;
 
 	struct synchro_request req;
 	if (xrow_decode_synchro(row, &req) != 0)
 		goto err;
 
+	txn_limbo_term_lock(&txn_limbo);
 	struct replica_cb_data rcb_data;
 	struct synchro_entry entry;
 	/*
@@ -908,7 +910,9 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 	 * before trying to commit. But that requires extra steps from the
 	 * transactions side, including the async ones.
 	 */
-	if (journal_write(&entry.base) != 0)
+	rc = journal_write(&entry.base);
+	txn_limbo_term_unlock(&txn_limbo);
+	if (rc != 0)
 		goto err;
 	if (entry.base.res < 0) {
 		diag_set_journal_res(entry.base.res);
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 570f77c46..a718c55a2 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -47,6 +47,7 @@ txn_limbo_create(struct txn_limbo *limbo)
 	vclock_create(&limbo->vclock);
 	vclock_create(&limbo->promote_term_map);
 	limbo->promote_greatest_term = 0;
+	latch_create(&limbo->promote_latch);
 	limbo->confirmed_lsn = 0;
 	limbo->rollback_count = 0;
 	limbo->is_in_rollback = false;
@@ -724,11 +725,14 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout)
 }
 
 void
-txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
+txn_limbo_process_locked(struct txn_limbo *limbo,
+			 const struct synchro_request *req)
 {
+	assert(latch_is_locked(&limbo->promote_latch));
+
 	uint64_t term = req->term;
 	uint32_t origin = req->origin_id;
-	if (txn_limbo_replica_term(limbo, origin) < term) {
+	if (txn_limbo_replica_term_locked(limbo, origin) < term) {
 		vclock_follow(&limbo->promote_term_map, origin, term);
 		if (term > limbo->promote_greatest_term)
 			limbo->promote_greatest_term = term;
@@ -786,6 +790,15 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 	return;
 }
 
+void
+txn_limbo_process(struct txn_limbo *limbo,
+		  const struct synchro_request *req)
+{
+	txn_limbo_term_lock(limbo);
+	txn_limbo_process_locked(limbo, req);
+	txn_limbo_term_unlock(limbo);
+}
+
 void
 txn_limbo_on_parameters_change(struct txn_limbo *limbo)
 {
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 53e52f676..c77c501e9 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -31,6 +31,7 @@
  */
 #include "small/rlist.h"
 #include "vclock/vclock.h"
+#include "latch.h"
 
 #include <stdint.h>
 
@@ -147,6 +148,10 @@ struct txn_limbo {
 	 * limbo and raft are in sync and the terms are the same.
 	 */
 	uint64_t promote_greatest_term;
+	/**
+	 * To order access to the promote data.
+	 */
+	struct latch promote_latch;
 	/**
 	 * Maximal LSN gathered quorum and either already confirmed in WAL, or
 	 * whose confirmation is in progress right now. Any attempt to confirm
@@ -211,14 +216,39 @@ txn_limbo_last_entry(struct txn_limbo *limbo)
 				in_queue);
 }
 
+/** Lock promote data. */
+static inline void
+txn_limbo_term_lock(struct txn_limbo *limbo)
+{
+	latch_lock(&limbo->promote_latch);
+}
+
+/** Unlock promote data. */
+static inline void
+txn_limbo_term_unlock(struct txn_limbo *limbo)
+{
+	latch_unlock(&limbo->promote_latch);
+}
+
+/** Fetch replica's term with lock taken. */
+static inline uint64_t
+txn_limbo_replica_term_locked(struct txn_limbo *limbo, uint32_t replica_id)
+{
+	assert(latch_is_locked(&limbo->promote_latch));
+	return vclock_get(&limbo->promote_term_map, replica_id);
+}
+
 /**
  * Return the latest term as seen in PROMOTE requests from instance with id
  * @a replica_id.
  */
 static inline uint64_t
-txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t replica_id)
+txn_limbo_replica_term(struct txn_limbo *limbo, uint32_t replica_id)
 {
-	return vclock_get(&limbo->promote_term_map, replica_id);
+	txn_limbo_term_lock(limbo);
+	uint64_t v = txn_limbo_replica_term_locked(limbo, replica_id);
+	txn_limbo_term_unlock(limbo);
+	return v;
 }
 
 /**
@@ -226,11 +256,14 @@ txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t replica_id)
  * data from it. The check is only valid when elections are enabled.
  */
 static inline bool
-txn_limbo_is_replica_outdated(const struct txn_limbo *limbo,
+txn_limbo_is_replica_outdated(struct txn_limbo *limbo,
 			      uint32_t replica_id)
 {
-	return txn_limbo_replica_term(limbo, replica_id) <
-	       limbo->promote_greatest_term;
+	txn_limbo_term_lock(limbo);
+	bool res = txn_limbo_replica_term_locked(limbo, replica_id) <
+		limbo->promote_greatest_term;
+	txn_limbo_term_unlock(limbo);
+	return res;
 }
 
 /**
@@ -302,6 +335,11 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 
 /** Execute a synchronous replication request. */
 void
+txn_limbo_process_locked(struct txn_limbo *limbo,
+			 const struct synchro_request *req);
+
+/** Lock limbo terms and execute a synchronous replication request. */
+void
 txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
 
 /**
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests
  2021-08-04 19:07 [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
  2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 1/4] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
  2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
@ 2021-08-04 19:07 ` Cyrill Gorcunov via Tarantool-patches
  2021-08-05 23:33   ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 4/4] test: add replication/gh-6036-rollback-confirm Cyrill Gorcunov via Tarantool-patches
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-04 19:07 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

When we receive synchro requests we can't just apply
them blindly because in worse case they may come from
split-brain configuration (where a cluster split into
several clusters and each one has own leader elected,
then clusters are trying to merge back into the original
one). We need to do our best to detect such disunity
and force these nodes to rejoin from the scratch for
data consistency sake.

Thus when we're processing requests we pass them to the
packet filter first which validates their contents and
refuse to apply if they are not matched.

Depending on request type each packet traverse an
appropriate chain(s)

                                          +---> FILTER_PROMOTE
                                          |
                                          +---> FILTER_CONFIRM
 reader -> FILTER_IN -> request:{type} -> |
              |                           +---> FILTER_ROLLBACK
              | error                     |
              V                           +---> FILTER_DEMOTE
             exit

FILTER_IN
 Common chain for any synchro packet.
 1) Zero LSN allowed for PROMOTE | DEMOTE packets, since
    CONFIRM | ROLLBACK has to proceed some real data with
    LSN already assigned.
 2) request:replica_id = 0 allowed for PROMOTE request only.
 3) request:replica_id should match limbo:owner_id, iow the
    limbo migration should be noticed by all instances in the
    cluster.

FILTER_CONFIRM
FILTER_ROLLBACK
 1) Both confirm and rollback requests can't be considered
    if limbo is already empty, ie there is no data in a
    local queue and everything is processed already. The
    request is obviously from the node which is out of date.

FILTER_PROMOTE
FILTER_DEMOTE
 1) The requests should come in with nonzero term, otherwise
    the packet is corrupted.
 2) The request's term should not be less than maximal known
    one, iow it should not come in from nodes which didn't notice
    raft epoch changes and living in the past.
 3) If LSN of the request matches current confirmed LSN the packet
    is obviously correct to process.
 4) If LSN is less than confirmed LSN then the request is wrong,
    we have processed the requested LSN already.
 5) If LSN is less than confirmed LSN then
    a) If limbo is empty we can't do anything, since data is already
       processed and should issue an error;
    b) If there is some data in the limbo then requested LSN should
       be in range of limbo's [first; last] LSNs, thus the request
       will be able to commit and rollback limbo queue.

Closes #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/applier.cc     |  23 ++-
 src/box/box.cc         |  11 +-
 src/box/memtx_engine.c |   3 +-
 src/box/txn_limbo.c    | 354 ++++++++++++++++++++++++++++++++++++-----
 src/box/txn_limbo.h    |  33 +++-
 5 files changed, 378 insertions(+), 46 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 9db286ae2..f64b6fa35 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -458,7 +458,8 @@ applier_wait_snapshot(struct applier *applier)
 				struct synchro_request req;
 				if (xrow_decode_synchro(&row, &req) != 0)
 					diag_raise();
-				txn_limbo_process(&txn_limbo, &req);
+				if (txn_limbo_process(&txn_limbo, &req) != 0)
+					diag_raise();
 			} else if (iproto_type_is_raft_request(row.type)) {
 				struct raft_request req;
 				if (xrow_decode_raft(&row, &req, NULL) != 0)
@@ -514,6 +515,11 @@ applier_fetch_snapshot(struct applier *applier)
 	struct ev_io *coio = &applier->io;
 	struct xrow_header row;
 
+	txn_limbo_filter_disable(&txn_limbo);
+	auto filter_guard = make_scoped_guard([&]{
+		txn_limbo_filter_enable(&txn_limbo);
+	});
+
 	memset(&row, 0, sizeof(row));
 	row.type = IPROTO_FETCH_SNAPSHOT;
 	coio_write_xrow(coio, &row);
@@ -587,6 +593,11 @@ applier_register(struct applier *applier, bool was_anon)
 	struct ev_io *coio = &applier->io;
 	struct xrow_header row;
 
+	txn_limbo_filter_disable(&txn_limbo);
+	auto filter_guard = make_scoped_guard([&]{
+		txn_limbo_filter_enable(&txn_limbo);
+	});
+
 	memset(&row, 0, sizeof(row));
 	/*
 	 * Send this instance's current vclock together
@@ -620,6 +631,11 @@ applier_join(struct applier *applier)
 	struct xrow_header row;
 	uint64_t row_count;
 
+	txn_limbo_filter_disable(&txn_limbo);
+	auto filter_guard = make_scoped_guard([&]{
+		txn_limbo_filter_enable(&txn_limbo);
+	});
+
 	xrow_encode_join_xc(&row, &INSTANCE_UUID);
 	coio_write_xrow(coio, &row);
 
@@ -874,6 +890,11 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 		goto err;
 
 	txn_limbo_term_lock(&txn_limbo);
+	if (txn_limbo_filter_locked(&txn_limbo, &req) != 0) {
+		txn_limbo_term_unlock(&txn_limbo);
+		goto err;
+	}
+
 	struct replica_cb_data rcb_data;
 	struct synchro_entry entry;
 	/*
diff --git a/src/box/box.cc b/src/box/box.cc
index 8dc3b130b..c3516b7a4 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1675,7 +1675,8 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
 		.lsn = promote_lsn,
 		.term = raft->term,
 	};
-	txn_limbo_process(&txn_limbo, &req);
+	if (txn_limbo_process(&txn_limbo, &req) != 0)
+		diag_raise();
 	assert(txn_limbo_is_empty(&txn_limbo));
 }
 
@@ -1697,7 +1698,8 @@ box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
 		.lsn = promote_lsn,
 		.term = box_raft()->term,
 	};
-	txn_limbo_process(&txn_limbo, &req);
+	if (txn_limbo_process(&txn_limbo, &req) != 0)
+		diag_raise();
 	assert(txn_limbo_is_empty(&txn_limbo));
 }
 
@@ -3284,6 +3286,11 @@ local_recovery(const struct tt_uuid *instance_uuid,
 
 	say_info("instance uuid %s", tt_uuid_str(&INSTANCE_UUID));
 
+	txn_limbo_filter_disable(&txn_limbo);
+	auto filter_guard = make_scoped_guard([&]{
+		txn_limbo_filter_enable(&txn_limbo);
+	});
+
 	struct wal_stream wal_stream;
 	wal_stream_create(&wal_stream);
 	auto stream_guard = make_scoped_guard([&]{
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 0b06e5e63..4aed24fe3 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -238,7 +238,8 @@ memtx_engine_recover_synchro(const struct xrow_header *row)
 	 * because all its rows have a zero replica_id.
 	 */
 	req.origin_id = req.replica_id;
-	txn_limbo_process(&txn_limbo, &req);
+	if (txn_limbo_process(&txn_limbo, &req) != 0)
+		return -1;
 	return 0;
 }
 
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index a718c55a2..59fb51fac 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -51,6 +51,7 @@ txn_limbo_create(struct txn_limbo *limbo)
 	limbo->confirmed_lsn = 0;
 	limbo->rollback_count = 0;
 	limbo->is_in_rollback = false;
+	limbo->is_filtering = true;
 }
 
 bool
@@ -724,6 +725,302 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout)
 	return 0;
 }
 
+enum filter_chain {
+	FILTER_IN,
+	FILTER_CONFIRM,
+	FILTER_ROLLBACK,
+	FILTER_PROMOTE,
+	FILTER_DEMOTE,
+	FILTER_MAX,
+};
+
+/**
+ * Fill the reject reason with request data.
+ * The function is not reenterable, use with caution.
+ */
+static char *
+reject_str(const struct synchro_request *req)
+{
+	static char prefix[128];
+
+	snprintf(prefix, sizeof(prefix), "RAFT: rejecting %s (%d) "
+		 "request from origin_id %u replica_id %u term %llu",
+		 iproto_type_name(req->type), req->type,
+		 req->origin_id, req->replica_id,
+		 (long long)req->term);
+
+	return prefix;
+}
+
+/**
+ * Common chain for any incoming packet.
+ */
+static int
+filter_in(struct txn_limbo *limbo, const struct synchro_request *req)
+{
+	(void)limbo;
+
+	/*
+	 * Zero LSN are allowed for PROMOTE
+	 * and DEMOTE requests only.
+	 */
+	if (req->lsn == 0) {
+		if (!iproto_type_is_promote_request(req->type)) {
+			say_info("%s. Zero lsn detected",
+				 reject_str(req));
+
+			diag_set(ClientError, ER_UNSUPPORTED,
+				 "Replication",
+				 "zero LSN on promote/demote");
+			return -1;
+		}
+	}
+
+	/*
+	 * Zero @a replica_id is allowed for PROMOTE packets only.
+	 */
+	if (req->replica_id == REPLICA_ID_NIL) {
+		if (req->type != IPROTO_PROMOTE) {
+			say_info("%s. Zero replica_id detected",
+				 reject_str(req));
+
+			diag_set(ClientError, ER_UNSUPPORTED,
+				 "Replication",
+				 "zero replica_id");
+			return -1;
+
+		}
+	}
+
+	/*
+	 * Incoming packets should esteem limbo owner,
+	 * if it doesn't match it means the sender
+	 * missed limbo owner migrations and out of date.
+	 */
+	if (req->replica_id != limbo->owner_id) {
+		say_info("%s. Limbo owner mismatch, owner_id %u",
+			 reject_str(req), limbo->owner_id);
+
+		diag_set(ClientError, ER_UNSUPPORTED,
+			 "Replication",
+			 "sync queue silent owner migration");
+		return -1;
+	}
+
+	return 0;
+}
+
+/**
+ * Filter CONFIRM and ROLLBACK packets.
+ */
+static int
+filter_confirm_rollback(struct txn_limbo *limbo,
+			const struct synchro_request *req)
+{
+	/*
+	 * When limbo is empty we have nothing to
+	 * confirm/commit and if this request comes
+	 * in it means the split brain has happened.
+	 */
+	if (!txn_limbo_is_empty(limbo))
+		return 0;
+
+	say_info("%s. Empty limbo detected", reject_str(req));
+
+	diag_set(ClientError, ER_UNSUPPORTED,
+		 "Replication",
+		 "confirm/rollback with empty limbo");
+	return -1;
+}
+
+/**
+ * Filter PROMOTE and DEMOTE packets.
+ */
+static int
+filter_promote_demote(struct txn_limbo *limbo,
+		      const struct synchro_request *req)
+{
+	int64_t promote_lsn = req->lsn;
+
+	/*
+	 * PROMOTE and DEMOTE packets must not have zero
+	 * term supplied, otherwise it is a broken packet.
+	 */
+	if (req->term == 0) {
+		say_info("%s. Zero term detected", reject_str(req));
+
+		diag_set(ClientError, ER_UNSUPPORTED,
+			 "Replication", "zero term");
+		return -1;
+	}
+
+	/*
+	 * If the term is already seen it means it comes
+	 * from a node which didn't notice new elections,
+	 * thus been living in subdomain and its data is
+	 * no longer consistent.
+	 */
+	if (limbo->promote_greatest_term > req->term) {
+		say_info("%s. Max term seen is %llu", reject_str(req),
+			 (long long)limbo->promote_greatest_term);
+
+		diag_set(ClientError, ER_UNSUPPORTED,
+			 "Replication", "obsolete terms");
+		return -1;
+	}
+
+	/*
+	 * Easy case -- processed LSN matches the new
+	 * one which comes inside request, everything
+	 * is consistent.
+	 */
+	if (limbo->confirmed_lsn == promote_lsn)
+		return 0;
+
+	/*
+	 * Explicit split brain situation. Promote
+	 * comes in with an old LSN which we've already
+	 * processed.
+	 */
+	if (limbo->confirmed_lsn > promote_lsn) {
+		say_info("%s. confirmed_lsn %lld > promote_lsn %lld",
+			 reject_str(req),
+			 (long long)limbo->confirmed_lsn,
+			 (long long)promote_lsn);
+
+		diag_set(ClientError, ER_UNSUPPORTED,
+			 "Replication",
+			 "backward promote LSN (split brain)");
+		return -1;
+	}
+
+	/*
+	 * The last case requires a few subcases.
+	 */
+	assert(limbo->confirmed_lsn < promote_lsn);
+
+	if (txn_limbo_is_empty(limbo)) {
+		/*
+		 * Transactions are rolled back already,
+		 * since the limbo is empty.
+		 */
+		say_info("%s. confirmed_lsn %lld < promote_lsn %lld "
+			 "and empty limbo", reject_str(req),
+			 (long long)limbo->confirmed_lsn,
+			 (long long)promote_lsn);
+
+		diag_set(ClientError, ER_UNSUPPORTED,
+			 "Replication",
+			 "forward promote LSN "
+			 "(empty limbo, split brain)");
+		return -1;
+	} else {
+		/*
+		 * Some entries are present in the limbo,
+		 * we need to make sure the @a promote_lsn
+		 * lays inside limbo [first; last] range.
+		 * So that the promote request has some
+		 * queued data to process, otherwise it
+		 * means the request comes from split
+		 * brained node.
+		 */
+		struct txn_limbo_entry *first, *last;
+
+		first = txn_limbo_first_entry(limbo);
+		last = txn_limbo_last_entry(limbo);
+
+		if (first->lsn < promote_lsn ||
+		    last->lsn > promote_lsn) {
+			say_info("%s. promote_lsn %lld out of "
+				 "range [%lld; %lld]",
+				 reject_str(req),
+				 (long long)promote_lsn,
+				 (long long)first->lsn,
+				 (long long)last->lsn);
+
+			diag_set(ClientError, ER_UNSUPPORTED,
+				 "Replication",
+				 "promote LSN out of queue range "
+				 "(split brain)");
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+static int (*filter_req[FILTER_MAX])
+(struct txn_limbo *limbo, const struct synchro_request *req) = {
+	[FILTER_IN]		= filter_in,
+	[FILTER_CONFIRM]	= filter_confirm_rollback,
+	[FILTER_ROLLBACK]	= filter_confirm_rollback,
+	[FILTER_PROMOTE]	= filter_promote_demote,
+	[FILTER_DEMOTE]		= filter_promote_demote,
+};
+
+int
+txn_limbo_filter_locked(struct txn_limbo *limbo,
+			const struct synchro_request *req)
+{
+	unsigned int mask = (1u << FILTER_IN);
+	unsigned int pos = 0;
+
+	assert(latch_is_locked(&limbo->promote_latch));
+
+#ifndef NDEBUG
+	say_info("limbo: filter %s replica_id %u origin_id %u "
+		 "term %lld lsn %lld, queue owner_id %u len %lld "
+		 "confirmed_lsn %lld (%s)",
+		 iproto_type_name(req->type),
+		 req->replica_id, req->origin_id,
+		 (long long)req->term, (long long)req->lsn,
+		 limbo->owner_id, (long long)limbo->len,
+		 (long long)limbo->confirmed_lsn,
+		 limbo->is_filtering ? "on" : "off");
+#endif
+
+	if (!limbo->is_filtering)
+		return 0;
+
+	switch (req->type) {
+	case IPROTO_CONFIRM:
+		mask |= (1u << FILTER_CONFIRM);
+		break;
+	case IPROTO_ROLLBACK:
+		mask |= (1u << FILTER_ROLLBACK);
+		break;
+	case IPROTO_PROMOTE:
+		mask |= (1u << FILTER_PROMOTE);
+		break;
+	case IPROTO_DEMOTE:
+		mask |= (1u << FILTER_DEMOTE);
+		break;
+	default:
+		say_info("RAFT: rejecting unexpected %d "
+			 "request from instance id %u "
+			 "for term %llu.",
+			 req->type, req->origin_id,
+			 (long long)req->term);
+		diag_set(ClientError, ER_UNSUPPORTED,
+			 "Replication",
+			 "unexpected request type");
+		return -1;
+	}
+
+	while (mask != 0) {
+		if ((mask & 1) != 0) {
+			assert(pos < lengthof(filter_req));
+			assert(filter_req[pos] != NULL);
+			if (filter_req[pos](limbo, req) != 0)
+				return -1;
+		}
+		pos++;
+		mask >>= 1;
+	};
+
+	return 0;
+}
+
 void
 txn_limbo_process_locked(struct txn_limbo *limbo,
 			 const struct synchro_request *req)
@@ -732,71 +1029,46 @@ txn_limbo_process_locked(struct txn_limbo *limbo,
 
 	uint64_t term = req->term;
 	uint32_t origin = req->origin_id;
+
 	if (txn_limbo_replica_term_locked(limbo, origin) < term) {
 		vclock_follow(&limbo->promote_term_map, origin, term);
 		if (term > limbo->promote_greatest_term)
 			limbo->promote_greatest_term = term;
-	} else if (iproto_type_is_promote_request(req->type) &&
-		   limbo->promote_greatest_term > 1) {
-		/* PROMOTE for outdated term. Ignore. */
-		say_info("RAFT: ignoring %s request from instance "
-			 "id %u for term %llu. Greatest term seen "
-			 "before (%llu) is bigger.",
-			 iproto_type_name(req->type), origin, (long long)term,
-			 (long long)limbo->promote_greatest_term);
-		return;
 	}
 
-	int64_t lsn = req->lsn;
-	if (req->replica_id == REPLICA_ID_NIL) {
-		/*
-		 * The limbo was empty on the instance issuing the request.
-		 * This means this instance must empty its limbo as well.
-		 */
-		assert(lsn == 0 && iproto_type_is_promote_request(req->type));
-	} else if (req->replica_id != limbo->owner_id) {
-		/*
-		 * Ignore CONFIRM/ROLLBACK messages for a foreign master.
-		 * These are most likely outdated messages for already confirmed
-		 * data from an old leader, who has just started and written
-		 * confirm right on synchronous transaction recovery.
-		 */
-		if (!iproto_type_is_promote_request(req->type))
-			return;
-		/*
-		 * Promote has a bigger term, and tries to steal the limbo. It
-		 * means it probably was elected with a quorum, and it makes no
-		 * sense to wait here for confirmations. The other nodes already
-		 * elected a new leader. Rollback all the local txns.
-		 */
-		lsn = 0;
-	}
 	switch (req->type) {
 	case IPROTO_CONFIRM:
-		txn_limbo_read_confirm(limbo, lsn);
+		txn_limbo_read_confirm(limbo, req->lsn);
 		break;
 	case IPROTO_ROLLBACK:
-		txn_limbo_read_rollback(limbo, lsn);
+		txn_limbo_read_rollback(limbo, req->lsn);
 		break;
 	case IPROTO_PROMOTE:
-		txn_limbo_read_promote(limbo, req->origin_id, lsn);
+		txn_limbo_read_promote(limbo, req->origin_id, req->lsn);
 		break;
 	case IPROTO_DEMOTE:
-		txn_limbo_read_demote(limbo, lsn);
+		txn_limbo_read_demote(limbo, req->lsn);
 		break;
 	default:
-		unreachable();
+		panic("limbo: unexpected request type %d",
+		      req->type);
+		break;
 	}
-	return;
 }
 
-void
+int
 txn_limbo_process(struct txn_limbo *limbo,
 		  const struct synchro_request *req)
 {
+	int rc;
+
 	txn_limbo_term_lock(limbo);
-	txn_limbo_process_locked(limbo, req);
+	rc = txn_limbo_filter_locked(limbo, req);
+	if (rc == 0)
+		txn_limbo_process_locked(limbo, req);
 	txn_limbo_term_unlock(limbo);
+
+	return rc;
 }
 
 void
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index c77c501e9..de33037f5 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -184,6 +184,14 @@ struct txn_limbo {
 	 * by the 'reversed rollback order' rule - contradiction.
 	 */
 	bool is_in_rollback;
+	/**
+	 * Whether the limbo should filter incoming requests.
+	 * The phases of local recovery from WAL file and on applier's
+	 * join phase we are in complete trust of incoming data because
+	 * this data forms an initial limbo state and should not
+	 * filter out requests.
+	 */
+	bool is_filtering;
 };
 
 /**
@@ -333,15 +341,38 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn);
 int
 txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 
+/**
+ * Verify if the request is valid for processing.
+ */
+int
+txn_limbo_filter_locked(struct txn_limbo *limbo,
+			const struct synchro_request *req);
+
 /** Execute a synchronous replication request. */
 void
 txn_limbo_process_locked(struct txn_limbo *limbo,
 			 const struct synchro_request *req);
 
 /** Lock limbo terms and execute a synchronous replication request. */
-void
+int
 txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
 
+/** Enable filtering of synchro requests. */
+static inline void
+txn_limbo_filter_enable(struct txn_limbo *limbo)
+{
+	limbo->is_filtering = true;
+	say_info("limbo: filter enabled");
+}
+
+/** Disable filtering of synchro requests. */
+static inline void
+txn_limbo_filter_disable(struct txn_limbo *limbo)
+{
+	limbo->is_filtering = false;
+	say_info("limbo: filter disabled");
+}
+
 /**
  * Waiting for confirmation of all "sync" transactions
  * during confirm timeout or fail.
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Tarantool-patches] [PATCH v10 4/4] test: add replication/gh-6036-rollback-confirm
  2021-08-04 19:07 [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
                   ` (2 preceding siblings ...)
  2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches
@ 2021-08-04 19:07 ` Cyrill Gorcunov via Tarantool-patches
  2021-08-05  9:38 ` [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
  2021-08-05 23:29 ` Vladislav Shpilevoy via Tarantool-patches
  5 siblings, 0 replies; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-04 19:07 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Follow-up #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 test/replication/gh-6036-master.lua           |   1 +
 test/replication/gh-6036-node.lua             |  33 ++++
 test/replication/gh-6036-replica.lua          |   1 +
 .../gh-6036-rollback-confirm.result           | 180 ++++++++++++++++++
 .../gh-6036-rollback-confirm.test.lua         |  92 +++++++++
 5 files changed, 307 insertions(+)
 create mode 120000 test/replication/gh-6036-master.lua
 create mode 100644 test/replication/gh-6036-node.lua
 create mode 120000 test/replication/gh-6036-replica.lua
 create mode 100644 test/replication/gh-6036-rollback-confirm.result
 create mode 100644 test/replication/gh-6036-rollback-confirm.test.lua

diff --git a/test/replication/gh-6036-master.lua b/test/replication/gh-6036-master.lua
new file mode 120000
index 000000000..65baed5de
--- /dev/null
+++ b/test/replication/gh-6036-master.lua
@@ -0,0 +1 @@
+gh-6036-node.lua
\ No newline at end of file
diff --git a/test/replication/gh-6036-node.lua b/test/replication/gh-6036-node.lua
new file mode 100644
index 000000000..ac701b7a2
--- /dev/null
+++ b/test/replication/gh-6036-node.lua
@@ -0,0 +1,33 @@
+local INSTANCE_ID = string.match(arg[0], "gh%-6036%-(.+)%.lua")
+
+local function unix_socket(name)
+    return "unix/:./" .. name .. '.sock';
+end
+
+require('console').listen(os.getenv('ADMIN'))
+
+if INSTANCE_ID == "master" then
+    box.cfg({
+        listen = unix_socket("master"),
+        replication_connect_quorum = 0,
+        election_mode = 'candidate',
+        replication_synchro_quorum = 3,
+        replication_synchro_timeout = 1000,
+    })
+elseif INSTANCE_ID == "replica" then
+    box.cfg({
+        listen = unix_socket("replica"),
+        replication = {
+            unix_socket("master"),
+            unix_socket("replica")
+        },
+        read_only = true,
+        election_mode = 'voter',
+        replication_synchro_quorum = 2,
+        replication_synchro_timeout = 1000,
+    })
+end
+
+box.once("bootstrap", function()
+    box.schema.user.grant('guest', 'super')
+end)
diff --git a/test/replication/gh-6036-replica.lua b/test/replication/gh-6036-replica.lua
new file mode 120000
index 000000000..65baed5de
--- /dev/null
+++ b/test/replication/gh-6036-replica.lua
@@ -0,0 +1 @@
+gh-6036-node.lua
\ No newline at end of file
diff --git a/test/replication/gh-6036-rollback-confirm.result b/test/replication/gh-6036-rollback-confirm.result
new file mode 100644
index 000000000..27419ce21
--- /dev/null
+++ b/test/replication/gh-6036-rollback-confirm.result
@@ -0,0 +1,180 @@
+-- test-run result file version 2
+--
+-- gh-6036: Test for record collision detection. We have a cluster
+-- of two nodes: master and replica. The master initiates syncho write
+-- but fails to gather a quorum. Before it rolls back the record the
+-- network breakage occurs and replica lives with dirty data while
+-- master node goes offline. The replica becomes a new raft leader
+-- and commits the dirty data, same time master node rolls back this
+-- record and tries to connect to the new raft leader back. Such
+-- connection should be refused because old master node is not longer
+-- consistent.
+--
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+test_run:cmd('create server master with script="replication/gh-6036-master.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server replica with script="replication/gh-6036-replica.lua"')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('start server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica')
+ | ---
+ | - true
+ | ...
+
+--
+-- Connect master to the replica and write a record. Since the quorum
+-- value is bigger than number of nodes in a cluster it will be rolled
+-- back later.
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.cfg({                                       \
+    replication = {                             \
+            "unix/:./master.sock",              \
+            "unix/:./replica.sock",             \
+    },                                          \
+})
+ | ---
+ | ...
+_ = box.schema.create_space('sync', {is_sync = true})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+
+--
+-- Wait the record to appear on the master.
+f = require('fiber').create(function() box.space.sync:replace{1} end)
+ | ---
+ | ...
+test_run:wait_cond(function() return box.space.sync:get({1}) ~= nil end, 100)
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+
+--
+-- Wait the record from master get written and then
+-- drop the replication.
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return box.space.sync:get({1}) ~= nil end, 100)
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+box.cfg{replication = {}}
+ | ---
+ | ...
+
+--
+-- Then we jump back to the master and drop the replication,
+-- thus unconfirmed record get rolled back.
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.cfg({                                       \
+    replication = {},                           \
+    replication_synchro_timeout = 0.001,        \
+    election_mode = 'manual',                   \
+})
+ | ---
+ | ...
+while f:status() ~= 'dead' do require('fiber').sleep(0.1) end
+ | ---
+ | ...
+test_run:wait_cond(function() return box.space.sync:get({1}) == nil end, 100)
+ | ---
+ | - true
+ | ...
+
+--
+-- Force the replica to become a RAFT leader and
+-- commit this new record.
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.cfg({                                       \
+    replication_synchro_quorum = 1,             \
+    election_mode = 'manual'                    \
+})
+ | ---
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ | ...
+
+--
+-- Connect master back to the replica, it should
+-- be refused.
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.cfg({                                       \
+    replication = {                             \
+            "unix/:./replica.sock",             \
+    },                                          \
+})
+ | ---
+ | ...
+box.space.sync:select{}
+ | ---
+ | - []
+ | ...
+test_run:wait_cond(function() return            \
+    test_run:grep_log('master',                 \
+        'rejecting PROMOTE') ~= nil end, 100)   \
+test_run:wait_cond(function() return            \
+    test_run:grep_log('master',                 \
+        'ER_UNSUPPORTED') ~= nil end, 100)
+ | ---
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-6036-rollback-confirm.test.lua b/test/replication/gh-6036-rollback-confirm.test.lua
new file mode 100644
index 000000000..f84f3dd45
--- /dev/null
+++ b/test/replication/gh-6036-rollback-confirm.test.lua
@@ -0,0 +1,92 @@
+--
+-- gh-6036: Test for record collision detection. We have a cluster
+-- of two nodes: master and replica. The master initiates syncho write
+-- but fails to gather a quorum. Before it rolls back the record the
+-- network breakage occurs and replica lives with dirty data while
+-- master node goes offline. The replica becomes a new raft leader
+-- and commits the dirty data, same time master node rolls back this
+-- record and tries to connect to the new raft leader back. Such
+-- connection should be refused because old master node is not longer
+-- consistent.
+--
+test_run = require('test_run').new()
+
+test_run:cmd('create server master with script="replication/gh-6036-master.lua"')
+test_run:cmd('create server replica with script="replication/gh-6036-replica.lua"')
+
+test_run:cmd('start server master')
+test_run:cmd('start server replica')
+
+--
+-- Connect master to the replica and write a record. Since the quorum
+-- value is bigger than number of nodes in a cluster it will be rolled
+-- back later.
+test_run:switch('master')
+box.cfg({                                       \
+    replication = {                             \
+            "unix/:./master.sock",              \
+            "unix/:./replica.sock",             \
+    },                                          \
+})
+_ = box.schema.create_space('sync', {is_sync = true})
+_ = box.space.sync:create_index('pk')
+
+--
+-- Wait the record to appear on the master.
+f = require('fiber').create(function() box.space.sync:replace{1} end)
+test_run:wait_cond(function() return box.space.sync:get({1}) ~= nil end, 100)
+box.space.sync:select{}
+
+--
+-- Wait the record from master get written and then
+-- drop the replication.
+test_run:switch('replica')
+test_run:wait_cond(function() return box.space.sync:get({1}) ~= nil end, 100)
+box.space.sync:select{}
+box.cfg{replication = {}}
+
+--
+-- Then we jump back to the master and drop the replication,
+-- thus unconfirmed record get rolled back.
+test_run:switch('master')
+box.cfg({                                       \
+    replication = {},                           \
+    replication_synchro_timeout = 0.001,        \
+    election_mode = 'manual',                   \
+})
+while f:status() ~= 'dead' do require('fiber').sleep(0.1) end
+test_run:wait_cond(function() return box.space.sync:get({1}) == nil end, 100)
+
+--
+-- Force the replica to become a RAFT leader and
+-- commit this new record.
+test_run:switch('replica')
+box.cfg({                                       \
+    replication_synchro_quorum = 1,             \
+    election_mode = 'manual'                    \
+})
+box.ctl.promote()
+box.space.sync:select{}
+
+--
+-- Connect master back to the replica, it should
+-- be refused.
+test_run:switch('master')
+box.cfg({                                       \
+    replication = {                             \
+            "unix/:./replica.sock",             \
+    },                                          \
+})
+box.space.sync:select{}
+test_run:wait_cond(function() return            \
+    test_run:grep_log('master',                 \
+        'rejecting PROMOTE') ~= nil end, 100)   \
+test_run:wait_cond(function() return            \
+    test_run:grep_log('master',                 \
+        'ER_UNSUPPORTED') ~= nil end, 100)
+
+test_run:switch('default')
+test_run:cmd('stop server master')
+test_run:cmd('delete server master')
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering
  2021-08-04 19:07 [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
                   ` (3 preceding siblings ...)
  2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 4/4] test: add replication/gh-6036-rollback-confirm Cyrill Gorcunov via Tarantool-patches
@ 2021-08-05  9:38 ` Cyrill Gorcunov via Tarantool-patches
  2021-08-05 23:29 ` Vladislav Shpilevoy via Tarantool-patches
  5 siblings, 0 replies; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-05  9:38 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

On Wed, Aug 04, 2021 at 10:07:48PM +0300, Cyrill Gorcunov wrote:
> 
> We've a long discussion with Serge about current packet validation
> in PROMOTE/DEMOTE filter so I reworked the validation a lot, please
> take a look.

Also I left current error code, ie ER_UNSUPPORTED as is for now,
I'll update it once we manage to agree on everything else design.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering
  2021-08-04 19:07 [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
                   ` (4 preceding siblings ...)
  2021-08-05  9:38 ` [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
@ 2021-08-05 23:29 ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-08 22:03   ` Cyrill Gorcunov via Tarantool-patches
  5 siblings, 1 reply; 24+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-05 23:29 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Hi! Thanks for the patchset!

On top of the branch I tried the test I pasted in the
ticket's description.

I see the connection now breaks in one direction. But the
new leader still follows the old leader somewhy.

This is the old leader (id = 1) which did ROLLBACK:

tarantool> box.info.replication
---
- 1:
    id: 1
    uuid: bf51f188-cb5d-4471-85ee-2a2be4f24052
    lsn: 8
  2:
    id: 2
    uuid: ff1d81af-fae2-4e29-989a-4eb20f06eeb5
    lsn: 0
    upstream:
      peer: localhost:3314
      lag: 4.9944338798523
      status: stopped
      idle: 10.290937999991
      message: Replication does not support forward promote LSN (empty limbo, split
        brain)
    downstream:
      status: follow
      idle: 0.65130299999146
      vclock: {2: 1, 1: 8}
      lag: 14.389625072479
...

This is the new leader (id = 2) which did PROMOTE on the same data:

tarantool> box.info.replication
---
- 1:
    id: 1
    uuid: bf51f188-cb5d-4471-85ee-2a2be4f24052
    lsn: 8
    upstream:
      status: follow
      idle: 0.46603999999934
      peer: localhost:3313
      lag: 0.00013971328735352
    downstream:
      status: stopped
      message: unexpected EOF when reading from socket, called on fd 18, aka [::1]:3314
      system_message: Broken pipe
  2:
    id: 2
    uuid: ff1d81af-fae2-4e29-989a-4eb20f06eeb5
    lsn: 1
...

Why is there still downstream from id 1 to id 2? The new leader
should have received the ROLLBACK which it should have seen can't
be properly applied. Why didn't it cut the connection?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms
  2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
@ 2021-08-05 23:29   ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-06 15:20     ` Cyrill Gorcunov via Tarantool-patches
  2021-08-23 11:32     ` Serge Petrenko via Tarantool-patches
  0 siblings, 2 replies; 24+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-05 23:29 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Thanks for the patch!

> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index f621fa657..9db286ae2 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -856,7 +856,7 @@ apply_synchro_row_cb(struct journal_entry *entry)
>  		applier_rollback_by_wal_io(entry->res);
>  	} else {
>  		replica_txn_wal_write_cb(synchro_entry->rcb);
> -		txn_limbo_process(&txn_limbo, synchro_entry->req);
> +		txn_limbo_process_locked(&txn_limbo, synchro_entry->req);
>  		trigger_run(&replicaset.applier.on_wal_write, NULL);
>  	}
>  	fiber_wakeup(synchro_entry->owner);
> @@ -867,11 +867,13 @@ static int
>  apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
>  {
>  	assert(iproto_type_is_synchro_request(row->type));
> +	int rc = 0;
>  
>  	struct synchro_request req;
>  	if (xrow_decode_synchro(row, &req) != 0)
>  		goto err;
>  
> +	txn_limbo_term_lock(&txn_limbo);

Maybe you should hide the lock from the API. Instead, do similar to
what transactions do:

	int txn_limbo_process_begin(limbo *);
	void txn_limbo_process_commit(limbo *, request *);
	void txn_limbo_process_rollback(limbo *);

begin would take the lock, commit would do the request and
unlock, rollback would only unlock. Commit and rollback you
call from apply_synchro_row_cb depend in on the WAL write
result.

Then the locks would disappear from the API, right?

In the next patch you would make txn_limbo_process_begin()
also take the request to validate it. Then the 'filtering'
would become internal to the limbo.

>  	struct replica_cb_data rcb_data;
>  	struct synchro_entry entry;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests
  2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches
@ 2021-08-05 23:33   ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-06 19:01     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 24+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-05 23:33 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Thanks for the patch!

See 14 comments below.

> FILTER_PROMOTE
> FILTER_DEMOTE
>  1) The requests should come in with nonzero term, otherwise
>     the packet is corrupted.
>  2) The request's term should not be less than maximal known
>     one, iow it should not come in from nodes which didn't notice
>     raft epoch changes and living in the past.
>  3) If LSN of the request matches current confirmed LSN the packet
>     is obviously correct to process.
>  4) If LSN is less than confirmed LSN then the request is wrong,
>     we have processed the requested LSN already.
>  5) If LSN is less than confirmed LSN then

1. You already said "If LSN is less than confirmed LSN then" in (4).
In (4) it must be 'greater'.

>     a) If limbo is empty we can't do anything, since data is already
>        processed and should issue an error;
>     b) If there is some data in the limbo then requested LSN should
>        be in range of limbo's [first; last] LSNs, thus the request
>        will be able to commit and rollback limbo queue.
> 
> Closes #6036

2. You need to add a changelog file.

> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 9db286ae2..f64b6fa35 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -514,6 +515,11 @@ applier_fetch_snapshot(struct applier *applier)
>  	struct ev_io *coio = &applier->io;
>  	struct xrow_header row;
>  
> +	txn_limbo_filter_disable(&txn_limbo);
> +	auto filter_guard = make_scoped_guard([&]{
> +		txn_limbo_filter_enable(&txn_limbo);
> +	});

3. Why do you need to enable/disabled the filter here? Shouldn't snapshot
contain only valid data? Moreover, AFAIU it can't contain any limbo
rows at all. The limbo snapshot is sent separately, but the data flow
does not have anything except pure data. The same for the
join.

And how is it related to applier_register below? It does not download
any data at all, does it?

> diff --git a/src/box/box.cc b/src/box/box.cc
> index 8dc3b130b..c3516b7a4 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1675,7 +1675,8 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
>  		.lsn = promote_lsn,
>  		.term = raft->term,
>  	};
> -	txn_limbo_process(&txn_limbo, &req);
> +	if (txn_limbo_process(&txn_limbo, &req) != 0)
> +		diag_raise();

4. box_issue_promote() is used without try-catches anywhere. Please,
don't use exceptions in the new code. The same for demote.

>  	assert(txn_limbo_is_empty(&txn_limbo));
>  }
> @@ -3284,6 +3286,11 @@ local_recovery(const struct tt_uuid *instance_uuid,
>  
>  	say_info("instance uuid %s", tt_uuid_str(&INSTANCE_UUID));
>  
> +	txn_limbo_filter_disable(&txn_limbo);

5. Why do you need it turned off for recovery? If the data was
able to be applied in the first place, why can't it be replayed
in the same way during recovery?

> +	auto filter_guard = make_scoped_guard([&]{
> +		txn_limbo_filter_enable(&txn_limbo);
> +	});
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index a718c55a2..59fb51fac 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c

<...>

> +
> +/**
> + * Common chain for any incoming packet.
> + */
> +static int
> +filter_in(struct txn_limbo *limbo, const struct synchro_request *req)
> +{
> +	(void)limbo;

6. So you have the filtering enabled dynamically in the limbo, but
you do not use the limbo here? Why? Maybe at least add an assertion
that the filter is enabled?

> +
> +	/*
> +	 * Zero LSN are allowed for PROMOTE
> +	 * and DEMOTE requests only.
> +	 */
> +	if (req->lsn == 0) {
> +		if (!iproto_type_is_promote_request(req->type)) {
> +			say_info("%s. Zero lsn detected",
> +				 reject_str(req));

7. It should be say_error(). The same in the other places.

> +
> +			diag_set(ClientError, ER_UNSUPPORTED,
> +				 "Replication",
> +				 "zero LSN on promote/demote");
> +			return -1;

8. Please, try to be more compact. Even with the current indetation
level you don't need to wrap the lines so early. But the indentation
can be reduced even further easily:

====================
@@ -764,16 +764,11 @@ filter_in(struct txn_limbo *limbo, const struct synchro_request *req)
 	 * Zero LSN are allowed for PROMOTE
 	 * and DEMOTE requests only.
 	 */
-	if (req->lsn == 0) {
-		if (!iproto_type_is_promote_request(req->type)) {
-			say_info("%s. Zero lsn detected",
-				 reject_str(req));
-
-			diag_set(ClientError, ER_UNSUPPORTED,
-				 "Replication",
-				 "zero LSN on promote/demote");
-			return -1;
-		}
+	if (req->lsn == 0 && !iproto_type_is_promote_request(req->type)) {
+		say_info("%s. Zero lsn detected", reject_str(req));
+		diag_set(ClientError, ER_UNSUPPORTED, "Replication zero LSN "
+			 "on promote/demote");
+		return -1;
 	}
====================

The same in 2 places below. More compact.

> +		}
> +	}

<...>

> +
> +/**
> + * Filter CONFIRM and ROLLBACK packets.
> + */
> +static int
> +filter_confirm_rollback(struct txn_limbo *limbo,
> +			const struct synchro_request *req)
> +{
> +	/*
> +	 * When limbo is empty we have nothing to
> +	 * confirm/commit and if this request comes
> +	 * in it means the split brain has happened.
> +	 */
> +	if (!txn_limbo_is_empty(limbo))
> +		return 0;

9. What if rollback is for LSN > limbo's last LSN? It
also means nothing to do. The same for confirm LSN < limbo's
first LSN.

> +
> +	say_info("%s. Empty limbo detected", reject_str(req));
> +
> +	diag_set(ClientError, ER_UNSUPPORTED,
> +		 "Replication",
> +		 "confirm/rollback with empty limbo");
> +	return -1;
> +}
> +
> +/**
> + * Filter PROMOTE and DEMOTE packets.
> + */
> +static int
> +filter_promote_demote(struct txn_limbo *limbo,
> +		      const struct synchro_request *req)
> +{
> +	int64_t promote_lsn = req->lsn;
> +
> +	/*
> +	 * PROMOTE and DEMOTE packets must not have zero
> +	 * term supplied, otherwise it is a broken packet.
> +	 */
> +	if (req->term == 0) {
> +		say_info("%s. Zero term detected", reject_str(req));
> +
> +		diag_set(ClientError, ER_UNSUPPORTED,
> +			 "Replication", "zero term");
> +		return -1;
> +	}
> +
> +	/*
> +	 * If the term is already seen it means it comes
> +	 * from a node which didn't notice new elections,
> +	 * thus been living in subdomain and its data is
> +	 * no longer consistent.
> +	 */
> +	if (limbo->promote_greatest_term > req->term) {
> +		say_info("%s. Max term seen is %llu", reject_str(req),
> +			 (long long)limbo->promote_greatest_term);
> +
> +		diag_set(ClientError, ER_UNSUPPORTED,
> +			 "Replication", "obsolete terms");
> +		return -1;
> +	}
> +
> +	/*
> +	 * Easy case -- processed LSN matches the new
> +	 * one which comes inside request, everything
> +	 * is consistent.
> +	 */
> +	if (limbo->confirmed_lsn == promote_lsn)
> +		return 0;
> +
> +	/*
> +	 * Explicit split brain situation. Promote
> +	 * comes in with an old LSN which we've already
> +	 * processed.
> +	 */
> +	if (limbo->confirmed_lsn > promote_lsn) {
> +		say_info("%s. confirmed_lsn %lld > promote_lsn %lld",
> +			 reject_str(req),
> +			 (long long)limbo->confirmed_lsn,
> +			 (long long)promote_lsn);
> +
> +		diag_set(ClientError, ER_UNSUPPORTED,
> +			 "Replication",
> +			 "backward promote LSN (split brain)");
> +		return -1;
> +	}
> +
> +	/*
> +	 * The last case requires a few subcases.
> +	 */
> +	assert(limbo->confirmed_lsn < promote_lsn);
> +
> +	if (txn_limbo_is_empty(limbo)) {
> +		/*
> +		 * Transactions are rolled back already,
> +		 * since the limbo is empty.
> +		 */
> +		say_info("%s. confirmed_lsn %lld < promote_lsn %lld "
> +			 "and empty limbo", reject_str(req),
> +			 (long long)limbo->confirmed_lsn,
> +			 (long long)promote_lsn);
> +
> +		diag_set(ClientError, ER_UNSUPPORTED,
> +			 "Replication",
> +			 "forward promote LSN "
> +			 "(empty limbo, split brain)");
> +		return -1;
> +	} else {

10. You don't need 'else' - the main branch already made 'return'.
This should help to reduce the indentation below.

> +		/*
> +		 * Some entries are present in the limbo,
> +		 * we need to make sure the @a promote_lsn
> +		 * lays inside limbo [first; last] range.
> +		 * So that the promote request has some
> +		 * queued data to process, otherwise it
> +		 * means the request comes from split
> +		 * brained node.
> +		 */
> +		struct txn_limbo_entry *first, *last;
> +
> +		first = txn_limbo_first_entry(limbo);
> +		last = txn_limbo_last_entry(limbo);
> +
> +		if (first->lsn < promote_lsn ||
> +		    last->lsn > promote_lsn) {

11. This seems to be broken. In the comment you said the
error is when

	promote < first or promote > last

And here in the condition you return an error when

	promote > first or promote < last

Why?

> +			say_info("%s. promote_lsn %lld out of "
> +				 "range [%lld; %lld]",
> +				 reject_str(req),
> +				 (long long)promote_lsn,
> +				 (long long)first->lsn,
> +				 (long long)last->lsn);
> +
> +			diag_set(ClientError, ER_UNSUPPORTED,
> +				 "Replication",
> +				 "promote LSN out of queue range "
> +				 "(split brain)");
> +			return -1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int (*filter_req[FILTER_MAX])
> +(struct txn_limbo *limbo, const struct synchro_request *req) = {
> +	[FILTER_IN]		= filter_in,
> +	[FILTER_CONFIRM]	= filter_confirm_rollback,
> +	[FILTER_ROLLBACK]	= filter_confirm_rollback,
> +	[FILTER_PROMOTE]	= filter_promote_demote,
> +	[FILTER_DEMOTE]		= filter_promote_demote,

12. What is this? Wouldn't it be much much much simpler to just
call filter_in() always + make a switch case for the request type +
call the needed functions?

What is worse, you already have the switch-case anyway, but you
also added some strange loop, masks, and virtual functions ... .
I don't think I could make it more complex even if I wanted to,
sorry. Why so complicated?

> +};
> +
> +int
> +txn_limbo_filter_locked(struct txn_limbo *limbo,
> +			const struct synchro_request *req)
> +{
> +	unsigned int mask = (1u << FILTER_IN);
> +	unsigned int pos = 0;
> +
> +	assert(latch_is_locked(&limbo->promote_latch));
> +
> +#ifndef NDEBUG
> +	say_info("limbo: filter %s replica_id %u origin_id %u "
> +		 "term %lld lsn %lld, queue owner_id %u len %lld "
> +		 "confirmed_lsn %lld (%s)",
> +		 iproto_type_name(req->type),
> +		 req->replica_id, req->origin_id,
> +		 (long long)req->term, (long long)req->lsn,
> +		 limbo->owner_id, (long long)limbo->len,
> +		 (long long)limbo->confirmed_lsn,
> +		 limbo->is_filtering ? "on" : "off");
> +#endif
> +
> +	if (!limbo->is_filtering)
> +		return 0;
> +
> +	switch (req->type) {
> +	case IPROTO_CONFIRM:
> +		mask |= (1u << FILTER_CONFIRM);
> +		break;
> +	case IPROTO_ROLLBACK:
> +		mask |= (1u << FILTER_ROLLBACK);
> +		break;
> +	case IPROTO_PROMOTE:
> +		mask |= (1u << FILTER_PROMOTE);
> +		break;
> +	case IPROTO_DEMOTE:
> +		mask |= (1u << FILTER_DEMOTE);
> +		break;
> +	default:
> +		say_info("RAFT: rejecting unexpected %d "
> +			 "request from instance id %u "
> +			 "for term %llu.",
> +			 req->type, req->origin_id,
> +			 (long long)req->term);
> +		diag_set(ClientError, ER_UNSUPPORTED,
> +			 "Replication",
> +			 "unexpected request type");
> +		return -1;
> +	}
> +
> +	while (mask != 0) {
> +		if ((mask & 1) != 0) {
> +			assert(pos < lengthof(filter_req));
> +			assert(filter_req[pos] != NULL);
> +			if (filter_req[pos](limbo, req) != 0)
> +				return -1;
> +		}
> +		pos++;
> +		mask >>= 1;
> +	};
> +
> +	return 0;
> +}
> +
>  void
>  txn_limbo_process_locked(struct txn_limbo *limbo,
>  			 const struct synchro_request *req)

<...>

> -void
> +int
>  txn_limbo_process(struct txn_limbo *limbo,
>  		  const struct synchro_request *req)
>  {
> +	int rc;
> +
>  	txn_limbo_term_lock(limbo);
> -	txn_limbo_process_locked(limbo, req);
> +	rc = txn_limbo_filter_locked(limbo, req);

13. We use relatively modern C. You can make int rc = ...;

14. Process can fail now, but still its result is ignored in
wal_stream_apply_synchro_row().

> +	if (rc == 0)
> +		txn_limbo_process_locked(limbo, req);
>  	txn_limbo_term_unlock(limbo);
> +
> +	return rc;
>  }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms
  2021-08-05 23:29   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-06 15:20     ` Cyrill Gorcunov via Tarantool-patches
  2021-08-08 14:34       ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-23 11:32     ` Serge Petrenko via Tarantool-patches
  1 sibling, 1 reply; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-06 15:20 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Aug 06, 2021 at 01:29:57AM +0200, Vladislav Shpilevoy wrote:
> >  apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
> >  {
> >  	assert(iproto_type_is_synchro_request(row->type));
> > +	int rc = 0;
> >  
> >  	struct synchro_request req;
> >  	if (xrow_decode_synchro(row, &req) != 0)
> >  		goto err;
> >  
> > +	txn_limbo_term_lock(&txn_limbo);
> 
> Maybe you should hide the lock from the API. Instead, do similar to
> what transactions do:
> 
> 	int txn_limbo_process_begin(limbo *);
> 	void txn_limbo_process_commit(limbo *, request *);
> 	void txn_limbo_process_rollback(limbo *);
> 
> begin would take the lock, commit would do the request and
> unlock, rollback would only unlock. Commit and rollback you
> call from apply_synchro_row_cb depend in on the WAL write
> result.
> 
> Then the locks would disappear from the API, right?

Unfortunatelly locking is needed not only for processing but
for reading terms as well. We have a few helpers more which
are waiting the other fibers to complete before reading terms.

applier_apply_tx
  applier_synchro_filter_tx
    txn_limbo_is_replica_outdated
      txn_limbo_term_lock
        txn_limbo_replica_term_locked
      txn_limbo_term_unlock

And a number of calls for txn_limbo_replica_term which reads
term in a locked way because we need to eliminate potential
race here and fetch only last written data.

So no, locking won't disappear. Another option may be to
introduce preemption disabling (just like kernel does for
tasks which should not be rescheduled on a core while
they are wating for some action to complete). Then our
write for synchro packets would look like

	preempt_disable();
	rc = journal_write();
	preempt_enable();

which would guarantee us that while we're waiting the journal
to finish its write no other fibers from the cord will be
executed and we gotta be woken up once write is complete.

This way I think we will be allowed to drop locking at all
because main problem is exactly because of other fibers get
running while we're writing synchro data.

> In the next patch you would make txn_limbo_process_begin()
> also take the request to validate it. Then the 'filtering'
> would become internal to the limbo.
> 
> >  	struct replica_cb_data rcb_data;
> >  	struct synchro_entry entry;

	Cyrill

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests
  2021-08-05 23:33   ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-06 19:01     ` Cyrill Gorcunov via Tarantool-patches
  2021-08-08 11:43       ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-06 19:01 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Aug 06, 2021 at 01:33:39AM +0200, Vladislav Shpilevoy wrote:
> > FILTER_PROMOTE
> > FILTER_DEMOTE
> >  1) The requests should come in with nonzero term, otherwise
> >     the packet is corrupted.
> >  2) The request's term should not be less than maximal known
> >     one, iow it should not come in from nodes which didn't notice
> >     raft epoch changes and living in the past.
> >  3) If LSN of the request matches current confirmed LSN the packet
> >     is obviously correct to process.
> >  4) If LSN is less than confirmed LSN then the request is wrong,
> >     we have processed the requested LSN already.
> >  5) If LSN is less than confirmed LSN then
> 
> 1. You already said "If LSN is less than confirmed LSN then" in (4).
> In (4) it must be 'greater'.

Yup, typo, thanks!

> >     a) If limbo is empty we can't do anything, since data is already
> >        processed and should issue an error;
> >     b) If there is some data in the limbo then requested LSN should
> >        be in range of limbo's [first; last] LSNs, thus the request
> >        will be able to commit and rollback limbo queue.
> > 
> > Closes #6036
> 
> 2. You need to add a changelog file.

And change error code as well, as Serge suggested. I'll update
it once we settle down on overall code structure.

> > diff --git a/src/box/applier.cc b/src/box/applier.cc
> > index 9db286ae2..f64b6fa35 100644
> > --- a/src/box/applier.cc
> > +++ b/src/box/applier.cc
> > @@ -514,6 +515,11 @@ applier_fetch_snapshot(struct applier *applier)
> >  	struct ev_io *coio = &applier->io;
> >  	struct xrow_header row;
> >  
> > +	txn_limbo_filter_disable(&txn_limbo);
> > +	auto filter_guard = make_scoped_guard([&]{
> > +		txn_limbo_filter_enable(&txn_limbo);
> > +	});
> 
> 3. Why do you need to enable/disabled the filter here? Shouldn't snapshot
> contain only valid data? Moreover, AFAIU it can't contain any limbo
> rows at all. The limbo snapshot is sent separately, but the data flow
> does not have anything except pure data. The same for the
> join.

The idea is that snapshot/recovery has valid data which forms the initial
limbo state versus which we will be apply filtering.

> 
> And how is it related to applier_register below? It does not download
> any data at all, does it?

After register stage is complete we catch up with lates not yet downloaded
data (final join stage) where we still assume that the data received is
valid and do not verify it.

Actually this is a good question. I've to recheck this moment because in
previous series when I ran join/recovery with filtering enabled sometime
I've an issues where filter didnt pass. Gimme some time, maybe we will
all this and manage to keep filtering all the time.

> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index 8dc3b130b..c3516b7a4 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -1675,7 +1675,8 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
> >  		.lsn = promote_lsn,
> >  		.term = raft->term,
> >  	};
> > -	txn_limbo_process(&txn_limbo, &req);
> > +	if (txn_limbo_process(&txn_limbo, &req) != 0)
> > +		diag_raise();
> 
> 4. box_issue_promote() is used without try-catches anywhere. Please,
> don't use exceptions in the new code. The same for demote.

OK

> 
> >  	assert(txn_limbo_is_empty(&txn_limbo));
> >  }
> > @@ -3284,6 +3286,11 @@ local_recovery(const struct tt_uuid *instance_uuid,
> >  
> >  	say_info("instance uuid %s", tt_uuid_str(&INSTANCE_UUID));
> >  
> > +	txn_limbo_filter_disable(&txn_limbo);
> 
> 5. Why do you need it turned off for recovery? If the data was
> able to be applied in the first place, why can't it be replayed
> in the same way during recovery?

I'll retest this moment, thanks!

> 
> > +
> > +/**
> > + * Common chain for any incoming packet.
> > + */
> > +static int
> > +filter_in(struct txn_limbo *limbo, const struct synchro_request *req)
> > +{
> > +	(void)limbo;
> 
> 6. So you have the filtering enabled dynamically in the limbo, but
> you do not use the limbo here? Why? Maybe at least add an assertion
> that the filter is enabled?

All chains are having same interface it is just happen that for common
filter I don't need to use limbo. I could add some operations here
but not sure if it worth it. As far as I see leave unused args is
pretty fine in our code base.

> > +
> > +	/*
> > +	 * Zero LSN are allowed for PROMOTE
> > +	 * and DEMOTE requests only.
> > +	 */
> > +	if (req->lsn == 0) {
> > +		if (!iproto_type_is_promote_request(req->type)) {
> > +			say_info("%s. Zero lsn detected",
> > +				 reject_str(req));
> 
> 7. It should be say_error(). The same in the other places.

OK, will switch to error mode.

> 
> > +
> > +			diag_set(ClientError, ER_UNSUPPORTED,
> > +				 "Replication",
> > +				 "zero LSN on promote/demote");
> > +			return -1;
> 
> 8. Please, try to be more compact. Even with the current indetation
> level you don't need to wrap the lines so early. But the indentation
> can be reduced even further easily:
...
> 
> 
> The same in 2 places below. More compact.

ok

> 
> > +/**
> > + * Filter CONFIRM and ROLLBACK packets.
> > + */
> > +static int
> > +filter_confirm_rollback(struct txn_limbo *limbo,
> > +			const struct synchro_request *req)
> > +{
> > +	/*
> > +	 * When limbo is empty we have nothing to
> > +	 * confirm/commit and if this request comes
> > +	 * in it means the split brain has happened.
> > +	 */
> > +	if (!txn_limbo_is_empty(limbo))
> > +		return 0;
> 
> 9. What if rollback is for LSN > limbo's last LSN? It
> also means nothing to do. The same for confirm LSN < limbo's
> first LSN.

static void
txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
{
-->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));

txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
{
-->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));

Currently we're allowed to process empty limbo if only owner is not nil,
I think I should add this case here.

...
> > +
> > +	if (txn_limbo_is_empty(limbo)) {
> > +		/*
> > +		 * Transactions are rolled back already,
> > +		 * since the limbo is empty.
> > +		 */
> > +		say_info("%s. confirmed_lsn %lld < promote_lsn %lld "
> > +			 "and empty limbo", reject_str(req),
> > +			 (long long)limbo->confirmed_lsn,
> > +			 (long long)promote_lsn);
> > +
> > +		diag_set(ClientError, ER_UNSUPPORTED,
> > +			 "Replication",
> > +			 "forward promote LSN "
> > +			 "(empty limbo, split brain)");
> > +		return -1;
> > +	} else {
> 
> 10. You don't need 'else' - the main branch already made 'return'.
> This should help to reduce the indentation below.

OK, will do, though explicit if\else here helps to gather code context,

> > +		/*
> > +		 * Some entries are present in the limbo,
> > +		 * we need to make sure the @a promote_lsn
> > +		 * lays inside limbo [first; last] range.
> > +		 * So that the promote request has some
> > +		 * queued data to process, otherwise it
> > +		 * means the request comes from split
> > +		 * brained node.
> > +		 */
> > +		struct txn_limbo_entry *first, *last;
> > +
> > +		first = txn_limbo_first_entry(limbo);
> > +		last = txn_limbo_last_entry(limbo);
> > +
> > +		if (first->lsn < promote_lsn ||
> > +		    last->lsn > promote_lsn) {
> 
> 11. This seems to be broken. In the comment you said the
> error is when
> 
> 	promote < first or promote > last
> 
> And here in the condition you return an error when
> 
> 	promote > first or promote < last
> 
> Why?

Good catch, typo. Actually I've updated this hunk locally
but didn't pushed out. We need "first <= promote <= last"

> > +
> > +static int (*filter_req[FILTER_MAX])
> > +(struct txn_limbo *limbo, const struct synchro_request *req) = {
> > +	[FILTER_IN]		= filter_in,
> > +	[FILTER_CONFIRM]	= filter_confirm_rollback,
> > +	[FILTER_ROLLBACK]	= filter_confirm_rollback,
> > +	[FILTER_PROMOTE]	= filter_promote_demote,
> > +	[FILTER_DEMOTE]		= filter_promote_demote,
> 
> 12. What is this? Wouldn't it be much much much simpler to just
> call filter_in() always + make a switch case for the request type +
> call the needed functions?
> 
> What is worse, you already have the switch-case anyway, but you
> also added some strange loop, masks, and virtual functions ... .
> I don't think I could make it more complex even if I wanted to,
> sorry. Why so complicated?

It might be look easier but won't allow to extend filtering in future
without rewritting too much. I'm pretty sure this number of packet types
is not finished and we will have more. Using bitmap routing you can easily
hook in any call sequence you need while using explicit if\elses or direct
calls via case-by-request-type won't allow to make it so. So no, this is
not complicated at all but rather close to real packet filtering code.

Anyway, which form would you prefer?

txn_limbo_filter_locked() {
	int rc = filter_in();
	if (rc != 0)
		return -1;

	swicth (req->type) {
	case IPROTO_CONFIRM:
	case IPROTO_ROLLBACK:
		rc = filter_confirm_rollback();
		break;
	...
	}

	return rc;
}

Kind of this?

> > +int
> >  txn_limbo_process(struct txn_limbo *limbo,
> >  		  const struct synchro_request *req)
> >  {
> > +	int rc;
> > +
> >  	txn_limbo_term_lock(limbo);
> > -	txn_limbo_process_locked(limbo, req);
> > +	rc = txn_limbo_filter_locked(limbo, req);
> 
> 13. We use relatively modern C. You can make int rc = ...;

ok

> 14. Process can fail now, but still its result is ignored in
> wal_stream_apply_synchro_row().

+1, thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests
  2021-08-06 19:01     ` Cyrill Gorcunov via Tarantool-patches
@ 2021-08-08 11:43       ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-08 22:35         ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 24+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-08 11:43 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

Hi! Thanks for working on this!

>>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>>> index 9db286ae2..f64b6fa35 100644
>>> --- a/src/box/applier.cc
>>> +++ b/src/box/applier.cc
>>> @@ -514,6 +515,11 @@ applier_fetch_snapshot(struct applier *applier)
>>>  	struct ev_io *coio = &applier->io;
>>>  	struct xrow_header row;
>>>  
>>> +	txn_limbo_filter_disable(&txn_limbo);
>>> +	auto filter_guard = make_scoped_guard([&]{
>>> +		txn_limbo_filter_enable(&txn_limbo);
>>> +	});
>>
>> 3. Why do you need to enable/disabled the filter here? Shouldn't snapshot
>> contain only valid data? Moreover, AFAIU it can't contain any limbo
>> rows at all. The limbo snapshot is sent separately, but the data flow
>> does not have anything except pure data. The same for the
>> join.
> 
> The idea is that snapshot/recovery has valid data which forms the initial
> limbo state versus which we will be apply filtering.

You didn't answer the question really. Why do you need the filtering
here if all the data is correct anyway? Will it all work if I just
drop this filter disable from here?

>> And how is it related to applier_register below? It does not download
>> any data at all, does it?
> 
> After register stage is complete we catch up with lates not yet downloaded
> data (final join stage) where we still assume that the data received is
> valid and do not verify it.

Register just makes the master give you a unique ID. It does not send
any data like joins do, AFAIR. Does it work if you drop the filter disable
from here?

> Actually this is a good question. I've to recheck this moment because in
> previous series when I ran join/recovery with filtering enabled sometime
> I've an issues where filter didnt pass. Gimme some time, maybe we will
> all this and manage to keep filtering all the time.
> 
>>> +
>>> +/**
>>> + * Common chain for any incoming packet.
>>> + */
>>> +static int
>>> +filter_in(struct txn_limbo *limbo, const struct synchro_request *req)
>>> +{
>>> +	(void)limbo;
>>
>> 6. So you have the filtering enabled dynamically in the limbo, but
>> you do not use the limbo here? Why? Maybe at least add an assertion
>> that the filter is enabled?
> 
> All chains are having same interface it is just happen that for common
> filter I don't need to use limbo. I could add some operations here
> but not sure if it worth it. As far as I see leave unused args is
> pretty fine in our code base.

You didn't answer the second question:

	Maybe at least add an assertion that the filter is enabled?

>>> +/**
>>> + * Filter CONFIRM and ROLLBACK packets.
>>> + */
>>> +static int
>>> +filter_confirm_rollback(struct txn_limbo *limbo,
>>> +			const struct synchro_request *req)
>>> +{
>>> +	/*
>>> +	 * When limbo is empty we have nothing to
>>> +	 * confirm/commit and if this request comes
>>> +	 * in it means the split brain has happened.
>>> +	 */
>>> +	if (!txn_limbo_is_empty(limbo))
>>> +		return 0;
>>
>> 9. What if rollback is for LSN > limbo's last LSN? It
>> also means nothing to do. The same for confirm LSN < limbo's
>> first LSN.
> 
> static void
> txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
> {
> -->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));
> 
> txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
> {
> -->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));
> 
> Currently we're allowed to process empty limbo if only owner is not nil,
> I think I should add this case here.

My question is not about the owner ID. I asked what if rollback/confirm
try to cover a range not present in the limbo while it is not empty. If
it is not empty, it has an owner obviously. But it does not matter.
What if it has an owner, has transactions, but you got ROLLBACK/CONFIRM
for data out of the LSN range present in the limbo?

>>> +		/*
>>> +		 * Some entries are present in the limbo,
>>> +		 * we need to make sure the @a promote_lsn
>>> +		 * lays inside limbo [first; last] range.
>>> +		 * So that the promote request has some
>>> +		 * queued data to process, otherwise it
>>> +		 * means the request comes from split
>>> +		 * brained node.
>>> +		 */
>>> +		struct txn_limbo_entry *first, *last;
>>> +
>>> +		first = txn_limbo_first_entry(limbo);
>>> +		last = txn_limbo_last_entry(limbo);
>>> +
>>> +		if (first->lsn < promote_lsn ||
>>> +		    last->lsn > promote_lsn) {
>>
>> 11. This seems to be broken. In the comment you said the
>> error is when
>>
>> 	promote < first or promote > last
>>
>> And here in the condition you return an error when
>>
>> 	promote > first or promote < last
>>
>> Why?
> 
> Good catch, typo. Actually I've updated this hunk locally
> but didn't pushed out. We need "first <= promote <= last"

Is it covered with a test?

>>> +static int (*filter_req[FILTER_MAX])
>>> +(struct txn_limbo *limbo, const struct synchro_request *req) = {
>>> +	[FILTER_IN]		= filter_in,
>>> +	[FILTER_CONFIRM]	= filter_confirm_rollback,
>>> +	[FILTER_ROLLBACK]	= filter_confirm_rollback,
>>> +	[FILTER_PROMOTE]	= filter_promote_demote,
>>> +	[FILTER_DEMOTE]		= filter_promote_demote,
>>
>> 12. What is this? Wouldn't it be much much much simpler to just
>> call filter_in() always + make a switch case for the request type +
>> call the needed functions?
>>
>> What is worse, you already have the switch-case anyway, but you
>> also added some strange loop, masks, and virtual functions ... .
>> I don't think I could make it more complex even if I wanted to,
>> sorry. Why so complicated?
> 
> It might be look easier but won't allow to extend filtering in future
> without rewritting too much.

I propose to think about that when we add more packet types. Also, in your
code you will need to extend both switch-case and your masks. While if we
had only the switch-case, you would only need to update the switch-case.
So it means less work. 'Switch-case' vs 'switch-case + virtual functions and
the loop'.

> I'm pretty sure this number of packet types
> is not finished and we will have more. Using bitmap routing you can easily
> hook in any call sequence you need while using explicit if\elses or direct
> calls via case-by-request-type won't allow to make it so. So no, this is
> not complicated at all but rather close to real packet filtering code.

You literally filter 4 packet types. Please, do not overcomplicate it. It
is not kernel, not some network filter for all kinds of protocols. It is
just 4 packet types. Which you **already** walk in switch-case + call the
virtual functions in a loop. While I propose to keep only the switch-case.
Even its extension will look simpler than what you have now. Because you
will also need to patch the switch-case.

Just compare:

	filters = [
		...
	]

	switch {
	case ...:
	case ...:
	}

	while (...) {
		...
	}

vs

	switch {
	case ...:
	case ...:
	}

You have the first version, and will need to update
the masks, the switch-case and still have the loop
with a fullscan.

In the second version you only would have a
switch-case. Doesn't it look simpler?

> Anyway, which form would you prefer?
> 
> txn_limbo_filter_locked() {
> 	int rc = filter_in();
> 	if (rc != 0)
> 		return -1;
> 
> 	swicth (req->type) {
> 	case IPROTO_CONFIRM:
> 	case IPROTO_ROLLBACK:
> 		rc = filter_confirm_rollback();
> 		break;
> 	...
> 	}
> 
> 	return rc;
> }
> 
> Kind of this?

Yes! It is much simpler and still easy to extend. Please,
just try and you will see how much simpler it is.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms
  2021-08-06 15:20     ` Cyrill Gorcunov via Tarantool-patches
@ 2021-08-08 14:34       ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-09 16:24         ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 24+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-08 14:34 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

On 06.08.2021 18:20, Cyrill Gorcunov wrote:
> On Fri, Aug 06, 2021 at 01:29:57AM +0200, Vladislav Shpilevoy wrote:
>>>  apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
>>>  {
>>>  	assert(iproto_type_is_synchro_request(row->type));
>>> +	int rc = 0;
>>>  
>>>  	struct synchro_request req;
>>>  	if (xrow_decode_synchro(row, &req) != 0)
>>>  		goto err;
>>>  
>>> +	txn_limbo_term_lock(&txn_limbo);
>>
>> Maybe you should hide the lock from the API. Instead, do similar to
>> what transactions do:
>>
>> 	int txn_limbo_process_begin(limbo *);
>> 	void txn_limbo_process_commit(limbo *, request *);
>> 	void txn_limbo_process_rollback(limbo *);
>>
>> begin would take the lock, commit would do the request and
>> unlock, rollback would only unlock. Commit and rollback you
>> call from apply_synchro_row_cb depend in on the WAL write
>> result.
>>
>> Then the locks would disappear from the API, right?
> 
> Unfortunatelly locking is needed not only for processing but
> for reading terms as well. We have a few helpers more which
> are waiting the other fibers to complete before reading terms.
> 
> applier_apply_tx
>   applier_synchro_filter_tx
>     txn_limbo_is_replica_outdated
>       txn_limbo_term_lock
>         txn_limbo_replica_term_locked
>       txn_limbo_term_unlock
> 
> And a number of calls for txn_limbo_replica_term which reads
> term in a locked way because we need to eliminate potential
> race here and fetch only last written data.
> 
> So no, locking won't disappear. Another option may be to
> introduce preemption disabling (just like kernel does for
> tasks which should not be rescheduled on a core while
> they are wating for some action to complete). Then our
> write for synchro packets would look like
> 
> 	preempt_disable();
> 	rc = journal_write();
> 	preempt_enable();
> 
> which would guarantee us that while we're waiting the journal
> to finish its write no other fibers from the cord will be
> executed and we gotta be woken up once write is complete.
> 
> This way I think we will be allowed to drop locking at all
> because main problem is exactly because of other fibers get
> running while we're writing synchro data.
> 
>> In the next patch you would make txn_limbo_process_begin()
>> also take the request to validate it. Then the 'filtering'
>> would become internal to the limbo.

I didn't propose to drop the locking. I said it could be hidden
inside of the limbo's API. In the only example above you show:

>       txn_limbo_term_lock
>         txn_limbo_replica_term_locked
>       txn_limbo_term_unlock

Here the lock is done inside of the limbo's API too. It is
not exposed on the limbo's API level. So the questions is the
same - can it be hidden inside of the API? Are there any usages
of the lock done explicitly out of the limo?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering
  2021-08-05 23:29 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-08 22:03   ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-08 22:03 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Fri, Aug 06, 2021 at 01:29:17AM +0200, Vladislav Shpilevoy wrote:
> 
> Why is there still downstream from id 1 to id 2? The new leader
> should have received the ROLLBACK which it should have seen can't
> be properly applied. Why didn't it cut the connection?

Hi! I saw this as well, I think it is due to our error handling.
I'll take more precise look, thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests
  2021-08-08 11:43       ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-08 22:35         ` Cyrill Gorcunov via Tarantool-patches
  2021-08-10 12:31           ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-08 22:35 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Sun, Aug 08, 2021 at 02:43:14PM +0300, Vladislav Shpilevoy wrote:
> >>> +	txn_limbo_filter_disable(&txn_limbo);
> >>> +	auto filter_guard = make_scoped_guard([&]{
> >>> +		txn_limbo_filter_enable(&txn_limbo);
> >>> +	});
> >>
> >> 3. Why do you need to enable/disabled the filter here? Shouldn't snapshot
> >> contain only valid data? Moreover, AFAIU it can't contain any limbo
> >> rows at all. The limbo snapshot is sent separately, but the data flow
> >> does not have anything except pure data. The same for the
> >> join.
> > 
> > The idea is that snapshot/recovery has valid data which forms the initial
> > limbo state versus which we will be apply filtering.
> 
> You didn't answer the question really. Why do you need the filtering
> here if all the data is correct anyway? Will it all work if I just
> drop this filter disable from here?

Vlad, I answered to this but a bit later in the reply

  | Actually this is a good question. I've to recheck this moment because in
  | previous series when I ran join/recovery with filtering enabled sometime
  | I've an issues where filter didnt pass. Gimme some time, maybe we will
  | all this and manage to keep filtering all the time.

I saw errors in test (not in our new test but in rpelication/ tests). And
I need to figure out with more attention about the stages where I must
disable the filtering and where I can leave filtering turned on. In all
stages (local recovery, initial and final joins). Once I recheck everything
again I'll come back with precise reply.

> >> And how is it related to applier_register below? It does not download
> >> any data at all, does it?
> > 
> > After register stage is complete we catch up with lates not yet downloaded
> > data (final join stage) where we still assume that the data received is
> > valid and do not verify it.
> 
> Register just makes the master give you a unique ID. It does not send
> any data like joins do, AFAIR. Does it work if you drop the filter disable
> from here?

I suspect I overdid in paranoia, will recheck.

> > Actually this is a good question. I've to recheck this moment because in
> > previous series when I ran join/recovery with filtering enabled sometime
> > I've an issues where filter didnt pass. Gimme some time, maybe we will
> > all this and manage to keep filtering all the time.
> > 
> >>> +
> >>> +/**
> >>> + * Common chain for any incoming packet.
> >>> + */
> >>> +static int
> >>> +filter_in(struct txn_limbo *limbo, const struct synchro_request *req)
> >>> +{
> >>> +	(void)limbo;
> >>
> >> 6. So you have the filtering enabled dynamically in the limbo, but
> >> you do not use the limbo here? Why? Maybe at least add an assertion
> >> that the filter is enabled?
> > 
> > All chains are having same interface it is just happen that for common
> > filter I don't need to use limbo. I could add some operations here
> > but not sure if it worth it. As far as I see leave unused args is
> > pretty fine in our code base.
> 
> You didn't answer the second question:
> 
> 	Maybe at least add an assertion that the filter is enabled?

I did

  | I could add some operations here but not sure if it worth it.

Letme state it clear then - I could add this assert() if you insist
but I think we aready spread too many assertions all over the code,
and if it is possible I would be glad not to add new ones. After all
either we should add this assert() to each filter chain or not add
at all, otherwise there will be kind of code imbalance.

> >>> +/**
> >>> + * Filter CONFIRM and ROLLBACK packets.
> >>> + */
> >>> +static int
> >>> +filter_confirm_rollback(struct txn_limbo *limbo,
> >>> +			const struct synchro_request *req)
> >>> +{
> >>> +	/*
> >>> +	 * When limbo is empty we have nothing to
> >>> +	 * confirm/commit and if this request comes
> >>> +	 * in it means the split brain has happened.
> >>> +	 */
> >>> +	if (!txn_limbo_is_empty(limbo))
> >>> +		return 0;
> >>
> >> 9. What if rollback is for LSN > limbo's last LSN? It
> >> also means nothing to do. The same for confirm LSN < limbo's
> >> first LSN.
> > 
> > static void
> > txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
> > {
> > -->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));
> > 
> > txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
> > {
> > -->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));
> > 
> > Currently we're allowed to process empty limbo if only owner is not nil,
> > I think I should add this case here.
> 
> My question is not about the owner ID. I asked what if rollback/confirm
> try to cover a range not present in the limbo while it is not empty. If
> it is not empty, it has an owner obviously. But it does not matter.
> What if it has an owner, has transactions, but you got ROLLBACK/CONFIRM
> for data out of the LSN range present in the limbo?

Since the terms are matching I think such scenarion should be fine, right?
IOW, some old replica has been stopped for some reason and been living out
of quorum for some time thus such requests should be considered as OK to
pass and when filter accepts them the will reach txn_limbo_read_confirm
or txn_limbo_read_rollback where they will be simply ignored as far as I
unrestand. IOW, such requests are valid, no?

> > 
> > Good catch, typo. Actually I've updated this hunk locally
> > but didn't pushed out. We need "first <= promote <= last"
> 
> Is it covered with a test?

Not yet, to test _each_ filter condition we've a separate ticket
which I didn't implement yet (split-brain common bug which all
about tests).

...

> > 
> > Kind of this?
> 
> Yes! It is much simpler and still easy to extend. Please,
> just try and you will see how much simpler it is.

OK

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms
  2021-08-08 14:34       ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-09 16:24         ` Cyrill Gorcunov via Tarantool-patches
  2021-08-10 12:27           ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-09 16:24 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Sun, Aug 08, 2021 at 05:34:54PM +0300, Vladislav Shpilevoy wrote:
> >> In the next patch you would make txn_limbo_process_begin()
> >> also take the request to validate it. Then the 'filtering'
> >> would become internal to the limbo.
> 
> I didn't propose to drop the locking. I said it could be hidden
> inside of the limbo's API. In the only example above you show:
> 
> >       txn_limbo_term_lock
> >         txn_limbo_replica_term_locked
> >       txn_limbo_term_unlock
> 
> Here the lock is done inside of the limbo's API too. It is
> not exposed on the limbo's API level. So the questions is the
> same - can it be hidden inside of the API? Are there any usages
> of the lock done explicitly out of the limo?

Actually, everything start looking a way more unattractive I think.
Lets gather the current API from the patchset.

applier_synchro_filter_tx
  txn_limbo_is_replica_outdated
    txn_limbo_term_lock
      txn_limbo_replica_term_locked
    txn_limbo_term_unlock

box_demote | box_promote_qsync | box_promote
  txn_limbo_replica_term
    txn_limbo_term_lock
      txn_limbo_replica_term_locked
    txn_limbo_term_unlock


wal_stream_apply_synchro_row | box_issue_promote | box_issue_demote | memtx_engine_recover_synchro
  txn_limbo_process
    txn_limbo_term_lock
      txn_limbo_filter_locked
      txn_limbo_process_locked
    txn_limbo_term_unlock

apply_synchro_row
  txn_limbo_term_lock
    txn_limbo_filter_locked
    ** in-callback apply_synchro_row_cb -> txn_limbo_process_locked
  txn_limbo_term_unlock

Thus we have:

 - big txn_limbo_process function which operates with locked promote term
 - txn_limbo_replica_term inliner, which relies on txn_limbo_term_lock/unlock
   being present in header file
 - txn_limbo_is_replica_outdated inliner, which relies on lock/unlock being
   exported as well

and apply_synchro_row as a special one which uses txn_limbo_process_locked
internally when commit happens. Note that all the functions above use explicit
lock/unlock inside single function, and even apply_synchro_row calls lock at
start and unlock at exit.

Now if I gonna hide locking completely from usage ouside of limbo code I
have to:

1) Move txn_limbo_term_lock/txn_limbo_term_unlock into .c file, in result
   txn_limbo_is_replica_outdated and txn_limbo_replica_term won't be
   inliner anymore. Which is not critical I think but better to point out.
2) We inroduce txn_txn_limbo_process_begin/complete/rollback which are basically
   the wrappers arount txn_limbo_process_locked (because txn_limbo_process
   will remain as is). Thus we will have

txn_txn_limbo_process_begin()
  txn_limbo_term_lock()
  txn_limbo_filter_locked();

txn_txn_limbo_process_complete()
  txn_limbo_process_locked()
  txn_limbo_term_unlock

txn_txn_limbo_process_rollback
  txn_limbo_term_unlock

And these three helpers looks very ugly. First of all they hide locking
unlocking between functions, since there is no explicit lock/unlock
in apply_synchro_row anymore. Do you really prefer this kind of
design, or I miss something obvious?

	Cyrill

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms
  2021-08-09 16:24         ` Cyrill Gorcunov via Tarantool-patches
@ 2021-08-10 12:27           ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-10 12:57             ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 24+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-10 12:27 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml



On 09.08.2021 19:24, Cyrill Gorcunov wrote:
> On Sun, Aug 08, 2021 at 05:34:54PM +0300, Vladislav Shpilevoy wrote:
>>>> In the next patch you would make txn_limbo_process_begin()
>>>> also take the request to validate it. Then the 'filtering'
>>>> would become internal to the limbo.
>>
>> I didn't propose to drop the locking. I said it could be hidden
>> inside of the limbo's API. In the only example above you show:
>>
>>>       txn_limbo_term_lock
>>>         txn_limbo_replica_term_locked
>>>       txn_limbo_term_unlock
>>
>> Here the lock is done inside of the limbo's API too. It is
>> not exposed on the limbo's API level. So the questions is the
>> same - can it be hidden inside of the API? Are there any usages
>> of the lock done explicitly out of the limo?
> 
> Actually, everything start looking a way more unattractive I think.
> Lets gather the current API from the patchset.
> 
> applier_synchro_filter_tx
>   txn_limbo_is_replica_outdated
>     txn_limbo_term_lock
>       txn_limbo_replica_term_locked
>     txn_limbo_term_unlock
> 
> box_demote | box_promote_qsync | box_promote
>   txn_limbo_replica_term
>     txn_limbo_term_lock
>       txn_limbo_replica_term_locked
>     txn_limbo_term_unlock
> 
> 
> wal_stream_apply_synchro_row | box_issue_promote | box_issue_demote | memtx_engine_recover_synchro
>   txn_limbo_process
>     txn_limbo_term_lock
>       txn_limbo_filter_locked
>       txn_limbo_process_locked
>     txn_limbo_term_unlock
> 
> apply_synchro_row
>   txn_limbo_term_lock
>     txn_limbo_filter_locked
>     ** in-callback apply_synchro_row_cb -> txn_limbo_process_locked
>   txn_limbo_term_unlock
> 
> Thus we have:
> 
>  - big txn_limbo_process function which operates with locked promote term
>  - txn_limbo_replica_term inliner, which relies on txn_limbo_term_lock/unlock
>    being present in header file
>  - txn_limbo_is_replica_outdated inliner, which relies on lock/unlock being
>    exported as well
> 
> and apply_synchro_row as a special one which uses txn_limbo_process_locked
> internally when commit happens. Note that all the functions above use explicit
> lock/unlock inside single function, and even apply_synchro_row calls lock at
> start and unlock at exit.
> 
> Now if I gonna hide locking completely from usage ouside of limbo code I
> have to:
> 
> 1) Move txn_limbo_term_lock/txn_limbo_term_unlock into .c file, in result
>    txn_limbo_is_replica_outdated and txn_limbo_replica_term won't be
>    inliner anymore. Which is not critical I think but better to point out.
> 2) We inroduce txn_txn_limbo_process_begin/complete/rollback which are basically
>    the wrappers arount txn_limbo_process_locked (because txn_limbo_process
>    will remain as is). Thus we will have
> 
> txn_txn_limbo_process_begin()
>   txn_limbo_term_lock()
>   txn_limbo_filter_locked();
> 
> txn_txn_limbo_process_complete()
>   txn_limbo_process_locked()
>   txn_limbo_term_unlock
> 
> txn_txn_limbo_process_rollback
>   txn_limbo_term_unlock
> 
> And these three helpers looks very ugly. First of all they hide locking
> unlocking between functions, since there is no explicit lock/unlock
> in apply_synchro_row anymore. Do you really prefer this kind of
> design, or I miss something obvious?

They look consistent with txn_begin/commit/rollback. They hide the locking,
exactly. This is what I wanted to achieve, because I don't like that
the applier interferes into the limbo so hard. Yes, I would prefer this API.
Lets wait for Sergey's opinion too.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests
  2021-08-08 22:35         ` Cyrill Gorcunov via Tarantool-patches
@ 2021-08-10 12:31           ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-10 14:36             ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 24+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-10 12:31 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

>>>>> +/**
>>>>> + * Common chain for any incoming packet.
>>>>> + */
>>>>> +static int
>>>>> +filter_in(struct txn_limbo *limbo, const struct synchro_request *req)
>>>>> +{
>>>>> +	(void)limbo;
>>>>
>>>> 6. So you have the filtering enabled dynamically in the limbo, but
>>>> you do not use the limbo here? Why? Maybe at least add an assertion
>>>> that the filter is enabled?
>>>
>>> All chains are having same interface it is just happen that for common
>>> filter I don't need to use limbo. I could add some operations here
>>> but not sure if it worth it. As far as I see leave unused args is
>>> pretty fine in our code base.
>>
>> You didn't answer the second question:
>>
>> 	Maybe at least add an assertion that the filter is enabled?
> 
> I did
> 
>   | I could add some operations here but not sure if it worth it.
> 
> Letme state it clear then - I could add this assert() if you insist
> but I think we aready spread too many assertions all over the code,
> and if it is possible I would be glad not to add new ones. After all
> either we should add this assert() to each filter chain or not add
> at all, otherwise there will be kind of code imbalance.

What is wrong with the assertions that you don't like adding them?
You add panics quite often, and they cost some perf. But asserts
just help to catch bugs and cost nothing in Release build.

>>>>> +/**
>>>>> + * Filter CONFIRM and ROLLBACK packets.
>>>>> + */
>>>>> +static int
>>>>> +filter_confirm_rollback(struct txn_limbo *limbo,
>>>>> +			const struct synchro_request *req)
>>>>> +{
>>>>> +	/*
>>>>> +	 * When limbo is empty we have nothing to
>>>>> +	 * confirm/commit and if this request comes
>>>>> +	 * in it means the split brain has happened.
>>>>> +	 */
>>>>> +	if (!txn_limbo_is_empty(limbo))
>>>>> +		return 0;
>>>>
>>>> 9. What if rollback is for LSN > limbo's last LSN? It
>>>> also means nothing to do. The same for confirm LSN < limbo's
>>>> first LSN.
>>>
>>> static void
>>> txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
>>> {
>>> -->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));
>>>
>>> txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
>>> {
>>> -->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));
>>>
>>> Currently we're allowed to process empty limbo if only owner is not nil,
>>> I think I should add this case here.
>>
>> My question is not about the owner ID. I asked what if rollback/confirm
>> try to cover a range not present in the limbo while it is not empty. If
>> it is not empty, it has an owner obviously. But it does not matter.
>> What if it has an owner, has transactions, but you got ROLLBACK/CONFIRM
>> for data out of the LSN range present in the limbo?
> 
> Since the terms are matching I think such scenarion should be fine, right?
> IOW, some old replica has been stopped for some reason and been living out
> of quorum for some time thus such requests should be considered as OK to
> pass and when filter accepts them the will reach txn_limbo_read_confirm
> or txn_limbo_read_rollback where they will be simply ignored as far as I
> unrestand. IOW, such requests are valid, no?

If a replica is outdated, it should not matter. It will receive the needed
data in order anyway. Like if the data was just sent. Hence, it seems
irrelevant whether it is outdated. And still looks the same as the thing
you are trying to filter out (when the limbo is empty = confirm/rollback
do not cover anything too).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms
  2021-08-10 12:27           ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-10 12:57             ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-10 12:57 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Tue, Aug 10, 2021 at 03:27:23PM +0300, Vladislav Shpilevoy wrote:
...
> > 
> > And these three helpers looks very ugly. First of all they hide locking
> > unlocking between functions, since there is no explicit lock/unlock
> > in apply_synchro_row anymore. Do you really prefer this kind of
> > design, or I miss something obvious?
> 
> They look consistent with txn_begin/commit/rollback. They hide the locking,
> exactly. This is what I wanted to achieve, because I don't like that
> the applier interferes into the limbo so hard. Yes, I would prefer this API.
> Lets wait for Sergey's opinion too.

OK, I can make it so. While I still think this is a bad choise, because
the key difference is the locking, where the rule of thumb is never spread
locks over different functions, they must be released in the same function
they have been taken (with rare exceptions). But I won't insist. If you
and Serge are agree I'll rework.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests
  2021-08-10 12:31           ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-10 14:36             ` Cyrill Gorcunov via Tarantool-patches
  2021-08-12 16:59               ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-10 14:36 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Tue, Aug 10, 2021 at 03:31:04PM +0300, Vladislav Shpilevoy wrote:
> > 
> >   | I could add some operations here but not sure if it worth it.
> > 
> > Letme state it clear then - I could add this assert() if you insist
> > but I think we aready spread too many assertions all over the code,
> > and if it is possible I would be glad not to add new ones. After all
> > either we should add this assert() to each filter chain or not add
> > at all, otherwise there will be kind of code imbalance.
> 
> What is wrong with the assertions that you don't like adding them?
> You add panics quite often, and they cost some perf. But asserts
> just help to catch bugs and cost nothing in Release build.

I personally think that either some particular condition is critical
so that you can't continue execution if it failed and because of this
it must be tested even in release builds. And here panic() is needed.
Or it is not critical and we don't need assert(). In particular for filtering
case if we ocasionally called it where should not then it might trigger a
false positive error breaking the replication but not corrupting data,
and in such case it is ok and no assertion is needed. In reverse case,
say enabling filtering in wrong place would cause data corruption then
we need a panic not assert. So I don't see much point in assert calls
at all. Surely I can add it if you prefer. Simply don't like.

You know, we've been talking with Serge today about enabling filtering
all the time because this looks pretty fishy that I do turn it on/off.
So I'm working on removing this code and the question with assert will
disappear on its own.

> >>>>> +static int
> >>>>> +filter_confirm_rollback(struct txn_limbo *limbo,
> >>>>> +			const struct synchro_request *req)
> >>>>> +{
> >>>>> +	/*
> >>>>> +	 * When limbo is empty we have nothing to
> >>>>> +	 * confirm/commit and if this request comes
> >>>>> +	 * in it means the split brain has happened.
> >>>>> +	 */
> >>>>> +	if (!txn_limbo_is_empty(limbo))
> >>>>> +		return 0;
> >>>>
> >>>> 9. What if rollback is for LSN > limbo's last LSN? It
> >>>> also means nothing to do. The same for confirm LSN < limbo's
> >>>> first LSN.
> >>>
> >>> static void
> >>> txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
> >>> {
> >>> -->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));
> >>>
> >>> txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
> >>> {
> >>> -->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));
> >>>
> >>> Currently we're allowed to process empty limbo if only owner is not nil,
> >>> I think I should add this case here.
> >>
> >> My question is not about the owner ID. I asked what if rollback/confirm
> >> try to cover a range not present in the limbo while it is not empty. If
> >> it is not empty, it has an owner obviously. But it does not matter.
> >> What if it has an owner, has transactions, but you got ROLLBACK/CONFIRM
> >> for data out of the LSN range present in the limbo?
> > 
> > Since the terms are matching I think such scenarion should be fine, right?
> > IOW, some old replica has been stopped for some reason and been living out
> > of quorum for some time thus such requests should be considered as OK to
> > pass and when filter accepts them the will reach txn_limbo_read_confirm
> > or txn_limbo_read_rollback where they will be simply ignored as far as I
> > unrestand. IOW, such requests are valid, no?
> 
> If a replica is outdated, it should not matter. It will receive the needed
> data in order anyway. Like if the data was just sent. Hence, it seems
> irrelevant whether it is outdated. And still looks the same as the thing
> you are trying to filter out (when the limbo is empty = confirm/rollback
> do not cover anything too).

Wait, Vlad, I don't understand. When packet comes in we verify for terms
matching, if it doesn't match then we drop the request with error. Now
assume the term is valid and we get confirm/rollback over already processed
entry. Initially I though it is an error due to split-brain because there
is no data in limbo which we can compare against. Then I looked into
txn_limbo_read_confirm and the code silently passes if queue is empty
so I presumed that I simply need to convert the assert() above into
the real verification condition. And after your reply I confused again.

Assume I'm a replica and have no data in limbo, if I obtain some
confirm/rollback it means the master node did some transactions behind my
back so I should refuse to proceed and refetch all data again, right?

Another scenario is that I'm the leader node sent some transactions
then gathered the quorum and make limbo empty, at some moment the
replica will send me confirm packet back and I should simply advance
the vclock and ignore this packet, correct?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests
  2021-08-10 14:36             ` Cyrill Gorcunov via Tarantool-patches
@ 2021-08-12 16:59               ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 0 replies; 24+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-12 16:59 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

On 10.08.2021 17:36, Cyrill Gorcunov wrote:
> On Tue, Aug 10, 2021 at 03:31:04PM +0300, Vladislav Shpilevoy wrote:
>>>
>>>   | I could add some operations here but not sure if it worth it.
>>>
>>> Letme state it clear then - I could add this assert() if you insist
>>> but I think we aready spread too many assertions all over the code,
>>> and if it is possible I would be glad not to add new ones. After all
>>> either we should add this assert() to each filter chain or not add
>>> at all, otherwise there will be kind of code imbalance.
>>
>> What is wrong with the assertions that you don't like adding them?
>> You add panics quite often, and they cost some perf. But asserts
>> just help to catch bugs and cost nothing in Release build.
> 
> I personally think that either some particular condition is critical
> so that you can't continue execution if it failed and because of this
> it must be tested even in release builds. And here panic() is needed.
> Or it is not critical and we don't need assert(). In particular for filtering
> case if we ocasionally called it where should not then it might trigger a
> false positive error breaking the replication but not corrupting data,
> and in such case it is ok and no assertion is needed. In reverse case,
> say enabling filtering in wrong place would cause data corruption then
> we need a panic not assert. So I don't see much point in assert calls
> at all. Surely I can add it if you prefer. Simply don't like.
> 
> You know, we've been talking with Serge today about enabling filtering
> all the time because this looks pretty fishy that I do turn it on/off.
> So I'm working on removing this code and the question with assert will
> disappear on its own.

Assertions help to catch tons of rubbish during tests. Like they
do quite often. Just grep by 'assert' in our github tickets. So please,
add them in all the non-trivial places. It is easier to drop trivial
ones on a review than try to spot places lacking the asserts.

>>>>>>> +static int
>>>>>>> +filter_confirm_rollback(struct txn_limbo *limbo,
>>>>>>> +			const struct synchro_request *req)
>>>>>>> +{
>>>>>>> +	/*
>>>>>>> +	 * When limbo is empty we have nothing to
>>>>>>> +	 * confirm/commit and if this request comes
>>>>>>> +	 * in it means the split brain has happened.
>>>>>>> +	 */
>>>>>>> +	if (!txn_limbo_is_empty(limbo))
>>>>>>> +		return 0;
>>>>>>
>>>>>> 9. What if rollback is for LSN > limbo's last LSN? It
>>>>>> also means nothing to do. The same for confirm LSN < limbo's
>>>>>> first LSN.
>>>>>
>>>>> static void
>>>>> txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
>>>>> {
>>>>> -->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));
>>>>>
>>>>> txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
>>>>> {
>>>>> -->	assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo));
>>>>>
>>>>> Currently we're allowed to process empty limbo if only owner is not nil,
>>>>> I think I should add this case here.
>>>>
>>>> My question is not about the owner ID. I asked what if rollback/confirm
>>>> try to cover a range not present in the limbo while it is not empty. If
>>>> it is not empty, it has an owner obviously. But it does not matter.
>>>> What if it has an owner, has transactions, but you got ROLLBACK/CONFIRM
>>>> for data out of the LSN range present in the limbo?
>>>
>>> Since the terms are matching I think such scenarion should be fine, right?
>>> IOW, some old replica has been stopped for some reason and been living out
>>> of quorum for some time thus such requests should be considered as OK to
>>> pass and when filter accepts them the will reach txn_limbo_read_confirm
>>> or txn_limbo_read_rollback where they will be simply ignored as far as I
>>> unrestand. IOW, such requests are valid, no?
>>
>> If a replica is outdated, it should not matter. It will receive the needed
>> data in order anyway. Like if the data was just sent. Hence, it seems
>> irrelevant whether it is outdated. And still looks the same as the thing
>> you are trying to filter out (when the limbo is empty = confirm/rollback
>> do not cover anything too).
> 
> Wait, Vlad, I don't understand. When packet comes in we verify for terms
> matching, if it doesn't match then we drop the request with error. Now
> assume the term is valid and we get confirm/rollback over already processed
> entry. Initially I though it is an error due to split-brain because there
> is no data in limbo which we can compare against. Then I looked into
> txn_limbo_read_confirm and the code silently passes if queue is empty
> so I presumed that I simply need to convert the assert() above into
> the real verification condition. And after your reply I confused again.
> 
> Assume I'm a replica and have no data in limbo, if I obtain some
> confirm/rollback it means the master node did some transactions behind my
> back so I should refuse to proceed and refetch all data again, right?
> 
> Another scenario is that I'm the leader node sent some transactions
> then gathered the quorum and make limbo empty, at some moment the
> replica will send me confirm packet back and I should simply advance
> the vclock and ignore this packet, correct?

You have an answer in your question - why a valid replica would send
you a confirm on its own? Only your own confirms are valid since you
are the leader from now on.

If you are talking about the replica sending you your own confirm -
it can't happen. Your own data is not sent back to you. Sometimes it
can be delivered indirectly, but it is simply filtered out as already
applied in the applier and never reaches spaces, limbo, wal or anything
else.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms
  2021-08-05 23:29   ` Vladislav Shpilevoy via Tarantool-patches
  2021-08-06 15:20     ` Cyrill Gorcunov via Tarantool-patches
@ 2021-08-23 11:32     ` Serge Petrenko via Tarantool-patches
  2021-08-23 11:41       ` Cyrill Gorcunov via Tarantool-patches
  1 sibling, 1 reply; 24+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-08-23 11:32 UTC (permalink / raw)
  To: Vladislav Shpilevoy, Cyrill Gorcunov, tml



06.08.2021 02:29, Vladislav Shpilevoy via Tarantool-patches пишет:
> Thanks for the patch!
>
>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>> index f621fa657..9db286ae2 100644
>> --- a/src/box/applier.cc
>> +++ b/src/box/applier.cc
>> @@ -856,7 +856,7 @@ apply_synchro_row_cb(struct journal_entry *entry)
>>   		applier_rollback_by_wal_io(entry->res);
>>   	} else {
>>   		replica_txn_wal_write_cb(synchro_entry->rcb);
>> -		txn_limbo_process(&txn_limbo, synchro_entry->req);
>> +		txn_limbo_process_locked(&txn_limbo, synchro_entry->req);
>>   		trigger_run(&replicaset.applier.on_wal_write, NULL);
>>   	}
>>   	fiber_wakeup(synchro_entry->owner);
>> @@ -867,11 +867,13 @@ static int
>>   apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
>>   {
>>   	assert(iproto_type_is_synchro_request(row->type));
>> +	int rc = 0;
>>   
>>   	struct synchro_request req;
>>   	if (xrow_decode_synchro(row, &req) != 0)
>>   		goto err;
>>   
>> +	txn_limbo_term_lock(&txn_limbo);
> Maybe you should hide the lock from the API. Instead, do similar to
> what transactions do:
>
> 	int txn_limbo_process_begin(limbo *);
> 	void txn_limbo_process_commit(limbo *, request *);
> 	void txn_limbo_process_rollback(limbo *);
>
> begin would take the lock, commit would do the request and
> unlock, rollback would only unlock. Commit and rollback you
> call from apply_synchro_row_cb depend in on the WAL write
> result.
>
> Then the locks would disappear from the API, right?
>
> In the next patch you would make txn_limbo_process_begin()
> also take the request to validate it. Then the 'filtering'
> would become internal to the limbo.

I agree with Vlad here.

txn_limbo_process_begin()/commit()/rollback

looks more clean than calling lock()/unlock() manually.

Let's stick with Vlad's proposal then.

>
>>   	struct replica_cb_data rcb_data;
>>   	struct synchro_entry entry;

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms
  2021-08-23 11:32     ` Serge Petrenko via Tarantool-patches
@ 2021-08-23 11:41       ` Cyrill Gorcunov via Tarantool-patches
  2021-09-01 16:04         ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-08-23 11:41 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: Vladislav Shpilevoy, tml

On Mon, Aug 23, 2021 at 02:32:27PM +0300, Serge Petrenko wrote:
> > 
> > In the next patch you would make txn_limbo_process_begin()
> > also take the request to validate it. Then the 'filtering'
> > would become internal to the limbo.
> 
> I agree with Vlad here.
> 
> txn_limbo_process_begin()/commit()/rollback
> 
> looks more clean than calling lock()/unlock() manually.
> 
> Let's stick with Vlad's proposal then.

Sure thing, thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms
  2021-08-23 11:41       ` Cyrill Gorcunov via Tarantool-patches
@ 2021-09-01 16:04         ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 24+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-09-01 16:04 UTC (permalink / raw)
  To: Serge Petrenko, Vladislav Shpilevoy; +Cc: tml

On Mon, Aug 23, 2021 at 02:41:51PM +0300, Cyrill Gorcunov wrote:
> On Mon, Aug 23, 2021 at 02:32:27PM +0300, Serge Petrenko wrote:
> > > 
> > > In the next patch you would make txn_limbo_process_begin()
> > > also take the request to validate it. Then the 'filtering'
> > > would become internal to the limbo.
> > 
> > I agree with Vlad here.
> > 
> > txn_limbo_process_begin()/commit()/rollback
> > 
> > looks more clean than calling lock()/unlock() manually.
> > 
> > Let's stick with Vlad's proposal then.
> 
> Sure thing, thanks!

Guys, this scheme doesn't work, it makes code even more complex.
The problem is that the completion is called from another fiber.
Here is a log from replica

[cyrill@grain 001_replication] grep promote_latch replica.log 
2021-09-01 18:41:42.065 [2061868] main/112/applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i I> limbo: lock promote_latch 112: applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i
2021-09-01 18:41:42.065 [2061868] main/112/applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i I> limbo: unlock promote_latch 112: applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i
2021-09-01 18:41:42.224 [2061868] main/112/applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i I> limbo: lock promote_latch 112: applier/unix/:/home/cyrill/d1/projects/tarantool/tarantool.git/test/var/001_replication/master.i
2021-09-01 18:41:42.225 [2061868] main I> limbo: unlock promote_latch 1: sched
tarantool: /home/cyrill/projects/tarantool/tarantool.git/src/lib/core/latch.h:175: latch_unlock: Assertion `l->owner == fiber()' failed.

Initially the limbo get latched by applier fiber but the completion called from
sched fiber then which cause assertion to trigger. Actually we could add another
helper which would unlock the promote_latch from applier but as to me this all
become an extreme mess, and I'm pretty sure this is a bad design: locks always
should be taken and released from the same function body, until there are some
very serious reasons to not do so.

I used

int
txn_limbo_process_begin(struct txn_limbo *limbo,
			const struct synchro_request *req)
{
	...
	latch_lock(&limbo->promote_latch);
	...
}

txn_limbo_process_commit(struct txn_limbo *limbo,
			 const struct synchro_request *req)
{
	assert(latch_is_locked(&limbo->promote_latch));
	...
	latch_unlock(&limbo->promote_latch);
}

void
txn_limbo_process_rollback(struct txn_limbo *limbo)
{
	assert(latch_is_locked(&limbo->promote_latch));
	latch_unlock(&limbo->promote_latch);
}

and it triggers from

static void
apply_synchro_row_cb(struct journal_entry *entry)
{
	assert(entry->complete_data != NULL);
	struct synchro_entry *synchro_entry =
		(struct synchro_entry *)entry->complete_data;
	if (entry->res < 0) {
		applier_rollback_by_wal_io(entry->res);
-->		txn_limbo_process_rollback(&txn_limbo);
	} else {
		replica_txn_wal_write_cb(synchro_entry->rcb);
-->		txn_limbo_process_commit(&txn_limbo, synchro_entry->req);
		trigger_run(&replicaset.applier.on_wal_write, NULL);
	}
	fiber_wakeup(synchro_entry->owner);
}

I tend to step back to explicit locking/unlocking to not mess with
triggers and completion contexts.

	Cyrill

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-09-01 16:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-04 19:07 [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 1/4] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 2/4] limbo: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
2021-08-05 23:29   ` Vladislav Shpilevoy via Tarantool-patches
2021-08-06 15:20     ` Cyrill Gorcunov via Tarantool-patches
2021-08-08 14:34       ` Vladislav Shpilevoy via Tarantool-patches
2021-08-09 16:24         ` Cyrill Gorcunov via Tarantool-patches
2021-08-10 12:27           ` Vladislav Shpilevoy via Tarantool-patches
2021-08-10 12:57             ` Cyrill Gorcunov via Tarantool-patches
2021-08-23 11:32     ` Serge Petrenko via Tarantool-patches
2021-08-23 11:41       ` Cyrill Gorcunov via Tarantool-patches
2021-09-01 16:04         ` Cyrill Gorcunov via Tarantool-patches
2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches
2021-08-05 23:33   ` Vladislav Shpilevoy via Tarantool-patches
2021-08-06 19:01     ` Cyrill Gorcunov via Tarantool-patches
2021-08-08 11:43       ` Vladislav Shpilevoy via Tarantool-patches
2021-08-08 22:35         ` Cyrill Gorcunov via Tarantool-patches
2021-08-10 12:31           ` Vladislav Shpilevoy via Tarantool-patches
2021-08-10 14:36             ` Cyrill Gorcunov via Tarantool-patches
2021-08-12 16:59               ` Vladislav Shpilevoy via Tarantool-patches
2021-08-04 19:07 ` [Tarantool-patches] [PATCH v10 4/4] test: add replication/gh-6036-rollback-confirm Cyrill Gorcunov via Tarantool-patches
2021-08-05  9:38 ` [Tarantool-patches] [PATCH v10 0/4] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
2021-08-05 23:29 ` Vladislav Shpilevoy via Tarantool-patches
2021-08-08 22:03   ` Cyrill Gorcunov via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox