[Tarantool-patches] [PATCH v14 5/6] qsync: filter incoming synchro requests

Sun Sep 12 18:44:15 MSK 2021

Thanks for the patch!

See 8 comments below.

On 10.09.2021 17:29, Cyrill Gorcunov wrote:
> When we receive synchro requests we can't just apply
> them blindly because in worse case they may come from
> split-brain configuration (where a cluster split into
> several clusters and each one has own leader elected,
> then clusters are trying to merge back into the original
> one). We need to do our best to detect such disunity
> and force these nodes to rejoin from the scratch for
> data consistency sake.
> 
> Thus when we're processing requests we pass them to the
> packet filter first which validates their contents and
> refuse to apply if they are not matched.
> 
> Filter logic depends on request type.
> 
> First there is a common chain for any synchro packet, this
> is kind of a general pre-validation:
>  1) Zero LSN allowed for PROMOTE | DEMOTE packets, since
>     CONFIRM | ROLLBACK has to proceed some real data with
>     LSN already assigned.
>  2) request:replica_id = 0 allowed for PROMOTE request only.
>  3) request:replica_id should match limbo:owner_id, iow the
>     limbo migration should be noticed by all instances in the
>     cluster.
> 
> For CONFIRM and ROLLBACK packets:
>  1) Both of them can't be considered if limbo is already empty,
>     ie there is no data in a local queue and everything is
>     processed already. The request is obviously from the node which
>     is out of date.

1. It is not just about empty. They might try to touch a range
of transactions out of the LSN range waiting in the limbo. Then
their effect is also void. The question remains from the previous
review. What is the resolution here?

Besides, I don't know how could such requests happen, but I don't
how to get the ones in your example either TBH. An theoretical examle?
A test?

> For PROMOTE and DEMOTE packets:
>  1) The requests should come in with nonzero term, otherwise
>     the packet is corrupted.
>  2) The request's term should not be less than maximal known
>     one, iow it should not come in from nodes which didn't notice
>     raft epoch changes and living in the past.
>  3) If LSN of the request matches current confirmed LSN the packet
>     is obviously correct to process.
>  4) If LSN is less than confirmed LSN then the request is wrong,
>     we have processed the requested LSN already.
>  5) If LSN is less than confirmed LSN then

2. You didn't fix the typo from the previous review. Still two
points say "less than confirmed LSN". Please, re-read the comments
of the previous review and address them all. As I already told not
once, it would be best if you would respond to the comments individually
and with diff if possible. Otherwise you will continue missing them.

>     a) If limbo is empty we can't do anything, since data is already
>        processed and should issue an error;
>     b) If there is some data in the limbo then requested LSN should
>        be in range of limbo's [first; last] LSNs, thus the request
>        will be able to commit and rollback limbo queue.
> 
> Because snapshot have promote packet we disable filtering at moment
> of joining to the leader node and similarly due to recovery. The thing
> is that our filtration procedure implies that limbo is already
> initialized to some valid state otherwise we will have to distinguish
> initial states from working ones, this can be done actuially but will
> make code more complex.

3. How 'more complex'? And why do you distinguish between 'initial' and
'working' states? All states should work. Initial only means the limbo
does not belong to anybody.

Currently I only see complexity coming from the filtering being turned
on/off.

> Thus for now lets leave filtration on and off.

Please, find the real reason why is it needed. All states should be
working. 'Initial' state is not any different than for example when
DEMOTE was called.

> diff --git a/changelogs/unreleased/gh-6036-qsync-filter-packets.md b/changelogs/unreleased/gh-6036-qsync-filter-packets.md
> new file mode 100644
> index 000000000..0db629e83
> --- /dev/null
> +++ b/changelogs/unreleased/gh-6036-qsync-filter-packets.md
> @@ -0,0 +1,9 @@
> +## feature/replication

4. It is a bugfix, not feature.

> +
> +* Implemented incoming synchronous packets filtration to discard
> +  requests from outdated cluster nodes. This can happen when
> +  replication cluster is partitioned on a transport level and
> +  two or more sub-clusters are running simultaneously for some
> +  time, then they are trying to merge back. Since the subclusters
> +  had own leaders they should not be able to form original cluster
> +  because data is not longer consistent (gh-6036).> diff --git a/src/box/box.cc b/src/box/box.cc
> index 7b11d56d6..f134dc8bb 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1686,15 +1685,17 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
>  		.lsn = promote_lsn,
>  		.term = raft->term,
>  	};
> -	txn_limbo_process(&txn_limbo, &req);
> +	if (txn_limbo_process(&txn_limbo, &req) != 0)
> +		return -1;

5. There was already done txn_limbo_write_promote() above. If you
bail now, you have an inconsistent state - in WAL the promote is
written, in the limbo it is not applied. What will happen on recovery?

It seems you need to lock box_promote(), box_promote_qsync(),
and box_demote(). Otherwise you have the exact same problem with
local promotions vs coming from the applier as the one you tried
to fix for applier vs applier.

>  	assert(txn_limbo_is_empty(&txn_limbo));
> +	return 0;
>  }
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index 65fbd0cac..925f401e7 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -737,6 +738,261 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout)
>  	return 0;
>  }
>  
> +/**
> + * Fill the reject reason with request data.
> + * The function is not reenterable, use with caution.
> + */
> +static char *
> +reject_str(const struct synchro_request *req)
> +{
> +	static char prefix[128];

6. Please, don't try to re-invent the static buffer. Just use it.

> +
> +	snprintf(prefix, sizeof(prefix), "RAFT: rejecting %s (%d) "
> +		 "request from origin_id %u replica_id %u term %llu",
> +		 iproto_type_name(req->type), req->type,
> +		 req->origin_id, req->replica_id,
> +		 (long long)req->term);
> +
> +	return prefix;
> +}
> +
> +/**
> + * Common chain for any incoming packet.
> + */
> +static int
> +filter_in(struct txn_limbo *limbo, const struct synchro_request *req)
> +{
> +	assert(limbo->is_filtering);
> +
> +	/*
> +	 * Zero LSN are allowed for PROMOTE
> +	 * and DEMOTE requests only.
> +	 */
> +	if (req->lsn == 0 && !iproto_type_is_promote_request(req->type)) {
> +		say_error("%s. Zero lsn detected", reject_str(req));
> +		diag_set(ClientError, ER_CLUSTER_SPLIT,
> +			 "Zero LSN on promote/demote");
> +		return -1;
> +	}
> +
> +	/*
> +	 * Zero @a replica_id is allowed for PROMOTE packets only.

7. Why not for DEMOTE?

> +	 */
> +	if (req->replica_id == REPLICA_ID_NIL &&
> +	    req->type != IPROTO_RAFT_PROMOTE) {
> +		say_error("%s. Zero replica_id detected",
> +			  reject_str(req));
> +		diag_set(ClientError, ER_CLUSTER_SPLIT,
> +			 "Zero replica_id");
> +		return -1;
> +	}
> +
> +	/*
> +	 * Incoming packets should esteem limbo owner,
> +	 * if it doesn't match it means the sender
> +	 * missed limbo owner migrations and out of date.
> +	 */
> +	if (req->replica_id != limbo->owner_id) {
> +		say_error("%s. Limbo owner mismatch, owner_id %u",
> +			  reject_str(req), limbo->owner_id);
> +		diag_set(ClientError, ER_CLUSTER_SPLIT,
> +			 "Sync queue silent owner migration");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * Filter CONFIRM and ROLLBACK packets.
> + */
> +static int
> +filter_confirm_rollback(struct txn_limbo *limbo,

8. The limbo can be const in all filter functions.