Tarantool development patches archive
 help / color / mirror / Atom feed
From: Cyrill Gorcunov via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Serge Petrenko <sergepetrenko@tarantool.org>
Cc: tml <tarantool-patches@dev.tarantool.org>,
	Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests
Date: Tue, 3 Aug 2021 16:49:59 +0300	[thread overview]
Message-ID: <YQlJh5lgO7T10c5e@grain> (raw)
In-Reply-To: <6e48b22a-aedc-c670-3050-29bd4e07b48e@tarantool.org>

On Tue, Aug 03, 2021 at 01:51:44PM +0300, Serge Petrenko wrote:
> 
> > +		if (req->lsn != 0 ||
> > +		    !iproto_type_is_promote_request(req->type)) {
> > +			say_info("RAFT: rejecting %s request from "
> > +				 "instance id %u for term %llu. "
> > +				 "req->replica_id = 0 but lsn %lld.",
> > +				 iproto_type_name(req->type),
> > +				 req->origin_id, (long long)req->term,
> > +				 (long long)req->lsn);
> > +
> > +			diag_set(ClientError, ER_UNSUPPORTED,
> > +				 "Replication",
> > +				 "empty replica_id with nonzero LSN");
> > +			return -1;
> > +		}
> > +	}
> 
> I agree with Vlad. This may be moved to filter_confirm_rollback.

You know, as I replied to Vlad, I think we should do a common
attibutes verification in early stage, because not only due to
network error but for security reasons as well, ie say some fuzzer
in  network could feed us with screwed data, and when we sort packets
via chains to process we could do an early verification first.

But I won't insist, surely I can move it to promote chain.

> > +
> > +	/*
> > +	 * Explicit split brain situation. Promote
> > +	 * comes in with an old LSN which we've already
> > +	 * processed.
> > +	 */
> > +	if (limbo->confirmed_lsn > promote_lsn) {
> > +		/*
> > +		 * If limbo is empty we're migrating
> > +		 * the owner.
> > +		 */
> > +		if (txn_limbo_is_empty(limbo))
> > +			return 0;
> 
> I don't understand this part. Are you sure this check is needed?
> We're always migrating the owner with a promote.

Serge, I think this code hunk is due to my misunderstanding of
owner migration, should remove it I think, thanks!

> > +
> > +	if (txn_limbo_is_empty(limbo)) {
> > +		/*
> > +		 * Transactions are already rolled back
> > +		 * since the limbo is empty.
> > +		 */
> > +		say_info("RAFT: rejecting %s request from "
> > +			 "instance id %u for term %llu. "
> > +			 "confirmed_lsn %lld < promote_lsn %lld "
> > +			 "and empty limbo.",
> > +			 iproto_type_name(req->type),
> > +			 req->origin_id, (long long)req->term,
> > +			 (long long)limbo->confirmed_lsn,
> > +			 (long long)promote_lsn);
> > +
> > +		diag_set(ClientError, ER_UNSUPPORTED,
> > +			 "Replication",
> > +			 "forward promote LSN "
> > +			 "(empty limbo, split brain)");
> > +		return -1;
> 
> I think it'd be better to have a separate error code for this purpose.
> Say, ER_SPLITBRAIN or something.
> Then applier would have more control over what to do when such an error
> is raised. Say, never reconnect. (I doesn't reconnect on ER_UNSUPPORTED,
> I believe, but a distinct error is still better).

Split-bain is one particular case, since more cases why we refuse to proceed
may appear in future maybe ER_REJECT with rejection error (just like netfilter
does?)

> > +	} else {
> > +		/*
> > +		 * Some entries are present in the limbo,
> > +		 * and if first entry's LSN is greater than
> > +		 * requested then old data either commited
> > +		 * or rolled back, so can't continue.
> > +		 */
> > +		struct txn_limbo_entry *first;
> > +
> > +		first = txn_limbo_first_entry(limbo);
> > +		if (first->lsn > promote_lsn) {
> 
> This should only happen when confirmed_lsn > promote_lsn, shouldn't it?
> If yes, than you've already handled it above.

Nope. This is the case where the data is still in limbo and we've not
yet commited it but master already commited/rolledback it already.
If only I'm not misssing something obvious.

> > +
> > +static int (*filter_req[FILTER_MAX])
> > +(struct txn_limbo *limbo, const struct synchro_request *req) = {
> > +	[FILTER_IN]		= filter_in,
> > +	[FILTER_CONFIRM]	= filter_confirm_rollback,
> > +	[FILTER_ROLLBACK]	= filter_confirm_rollback,
> > +	[FILTER_PROMOTE]	= filter_promote,
> > +	[FILTER_DEMOTE]		= filter_demote,
> 
> Demote should be filtered the same way as promote, they are
> basically the same requests with same meaning.
> 
> demote is just a promote for replica id 0, because we couldn't
> do promote replica id 0.

Aha, thanks, will update.

> > -	if (req->replica_id == REPLICA_ID_NIL) {
> > -		/*
> > -		 * The limbo was empty on the instance issuing the request.
> > -		 * This means this instance must empty its limbo as well.
> > -		 */
> > -		assert(lsn == 0 && iproto_type_is_promote_request(req->type));
> > -	} else if (req->replica_id != limbo->owner_id) {
> > +	if (req->replica_id != limbo->owner_id) {
> >   		/*
> >   		 * Ignore CONFIRM/ROLLBACK messages for a foreign master.
> >   		 * These are most likely outdated messages for already confirmed
> 
> We should error when request -> replica_id != limbo->owner_id.
> For every entry type: promote/demote/confirm/rollback.
> 
> req->replica_id != limbo->owner_id means that the remote instance has
> taken some actions in the past (say, confirmed something) of which we
> didn't know until now. This is basically a splitbrain again.

Good point. I think we should put it into filter-in chain then.

	Cyrill

  reply	other threads:[~2021-08-03 13:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30 11:35 [Tarantool-patches] [PATCH v9 0/5] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches
2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 1/5] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 2/5] say: introduce panic_on helper Cyrill Gorcunov via Tarantool-patches
2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 3/5] limbo: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
2021-08-02 23:48   ` Vladislav Shpilevoy via Tarantool-patches
2021-08-03 11:23     ` Cyrill Gorcunov via Tarantool-patches
2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches
2021-08-02 23:50   ` Vladislav Shpilevoy via Tarantool-patches
2021-08-03 13:25     ` Cyrill Gorcunov via Tarantool-patches
2021-08-03 10:51   ` Serge Petrenko via Tarantool-patches
2021-08-03 13:49     ` Cyrill Gorcunov via Tarantool-patches [this message]
2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 5/5] test: add replication/gh-6036-rollback-confirm Cyrill Gorcunov via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YQlJh5lgO7T10c5e@grain \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox