From: Cyrill Gorcunov via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Serge Petrenko <sergepetrenko@tarantool.org> Cc: tml <tarantool-patches@dev.tarantool.org>, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests Date: Tue, 3 Aug 2021 16:49:59 +0300 [thread overview] Message-ID: <YQlJh5lgO7T10c5e@grain> (raw) In-Reply-To: <6e48b22a-aedc-c670-3050-29bd4e07b48e@tarantool.org> On Tue, Aug 03, 2021 at 01:51:44PM +0300, Serge Petrenko wrote: > > > + if (req->lsn != 0 || > > + !iproto_type_is_promote_request(req->type)) { > > + say_info("RAFT: rejecting %s request from " > > + "instance id %u for term %llu. " > > + "req->replica_id = 0 but lsn %lld.", > > + iproto_type_name(req->type), > > + req->origin_id, (long long)req->term, > > + (long long)req->lsn); > > + > > + diag_set(ClientError, ER_UNSUPPORTED, > > + "Replication", > > + "empty replica_id with nonzero LSN"); > > + return -1; > > + } > > + } > > I agree with Vlad. This may be moved to filter_confirm_rollback. You know, as I replied to Vlad, I think we should do a common attibutes verification in early stage, because not only due to network error but for security reasons as well, ie say some fuzzer in network could feed us with screwed data, and when we sort packets via chains to process we could do an early verification first. But I won't insist, surely I can move it to promote chain. > > + > > + /* > > + * Explicit split brain situation. Promote > > + * comes in with an old LSN which we've already > > + * processed. > > + */ > > + if (limbo->confirmed_lsn > promote_lsn) { > > + /* > > + * If limbo is empty we're migrating > > + * the owner. > > + */ > > + if (txn_limbo_is_empty(limbo)) > > + return 0; > > I don't understand this part. Are you sure this check is needed? > We're always migrating the owner with a promote. Serge, I think this code hunk is due to my misunderstanding of owner migration, should remove it I think, thanks! > > + > > + if (txn_limbo_is_empty(limbo)) { > > + /* > > + * Transactions are already rolled back > > + * since the limbo is empty. > > + */ > > + say_info("RAFT: rejecting %s request from " > > + "instance id %u for term %llu. " > > + "confirmed_lsn %lld < promote_lsn %lld " > > + "and empty limbo.", > > + iproto_type_name(req->type), > > + req->origin_id, (long long)req->term, > > + (long long)limbo->confirmed_lsn, > > + (long long)promote_lsn); > > + > > + diag_set(ClientError, ER_UNSUPPORTED, > > + "Replication", > > + "forward promote LSN " > > + "(empty limbo, split brain)"); > > + return -1; > > I think it'd be better to have a separate error code for this purpose. > Say, ER_SPLITBRAIN or something. > Then applier would have more control over what to do when such an error > is raised. Say, never reconnect. (I doesn't reconnect on ER_UNSUPPORTED, > I believe, but a distinct error is still better). Split-bain is one particular case, since more cases why we refuse to proceed may appear in future maybe ER_REJECT with rejection error (just like netfilter does?) > > + } else { > > + /* > > + * Some entries are present in the limbo, > > + * and if first entry's LSN is greater than > > + * requested then old data either commited > > + * or rolled back, so can't continue. > > + */ > > + struct txn_limbo_entry *first; > > + > > + first = txn_limbo_first_entry(limbo); > > + if (first->lsn > promote_lsn) { > > This should only happen when confirmed_lsn > promote_lsn, shouldn't it? > If yes, than you've already handled it above. Nope. This is the case where the data is still in limbo and we've not yet commited it but master already commited/rolledback it already. If only I'm not misssing something obvious. > > + > > +static int (*filter_req[FILTER_MAX]) > > +(struct txn_limbo *limbo, const struct synchro_request *req) = { > > + [FILTER_IN] = filter_in, > > + [FILTER_CONFIRM] = filter_confirm_rollback, > > + [FILTER_ROLLBACK] = filter_confirm_rollback, > > + [FILTER_PROMOTE] = filter_promote, > > + [FILTER_DEMOTE] = filter_demote, > > Demote should be filtered the same way as promote, they are > basically the same requests with same meaning. > > demote is just a promote for replica id 0, because we couldn't > do promote replica id 0. Aha, thanks, will update. > > - if (req->replica_id == REPLICA_ID_NIL) { > > - /* > > - * The limbo was empty on the instance issuing the request. > > - * This means this instance must empty its limbo as well. > > - */ > > - assert(lsn == 0 && iproto_type_is_promote_request(req->type)); > > - } else if (req->replica_id != limbo->owner_id) { > > + if (req->replica_id != limbo->owner_id) { > > /* > > * Ignore CONFIRM/ROLLBACK messages for a foreign master. > > * These are most likely outdated messages for already confirmed > > We should error when request -> replica_id != limbo->owner_id. > For every entry type: promote/demote/confirm/rollback. > > req->replica_id != limbo->owner_id means that the remote instance has > taken some actions in the past (say, confirmed something) of which we > didn't know until now. This is basically a splitbrain again. Good point. I think we should put it into filter-in chain then. Cyrill
next prev parent reply other threads:[~2021-08-03 13:50 UTC|newest] Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-30 11:35 [Tarantool-patches] [PATCH v9 0/5] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches 2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 1/5] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches 2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 2/5] say: introduce panic_on helper Cyrill Gorcunov via Tarantool-patches 2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 3/5] limbo: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches 2021-08-02 23:48 ` Vladislav Shpilevoy via Tarantool-patches 2021-08-03 11:23 ` Cyrill Gorcunov via Tarantool-patches 2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches 2021-08-02 23:50 ` Vladislav Shpilevoy via Tarantool-patches 2021-08-03 13:25 ` Cyrill Gorcunov via Tarantool-patches 2021-08-03 10:51 ` Serge Petrenko via Tarantool-patches 2021-08-03 13:49 ` Cyrill Gorcunov via Tarantool-patches [this message] 2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 5/5] test: add replication/gh-6036-rollback-confirm Cyrill Gorcunov via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YQlJh5lgO7T10c5e@grain \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox