From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 134B26EC58; Tue, 3 Aug 2021 13:51:48 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 134B26EC58 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1627987908; bh=r57a5lOeyd3JkSoMxsL0wvgmfybI/peVvTuNmt0SCI4=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=mwLuWRjX73wFsLmz6JF7NStYpnwumToXC9yH95+EJLYSHts7Ufe3sPWwfoDq0XXgZ HOg4UOz99805YoHZJcMu7EfShC61izRoCandC487TUs7avpkKhouXCGvRH2UAi/x89 gLa7olyEvQW5LajXDbDXRyI+W7Q52bB1jqcwfZCo= Received: from smtp17.mail.ru (smtp17.mail.ru [94.100.176.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 874DD6EC58 for ; Tue, 3 Aug 2021 13:51:46 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 874DD6EC58 Received: by smtp17.mail.ru with esmtpa (envelope-from ) id 1mAs1Q-0002lj-SJ; Tue, 03 Aug 2021 13:51:46 +0300 To: Cyrill Gorcunov , tml Cc: Vladislav Shpilevoy References: <20210730113539.563318-1-gorcunov@gmail.com> <20210730113539.563318-5-gorcunov@gmail.com> Message-ID: <6e48b22a-aedc-c670-3050-29bd4e07b48e@tarantool.org> Date: Tue, 3 Aug 2021 13:51:44 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <20210730113539.563318-5-gorcunov@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD941C43E597735A9C36A98DBA789EBB6AE26DB9A6C1D9BF7E0182A05F538085040494C19C69BFE201132B6C432C18BDE7B90BA57CDDEED0AB8BB3E5050F83A8A0E X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE711269A7C2F827F16EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637A8691684BB8CCFB18638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D87BA8652E746CFE573AEBB49B5259648A117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BF1175FABE1C0F9B6A471835C12D1D977C4224003CC8364762BB6847A3DEAEFB0F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7CB59C7783CC88FA9643847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: 0D63561A33F958A546E1031B66E27031E098B50B52DE1E2FA331A2F2EBA879B6D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA751B940EDA0DFB0535410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34B7CBFF60649FF2669E4F64708AAB9DCC1EAC436359814F2DCCCBE213D9A911A88353092B0A7DB5A01D7E09C32AA3244CCF9496A0161F3387455AB37103127DE5B4DF56057A86259FFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj9N286KAyvN7NUpFiZWTaow== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A4468E0423ED33EBE234C1617516F94C9B97A475549227A3A2E2424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 30.07.2021 14:35, Cyrill Gorcunov пишет: > When we receive synchro requests we can't just apply > them blindly because in worse case they may come from > split-brain configuration (where a cluster splitted into > several subclusters and each one has own leader elected, > then subclisters are trying to merge back into original > cluster). We need to do our best to detect such configs > and force these nodes to rejoin from the scratch for > data consistency sake. > > Thus when we're processing requests we pass them to the > packet filter first which validates their contents and > refuse to apply if they are not matched. > > Depending on request type each packet traverse an > appropriate chain(s) > > FILTER_IN > - Common chain for any synchro packet. We verify > that if replica_id is nil then it shall be > PROMOTE request with lsn 0 to migrate limbo owner > > FILTER_CONFIRM > FILTER_ROLLBACK > - Both confirm and rollback requests shall not come > with empty limbo since it measn the synchro queue > is already processed and the peer didn't notice > that > > FILTER_PROMOTE > - Promote request should come in with new terms only, > otherwise it means the peer didn't notice election > > - If limbo's confirmed_lsn is equal to promote LSN then > it is a valid request to process > > - If limbo's confirmed_lsn is bigger than requested then > it is valid in one case only -- limbo migration so the > queue shall be empty > > - If limbo's confirmed_lsn is less than promote LSN then > - If queue is empty then it means the transactions are > already rolled back and request is invalid > - If queue is not empty then its first entry might be > greater than promote LSN and it means that old data > either committed or rolled back already and request > is invalid > > FILTER_DEMOTE > - NOP, reserved for future use > > Closes #6036 Thanks for the patch! You're definitely moving in the right direction here. Please, find a couple of comments below. > > Signed-off-by: Cyrill Gorcunov > --- > src/box/applier.cc | 21 ++- > src/box/box.cc | 11 +- > src/box/memtx_engine.c | 3 +- > src/box/txn_limbo.c | 304 ++++++++++++++++++++++++++++++++++++++--- > src/box/txn_limbo.h | 33 ++++- > 5 files changed, 347 insertions(+), 25 deletions(-) > > diff --git a/src/box/applier.cc b/src/box/applier.cc > index a7f472714..fefaa4ced 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc ... > diff --git a/src/box/box.cc b/src/box/box.cc > index 5ca617e32..000887aa6 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc ... > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c > index be5e0adf5..b01b5a572 100644 > --- a/src/box/txn_limbo.c > +++ b/src/box/txn_limbo.c > @@ -51,6 +51,7 @@ txn_limbo_create(struct txn_limbo *limbo) > limbo->confirmed_lsn = 0; > limbo->rollback_count = 0; > limbo->is_in_rollback = false; > + limbo->is_filtering = true; > } > > bool > @@ -724,35 +725,291 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout) > return 0; > } > > +enum filter_chain { > + FILTER_IN, > + FILTER_CONFIRM, > + FILTER_ROLLBACK, > + FILTER_PROMOTE, > + FILTER_DEMOTE, > + FILTER_MAX, > +}; > + > +/** > + * Common chain for any incoming packet. > + */ > +static int > +filter_in(struct txn_limbo *limbo, const struct synchro_request *req) > +{ > + (void)limbo; > + > + if (req->replica_id == REPLICA_ID_NIL) { > + /* > + * The limbo was empty on the instance issuing > + * the request. This means this instance must > + * empty its limbo as well. > + */ > + if (req->lsn != 0 || > + !iproto_type_is_promote_request(req->type)) { > + say_info("RAFT: rejecting %s request from " > + "instance id %u for term %llu. " > + "req->replica_id = 0 but lsn %lld.", > + iproto_type_name(req->type), > + req->origin_id, (long long)req->term, > + (long long)req->lsn); > + > + diag_set(ClientError, ER_UNSUPPORTED, > + "Replication", > + "empty replica_id with nonzero LSN"); > + return -1; > + } > + } I agree with Vlad. This may be moved to filter_confirm_rollback. > + > + return 0; > +} > + > +/** > + * Filter CONFIRM and ROLLBACK packets. > + */ > +static int > +filter_confirm_rollback(struct txn_limbo *limbo, > + const struct synchro_request *req) > +{ > + /* > + * When limbo is empty we have nothing to > + * confirm/commit and if this request comes > + * in it means the split brain has happened. > + */ > + if (!txn_limbo_is_empty(limbo)) > + return 0; > + > + say_info("RAFT: rejecting %s request from " > + "instance id %u for term %llu. " > + "Empty limbo detected.", > + iproto_type_name(req->type), > + req->origin_id, > + (long long)req->term); > + > + diag_set(ClientError, ER_UNSUPPORTED, > + "Replication", > + "confirm/rollback with empty limbo"); > + return -1; > +} > + > +/** > + * Filter PROMOTE packets. > + */ > +static int > +filter_promote(struct txn_limbo *limbo, const struct synchro_request *req) > +{ > + int64_t promote_lsn = req->lsn; > + > + /* > + * If the term is already seen it means it comes > + * from a node which didn't notice new elections, > + * thus been living in subdomain and its data is > + * no longer consistent. > + */ > + if (limbo->promote_term_max > 1 && > + limbo->promote_term_max > req->term) { > + say_info("RAFT: rejecting %s request from " > + "instance id %u for term %llu. " > + "Max term seen is %llu.", > + iproto_type_name(req->type), > + req->origin_id, > + (long long)req->term, > + (long long)limbo->promote_term_max); > + > + diag_set(ClientError, ER_UNSUPPORTED, > + "Replication", "obsolete terms"); > + return -1; > + } > + > + /* > + * Either the limbo is empty or new promote will > + * rollback all waiting transactions. Which > + * is fine. > + */ > + if (limbo->confirmed_lsn == promote_lsn) > + return 0; > + > + /* > + * Explicit split brain situation. Promote > + * comes in with an old LSN which we've already > + * processed. > + */ > + if (limbo->confirmed_lsn > promote_lsn) { > + /* > + * If limbo is empty we're migrating > + * the owner. > + */ > + if (txn_limbo_is_empty(limbo)) > + return 0; I don't understand this part. Are you sure this check is needed? We're always migrating the owner with a promote. > + > + say_info("RAFT: rejecting %s request from " > + "instance id %u for term %llu. " > + "confirmed_lsn %lld > promote_lsn %lld.", > + iproto_type_name(req->type), > + req->origin_id, (long long)req->term, > + (long long)limbo->confirmed_lsn, > + (long long)promote_lsn); > + > + diag_set(ClientError, ER_UNSUPPORTED, > + "Replication", > + "backward promote LSN (split brain)"); > + return -1; > + } > + > + /* > + * The last case requires a few subcases. > + */ > + assert(limbo->confirmed_lsn < promote_lsn); > + > + if (txn_limbo_is_empty(limbo)) { > + /* > + * Transactions are already rolled back > + * since the limbo is empty. > + */ > + say_info("RAFT: rejecting %s request from " > + "instance id %u for term %llu. " > + "confirmed_lsn %lld < promote_lsn %lld " > + "and empty limbo.", > + iproto_type_name(req->type), > + req->origin_id, (long long)req->term, > + (long long)limbo->confirmed_lsn, > + (long long)promote_lsn); > + > + diag_set(ClientError, ER_UNSUPPORTED, > + "Replication", > + "forward promote LSN " > + "(empty limbo, split brain)"); > + return -1; I think it'd be better to have a separate error code for this purpose. Say, ER_SPLITBRAIN or something. Then applier would have more control over what to do when such an error is raised. Say, never reconnect. (I doesn't reconnect on ER_UNSUPPORTED, I believe, but a distinct error is still better). > + } else { > + /* > + * Some entries are present in the limbo, > + * and if first entry's LSN is greater than > + * requested then old data either commited > + * or rolled back, so can't continue. > + */ > + struct txn_limbo_entry *first; > + > + first = txn_limbo_first_entry(limbo); > + if (first->lsn > promote_lsn) { This should only happen when confirmed_lsn > promote_lsn, shouldn't it? If yes, than you've already handled it above. > + say_info("RAFT: rejecting %s request from " > + "instance id %u for term %llu. " > + "confirmed_lsn %lld < promote_lsn %lld " > + "and limbo first lsn %lld.", > + iproto_type_name(req->type), > + req->origin_id, (long long)req->term, > + (long long)limbo->confirmed_lsn, > + (long long)promote_lsn, > + (long long)first->lsn); > + > + diag_set(ClientError, ER_UNSUPPORTED, > + "Replication", > + "promote LSN confilict " > + "(limbo LSN ahead, split brain)"); > + return -1; > + } > + } > + > + return 0; > +} > + > +/** > + * Filter DEMOTE packets. > + */ > +static int > +filter_demote(struct txn_limbo *limbo, const struct synchro_request *req) > +{ > + (void)limbo; > + (void)req; > + return 0; > +} > + > +static int (*filter_req[FILTER_MAX]) > +(struct txn_limbo *limbo, const struct synchro_request *req) = { > + [FILTER_IN] = filter_in, > + [FILTER_CONFIRM] = filter_confirm_rollback, > + [FILTER_ROLLBACK] = filter_confirm_rollback, > + [FILTER_PROMOTE] = filter_promote, > + [FILTER_DEMOTE] = filter_demote, Demote should be filtered the same way as promote, they are basically the same requests with same meaning. demote is just a promote for replica id 0, because we couldn't do promote replica id 0. > +}; > + > +int > +txn_limbo_filter_locked(struct txn_limbo *limbo, > + const struct synchro_request *req) > +{ > + unsigned int mask = (1u << FILTER_IN); > + unsigned int pos = 0; > + > +#ifndef NDEBUG > + say_info("limbo: filter %s replica_id %u origin_id %u " > + "term %lld lsn %lld, queue owner_id %u len %lld " > + "confirmed_lsn %lld (%s)", > + iproto_type_name(req->type), > + req->replica_id, req->origin_id, > + (long long)req->term, (long long)req->lsn, > + limbo->owner_id, (long long)limbo->len, > + (long long)limbo->confirmed_lsn, > + limbo->is_filtering ? "on" : "off"); > +#endif > + > + if (!limbo->is_filtering) > + return 0; > + > + switch (req->type) { > + case IPROTO_CONFIRM: > + mask |= (1u << FILTER_CONFIRM); > + break; > + case IPROTO_ROLLBACK: > + mask |= (1u << FILTER_ROLLBACK); > + break; > + case IPROTO_PROMOTE: > + mask |= (1u << FILTER_PROMOTE); > + break; > + case IPROTO_DEMOTE: > + mask |= (1u << FILTER_DEMOTE); > + break; > + default: > + say_info("RAFT: rejecting unexpected %d " > + "request from instance id %u " > + "for term %llu.", > + req->type, req->origin_id, > + (long long)req->term); > + diag_set(ClientError, ER_UNSUPPORTED, > + "Replication", > + "unexpected request type"); > + return -1; > + } > + > + while (mask != 0) { > + if ((mask & 1) != 0) { > + assert(pos < lengthof(filter_req)); > + if (filter_req[pos](limbo, req) != 0) > + return -1; > + } > + pos++; > + mask >>= 1; > + }; > + > + return 0; > +} > + > void > txn_limbo_process_locked(struct txn_limbo *limbo, > const struct synchro_request *req) > { > uint64_t term = req->term; > uint32_t origin = req->origin_id; > + > if (txn_limbo_term_locked(limbo, origin) < term) { > vclock_follow(&limbo->promote_term_map, origin, term); > if (term > limbo->promote_term_max) > limbo->promote_term_max = term; > - } else if (iproto_type_is_promote_request(req->type) && > - limbo->promote_term_max > 1) { > - /* PROMOTE for outdated term. Ignore. */ > - say_info("RAFT: ignoring %s request from instance " > - "id %u for term %llu. Greatest term seen " > - "before (%llu) is bigger.", > - iproto_type_name(req->type), origin, (long long)term, > - (long long)limbo->promote_term_max); > - return; > } > > int64_t lsn = req->lsn; > - if (req->replica_id == REPLICA_ID_NIL) { > - /* > - * The limbo was empty on the instance issuing the request. > - * This means this instance must empty its limbo as well. > - */ > - assert(lsn == 0 && iproto_type_is_promote_request(req->type)); > - } else if (req->replica_id != limbo->owner_id) { > + if (req->replica_id != limbo->owner_id) { > /* > * Ignore CONFIRM/ROLLBACK messages for a foreign master. > * These are most likely outdated messages for already confirmed We should error when request -> replica_id != limbo->owner_id. For every entry type: promote/demote/confirm/rollback. req->replica_id != limbo->owner_id means that the remote instance has taken some actions in the past (say, confirmed something) of which we didn't know until now. This is basically a splitbrain again. > @@ -783,18 +1040,25 @@ txn_limbo_process_locked(struct txn_limbo *limbo, > txn_limbo_read_demote(limbo, lsn); > break; > default: > - unreachable(); > + panic("limbo: unexpected request type %d", > + req->type); > + break; > } > - return; > } > > -void > +int > txn_limbo_process(struct txn_limbo *limbo, > const struct synchro_request *req) > { > + int rc; > + > txn_limbo_term_lock(limbo); > - txn_limbo_process_locked(limbo, req); > + rc = txn_limbo_filter_locked(limbo, req); > + if (rc == 0) > + txn_limbo_process_locked(limbo, req); > txn_limbo_term_unlock(limbo); > + > + return rc; > } > > void > diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h > index 25faffd2b..eb74dda00 100644 > --- a/src/box/txn_limbo.h > +++ b/src/box/txn_limbo.h > @@ -184,6 +184,14 @@ struct txn_limbo { > * by the 'reversed rollback order' rule - contradiction. > */ > bool is_in_rollback; > + /** > + * Whether the limbo should filter incoming requests. > + * The phases of local recovery from WAL file and on applier's > + * join phase we are in complete trust of incoming data because > + * this data forms an initial limbo state and should not > + * filter out requests. > + */ > + bool is_filtering; > }; > > /** > @@ -355,15 +363,38 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn); > int > txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry); > > +/** > + * Verify if the request is valid for processing. > + */ > +int > +txn_limbo_filter_locked(struct txn_limbo *limbo, > + const struct synchro_request *req); > + > /** Execute a synchronous replication request. */ > void > txn_limbo_process_locked(struct txn_limbo *limbo, > const struct synchro_request *req); > > /** Lock limbo terms and execute a synchronous replication request. */ > -void > +int > txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req); > > +/** Enable filtering of synchro requests. */ > +static inline void > +txn_limbo_filter_enable(struct txn_limbo *limbo) > +{ > + limbo->is_filtering = true; > + say_info("limbo: filter enabled"); > +} > + > +/** Disable filtering of synchro requests. */ > +static inline void > +txn_limbo_filter_disable(struct txn_limbo *limbo) > +{ > + limbo->is_filtering = false; > + say_info("limbo: filter disabled"); > +} > + > /** > * Waiting for confirmation of all "sync" transactions > * during confirm timeout or fail. -- Serge Petrenko