From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 4F25A6FC8F; Sun, 12 Sep 2021 18:45:54 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4F25A6FC8F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1631461554; bh=7XIrgpWTXnwQdiuZgghHP5ROfYu0IG+qhzvFAHKy41E=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=UnwjjotoR/yQFJAdN+OA3jgdPwKVtEYOYORwDQsDz4Q3ZDSIP7QE6rp/6Jgpw/rpG P1fG8Kfp3LuCbDYAGGdqE1aNo+t14pyAtTROsLO4ip1ddf61AMD39B7KluIcUUXHRv b9nz9kkARNI+G39vNJHWy+Ze7R4SELDy2IKHMxXk= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id C33C8718A2 for ; Sun, 12 Sep 2021 18:44:16 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C33C8718A2 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1mPReR-0006qt-Qu; Sun, 12 Sep 2021 18:44:16 +0300 To: Cyrill Gorcunov , tml References: <20210910152910.607398-1-gorcunov@gmail.com> <20210910152910.607398-6-gorcunov@gmail.com> Message-ID: Date: Sun, 12 Sep 2021 17:44:15 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <20210910152910.607398-6-gorcunov@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD91AE02D33A9C88A2FECE56E2A292C9F7D3C69CC5C84EEB3A800894C459B0CD1B9F1400C1288712568D7EE90866D65D87606C63B2D39B5180C2A407B5933F66546 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7495A032B936E882FEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006370CE4B4B08BC34B6C8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8B765CEC13771D2CC0A7E90A4379347D0117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC974A882099E279BDA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18618001F51B5FD3F9D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6A1DCCEB63E2F10FB089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A526FA6E316E23BB1B08C7AE44A40A30363524A64561BD0DFBD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA756665624D6DDF07B5410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34BDCC71B3781C9C950C187C108A24FE88B5E2F06AC1F0EECE57FE4DFF45F5BD1D4F5E21463747994E1D7E09C32AA3244C982E1E463DC0EE8F1EB04D63B645EC0E8A6D4CC6FBFAC251FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojuaewUBhMl4wWjHN7DCq7Uw== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5DAFB3A1834A2AC83D9A94D6E24F5BAED53841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v14 5/6] qsync: filter incoming synchro requests X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Thanks for the patch! See 8 comments below. On 10.09.2021 17:29, Cyrill Gorcunov wrote: > When we receive synchro requests we can't just apply > them blindly because in worse case they may come from > split-brain configuration (where a cluster split into > several clusters and each one has own leader elected, > then clusters are trying to merge back into the original > one). We need to do our best to detect such disunity > and force these nodes to rejoin from the scratch for > data consistency sake. > > Thus when we're processing requests we pass them to the > packet filter first which validates their contents and > refuse to apply if they are not matched. > > Filter logic depends on request type. > > First there is a common chain for any synchro packet, this > is kind of a general pre-validation: > 1) Zero LSN allowed for PROMOTE | DEMOTE packets, since > CONFIRM | ROLLBACK has to proceed some real data with > LSN already assigned. > 2) request:replica_id = 0 allowed for PROMOTE request only. > 3) request:replica_id should match limbo:owner_id, iow the > limbo migration should be noticed by all instances in the > cluster. > > For CONFIRM and ROLLBACK packets: > 1) Both of them can't be considered if limbo is already empty, > ie there is no data in a local queue and everything is > processed already. The request is obviously from the node which > is out of date. 1. It is not just about empty. They might try to touch a range of transactions out of the LSN range waiting in the limbo. Then their effect is also void. The question remains from the previous review. What is the resolution here? Besides, I don't know how could such requests happen, but I don't how to get the ones in your example either TBH. An theoretical examle? A test? > For PROMOTE and DEMOTE packets: > 1) The requests should come in with nonzero term, otherwise > the packet is corrupted. > 2) The request's term should not be less than maximal known > one, iow it should not come in from nodes which didn't notice > raft epoch changes and living in the past. > 3) If LSN of the request matches current confirmed LSN the packet > is obviously correct to process. > 4) If LSN is less than confirmed LSN then the request is wrong, > we have processed the requested LSN already. > 5) If LSN is less than confirmed LSN then 2. You didn't fix the typo from the previous review. Still two points say "less than confirmed LSN". Please, re-read the comments of the previous review and address them all. As I already told not once, it would be best if you would respond to the comments individually and with diff if possible. Otherwise you will continue missing them. > a) If limbo is empty we can't do anything, since data is already > processed and should issue an error; > b) If there is some data in the limbo then requested LSN should > be in range of limbo's [first; last] LSNs, thus the request > will be able to commit and rollback limbo queue. > > Because snapshot have promote packet we disable filtering at moment > of joining to the leader node and similarly due to recovery. The thing > is that our filtration procedure implies that limbo is already > initialized to some valid state otherwise we will have to distinguish > initial states from working ones, this can be done actuially but will > make code more complex. 3. How 'more complex'? And why do you distinguish between 'initial' and 'working' states? All states should work. Initial only means the limbo does not belong to anybody. Currently I only see complexity coming from the filtering being turned on/off. > Thus for now lets leave filtration on and off. Please, find the real reason why is it needed. All states should be working. 'Initial' state is not any different than for example when DEMOTE was called. > diff --git a/changelogs/unreleased/gh-6036-qsync-filter-packets.md b/changelogs/unreleased/gh-6036-qsync-filter-packets.md > new file mode 100644 > index 000000000..0db629e83 > --- /dev/null > +++ b/changelogs/unreleased/gh-6036-qsync-filter-packets.md > @@ -0,0 +1,9 @@ > +## feature/replication 4. It is a bugfix, not feature. > + > +* Implemented incoming synchronous packets filtration to discard > + requests from outdated cluster nodes. This can happen when > + replication cluster is partitioned on a transport level and > + two or more sub-clusters are running simultaneously for some > + time, then they are trying to merge back. Since the subclusters > + had own leaders they should not be able to form original cluster > + because data is not longer consistent (gh-6036).> diff --git a/src/box/box.cc b/src/box/box.cc > index 7b11d56d6..f134dc8bb 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -1686,15 +1685,17 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn) > .lsn = promote_lsn, > .term = raft->term, > }; > - txn_limbo_process(&txn_limbo, &req); > + if (txn_limbo_process(&txn_limbo, &req) != 0) > + return -1; 5. There was already done txn_limbo_write_promote() above. If you bail now, you have an inconsistent state - in WAL the promote is written, in the limbo it is not applied. What will happen on recovery? It seems you need to lock box_promote(), box_promote_qsync(), and box_demote(). Otherwise you have the exact same problem with local promotions vs coming from the applier as the one you tried to fix for applier vs applier. > assert(txn_limbo_is_empty(&txn_limbo)); > + return 0; > } > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c > index 65fbd0cac..925f401e7 100644 > --- a/src/box/txn_limbo.c > +++ b/src/box/txn_limbo.c > @@ -737,6 +738,261 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout) > return 0; > } > > +/** > + * Fill the reject reason with request data. > + * The function is not reenterable, use with caution. > + */ > +static char * > +reject_str(const struct synchro_request *req) > +{ > + static char prefix[128]; 6. Please, don't try to re-invent the static buffer. Just use it. > + > + snprintf(prefix, sizeof(prefix), "RAFT: rejecting %s (%d) " > + "request from origin_id %u replica_id %u term %llu", > + iproto_type_name(req->type), req->type, > + req->origin_id, req->replica_id, > + (long long)req->term); > + > + return prefix; > +} > + > +/** > + * Common chain for any incoming packet. > + */ > +static int > +filter_in(struct txn_limbo *limbo, const struct synchro_request *req) > +{ > + assert(limbo->is_filtering); > + > + /* > + * Zero LSN are allowed for PROMOTE > + * and DEMOTE requests only. > + */ > + if (req->lsn == 0 && !iproto_type_is_promote_request(req->type)) { > + say_error("%s. Zero lsn detected", reject_str(req)); > + diag_set(ClientError, ER_CLUSTER_SPLIT, > + "Zero LSN on promote/demote"); > + return -1; > + } > + > + /* > + * Zero @a replica_id is allowed for PROMOTE packets only. 7. Why not for DEMOTE? > + */ > + if (req->replica_id == REPLICA_ID_NIL && > + req->type != IPROTO_RAFT_PROMOTE) { > + say_error("%s. Zero replica_id detected", > + reject_str(req)); > + diag_set(ClientError, ER_CLUSTER_SPLIT, > + "Zero replica_id"); > + return -1; > + } > + > + /* > + * Incoming packets should esteem limbo owner, > + * if it doesn't match it means the sender > + * missed limbo owner migrations and out of date. > + */ > + if (req->replica_id != limbo->owner_id) { > + say_error("%s. Limbo owner mismatch, owner_id %u", > + reject_str(req), limbo->owner_id); > + diag_set(ClientError, ER_CLUSTER_SPLIT, > + "Sync queue silent owner migration"); > + return -1; > + } > + > + return 0; > +} > + > +/** > + * Filter CONFIRM and ROLLBACK packets. > + */ > +static int > +filter_confirm_rollback(struct txn_limbo *limbo, 8. The limbo can be const in all filter functions.