From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id F0E166EC58; Tue, 3 Aug 2021 02:50:51 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org F0E166EC58 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1627948252; bh=iLER5XW4V12O7m2IOEL1U10dZMcy1rvLmIlOIq+oado=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=eVvQVV/fniuBznAVotRE41TSc6QvLUa17H3CH362UFU32ea1qaKXgu4oA9wKyh0J+ bC/YN5GzR3jLwO9729t7sPDu46kqaUt9/12m3SUk/W2PpK/lYGo2yr7yOnf3KE6Fuh Zy5myb+ajBUDwztyC/B8WZLLoxXxFsZ5eyg4xpec= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id EE3AC6EC58 for ; Tue, 3 Aug 2021 02:50:50 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org EE3AC6EC58 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1mAhhq-00043q-5H; Tue, 03 Aug 2021 02:50:50 +0300 To: Cyrill Gorcunov , tml References: <20210730113539.563318-1-gorcunov@gmail.com> <20210730113539.563318-5-gorcunov@gmail.com> Message-ID: Date: Tue, 3 Aug 2021 01:50:49 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <20210730113539.563318-5-gorcunov@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-4EC0790: 10 X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD941C43E597735A9C30A5AB0699C09BB51E5FD76225F0C99C3182A05F538085040A7EE5AF0D8A64EDE98C26527996D7364582B61B7DFFDC2B2F35A52F595CD2E57 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE70C5E0F71D77D667BEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006377CB294AC37272EBD8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8C8BC9F9A97A7E093514D9C7455217000117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC55B19328CBC4F849A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352026055571C92BF10FBDFBBEFFF4125B51D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B613439FA09F3DCB32089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A213B5FB47DCBC3458834459D11680B505B82F503E3171A40EA7627F2A65545C38 X-C1DE0DAB: 0D63561A33F958A572023FF507A1E6F1FFFDCF4A1B21967EEA0ABFFA47CB3226D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA751B940EDA0DFB0535410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3473457D764E1CDE77CB33D15FD0FE90F88302793277F43B9A855DA2C172A6A01BFE0A9D1F3BE359D81D7E09C32AA3244C609B06762C7FA6B1A64B588464A8AA163A92A9747B6CC886FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj9N286KAyvN5fSBJ+EYw8Bw== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5DC98FD151CB7B3E56CDBD845263B55C553841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Thanks for the patch! On 30.07.2021 13:35, Cyrill Gorcunov wrote: > When we receive synchro requests we can't just apply > them blindly because in worse case they may come from > split-brain configuration (where a cluster splitted into splitted -> split. > several subclusters and each one has own leader elected, > then subclisters are trying to merge back into original subclisters -> subclusters. > cluster). We need to do our best to detect such configs > and force these nodes to rejoin from the scratch for > data consistency sake. > > Thus when we're processing requests we pass them to the > packet filter first which validates their contents and > refuse to apply if they are not matched. > > Depending on request type each packet traverse an > appropriate chain(s) > > FILTER_IN > - Common chain for any synchro packet. We verify > that if replica_id is nil then it shall be > PROMOTE request with lsn 0 to migrate limbo owner How can it be 0 for non PROMOTE/DEMOTE requests? Do we ever encode such rows at all? Why isn't this a part of FILTER_PROMOTE? > FILTER_CONFIRM > FILTER_ROLLBACK > - Both confirm and rollback requests shall not come > with empty limbo since it measn the synchro queue measn -> means. > is already processed and the peer didn't notice > that Is it the only issue? What about ROLLBACK coming to an instance, which already made PROMOTE on the rolled back data? That is a part of the original problem in the ticket. > FILTER_PROMOTE > - Promote request should come in with new terms only, > otherwise it means the peer didn't notice election > > - If limbo's confirmed_lsn is equal to promote LSN then > it is a valid request to process > > - If limbo's confirmed_lsn is bigger than requested then > it is valid in one case only -- limbo migration so the > queue shall be empty I don't understand. How is it valid? PROMOTE(lsn) rolls back everything > lsn. If the local confirmed_lsn > lsn, it means that data can't be rolled back now and the data becomes inconsistent. > - If limbo's confirmed_lsn is less than promote LSN then > - If queue is empty then it means the transactions are > already rolled back and request is invalid > - If queue is not empty then its first entry might be > greater than promote LSN and it means that old data > either committed or rolled back already and request > is invalid If the first entry's LSN in the limbo > promote LSN, it means it wasn't committed yet. The promote will roll it back and it is fine. This will make the data consistent. The problem appears if there were some other sync txns rolled back or even committed with quorum=1 before this hanging txn. And I don't remember we figured a way to distinguish between these situations. Did we? I didn't get to the code yet. Will do later. > FILTER_DEMOTE > - NOP, reserved for future use > > Closes #6036