From: Cyrill Gorcunov via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: tml <tarantool-patches@dev.tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests Date: Tue, 3 Aug 2021 16:25:59 +0300 [thread overview] Message-ID: <YQlD5zn0iavTv4kK@grain> (raw) In-Reply-To: <ff186dbc-1da2-7654-0159-cca5645c63b9@tarantool.org> On Tue, Aug 03, 2021 at 01:50:49AM +0200, Vladislav Shpilevoy wrote: > Thanks for the patch! > > On 30.07.2021 13:35, Cyrill Gorcunov wrote: > > When we receive synchro requests we can't just apply > > them blindly because in worse case they may come from > > split-brain configuration (where a cluster splitted into > > splitted -> split. > > > several subclusters and each one has own leader elected, > > then subclisters are trying to merge back into original > > subclisters -> subclusters. Thanks! > > > cluster). We need to do our best to detect such configs > > and force these nodes to rejoin from the scratch for > > data consistency sake. > > > > Thus when we're processing requests we pass them to the > > packet filter first which validates their contents and > > refuse to apply if they are not matched. > > > > Depending on request type each packet traverse an > > appropriate chain(s) > > > > FILTER_IN > > - Common chain for any synchro packet. We verify > > that if replica_id is nil then it shall be > > PROMOTE request with lsn 0 to migrate limbo owner > > How can it be 0 for non PROMOTE/DEMOTE requests? > Do we ever encode such rows at all? Why isn't this > a part of FILTER_PROMOTE? There could be network errors for example, thus when we see synchro type of a packet we need to verify its common attributes first before passing to the next chain. These attributes are not depending on limbo state I think. It this particular case if we see a packet with replica_id is nil then it must be a promote/demote request. Otherwise I'll have to add these tests to every non promote/demote packets. For example imagine a packet {rollback | lsn = 0}, it is obviously wrong because we don't have lsn = 0 records at all. Thus either I should this test inside confirm/rollback chains or make a common helper which would make a general validation of incming synchro packets. For this sake filter-in chain has been introduced. > > > FILTER_CONFIRM > > FILTER_ROLLBACK > > - Both confirm and rollback requests shall not come > > with empty limbo since it measn the synchro queue > > measn -> means. thanks > > > is already processed and the peer didn't notice > > that > > Is it the only issue? What about ROLLBACK coming to > an instance, which already made PROMOTE on the rolled back > data? That is a part of the original problem in the ticket. Then it is an error as far as I understand. There is no more queued data and promote request basically dropped any information we've had in memory related to the limbo state. The promote request implies that the node where it is executed is a raft leader and its data is only valid point for all other node in clusters. Thus if in receve confirm/rollback request on rows which are already commited (or rolled back) via promote request, then other peer should exit with error. Don't we? Or I miss something? > > FILTER_PROMOTE > > - Promote request should come in with new terms only, > > otherwise it means the peer didn't notice election > > > > - If limbo's confirmed_lsn is equal to promote LSN then > > it is a valid request to process > > > > - If limbo's confirmed_lsn is bigger than requested then > > it is valid in one case only -- limbo migration so the > > queue shall be empty > > I don't understand. How is it valid? PROMOTE(lsn) rolls > back everything > lsn. If the local confirmed_lsn > lsn, it > means that data can't be rolled back now and the data becomes > inconsistent. IIRC this was a scenario where we're migrating a limbo owner, I think the scenario with owner migration is yet unclear for me, need to revisit this moment. > > > - If limbo's confirmed_lsn is less than promote LSN then > > - If queue is empty then it means the transactions are > > already rolled back and request is invalid > > - If queue is not empty then its first entry might be > > greater than promote LSN and it means that old data > > either committed or rolled back already and request > > is invalid > > If the first entry's LSN in the limbo > promote LSN, it > means it wasn't committed yet. The promote will roll it back > and it is fine. This will make the data consistent. quoting you: > Первая транзакция лимба имеет lsn > promote lsn. Это уже конец. > Потому что старый мастер уже старые данные либо закатил, либо > откатил, уже неважно, и это сплит бреин. translation: > First limbo transaction lsn > promote lsn. This is the end. > Because old master has committed or rolled back the data > already, it doesn't matter, this is a split brain situation. Maybe I got you wrong? > > The problem appears if there were some other sync txns > rolled back or even committed with quorum=1 before this > hanging txn. And I don't remember we figured a way to > distinguish between these situations. Did we? Seems like not yet. Need more time to think if we can have some other scenarios here...
next prev parent reply other threads:[~2021-08-03 13:26 UTC|newest] Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-30 11:35 [Tarantool-patches] [PATCH v9 0/5] limbo: implement packets filtering Cyrill Gorcunov via Tarantool-patches 2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 1/5] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches 2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 2/5] say: introduce panic_on helper Cyrill Gorcunov via Tarantool-patches 2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 3/5] limbo: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches 2021-08-02 23:48 ` Vladislav Shpilevoy via Tarantool-patches 2021-08-03 11:23 ` Cyrill Gorcunov via Tarantool-patches 2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches 2021-08-02 23:50 ` Vladislav Shpilevoy via Tarantool-patches 2021-08-03 13:25 ` Cyrill Gorcunov via Tarantool-patches [this message] 2021-08-03 10:51 ` Serge Petrenko via Tarantool-patches 2021-08-03 13:49 ` Cyrill Gorcunov via Tarantool-patches 2021-07-30 11:35 ` [Tarantool-patches] [PATCH v9 5/5] test: add replication/gh-6036-rollback-confirm Cyrill Gorcunov via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YQlD5zn0iavTv4kK@grain \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox