From: Cyrill Gorcunov via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Serge Petrenko <sergepetrenko@tarantool.org>, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: tml <tarantool-patches@dev.tarantool.org> Subject: [Tarantool-patches] [RFC] qsync: overall design Date: Mon, 13 Sep 2021 17:20:45 +0300 [thread overview] Message-ID: <YT9ePZXvVuUZvW0f@grain> (raw) In-Reply-To: <17e39694-2f8b-da02-6009-c97ec46e8609@tarantool.org> On Mon, Sep 13, 2021 at 11:52:39AM +0300, Serge Petrenko wrote: > > > > Or maybe better via email. Serge could you please write the details here? > > It would be easier to discuss this verbally, I think. Verbal meetings are good indeed, but maybe I could summarize all the problems found so far and imprint them here. Guys, please comment, I would really appreciate. Terms accessing ordering ------------------------ We've found that fibers can read old terms which are already updated but not yet written into WAL. To address this we order term reading so appliers will wait until write to WAL is complete. While everyone is agree that there is an issue and ordering solves it we're not yet complete clear about internal design. I proposed to use explicit locking via txn_limbo_term_lock/unlock calls. The calls are used inside applier's code apply_synchro_row txn_limbo_term_lock journal_write txn_limbo_term_unlock the key moment is journal_write() call which queues completion to run and the completion code is called from inside of sched() fiber, ie not the fiber which took the lock (and such lock migration is prohibited in our latch-lock engine). The propose was to hide the locking mechanism inside limbo internals code completely, so the callers won't know about it. When I tried to make so I hit the problem with lock context migration and had to step back to use explicit locks as in code above. Still Vlad's question remains | 2. As for your complaint about the begin/commit/rollback API | being not working because you can't unlock from a non-owner | fiber - well, it works in your patch somehow, doesn't it? I already explained why it works with explicit locking. https://lists.tarantool.org/tarantool-patches/YT8tZ0CuIDKwzcC4@grain/ In short - we take and release the lock in same context. | | Why do you in your patch unlock here, but in the newly proposed | API you only tried to unlock in the trigger? Because our commit/rollback are called from inside of sched() fiber, and we will have to provide some helper like completion of completion where second completion will be called from inside of applier context to unlock terms. As to me this is a way more messy than explicit locking scheme. | | You could call commit/rollback from this function, like you | do with unlock now. This moment I don't understand. We already have commit/rollback helpers, so I ask Vlad to write some pseudocode, to figure out what exactly you have in mind. Limbo's confirmed_lsn update upon CONFIRM request read ------------------------------------------------------ Currently we update limbo::confirmed_lsn when node _writes_ this request into the WAL. This is done on limbo owner node only, ie transaction initiator. In result when the node which has not been leader at all takes limbo ownership it sends own "confirmed_lsn = 0" inside PROMOTE request, and when this request reaches previous leader node we don't allow to proceed (due to our filtration rules where we require the LSN to be > than current confirmed_lsn). Also Serge pointed out a) | Confirmed_lsn wasn't tracked during recovery and while | following a master. So, essentially, only the living master could | detect splitbrains by comparing confirmed_lsn to something else. b) | Say, a pair of CONFIRM requests is written, with lsns | N and N+1. So you first enter write_confirm(N), then | write_confirm(N+1). Now both fibers issuing the requests yield | waiting for the write to happen, and confirmed_lsn is N+1. | | Once the first CONFIRM (N) is written, you reset confirmed_lsn to N | right in read_confirm. | | So until the second CONFIRM (N+1) is written, there's a window | when confirmed_lsn is N, but it should be N+1. | | I think read_confirm should set confirmed_lsn on replica only. | On master this task is performed by write_confirm. | You may split read_confirm in two parts: | - set confirmed lsn (used only on replica) and | - apply_confirm (everything read_confirm did before your patch) Thus I seems need to rework this aspect. Update confirmed_lsn on first PROMOTE request arrival ----------------------------------------------------- Detailed explanation what I've seen is there https://lists.tarantool.org/tarantool-patches/YT5+YqCJuAh0HAQg@grain/ I must confess I don't like much this moment as well since this is a bit vague point for me so we gonna look into it soon on a meeting. Filtration procedure itself (split detector) ------------------------------------------- When CONFIRM or ROLLBACK packet comes in it is not enough to test for limbo emptiness only. We should rather traverse the queue and figure out if LSN inside the packet belongs to the current queue. So the *preliminary* conclusion is the following: when CONFIRM or ROLLBACK is coming in a) queue is empty -- then such request is invalid and we should exit with error b) queue is not empty -- then LSN should belong to a range covered by the queue c) it is unclear how to test this scenario Filtration disabling for joining and local recovery --------------------------------------------------- When joining or recovery happens the limbo is in empty state then our filtration start triggering false positives. For example > autobootstrap1.sock I> limbo: filter PROMOTE replica_id 0 origin_id 0 > term 0 lsn 0, queue owner_id 0 len 0 promote_greatest_term 0 confirmed_lsn 0 This is because we require the term to be nonzero when cluster is running. /* * PROMOTE and DEMOTE packets must not have zero * term supplied, otherwise it is a broken packet. */ if (req->term == 0) { say_error("%s. Zero term detected", reject_str(req)); diag_set(ClientError, ER_CLUSTER_SPLIT, "Request with zero term"); return -1; } If we want to not disable filtration at all then we need to introduce some state machine which would cover initial -> working state. I think better to start with simpler approach where we don't verify data on join/recovery and then extend filtration if needed.
next prev parent reply other threads:[~2021-09-13 14:20 UTC|newest] Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-10 15:29 [Tarantool-patches] [PATCH v14 0/6] qsync: implement packets filtering Cyrill Gorcunov via Tarantool-patches 2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 1/6] qsync: track confirmed lsn number on reads Cyrill Gorcunov via Tarantool-patches 2021-09-12 15:44 ` Vladislav Shpilevoy via Tarantool-patches 2021-09-12 22:18 ` Cyrill Gorcunov via Tarantool-patches 2021-09-13 8:33 ` Serge Petrenko via Tarantool-patches 2021-09-13 8:50 ` Serge Petrenko via Tarantool-patches 2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 2/6] qsync: update confirmed lsn on initial promote request Cyrill Gorcunov via Tarantool-patches 2021-09-12 15:44 ` Vladislav Shpilevoy via Tarantool-patches 2021-09-12 22:25 ` Cyrill Gorcunov via Tarantool-patches 2021-09-13 8:52 ` Serge Petrenko via Tarantool-patches 2021-09-13 14:20 ` Cyrill Gorcunov via Tarantool-patches [this message] 2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 3/6] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches 2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 4/6] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches 2021-09-12 15:44 ` Vladislav Shpilevoy via Tarantool-patches 2021-09-13 10:52 ` Cyrill Gorcunov via Tarantool-patches 2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 5/6] qsync: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches 2021-09-12 15:44 ` Vladislav Shpilevoy via Tarantool-patches 2021-09-14 19:41 ` Cyrill Gorcunov via Tarantool-patches 2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 6/6] test: add replication/gh-6036-rollback-confirm Cyrill Gorcunov via Tarantool-patches 2021-09-12 15:44 ` Vladislav Shpilevoy via Tarantool-patches 2021-09-12 15:43 ` [Tarantool-patches] [PATCH v14 0/6] qsync: implement packets filtering Vladislav Shpilevoy via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YT9ePZXvVuUZvW0f@grain \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [RFC] qsync: overall design' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox