From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 32CB87030D; Mon, 13 Sep 2021 17:20:50 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 32CB87030D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1631542850; bh=SXFJLYp9mafsyyh2NR9rEuC22endV+DDgDJ6jFQUrco=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=r/cKgNcay8AjgzcxvYRM18nLcdo+hXPGUt0fNuuwvUvYi/keohpftcnKDmTTZR/s2 Or7TDlVvPsj3shdU5x8G+OUni4x1p3wGzzJGtualXgT2J1P7wQXyqLG9FSefvDfCZD U+K0NjBIYr1tzf787SCFo4Dso5uwZepzn4nzt88o= Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 10B557030D for ; Mon, 13 Sep 2021 17:20:49 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 10B557030D Received: by mail-lf1-f45.google.com with SMTP id i7so5308918lfr.13 for ; Mon, 13 Sep 2021 07:20:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=tgXbGEeJD5mH829V4sohygr1NHGOuxpS1ZL8aNh1oTI=; b=MFDeF+8aXsgyBpeSa8hZIVNBN5FU1/zU32ZEyGDZdbgKPrCmLeeMc49rk41QHo/W45 X5bXxkgXRDqbJ/psmY6dHsImrsN1m4Hl1CEALrDorEgkpMPesnbuhwJqN0PUqXcx+o1X 9DKJWes/RuI+WHsMXC5wS5Aly0vjOozHfP8LRL7xRkQoftzoERLBfQmDHawWQM6KlCtW bj4DCCAUtb+eUAM0017ToFRRPB55yfWP+NpJZ+9HSU1jo9KERFboUQwzmz3cfvTMmVdX djE99QsrrPMV3RWX1ypXAG5MHej14VeKaW8lYFhOpJe4jUoQqO57lWbhXV76B/AIxc4J pUDg== X-Gm-Message-State: AOAM533oCjHpR0ofuRy8AoQrOY+UvwTkYoeTgkp/TKnckjJz0gdhxgtG t6baPwTjsqIx4iU0+QDe9g+1dEWDsdNbSg== X-Google-Smtp-Source: ABdhPJybXQqzV4OcBq7HmmotOG7DhJl5aoitNb6cU3tXUC0xRVKauA1aOy4AQGX94vHgwykT6RhFZA== X-Received: by 2002:a05:6512:e88:: with SMTP id bi8mr3013771lfb.328.1631542847727; Mon, 13 Sep 2021 07:20:47 -0700 (PDT) Received: from grain.localdomain ([5.18.253.97]) by smtp.gmail.com with ESMTPSA id n5sm1002665ljj.97.2021.09.13.07.20.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Sep 2021 07:20:46 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id 12B8A5A001E; Mon, 13 Sep 2021 17:20:46 +0300 (MSK) Date: Mon, 13 Sep 2021 17:20:45 +0300 To: Serge Petrenko , Vladislav Shpilevoy Message-ID: References: <20210910152910.607398-1-gorcunov@gmail.com> <20210910152910.607398-3-gorcunov@gmail.com> <17e39694-2f8b-da02-6009-c97ec46e8609@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17e39694-2f8b-da02-6009-c97ec46e8609@tarantool.org> User-Agent: Mutt/2.0.7 (2021-05-04) Subject: [Tarantool-patches] [RFC] qsync: overall design X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Cc: tml Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Mon, Sep 13, 2021 at 11:52:39AM +0300, Serge Petrenko wrote: > > > > Or maybe better via email. Serge could you please write the details here? > > It would be easier to discuss this verbally, I think. Verbal meetings are good indeed, but maybe I could summarize all the problems found so far and imprint them here. Guys, please comment, I would really appreciate. Terms accessing ordering ------------------------ We've found that fibers can read old terms which are already updated but not yet written into WAL. To address this we order term reading so appliers will wait until write to WAL is complete. While everyone is agree that there is an issue and ordering solves it we're not yet complete clear about internal design. I proposed to use explicit locking via txn_limbo_term_lock/unlock calls. The calls are used inside applier's code apply_synchro_row txn_limbo_term_lock journal_write txn_limbo_term_unlock the key moment is journal_write() call which queues completion to run and the completion code is called from inside of sched() fiber, ie not the fiber which took the lock (and such lock migration is prohibited in our latch-lock engine). The propose was to hide the locking mechanism inside limbo internals code completely, so the callers won't know about it. When I tried to make so I hit the problem with lock context migration and had to step back to use explicit locks as in code above. Still Vlad's question remains | 2. As for your complaint about the begin/commit/rollback API | being not working because you can't unlock from a non-owner | fiber - well, it works in your patch somehow, doesn't it? I already explained why it works with explicit locking. https://lists.tarantool.org/tarantool-patches/YT8tZ0CuIDKwzcC4@grain/ In short - we take and release the lock in same context. | | Why do you in your patch unlock here, but in the newly proposed | API you only tried to unlock in the trigger? Because our commit/rollback are called from inside of sched() fiber, and we will have to provide some helper like completion of completion where second completion will be called from inside of applier context to unlock terms. As to me this is a way more messy than explicit locking scheme. | | You could call commit/rollback from this function, like you | do with unlock now. This moment I don't understand. We already have commit/rollback helpers, so I ask Vlad to write some pseudocode, to figure out what exactly you have in mind. Limbo's confirmed_lsn update upon CONFIRM request read ------------------------------------------------------ Currently we update limbo::confirmed_lsn when node _writes_ this request into the WAL. This is done on limbo owner node only, ie transaction initiator. In result when the node which has not been leader at all takes limbo ownership it sends own "confirmed_lsn = 0" inside PROMOTE request, and when this request reaches previous leader node we don't allow to proceed (due to our filtration rules where we require the LSN to be > than current confirmed_lsn). Also Serge pointed out a) | Confirmed_lsn wasn't tracked during recovery and while | following a master. So, essentially, only the living master could | detect splitbrains by comparing confirmed_lsn to something else. b) | Say, a pair of CONFIRM requests is written, with lsns | N and N+1. So you first enter write_confirm(N), then | write_confirm(N+1). Now both fibers issuing the requests yield | waiting for the write to happen, and confirmed_lsn is N+1. | | Once the first CONFIRM (N) is written, you reset confirmed_lsn to N | right in read_confirm. | | So until the second CONFIRM (N+1) is written, there's a window | when confirmed_lsn is N, but it should be N+1. | | I think read_confirm should set confirmed_lsn on replica only. | On master this task is performed by write_confirm. | You may split read_confirm in two parts: | - set confirmed lsn (used only on replica) and | - apply_confirm (everything read_confirm did before your patch) Thus I seems need to rework this aspect. Update confirmed_lsn on first PROMOTE request arrival ----------------------------------------------------- Detailed explanation what I've seen is there https://lists.tarantool.org/tarantool-patches/YT5+YqCJuAh0HAQg@grain/ I must confess I don't like much this moment as well since this is a bit vague point for me so we gonna look into it soon on a meeting. Filtration procedure itself (split detector) ------------------------------------------- When CONFIRM or ROLLBACK packet comes in it is not enough to test for limbo emptiness only. We should rather traverse the queue and figure out if LSN inside the packet belongs to the current queue. So the *preliminary* conclusion is the following: when CONFIRM or ROLLBACK is coming in a) queue is empty -- then such request is invalid and we should exit with error b) queue is not empty -- then LSN should belong to a range covered by the queue c) it is unclear how to test this scenario Filtration disabling for joining and local recovery --------------------------------------------------- When joining or recovery happens the limbo is in empty state then our filtration start triggering false positives. For example > autobootstrap1.sock I> limbo: filter PROMOTE replica_id 0 origin_id 0 > term 0 lsn 0, queue owner_id 0 len 0 promote_greatest_term 0 confirmed_lsn 0 This is because we require the term to be nonzero when cluster is running. /* * PROMOTE and DEMOTE packets must not have zero * term supplied, otherwise it is a broken packet. */ if (req->term == 0) { say_error("%s. Zero term detected", reject_str(req)); diag_set(ClientError, ER_CLUSTER_SPLIT, "Request with zero term"); return -1; } If we want to not disable filtration at all then we need to introduce some state machine which would cover initial -> working state. I think better to start with simpler approach where we don't verify data on join/recovery and then extend filtration if needed.