From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 215486EC40; Mon, 9 Aug 2021 01:35:59 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 215486EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1628462159; bh=KrV9p5SdHP3bZniK80ROpS7aZQKS7UAe9xnlXBy74ss=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=q1Qf6QA2+nRCmbaxizTAqpgHC7nZe83t7wZbitp4hMraofl2LpkuqxxGSIWF6H5nA 1H1EwJQlMOUBgBU8UFGr/Bfq3qCQS0IQdt4bRc6i8cTgF8P8rrTvqJxF0lSrKH2H7y wtQpP3OAtdQ2MivcjWsPtSkpXzQyyITPYWrffAbg= Received: from mail-lj1-f169.google.com (mail-lj1-f169.google.com [209.85.208.169]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 0EE7E6EC40 for ; Mon, 9 Aug 2021 01:35:58 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 0EE7E6EC40 Received: by mail-lj1-f169.google.com with SMTP id h2so8682679lji.6 for ; Sun, 08 Aug 2021 15:35:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=BtJKd398ShCJEwoFoc7d9tD1gHuxA27nt9HvyCEpBsg=; b=oE8beybv/pazyl5ly+/pW8I3yh975au2HWwe3ndbSWObvnLXAtD5itepp0z2upWyBs bulnKKGTJhALXmGSIxPG8ZHCGEYgsJkXhKUbajj7hYDncvK/WjhvDXI1Cp43aIqpycH2 5rKusF+vwIpN5XNfeO6i30HkNkWgGUWvnqYuOSkxFMaSoNKIy+f+AGXLrYtf6mMSh64g Ww0Ujh8RJJuOfRwno7BOv3L+TFb0CZ9Aa0OPSU4N5RZ1rvjTKn+gq7IHecchzLzb+dOU zBB8A9hOQUFaVxvIipN2gOzMcyFOOwRIvYVDLeiKof0xLZtyFX8eBujsyR8q2xAGsimT 7TQA== X-Gm-Message-State: AOAM531/7eJFH+PxEBR80KWaS4nmO48iqdakbNtm17tSBXKTHjKLO4oN +3LaiwZ5mqfHvw/F4T5NsmmEEPSJL2AALw== X-Google-Smtp-Source: ABdhPJzPIK4B2NXVYUQJIPDCJowHB1pmGqo0icZVYV7Jw0U2b6cQcZ4HCLv7zmKvO6fjYasvNSCwgw== X-Received: by 2002:a2e:a4d5:: with SMTP id p21mr1934986ljm.214.1628462157084; Sun, 08 Aug 2021 15:35:57 -0700 (PDT) Received: from grain.localdomain ([5.18.253.97]) by smtp.gmail.com with ESMTPSA id h16sm1432003ljg.46.2021.08.08.15.35.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Aug 2021 15:35:56 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id B5E185A001E; Mon, 9 Aug 2021 01:35:55 +0300 (MSK) Date: Mon, 9 Aug 2021 01:35:55 +0300 To: Vladislav Shpilevoy Message-ID: References: <20210804190752.488147-1-gorcunov@gmail.com> <20210804190752.488147-4-gorcunov@gmail.com> <7e881b83-478c-6bc0-615d-ade811da8471@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7e881b83-478c-6bc0-615d-ade811da8471@tarantool.org> User-Agent: Mutt/2.0.7 (2021-05-04) Subject: Re: [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Cc: tml Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Sun, Aug 08, 2021 at 02:43:14PM +0300, Vladislav Shpilevoy wrote: > >>> + txn_limbo_filter_disable(&txn_limbo); > >>> + auto filter_guard = make_scoped_guard([&]{ > >>> + txn_limbo_filter_enable(&txn_limbo); > >>> + }); > >> > >> 3. Why do you need to enable/disabled the filter here? Shouldn't snapshot > >> contain only valid data? Moreover, AFAIU it can't contain any limbo > >> rows at all. The limbo snapshot is sent separately, but the data flow > >> does not have anything except pure data. The same for the > >> join. > > > > The idea is that snapshot/recovery has valid data which forms the initial > > limbo state versus which we will be apply filtering. > > You didn't answer the question really. Why do you need the filtering > here if all the data is correct anyway? Will it all work if I just > drop this filter disable from here? Vlad, I answered to this but a bit later in the reply | Actually this is a good question. I've to recheck this moment because in | previous series when I ran join/recovery with filtering enabled sometime | I've an issues where filter didnt pass. Gimme some time, maybe we will | all this and manage to keep filtering all the time. I saw errors in test (not in our new test but in rpelication/ tests). And I need to figure out with more attention about the stages where I must disable the filtering and where I can leave filtering turned on. In all stages (local recovery, initial and final joins). Once I recheck everything again I'll come back with precise reply. > >> And how is it related to applier_register below? It does not download > >> any data at all, does it? > > > > After register stage is complete we catch up with lates not yet downloaded > > data (final join stage) where we still assume that the data received is > > valid and do not verify it. > > Register just makes the master give you a unique ID. It does not send > any data like joins do, AFAIR. Does it work if you drop the filter disable > from here? I suspect I overdid in paranoia, will recheck. > > Actually this is a good question. I've to recheck this moment because in > > previous series when I ran join/recovery with filtering enabled sometime > > I've an issues where filter didnt pass. Gimme some time, maybe we will > > all this and manage to keep filtering all the time. > > > >>> + > >>> +/** > >>> + * Common chain for any incoming packet. > >>> + */ > >>> +static int > >>> +filter_in(struct txn_limbo *limbo, const struct synchro_request *req) > >>> +{ > >>> + (void)limbo; > >> > >> 6. So you have the filtering enabled dynamically in the limbo, but > >> you do not use the limbo here? Why? Maybe at least add an assertion > >> that the filter is enabled? > > > > All chains are having same interface it is just happen that for common > > filter I don't need to use limbo. I could add some operations here > > but not sure if it worth it. As far as I see leave unused args is > > pretty fine in our code base. > > You didn't answer the second question: > > Maybe at least add an assertion that the filter is enabled? I did | I could add some operations here but not sure if it worth it. Letme state it clear then - I could add this assert() if you insist but I think we aready spread too many assertions all over the code, and if it is possible I would be glad not to add new ones. After all either we should add this assert() to each filter chain or not add at all, otherwise there will be kind of code imbalance. > >>> +/** > >>> + * Filter CONFIRM and ROLLBACK packets. > >>> + */ > >>> +static int > >>> +filter_confirm_rollback(struct txn_limbo *limbo, > >>> + const struct synchro_request *req) > >>> +{ > >>> + /* > >>> + * When limbo is empty we have nothing to > >>> + * confirm/commit and if this request comes > >>> + * in it means the split brain has happened. > >>> + */ > >>> + if (!txn_limbo_is_empty(limbo)) > >>> + return 0; > >> > >> 9. What if rollback is for LSN > limbo's last LSN? It > >> also means nothing to do. The same for confirm LSN < limbo's > >> first LSN. > > > > static void > > txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn) > > { > > --> assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo)); > > > > txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn) > > { > > --> assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo)); > > > > Currently we're allowed to process empty limbo if only owner is not nil, > > I think I should add this case here. > > My question is not about the owner ID. I asked what if rollback/confirm > try to cover a range not present in the limbo while it is not empty. If > it is not empty, it has an owner obviously. But it does not matter. > What if it has an owner, has transactions, but you got ROLLBACK/CONFIRM > for data out of the LSN range present in the limbo? Since the terms are matching I think such scenarion should be fine, right? IOW, some old replica has been stopped for some reason and been living out of quorum for some time thus such requests should be considered as OK to pass and when filter accepts them the will reach txn_limbo_read_confirm or txn_limbo_read_rollback where they will be simply ignored as far as I unrestand. IOW, such requests are valid, no? > > > > Good catch, typo. Actually I've updated this hunk locally > > but didn't pushed out. We need "first <= promote <= last" > > Is it covered with a test? Not yet, to test _each_ filter condition we've a separate ticket which I didn't implement yet (split-brain common bug which all about tests). ... > > > > Kind of this? > > Yes! It is much simpler and still easy to extend. Please, > just try and you will see how much simpler it is. OK