From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 162F66EC40; Tue, 10 Aug 2021 17:36:07 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 162F66EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1628606167; bh=kIQRisy8TSRIJN8QOE7SE7ep8pRuxtlS+EKYtm544cQ=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=W7MRS2rMeH14A7xFAUUGvak5GJcqrEA6s8jwgaiUYtiK3DGPFo5lOguqsxDvDI05s pjwQUmjZp1jaXpasMvc5ltC+B58UuXlbVqKuzthaFurhnGjn1kBcklL28MLi7Qso9l ePkLDzrJaY0uOUslFHBLfAIBWHA57mwSrHmOv6CQ= Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id B8B5F6EC40 for ; Tue, 10 Aug 2021 17:36:05 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B8B5F6EC40 Received: by mail-lf1-f45.google.com with SMTP id z20so4852333lfd.2 for ; Tue, 10 Aug 2021 07:36:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=2oHwUZAKKEt+CWYcTf5v86ESeAbnUVwKlbvPjVg3Ng4=; b=ETEvorXGZG8DJDlz/CzvInG3I1R4vc/UxlbzgsP9m3pSNd6Myn8yWXQMVNrzOQ4Hrf GUuxwcBXPrjxOKlEZ0SNOuBkaUz3sbMxheKUXiuOxQrpVBnsXXFwhZge4ISw9NGLkq3n /OS3vz+amHvo2R0gn3jciaxvwzWBfd1sfbTCNoIoxVQ/e9IKAz4hHzLde8aqXMuHixRd o95B+EEh8qItal7NpNHgOGLFPV8ttO3FSOCsKd/3/S+Z+1XakWy1W1bcbUaLUelQLTWd YMJfxQaO+xQYF2U+1X7+faJOeatZUgs4dvf1xq+BnGy5u0Bi2hA+yTZ4gLy4h64njUho fudg== X-Gm-Message-State: AOAM531sbw0iFOvzUpkkU6JW8LqWvlClxEulB3LUbylP8ulPEMQIu8/b lFSDCPFXpO/4kfZFmJYW7rTaHitU2AqFng== X-Google-Smtp-Source: ABdhPJySpTauyi6y9ggS71SLFMBHaVZIl05CHjFIm5BdnhJ794jrpcYG2beO3cErl5FZo4YIOSk7DQ== X-Received: by 2002:a19:f20d:: with SMTP id q13mr9485507lfh.63.1628606164642; Tue, 10 Aug 2021 07:36:04 -0700 (PDT) Received: from grain.localdomain ([5.18.253.97]) by smtp.gmail.com with ESMTPSA id z8sm1233700ljn.11.2021.08.10.07.36.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Aug 2021 07:36:02 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id 3F2015A001E; Tue, 10 Aug 2021 17:36:02 +0300 (MSK) Date: Tue, 10 Aug 2021 17:36:02 +0300 To: Vladislav Shpilevoy Message-ID: References: <20210804190752.488147-1-gorcunov@gmail.com> <20210804190752.488147-4-gorcunov@gmail.com> <7e881b83-478c-6bc0-615d-ade811da8471@tarantool.org> <2f3274e6-80c4-1d07-b739-c8d758447fe1@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2f3274e6-80c4-1d07-b739-c8d758447fe1@tarantool.org> User-Agent: Mutt/2.0.7 (2021-05-04) Subject: Re: [Tarantool-patches] [PATCH v10 3/4] limbo: filter incoming synchro requests X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Cc: tml Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Tue, Aug 10, 2021 at 03:31:04PM +0300, Vladislav Shpilevoy wrote: > > > > | I could add some operations here but not sure if it worth it. > > > > Letme state it clear then - I could add this assert() if you insist > > but I think we aready spread too many assertions all over the code, > > and if it is possible I would be glad not to add new ones. After all > > either we should add this assert() to each filter chain or not add > > at all, otherwise there will be kind of code imbalance. > > What is wrong with the assertions that you don't like adding them? > You add panics quite often, and they cost some perf. But asserts > just help to catch bugs and cost nothing in Release build. I personally think that either some particular condition is critical so that you can't continue execution if it failed and because of this it must be tested even in release builds. And here panic() is needed. Or it is not critical and we don't need assert(). In particular for filtering case if we ocasionally called it where should not then it might trigger a false positive error breaking the replication but not corrupting data, and in such case it is ok and no assertion is needed. In reverse case, say enabling filtering in wrong place would cause data corruption then we need a panic not assert. So I don't see much point in assert calls at all. Surely I can add it if you prefer. Simply don't like. You know, we've been talking with Serge today about enabling filtering all the time because this looks pretty fishy that I do turn it on/off. So I'm working on removing this code and the question with assert will disappear on its own. > >>>>> +static int > >>>>> +filter_confirm_rollback(struct txn_limbo *limbo, > >>>>> + const struct synchro_request *req) > >>>>> +{ > >>>>> + /* > >>>>> + * When limbo is empty we have nothing to > >>>>> + * confirm/commit and if this request comes > >>>>> + * in it means the split brain has happened. > >>>>> + */ > >>>>> + if (!txn_limbo_is_empty(limbo)) > >>>>> + return 0; > >>>> > >>>> 9. What if rollback is for LSN > limbo's last LSN? It > >>>> also means nothing to do. The same for confirm LSN < limbo's > >>>> first LSN. > >>> > >>> static void > >>> txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn) > >>> { > >>> --> assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo)); > >>> > >>> txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn) > >>> { > >>> --> assert(limbo->owner_id != REPLICA_ID_NIL || txn_limbo_is_empty(limbo)); > >>> > >>> Currently we're allowed to process empty limbo if only owner is not nil, > >>> I think I should add this case here. > >> > >> My question is not about the owner ID. I asked what if rollback/confirm > >> try to cover a range not present in the limbo while it is not empty. If > >> it is not empty, it has an owner obviously. But it does not matter. > >> What if it has an owner, has transactions, but you got ROLLBACK/CONFIRM > >> for data out of the LSN range present in the limbo? > > > > Since the terms are matching I think such scenarion should be fine, right? > > IOW, some old replica has been stopped for some reason and been living out > > of quorum for some time thus such requests should be considered as OK to > > pass and when filter accepts them the will reach txn_limbo_read_confirm > > or txn_limbo_read_rollback where they will be simply ignored as far as I > > unrestand. IOW, such requests are valid, no? > > If a replica is outdated, it should not matter. It will receive the needed > data in order anyway. Like if the data was just sent. Hence, it seems > irrelevant whether it is outdated. And still looks the same as the thing > you are trying to filter out (when the limbo is empty = confirm/rollback > do not cover anything too). Wait, Vlad, I don't understand. When packet comes in we verify for terms matching, if it doesn't match then we drop the request with error. Now assume the term is valid and we get confirm/rollback over already processed entry. Initially I though it is an error due to split-brain because there is no data in limbo which we can compare against. Then I looked into txn_limbo_read_confirm and the code silently passes if queue is empty so I presumed that I simply need to convert the assert() above into the real verification condition. And after your reply I confused again. Assume I'm a replica and have no data in limbo, if I obtain some confirm/rollback it means the master node did some transactions behind my back so I should refuse to proceed and refetch all data again, right? Another scenario is that I'm the leader node sent some transactions then gathered the quorum and make limbo empty, at some moment the replica will send me confirm packet back and I should simply advance the vclock and ignore this packet, correct?