From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id C43CD6EC58; Tue, 3 Aug 2021 16:50:04 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C43CD6EC58 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1627998604; bh=twQOUJff9gAPLv6PAvxc5zj6XFgkiLg6YMnz4VzTdWo=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=d5eZOtWJ4qEoD3zO1GKXG/4QMTIun+7jhFJqtslzkzMBEpaEu39YCswvftpDxpJZU IrojgvgZ7gG/EcNllaUryh6dVf2r25s/Sss6PHYQFcGo90I4XXWdn123I7xcIaj/Nt MAwJ7DyCe/Vz7xbRMYxBKq/l10TnHrpR8Yqf+Mvs= Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com [209.85.167.46]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 946946EC58 for ; Tue, 3 Aug 2021 16:50:02 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 946946EC58 Received: by mail-lf1-f46.google.com with SMTP id a26so39776126lfr.11 for ; Tue, 03 Aug 2021 06:50:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=kQh+LXTzOI7lQjh+SLw3547z/rJfOnxI5YMkQGF8dHc=; b=XP6SZihHgW2sHPDCs1IFcrNgPrboUxfOao7qb5wQLvRWuuar1WDcutA7+qZwEZ4dgE TDh8pVkVWe8gI/adEW6vAhQf6iXjKKSuPie9z5ORkB7MqCDIcW6wNotE/DPjD9Dyxcym xHREmXA495pQ0MT8S2T8HBdCzsX3uj5XiiJ9XUjcb13gdF6qgA5u8psbh4yImFpiLciX qxNPZmSdiZqxiF9YFfX+3gaL+8qnIpeJvz8AAAdzYDMgEUVy3pn+TJJ9i3gCNUOQyCZn ypaB3QpKQ7GAhfsG3qm9p92bU4uy/PpfvCJzp9KxSl4IThwkFq5+RxJA8aJm31CsqG0/ 2ogg== X-Gm-Message-State: AOAM533jbc7rqV3JnxVUGikEo0uNh3LqjOne1LIE+HzaLXc+gqGQj63P z42/zdru2+2wQ3L2x70Pm2Ta/8d4AL0= X-Google-Smtp-Source: ABdhPJzKBJnizU7v4zd2Zy+Non9Oh+X9gI1LIJxJXMolNjmJ8spVsCj/TgGygK83DkjFmmxBmGF9vw== X-Received: by 2002:ac2:446d:: with SMTP id y13mr10913727lfl.632.1627998601248; Tue, 03 Aug 2021 06:50:01 -0700 (PDT) Received: from grain.localdomain ([5.18.255.97]) by smtp.gmail.com with ESMTPSA id e21sm1371120lfq.240.2021.08.03.06.50.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Aug 2021 06:50:00 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id D95505A001E; Tue, 3 Aug 2021 16:49:59 +0300 (MSK) Date: Tue, 3 Aug 2021 16:49:59 +0300 To: Serge Petrenko Cc: tml , Vladislav Shpilevoy Message-ID: References: <20210730113539.563318-1-gorcunov@gmail.com> <20210730113539.563318-5-gorcunov@gmail.com> <6e48b22a-aedc-c670-3050-29bd4e07b48e@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6e48b22a-aedc-c670-3050-29bd4e07b48e@tarantool.org> User-Agent: Mutt/2.0.7 (2021-05-04) Subject: Re: [Tarantool-patches] [PATCH v9 4/5] limbo: filter incoming synchro requests X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Tue, Aug 03, 2021 at 01:51:44PM +0300, Serge Petrenko wrote: > > > + if (req->lsn != 0 || > > + !iproto_type_is_promote_request(req->type)) { > > + say_info("RAFT: rejecting %s request from " > > + "instance id %u for term %llu. " > > + "req->replica_id = 0 but lsn %lld.", > > + iproto_type_name(req->type), > > + req->origin_id, (long long)req->term, > > + (long long)req->lsn); > > + > > + diag_set(ClientError, ER_UNSUPPORTED, > > + "Replication", > > + "empty replica_id with nonzero LSN"); > > + return -1; > > + } > > + } > > I agree with Vlad. This may be moved to filter_confirm_rollback. You know, as I replied to Vlad, I think we should do a common attibutes verification in early stage, because not only due to network error but for security reasons as well, ie say some fuzzer in network could feed us with screwed data, and when we sort packets via chains to process we could do an early verification first. But I won't insist, surely I can move it to promote chain. > > + > > + /* > > + * Explicit split brain situation. Promote > > + * comes in with an old LSN which we've already > > + * processed. > > + */ > > + if (limbo->confirmed_lsn > promote_lsn) { > > + /* > > + * If limbo is empty we're migrating > > + * the owner. > > + */ > > + if (txn_limbo_is_empty(limbo)) > > + return 0; > > I don't understand this part. Are you sure this check is needed? > We're always migrating the owner with a promote. Serge, I think this code hunk is due to my misunderstanding of owner migration, should remove it I think, thanks! > > + > > + if (txn_limbo_is_empty(limbo)) { > > + /* > > + * Transactions are already rolled back > > + * since the limbo is empty. > > + */ > > + say_info("RAFT: rejecting %s request from " > > + "instance id %u for term %llu. " > > + "confirmed_lsn %lld < promote_lsn %lld " > > + "and empty limbo.", > > + iproto_type_name(req->type), > > + req->origin_id, (long long)req->term, > > + (long long)limbo->confirmed_lsn, > > + (long long)promote_lsn); > > + > > + diag_set(ClientError, ER_UNSUPPORTED, > > + "Replication", > > + "forward promote LSN " > > + "(empty limbo, split brain)"); > > + return -1; > > I think it'd be better to have a separate error code for this purpose. > Say, ER_SPLITBRAIN or something. > Then applier would have more control over what to do when such an error > is raised. Say, never reconnect. (I doesn't reconnect on ER_UNSUPPORTED, > I believe, but a distinct error is still better). Split-bain is one particular case, since more cases why we refuse to proceed may appear in future maybe ER_REJECT with rejection error (just like netfilter does?) > > + } else { > > + /* > > + * Some entries are present in the limbo, > > + * and if first entry's LSN is greater than > > + * requested then old data either commited > > + * or rolled back, so can't continue. > > + */ > > + struct txn_limbo_entry *first; > > + > > + first = txn_limbo_first_entry(limbo); > > + if (first->lsn > promote_lsn) { > > This should only happen when confirmed_lsn > promote_lsn, shouldn't it? > If yes, than you've already handled it above. Nope. This is the case where the data is still in limbo and we've not yet commited it but master already commited/rolledback it already. If only I'm not misssing something obvious. > > + > > +static int (*filter_req[FILTER_MAX]) > > +(struct txn_limbo *limbo, const struct synchro_request *req) = { > > + [FILTER_IN] = filter_in, > > + [FILTER_CONFIRM] = filter_confirm_rollback, > > + [FILTER_ROLLBACK] = filter_confirm_rollback, > > + [FILTER_PROMOTE] = filter_promote, > > + [FILTER_DEMOTE] = filter_demote, > > Demote should be filtered the same way as promote, they are > basically the same requests with same meaning. > > demote is just a promote for replica id 0, because we couldn't > do promote replica id 0. Aha, thanks, will update. > > - if (req->replica_id == REPLICA_ID_NIL) { > > - /* > > - * The limbo was empty on the instance issuing the request. > > - * This means this instance must empty its limbo as well. > > - */ > > - assert(lsn == 0 && iproto_type_is_promote_request(req->type)); > > - } else if (req->replica_id != limbo->owner_id) { > > + if (req->replica_id != limbo->owner_id) { > > /* > > * Ignore CONFIRM/ROLLBACK messages for a foreign master. > > * These are most likely outdated messages for already confirmed > > We should error when request -> replica_id != limbo->owner_id. > For every entry type: promote/demote/confirm/rollback. > > req->replica_id != limbo->owner_id means that the remote instance has > taken some actions in the past (say, confirmed something) of which we > didn't know until now. This is basically a splitbrain again. Good point. I think we should put it into filter-in chain then. Cyrill