From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f66.google.com (mail-lf1-f66.google.com [209.85.167.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 304F9469719 for ; Mon, 14 Sep 2020 15:27:30 +0300 (MSK) Received: by mail-lf1-f66.google.com with SMTP id z19so13214822lfr.4 for ; Mon, 14 Sep 2020 05:27:30 -0700 (PDT) Date: Mon, 14 Sep 2020 15:27:27 +0300 From: Cyrill Gorcunov Message-ID: <20200914122727.GC991329@grain> References: <4a6849b67f690d2df3a09911110809149d67f15c.1599931123.git.v.shpilevoy@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4a6849b67f690d2df3a09911110809149d67f15c.1599931123.git.v.shpilevoy@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org On Sat, Sep 12, 2020 at 07:25:55PM +0200, Vladislav Shpilevoy wrote: > Previously XlogGapError was considered a critical error stopping > the replication. That may be not so good as it looks. > > XlogGapError is a perfectly fine error, which should not kill the > replication connection. It should be retried instead. > > Because here is an example, when the gap can be recovered on its > own. Consider the case: node1 is a leader, it is booted with > vclock {1: 3}. Node2 connects and fetches snapshot of node1, it > also gets vclock {1: 3}. Then node1 writes something and its > vclock becomes {1: 4}. Now node3 boots from node1, and gets the > same vclock. Vclocks now look like this: > > - node1: {1: 4}, leader, has {1: 3} snap. > - node2: {1: 3}, booted from node1, has only snap. > - node3: {1: 4}, booted from node1, has only snap. > > If the cluster is a fullmesh, node2 will send subscribe requests > with vclock {1: 3}. If node3 receives it, it will respond with > xlog gap error, because it only has a snap with {1: 4}, nothing > else. In that case node2 should retry connecting to node3, and in > the meantime try to get newer changes from node1. > > The example is totally valid. However it is unreachable now > because master registers all replicas in _cluster before allowing > them to make a join. So they all bootstrap from a snapshot > containing all their IDs. This is a bug, because such > auto-registration leads to registration of anonymous replicas, if > they are present during bootstrap. Also it blocks Raft, which > can't work if there are registered, but not yet joined nodes. > > Once the registration problem will be solved in a next commit, the > XlogGapError will strike quite often during bootstrap. This patch > won't allow that happen. > > Needed for #5287 While indeed this looks like a sane thing to do (ie try to reconnect in a cycle) when such error happens it breaks backward compatibility because state of the node is getting changed. For me the new behaviour is more sane and valid but I'm a bit nervious if it could cause problems on third party tools which might rely on the former behaviour...