Tarantool development patches archive
 help / color / mirror / Atom feed
From: Cyrill Gorcunov <gorcunov@gmail.com>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError
Date: Mon, 14 Sep 2020 15:27:27 +0300	[thread overview]
Message-ID: <20200914122727.GC991329@grain> (raw)
In-Reply-To: <4a6849b67f690d2df3a09911110809149d67f15c.1599931123.git.v.shpilevoy@tarantool.org>

On Sat, Sep 12, 2020 at 07:25:55PM +0200, Vladislav Shpilevoy wrote:
> Previously XlogGapError was considered a critical error stopping
> the replication. That may be not so good as it looks.
> 
> XlogGapError is a perfectly fine error, which should not kill the
> replication connection. It should be retried instead.
> 
> Because here is an example, when the gap can be recovered on its
> own. Consider the case: node1 is a leader, it is booted with
> vclock {1: 3}. Node2 connects and fetches snapshot of node1, it
> also gets vclock {1: 3}. Then node1 writes something and its
> vclock becomes {1: 4}. Now node3 boots from node1, and gets the
> same vclock. Vclocks now look like this:
> 
>   - node1: {1: 4}, leader, has {1: 3} snap.
>   - node2: {1: 3}, booted from node1, has only snap.
>   - node3: {1: 4}, booted from node1, has only snap.
> 
> If the cluster is a fullmesh, node2 will send subscribe requests
> with vclock {1: 3}. If node3 receives it, it will respond with
> xlog gap error, because it only has a snap with {1: 4}, nothing
> else. In that case node2 should retry connecting to node3, and in
> the meantime try to get newer changes from node1.
> 
> The example is totally valid. However it is unreachable now
> because master registers all replicas in _cluster before allowing
> them to make a join. So they all bootstrap from a snapshot
> containing all their IDs. This is a bug, because such
> auto-registration leads to registration of anonymous replicas, if
> they are present during bootstrap. Also it blocks Raft, which
> can't work if there are registered, but not yet joined nodes.
> 
> Once the registration problem will be solved in a next commit, the
> XlogGapError will strike quite often during bootstrap. This patch
> won't allow that happen.
> 
> Needed for #5287

While indeed this looks like a sane thing to do (ie try to reconnect
in a cycle) when such error happens it breaks backward compatibility
because state of the node is getting changed.

For me the new behaviour is more sane and valid but I'm a bit nervious
if it could cause problems on third party tools which might rely on
the former behaviour...

  reply	other threads:[~2020-09-14 12:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-12 17:25 [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 1/4] replication: replace anon flag with enum Vladislav Shpilevoy
2020-09-14 10:09   ` Cyrill Gorcunov
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 2/4] xlog: introduce an error code for XlogGapError Vladislav Shpilevoy
2020-09-14 10:18   ` Cyrill Gorcunov
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError Vladislav Shpilevoy
2020-09-14 12:27   ` Cyrill Gorcunov [this message]
2020-09-12 17:25 ` [Tarantool-patches] [PATCH 4/4] replication: do not register outgoing connections Vladislav Shpilevoy
2020-09-12 17:32 ` [Tarantool-patches] [PATCH 0/4] Boot with anon Vladislav Shpilevoy
2020-09-13 16:03   ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200914122727.GC991329@grain \
    --to=gorcunov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 3/4] replication: retry in case of XlogGapError' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox