Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Konstantin Osipov <kostja@tarantool.org>
Cc: tarantool-patches@freelists.org
Subject: Re: [tarantool-patches] Re: [PATCH 2/5] relay: do not try to scan xlog if exiting
Date: Sat, 29 Dec 2018 12:53:26 +0300	[thread overview]
Message-ID: <20181229095325.oojkito2hmr25wkl@esperanza> (raw)
In-Reply-To: <20181229091450.GE17043@chai>

On Sat, Dec 29, 2018 at 12:14:50PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [18/12/29 10:00]:
> > relay_process_wal_event() may be called if the relay fiber is already
> > exiting, e.g. by wal_clear_watcher(). We must not try to scan xlogs in
> > this case, because we could have written an incomplete packet fragment
> > to the replication socket, as described in the previous commit message,
> > so that writing another row would lead to corrupted replication stream
> > and, as a result, permanent replication breakdown.
> > 
> > Actually, there was a check for this case in relay_process_wal_event(),
> > but it was broken by commit adc28591f77f ("replication: do not delete
> > relay on applier disconnect"), which replaced it with a relay->status
> > check, which is completely wrong, because relay->status is reset only
> > after the relay thread exits.
> > 
> > Part of #3910
> > ---
> >  src/box/relay.cc | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/src/box/relay.cc b/src/box/relay.cc
> > index a01c2a2e..3d9703ea 100644
> > --- a/src/box/relay.cc
> > +++ b/src/box/relay.cc
> > @@ -409,10 +409,15 @@ static void
> >  relay_process_wal_event(struct wal_watcher *watcher, unsigned events)
> >  {
> >  	struct relay *relay = container_of(watcher, struct relay, wal_watcher);
> > -	if (relay->state != RELAY_FOLLOW) {
> > +	if (fiber_is_cancelled()) {
> 
>  When a relay is exiting, it's state is changed. Why would you
>  need to look at fiber_is_cancelled() *instead of* a more explicit
>  RELAY_FOLLOW state change? Why not fix the invariant that
>  whenever relay is exiting it's state is not RELAY_FOLLOW?

For the record. Discussed f2f.

relay->state isn't used by the relay thread, only by the tx thread for
reporting box.info. Relay thread uses fiber_is_cancelled() instead. This
looks ugly, but this particular fix doesn't make things worse so it's OK
to push it as is for now. In future we should rework relay machinery to
make it more straightforward and use fewer callbacks.

> 
> >  		/*
> > -		 * Do not try to send anything to the replica
> > -		 * if it already closed its socket.
> > +		 * The relay is exiting. Rescanning the WAL at this
> > +		 * point would be pointless and even dangerous,
> > +		 * because the relay could have written a packet
> > +		 * fragment to the socket before being cancelled
> > +		 * so that writing another row to the socket would
> > +		 * lead to corrupted replication stream and, as
> > +		 * a result, permanent replication breakdown.
> >  		 */
> >  		return;
> >  	}

  reply	other threads:[~2018-12-29  9:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-28 21:21 [PATCH 0/5] Fix a couple of replication breakdown issues Vladimir Davydov
2018-12-28 21:21 ` [PATCH 1/5] recovery: stop writing to xstream on system error Vladimir Davydov
2018-12-29  9:09   ` [tarantool-patches] " Konstantin Osipov
2018-12-29  9:50     ` Vladimir Davydov
2018-12-29 10:57       ` Vladimir Davydov
2018-12-29 12:08         ` Konstantin Osipov
2018-12-28 21:21 ` [PATCH 2/5] relay: do not try to scan xlog if exiting Vladimir Davydov
2018-12-29  9:14   ` [tarantool-patches] " Konstantin Osipov
2018-12-29  9:53     ` Vladimir Davydov [this message]
2018-12-28 21:21 ` [PATCH 3/5] relay: cleanup error handling Vladimir Davydov
2018-12-28 21:21 ` [PATCH 4/5] relay: close xlog cursor in relay thread Vladimir Davydov
2018-12-28 21:21 ` [PATCH 5/5] xlog: assure xlog is opened and closed in the same thread Vladimir Davydov
2018-12-29 11:40 ` [PATCH 0/5] Fix a couple of replication breakdown issues Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181229095325.oojkito2hmr25wkl@esperanza \
    --to=vdavydov.dev@gmail.com \
    --cc=kostja@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [tarantool-patches] Re: [PATCH 2/5] relay: do not try to scan xlog if exiting' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox