Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: tarantool-patches@freelists.org
Subject: [PATCH 1/5] recovery: stop writing to xstream on system error
Date: Sat, 29 Dec 2018 00:21:47 +0300	[thread overview]
Message-ID: <dc22e4cc81ce87fc0d852dafa96464f135bebe0a.1546030880.git.vdavydov.dev@gmail.com> (raw)
In-Reply-To: <cover.1546030880.git.vdavydov.dev@gmail.com>
In-Reply-To: <cover.1546030880.git.vdavydov.dev@gmail.com>

In case force_recovery flag is set, recover_xlog() ignores any errors
returned by xstream_write(), even SocketError or FiberIsCancelled. This
may result in permanent replication breakdown as described in the next
paragraph.

Suppose there's a master and a replica and the master has force_recovery
flag set. The replica gets stalled on WAL while applying a row fetched
from the master. As a result, it stops sending ACKs. In the meantime,
the master writes a lot of new rows to its WAL so that the relay thread
sending changes to the replica fills up all the space available in the
network buffer and blocks on the replication socket. Note, at this
moment it may occur that a packet fragment has been written to the
socket. The WAL delay on the replica takes long enough for replication
to break on timeout: the relay reader fiber on the master doesn't
receive an ACK from the replica in time and cancels the relay writer
fiber. The relay writer fiber wakes up and returns to recover_xlog(),
which happily continues to scan the xlog attempting to send more rows
(force_recovery is set), failing, and complaining to the log. While the
relay thread is still scanning the log, the replica finishes the long
WAL write and reads more data from the socket, freeing up some space in
the network buffer for the relay to write more rows. The relay thread,
which happens to be still in recover_xlog(), writes a new row to the
socket after the packet fragment it had written when it was cancelled,
effectively corrupting the stream and breaking a replication with an
unrecoverable error, e.g.

  xrow.c:99 E> ER_INVALID_MSGPACK: Invalid MsgPack - packet header

Actually, it's pointless to continue scanning an xlog if xstream_write()
returned any error different from ClientError - this means that the xlog
is scanned by a relay thread (not local recovery) and the connection is
broken, in which case there isn't much we can do but stop the relay and
wait for the replica to reconnect. So let's fix this issue by ignoring
force_recovery option for any error that doesn't have type ClientError.

It's difficult to write a test for this case, since too many conditions
have to be satisfied simultaneously for the issue to occur. Injecting
errors doesn't really help here and would look artificial, because it'd
rely too much on the implementation. So I'm committing this one without
a test case.

Part of #3910
---
 src/box/recovery.cc | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/box/recovery.cc b/src/box/recovery.cc
index 64d50989..c3cc7454 100644
--- a/src/box/recovery.cc
+++ b/src/box/recovery.cc
@@ -279,7 +279,17 @@ recover_xlog(struct recovery *r, struct xstream *stream,
 		} else {
 			say_error("can't apply row: ");
 			diag_log();
-			if (!r->wal_dir.force_recovery)
+			/*
+			 * Stop recovery if a system error occurred,
+			 * no matter if force_recovery is set or not,
+			 * because in this case we could have written
+			 * a packet fragment to the stream so that
+			 * the next write would corrupt data at the
+			 * receiving end.
+			 */
+			struct error *e = diag_last_error(diag_get());
+			if (!r->wal_dir.force_recovery ||
+			    !type_assignable(&type_ClientError, e->type))
 				diag_raise();
 		}
 	}
-- 
2.11.0

  reply	other threads:[~2018-12-28 21:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-28 21:21 [PATCH 0/5] Fix a couple of replication breakdown issues Vladimir Davydov
2018-12-28 21:21 ` Vladimir Davydov [this message]
2018-12-29  9:09   ` [tarantool-patches] Re: [PATCH 1/5] recovery: stop writing to xstream on system error Konstantin Osipov
2018-12-29  9:50     ` Vladimir Davydov
2018-12-29 10:57       ` Vladimir Davydov
2018-12-29 12:08         ` Konstantin Osipov
2018-12-28 21:21 ` [PATCH 2/5] relay: do not try to scan xlog if exiting Vladimir Davydov
2018-12-29  9:14   ` [tarantool-patches] " Konstantin Osipov
2018-12-29  9:53     ` Vladimir Davydov
2018-12-28 21:21 ` [PATCH 3/5] relay: cleanup error handling Vladimir Davydov
2018-12-28 21:21 ` [PATCH 4/5] relay: close xlog cursor in relay thread Vladimir Davydov
2018-12-28 21:21 ` [PATCH 5/5] xlog: assure xlog is opened and closed in the same thread Vladimir Davydov
2018-12-29 11:40 ` [PATCH 0/5] Fix a couple of replication breakdown issues Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dc22e4cc81ce87fc0d852dafa96464f135bebe0a.1546030880.git.vdavydov.dev@gmail.com \
    --to=vdavydov.dev@gmail.com \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [PATCH 1/5] recovery: stop writing to xstream on system error' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox