From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Vladimir Davydov Subject: [PATCH 1/5] recovery: stop writing to xstream on system error Date: Sat, 29 Dec 2018 00:21:47 +0300 Message-Id: In-Reply-To: References: In-Reply-To: References: To: tarantool-patches@freelists.org List-ID: In case force_recovery flag is set, recover_xlog() ignores any errors returned by xstream_write(), even SocketError or FiberIsCancelled. This may result in permanent replication breakdown as described in the next paragraph. Suppose there's a master and a replica and the master has force_recovery flag set. The replica gets stalled on WAL while applying a row fetched from the master. As a result, it stops sending ACKs. In the meantime, the master writes a lot of new rows to its WAL so that the relay thread sending changes to the replica fills up all the space available in the network buffer and blocks on the replication socket. Note, at this moment it may occur that a packet fragment has been written to the socket. The WAL delay on the replica takes long enough for replication to break on timeout: the relay reader fiber on the master doesn't receive an ACK from the replica in time and cancels the relay writer fiber. The relay writer fiber wakes up and returns to recover_xlog(), which happily continues to scan the xlog attempting to send more rows (force_recovery is set), failing, and complaining to the log. While the relay thread is still scanning the log, the replica finishes the long WAL write and reads more data from the socket, freeing up some space in the network buffer for the relay to write more rows. The relay thread, which happens to be still in recover_xlog(), writes a new row to the socket after the packet fragment it had written when it was cancelled, effectively corrupting the stream and breaking a replication with an unrecoverable error, e.g. xrow.c:99 E> ER_INVALID_MSGPACK: Invalid MsgPack - packet header Actually, it's pointless to continue scanning an xlog if xstream_write() returned any error different from ClientError - this means that the xlog is scanned by a relay thread (not local recovery) and the connection is broken, in which case there isn't much we can do but stop the relay and wait for the replica to reconnect. So let's fix this issue by ignoring force_recovery option for any error that doesn't have type ClientError. It's difficult to write a test for this case, since too many conditions have to be satisfied simultaneously for the issue to occur. Injecting errors doesn't really help here and would look artificial, because it'd rely too much on the implementation. So I'm committing this one without a test case. Part of #3910 --- src/box/recovery.cc | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/box/recovery.cc b/src/box/recovery.cc index 64d50989..c3cc7454 100644 --- a/src/box/recovery.cc +++ b/src/box/recovery.cc @@ -279,7 +279,17 @@ recover_xlog(struct recovery *r, struct xstream *stream, } else { say_error("can't apply row: "); diag_log(); - if (!r->wal_dir.force_recovery) + /* + * Stop recovery if a system error occurred, + * no matter if force_recovery is set or not, + * because in this case we could have written + * a packet fragment to the stream so that + * the next write would corrupt data at the + * receiving end. + */ + struct error *e = diag_last_error(diag_get()); + if (!r->wal_dir.force_recovery || + !type_assignable(&type_ClientError, e->type)) diag_raise(); } } -- 2.11.0