From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Vladimir Davydov Subject: [PATCH 2/5] relay: do not try to scan xlog if exiting Date: Sat, 29 Dec 2018 00:21:48 +0300 Message-Id: In-Reply-To: References: In-Reply-To: References: To: tarantool-patches@freelists.org List-ID: relay_process_wal_event() may be called if the relay fiber is already exiting, e.g. by wal_clear_watcher(). We must not try to scan xlogs in this case, because we could have written an incomplete packet fragment to the replication socket, as described in the previous commit message, so that writing another row would lead to corrupted replication stream and, as a result, permanent replication breakdown. Actually, there was a check for this case in relay_process_wal_event(), but it was broken by commit adc28591f77f ("replication: do not delete relay on applier disconnect"), which replaced it with a relay->status check, which is completely wrong, because relay->status is reset only after the relay thread exits. Part of #3910 --- src/box/relay.cc | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/src/box/relay.cc b/src/box/relay.cc index a01c2a2e..3d9703ea 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -409,10 +409,15 @@ static void relay_process_wal_event(struct wal_watcher *watcher, unsigned events) { struct relay *relay = container_of(watcher, struct relay, wal_watcher); - if (relay->state != RELAY_FOLLOW) { + if (fiber_is_cancelled()) { /* - * Do not try to send anything to the replica - * if it already closed its socket. + * The relay is exiting. Rescanning the WAL at this + * point would be pointless and even dangerous, + * because the relay could have written a packet + * fragment to the socket before being cancelled + * so that writing another row to the socket would + * lead to corrupted replication stream and, as + * a result, permanent replication breakdown. */ return; } -- 2.11.0