From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounce@freelists.org>
Received: from localhost (localhost [127.0.0.1])
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 19C14225A7
	for <tarantool-patches@freelists.org>; Sat, 29 Dec 2018 04:14:53 -0500 (EST)
Received: from turing.freelists.org ([127.0.0.1])
	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id biBsJxm_-zge for <tarantool-patches@freelists.org>;
	Sat, 29 Dec 2018 04:14:53 -0500 (EST)
Received: from smtp14.mail.ru (smtp14.mail.ru [94.100.181.95])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id CA59022581
	for <tarantool-patches@freelists.org>; Sat, 29 Dec 2018 04:14:52 -0500 (EST)
Received: by smtp14.mail.ru with esmtpa (envelope-from <kostja@tarantool.org>)
	id 1gdAhm-0003d8-Qu
	for tarantool-patches@freelists.org; Sat, 29 Dec 2018 12:14:51 +0300
Date: Sat, 29 Dec 2018 12:14:50 +0300
From: Konstantin Osipov <kostja@tarantool.org>
Subject: [tarantool-patches] Re: [PATCH 2/5] relay: do not try to scan xlog
 if exiting
Message-ID: <20181229091450.GE17043@chai>
References: <cover.1546030880.git.vdavydov.dev@gmail.com>
 <d9fe4682e9a19da9d6ea77defc3c4096bff00e0e.1546030880.git.vdavydov.dev@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <d9fe4682e9a19da9d6ea77defc3c4096bff00e0e.1546030880.git.vdavydov.dev@gmail.com>
Sender: tarantool-patches-bounce@freelists.org
Errors-to: tarantool-patches-bounce@freelists.org
Reply-To: tarantool-patches@freelists.org
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <tarantool-patches-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: tarantool-patches <tarantool-patches.freelists.org>
List-subscribe: <tarantool-patches-request@freelists.org?Subject=subscribe>
List-owner: <mailto:>
List-post: <mailto:tarantool-patches@freelists.org>
List-archive: <http://www.freelists.org/archives/tarantool-patches>
To: tarantool-patches@freelists.org

* Vladimir Davydov <vdavydov.dev@gmail.com> [18/12/29 10:00]:
> relay_process_wal_event() may be called if the relay fiber is already
> exiting, e.g. by wal_clear_watcher(). We must not try to scan xlogs in
> this case, because we could have written an incomplete packet fragment
> to the replication socket, as described in the previous commit message,
> so that writing another row would lead to corrupted replication stream
> and, as a result, permanent replication breakdown.
> 
> Actually, there was a check for this case in relay_process_wal_event(),
> but it was broken by commit adc28591f77f ("replication: do not delete
> relay on applier disconnect"), which replaced it with a relay->status
> check, which is completely wrong, because relay->status is reset only
> after the relay thread exits.
> 
> Part of #3910
> ---
>  src/box/relay.cc | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index a01c2a2e..3d9703ea 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -409,10 +409,15 @@ static void
>  relay_process_wal_event(struct wal_watcher *watcher, unsigned events)
>  {
>  	struct relay *relay = container_of(watcher, struct relay, wal_watcher);
> -	if (relay->state != RELAY_FOLLOW) {
> +	if (fiber_is_cancelled()) {

 When a relay is exiting, it's state is changed. Why would you
 need to look at fiber_is_cancelled() *instead of* a more explicit
 RELAY_FOLLOW state change? Why not fix the invariant that
 whenever relay is exiting it's state is not RELAY_FOLLOW?

>  		/*
> -		 * Do not try to send anything to the replica
> -		 * if it already closed its socket.
> +		 * The relay is exiting. Rescanning the WAL at this
> +		 * point would be pointless and even dangerous,
> +		 * because the relay could have written a packet
> +		 * fragment to the socket before being cancelled
> +		 * so that writing another row to the socket would
> +		 * lead to corrupted replication stream and, as
> +		 * a result, permanent replication breakdown.
>  		 */
>  		return;
>  	}
> -- 
> 2.11.0
> 

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov