From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 4FE5946971A for ; Thu, 5 Dec 2019 10:22:54 +0300 (MSK) References: <73ebdf94c8f03fca216de9141c6541870b1ed938.1575390549.git.lvasiliev@tarantool.org> <20191203180257.GA4364@atlas> From: Leonid Vasiliev Message-ID: <65b9d108-fe36-1c9b-82c0-60e98efce658@tarantool.org> Date: Thu, 5 Dec 2019 10:22:53 +0300 MIME-Version: 1.0 In-Reply-To: <20191203180257.GA4364@atlas> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH] Add a cancellation guard to cpipe flush callback List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Konstantin Osipov , alexander.turenko@tarantool.org, tarantool-patches@dev.tarantool.org On 12/3/19 9:02 PM, Konstantin Osipov wrote: > * Leonid Vasiliev [19/12/03 19:36]: >> https://github.com/tarantool/tarantool/issues/4127 >> https://github.com/tarantool/tarantool/tree/lvasiliev/gh-4127-WAL-thread-stucks > > Looks like a great catch. > >> We need to set a thread cancellation guard, because >> another thread may cancel the current thread >> (write() is a cancellation point in ev_async_send) >> and the activation of the ev_async watcher >> through ev_async_send will fail. > > I still don't get from the explanation why it is relevant that > ev_async_send mustn't fail? The cause of why the ev_async_send mustn't fail is unwanted behavior of the tarantool instance. For example: first thread flush cpipe input to a endpoint output and go away while trying to call ev_async_send (write() - cancellation point). Now stailq_empty(&endpoint->output) is false. After that, another thread flush cpipe input to the same endpoint, but it didn't try to call ev_async_send, because output_was_empty is false. As result: a thread of endpoint->consumer didn't wake-up (blocked on epoll_wait). The same situation described in https://github.com/tarantool/tarantool/issues/4127: at main thread: from void wal_free(void): cbus_stop_loop(&writer->wal_pipe); if (cord_join(&writer->cord)) {...} // wait the "wal" thread at "wal" thread: don't try to call cbus_stop_loop_f (for the reasons described above) blocked at epoll_wait() > >