From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f66.google.com (mail-lf1-f66.google.com [209.85.167.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 9E68E46971A for ; Thu, 5 Dec 2019 10:27:08 +0300 (MSK) Received: by mail-lf1-f66.google.com with SMTP id y19so1654667lfl.9 for ; Wed, 04 Dec 2019 23:27:08 -0800 (PST) Date: Thu, 5 Dec 2019 10:27:06 +0300 From: Konstantin Osipov Message-ID: <20191205072706.GA16690@atlas> References: <73ebdf94c8f03fca216de9141c6541870b1ed938.1575390549.git.lvasiliev@tarantool.org> <20191203180257.GA4364@atlas> <65b9d108-fe36-1c9b-82c0-60e98efce658@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <65b9d108-fe36-1c9b-82c0-60e98efce658@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH] Add a cancellation guard to cpipe flush callback List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Leonid Vasiliev Cc: tarantool-patches@dev.tarantool.org * Leonid Vasiliev [19/12/05 10:24]: > On 12/3/19 9:02 PM, Konstantin Osipov wrote: > > * Leonid Vasiliev [19/12/03 19:36]: > > > https://github.com/tarantool/tarantool/issues/4127 > > > https://github.com/tarantool/tarantool/tree/lvasiliev/gh-4127-WAL-thread-stucks > > > > Looks like a great catch. > > > > > We need to set a thread cancellation guard, because > > > another thread may cancel the current thread > > > (write() is a cancellation point in ev_async_send) > > > and the activation of the ev_async watcher > > > through ev_async_send will fail. > > > > I still don't get from the explanation why it is relevant that > > ev_async_send mustn't fail? > > The cause of why the ev_async_send mustn't fail is unwanted behavior of the > tarantool instance. For example: first thread flush cpipe input to a > endpoint output and go away while trying to call ev_async_send (write() - > cancellation point). Now stailq_empty(&endpoint->output) is false. After > that, another thread flush cpipe input to the same endpoint, but it didn't > try to call ev_async_send, because output_was_empty is false. As result: a > thread of endpoint->consumer didn't wake-up (blocked on epoll_wait). The > same situation described in > https://github.com/tarantool/tarantool/issues/4127: > Looks like an explanation that deserves to be in a comment prior to pthread_setcancelstate(). The issue is, however, if a thread is cancelled and disappears before deregistering from cbus, a lot of other bad things will happen - because all of its registration entries will sit there. The memory, luckily, will not go away, but I am not sure anyone but the creating thread can deregister the memory structures safaly. Looks like this could be covered with a unit test, what do you think? > at main thread: > from void wal_free(void): > cbus_stop_loop(&writer->wal_pipe); > if (cord_join(&writer->cord)) {...} // wait the "wal" thread > > at "wal" thread: > don't try to call cbus_stop_loop_f (for the reasons described above) > blocked at epoll_wait() > > > > > -- Konstantin Osipov, Moscow, Russia