From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 05656446439 for ; Tue, 17 Nov 2020 01:12:47 +0300 (MSK) References: <20201031162911.61876-1-sergos@tarantool.org> <20201103102018.GC517@tarantool.org> From: Vladislav Shpilevoy Message-ID: Date: Mon, 16 Nov 2020 23:12:46 +0100 MIME-Version: 1.0 In-Reply-To: <20201103102018.GC517@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Ostanevich , Oleg Babin Cc: tarantool-patches@dev.tarantool.org, alexander.turenko@tarantool.org On 03.11.2020 11:20, Sergey Ostanevich wrote: > Hi Oleg! > > I believe the point about 'consistency' is not valid here. I put a > simple check that if diag is already set, then print it out. For the > fiber_cond_wait_timeout() it happened multiple times with various > reports, inlcuding this one: > > 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not > set the DIAG to FiberIsCancelled, original diag: Missing .xlog file > between LSN 5 {1: 5} and 6 {1: 6} > > that is used in the test system: > > test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = > 'loading'}) > > So, my resolution will be: it is wrong to set a diag in an arbitrary > place, without clear understanting of the reason. This is the case for > the cond_wait machinery, since it doesn't know _why_ the fiber is > cancelled. It is a wrong resolution, IMO. You just hacked cond wait not to change the other places. It is not about tests. Tests only show what is provided by the internal subsystems. And if they depend on fiber cond not setting diag in case of a fail, then it looks wrong. I suggest you to fix the usage places, where the caller code thinks that cond_wait never sets a diag on cancellation. If a function fails, we set a diag. It is not a thing we do optionally. Otherwise you make it a bit simpler in this patch, but make it harder to work with the cond in future. Talking of your statement: I believe the stack diag also is not supported there yet. It is supported on the level of lib/core, i.e. everywhere. But is not present on 1.10. However it is not the point. The point is that it is not needed here.