From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 6 Feb 2019 12:55:55 +0300 From: Vladimir Davydov Subject: Re: [PATCH v2] Fix fiber_join() hang in case fiber_cancel() was called Message-ID: <20190206095555.j5krlav7eawshkkh@esperanza> References: <20190205150111.63224-1-sergepetrenko@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190205150111.63224-1-sergepetrenko@tarantool.org> To: Serge Petrenko Cc: tarantool-patches@freelists.org List-ID: On Tue, Feb 05, 2019 at 06:01:11PM +0300, Serge Petrenko wrote: > In case a fiber joining another fiber gets cancelled, it stays suspended > forever and never finishes joining. This happens because fiber_cancel() > wakes the fiber and removes it from all execution queues. > Fix this by handling possible cancellation in fiber_join(). > > Closes #3948 > --- > Changes in v2: > - rewrote the test completely. > - instead of continuing to join if the fiber > is cancelled make the fiber to be joined > non-joinable and exit. This solution was > discussed verbally. > - revert comment changes for fiber_yield(). > It really isn't a cancellation point. > > src/fiber.c | 25 ++++++++++++++++--------- > src/lua/fiber.c | 11 ++++++++++- > test/app/fiber.result | 33 +++++++++++++++++++++++++++++++++ > test/app/fiber.test.lua | 17 +++++++++++++++++ > 4 files changed, 76 insertions(+), 10 deletions(-) > > diff --git a/src/fiber.c b/src/fiber.c > index 6f3d0ab78..7e9a3d38d 100644 > --- a/src/fiber.c > +++ b/src/fiber.c > @@ -396,18 +396,25 @@ fiber_join(struct fiber *fiber) > > do { > fiber_yield(); > + if (fiber_is_cancelled()) > + break; > } while (! fiber_is_dead(fiber)); > } > - > - /* Move exception to the caller */ > - int ret = fiber->f_ret; > - if (ret != 0) { > - assert(!diag_is_empty(&fiber->diag)); > - diag_move(&fiber->diag, &fiber()->diag); > + if (! fiber_is_dead(fiber)) { > + fiber_set_joinable(fiber, false); > + diag_set(FiberIsCancelled); > + return -1; > + } else { > + /* Move exception to the caller */ > + int ret = fiber->f_ret; > + if (ret != 0) { > + assert(!diag_is_empty(&fiber->diag)); > + diag_move(&fiber->diag, &fiber()->diag); > + } > + /* The fiber is already dead. */ > + fiber_recycle(fiber); > + return ret; > } > - /* The fiber is already dead. */ > - fiber_recycle(fiber); > - return ret; > } > > /** > diff --git a/src/lua/fiber.c b/src/lua/fiber.c > index 8b17d6475..c655c5258 100644 > --- a/src/lua/fiber.c > +++ b/src/lua/fiber.c > @@ -33,6 +33,7 @@ > #include > #include "lua/utils.h" > #include "backtrace.h" > +#include "exception.h" > > #include > #include > @@ -676,10 +677,18 @@ lbox_fiber_join(struct lua_State *L) > { > struct fiber *fiber = lbox_checkfiber(L, 1); > struct lua_State *child_L = fiber->storage.lua.stack; > - fiber_join(fiber); > + int f_ret = fiber_join(fiber); > struct error *e = NULL; > int num_ret = 0; > int coro_ref = 0; > + > + if (f_ret != 0 && diag_last_error(&fiber()->diag)->type == > + &type_FiberIsCancelled) { The check looks fragile - what if both our fiber and the fiber we've been waiting for are cancelled? I'm afraid we'd leak the fiber we are supposed to join then. > + /* The fiber was cancelled before we joined. */ > + luaL_testcancel(L); > + } > + luaL_testcancel(L); > + luaL_testcancel() is called either... Anyway, now I seem to understand why you wanted to ignore fiber_cancel in fiber_join - making it cancellable complicates the function protocol, rendering the function barely usable. Guess I was wrong when asked you to rework the patch, sorry. Let's revert to v1. > if (child_L != NULL) { > coro_ref = lua_tointeger(child_L, -1); > lua_pop(child_L, 1); > diff --git a/test/app/fiber.result b/test/app/fiber.result > index ab7c1941b..f73d32671 100644 > --- a/test/app/fiber.result > +++ b/test/app/fiber.result > @@ -1411,6 +1411,39 @@ l = nil > l1 = nil > --- > ... > +-- gh-3948 fiber.join() blocks if fiber is cancelled. > +function another_func() fiber.yield() end > +--- > +... > +test_run:cmd("setopt delimiter ';'") > +--- > +- true > +... > +function func() > + local fib = fiber.create(another_func) > + fib:set_joinable(true) > + fib:join() > +end; > +--- > +... > +f = fiber.create(func) > +f:cancel() > +while f:status() ~= 'dead' do fiber.sleep(0.01) end; AFAICS the test highly depends on the scheduler algorithm. Let's rewrite it using fiber.channel please. > +--- > +... > +test_run:cmd("setopt delimiter ''"); > +--- > +- true > +... > +f = nil > +--- > +... > +func = nil > +--- > +... > +another_func = nil > +--- > +... These assignments aren't necessary.