From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 4 Feb 2019 20:44:04 +0300 From: Vladimir Davydov Subject: Re: [PATCH] Fix fiber_join() hang in case fiber_cancel() was called Message-ID: <20190204174404.qyalziyrhz6use5g@esperanza> References: <20190204135138.51960-1-sergepetrenko@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190204135138.51960-1-sergepetrenko@tarantool.org> To: Serge Petrenko Cc: georgy@tarantool.org, tarantool-patches@freelists.org List-ID: On Mon, Feb 04, 2019 at 04:51:38PM +0300, Serge Petrenko wrote: > In case a fiber joining another fiber gets cancelled, it stays suspended > forever and never finishes joining. This happens because fiber_cancel() > wakes the fiber and removes it from all execution queues. > Fix this by adding ourselves to the joined fibers wake queue on every > iteration. While joining the fiber ignores cancelation. > > Also, while we're at it, fix comment for fiber_yield(): it obviously is > a cancellation point. > > Closes #3948 > --- > https://github.com/tarantool/tarantool/tree/sp/gh-3948-fiber-cancel-during-join > https://github.com/tarantool/tarantool/issues/3948 > > src/fiber.c | 14 +++++++++++--- > test/unit/fiber.cc | 37 +++++++++++++++++++++++++++++++++++++ > test/unit/fiber.result | 2 ++ > 3 files changed, 50 insertions(+), 3 deletions(-) The test passes with and without this patch on my laptop. Please make sure it fails unless this patch is applied. > > diff --git a/src/fiber.c b/src/fiber.c > index 6f3d0ab78..4dc6b3c5a 100644 > --- a/src/fiber.c > +++ b/src/fiber.c > @@ -392,9 +392,17 @@ fiber_join(struct fiber *fiber) > assert(fiber->flags & FIBER_IS_JOINABLE); > > if (! fiber_is_dead(fiber)) { > - rlist_add_tail_entry(&fiber->wake, fiber(), state); > > do { > + /* > + * In case fiber is cancelled during yield > + * it will be removed from wake queue by a > + * wakeup following the cancel. > + * Having multiple entries for the same fiber > + * doesn't hurt, since wakeup is executed only > + * once per fiber. > + */ > + rlist_add_tail_entry(&fiber->wake, fiber(), state); I don't quite like the idea that cancelling a fiber that is joining another fiber will have no effect until the other fiber has exited. Can't we break the loop if fiber_is_cancelled()? > fiber_yield(); > } while (! fiber_is_dead(fiber)); > } > @@ -411,8 +419,8 @@ fiber_join(struct fiber *fiber) > } > > /** > - * @note: this is not a cancellation point (@sa fiber_testcancel()) > - * but it is considered good practice to call testcancel() > + * @note: this is a cancellation point (@sa fiber_testcancel()) > + * It is considered good practice to call testcancel() > * after each yield. > */ > void I dig through the history and found commit 3e2adffb7a918b8a6e956e7eb47da3083432d154 Author: Konstantin Osipov Date: Tue Oct 18 18:34:32 2011 +0400 Lua: make it possible to create new fibers. Add box.fiber.create(), box.fiber.resume(), fiber.yield(), box.fiber.detach(). Add tests. Makce fiber cancellation implementation more robust and quick, so that it is actually usable from Lua. Debug. which among other things did the following: > @@ -208,7 +234,9 @@ void fiber_setcancelstate(bool enable) > } > > /** > - * @note: this is a cancellation point (@sa fiber_testcancel()) > + * @note: this is not a cancellation point (@sa fiber_testcancel()) > + * but it is considered good practice to call testcancel() > + * after each yield. > */ > > void > @@ -222,8 +250,6 @@ fiber_yield(void) > > callee->csw++; > coro_transfer(&caller->coro.ctx, &callee->coro.ctx); > - > - fiber_testcancel(); > } So I guess the comment is correct after all: by a cancellation point the author meant a function that won't return in case the current fiber is cancelled. After that patch removed fiber_testcancel from fiber_yield, the latter ceased to be a cancellation point.