Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH] Fix fiber_join() hang in case fiber_cancel() was called
@ 2019-02-04 13:51 Serge Petrenko
  2019-02-04 17:44 ` Vladimir Davydov
  0 siblings, 1 reply; 5+ messages in thread
From: Serge Petrenko @ 2019-02-04 13:51 UTC (permalink / raw)
  To: georgy, vdavydov.dev; +Cc: tarantool-patches, Serge Petrenko

In case a fiber joining another fiber gets cancelled, it stays suspended
forever and never finishes joining. This happens because fiber_cancel()
wakes the fiber and removes it from all execution queues.
Fix this by adding ourselves to the joined fibers wake queue on every
iteration. While joining the fiber ignores cancelation.

Also, while we're at it, fix comment for fiber_yield(): it obviously is
a cancellation point.

Closes #3948
---
https://github.com/tarantool/tarantool/tree/sp/gh-3948-fiber-cancel-during-join
https://github.com/tarantool/tarantool/issues/3948

 src/fiber.c            | 14 +++++++++++---
 test/unit/fiber.cc     | 37 +++++++++++++++++++++++++++++++++++++
 test/unit/fiber.result |  2 ++
 3 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/src/fiber.c b/src/fiber.c
index 6f3d0ab78..4dc6b3c5a 100644
--- a/src/fiber.c
+++ b/src/fiber.c
@@ -392,9 +392,17 @@ fiber_join(struct fiber *fiber)
 	assert(fiber->flags & FIBER_IS_JOINABLE);
 
 	if (! fiber_is_dead(fiber)) {
-		rlist_add_tail_entry(&fiber->wake, fiber(), state);
 
 		do {
+			/*
+			 * In case fiber is cancelled during yield
+			 * it will be removed from wake queue by a
+			 * wakeup following the cancel.
+			 * Having multiple entries for the same fiber
+			 * doesn't hurt, since wakeup is executed only
+			 * once per fiber.
+			 */
+			rlist_add_tail_entry(&fiber->wake, fiber(), state);
 			fiber_yield();
 		} while (! fiber_is_dead(fiber));
 	}
@@ -411,8 +419,8 @@ fiber_join(struct fiber *fiber)
 }
 
 /**
- * @note: this is not a cancellation point (@sa fiber_testcancel())
- * but it is considered good practice to call testcancel()
+ * @note: this is a cancellation point (@sa fiber_testcancel())
+ * It is considered good practice to call testcancel()
  * after each yield.
  */
 void
diff --git a/test/unit/fiber.cc b/test/unit/fiber.cc
index 3e9c479c5..5c2a645f6 100644
--- a/test/unit/fiber.cc
+++ b/test/unit/fiber.cc
@@ -47,6 +47,29 @@ cancel_dead_f(va_list ap)
 	return 0;
 }
 
+static int
+yield_f(va_list ap)
+{
+	/** Let the fiber schedule. */
+	fiber_wakeup(fiber());
+	fiber_yield();
+	return 0;
+}
+
+static int
+cancel_joining_f(va_list ap)
+{
+	fiber_set_cancellable(true);
+	struct fiber *fiber = fiber_new("yield", yield_f);
+	fiber_set_joinable(fiber, true);
+	fiber_wakeup(fiber);
+	fiber_yield();
+	note("joining");
+	fiber_join(fiber);
+	note("joined");
+	return 0;
+}
+
 static size_t fiber_stack_size_default;
 
 static void NOINLINE
@@ -138,6 +161,20 @@ fiber_join_test()
 	note("big-stack fiber not crashed");
 	fiber_join(fiber);
 
+	/*
+	 * Cancelling a fiber which tries to join another fiber
+	 * lead to the joining fiber never getting invoked.
+	 * Make suse fiber joins successfully and that it doesn't
+	 * interrupt join due to a cancel.
+	 */
+	fiber = fiber_new_xc("cancel_joining", cancel_joining_f);
+	fiber_set_joinable(fiber, true);
+	fiber_wakeup(fiber);
+	fiber_wakeup(fiber());
+	fiber_yield();
+	fiber_cancel(fiber);
+	fiber_join(fiber);
+
 	footer();
 }
 
diff --git a/test/unit/fiber.result b/test/unit/fiber.result
index 4f7108ffc..e08409243 100644
--- a/test/unit/fiber.result
+++ b/test/unit/fiber.result
@@ -13,4 +13,6 @@ SystemError Failed to allocate 42 bytes in allocator for exception: Cannot alloc
 # cancel dead has started
 # by this time the fiber should be dead already
 # big-stack fiber not crashed
+# joining
+# joined
 	*** fiber_join_test: done ***
-- 
2.17.2 (Apple Git-113)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix fiber_join() hang in case fiber_cancel() was called
  2019-02-04 13:51 [PATCH] Fix fiber_join() hang in case fiber_cancel() was called Serge Petrenko
@ 2019-02-04 17:44 ` Vladimir Davydov
  2019-02-05  6:28   ` [tarantool-patches] " Serge Petrenko
  0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Davydov @ 2019-02-04 17:44 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: georgy, tarantool-patches

On Mon, Feb 04, 2019 at 04:51:38PM +0300, Serge Petrenko wrote:
> In case a fiber joining another fiber gets cancelled, it stays suspended
> forever and never finishes joining. This happens because fiber_cancel()
> wakes the fiber and removes it from all execution queues.
> Fix this by adding ourselves to the joined fibers wake queue on every
> iteration. While joining the fiber ignores cancelation.
> 
> Also, while we're at it, fix comment for fiber_yield(): it obviously is
> a cancellation point.
> 
> Closes #3948
> ---
> https://github.com/tarantool/tarantool/tree/sp/gh-3948-fiber-cancel-during-join
> https://github.com/tarantool/tarantool/issues/3948
> 
>  src/fiber.c            | 14 +++++++++++---
>  test/unit/fiber.cc     | 37 +++++++++++++++++++++++++++++++++++++
>  test/unit/fiber.result |  2 ++
>  3 files changed, 50 insertions(+), 3 deletions(-)

The test passes with and without this patch on my laptop. Please make
sure it fails unless this patch is applied.

> 
> diff --git a/src/fiber.c b/src/fiber.c
> index 6f3d0ab78..4dc6b3c5a 100644
> --- a/src/fiber.c
> +++ b/src/fiber.c
> @@ -392,9 +392,17 @@ fiber_join(struct fiber *fiber)
>  	assert(fiber->flags & FIBER_IS_JOINABLE);
>  
>  	if (! fiber_is_dead(fiber)) {
> -		rlist_add_tail_entry(&fiber->wake, fiber(), state);
>  
>  		do {
> +			/*
> +			 * In case fiber is cancelled during yield
> +			 * it will be removed from wake queue by a
> +			 * wakeup following the cancel.
> +			 * Having multiple entries for the same fiber
> +			 * doesn't hurt, since wakeup is executed only
> +			 * once per fiber.
> +			 */
> +			rlist_add_tail_entry(&fiber->wake, fiber(), state);

I don't quite like the idea that cancelling a fiber that is joining
another fiber will have no effect until the other fiber has exited.
Can't we break the loop if fiber_is_cancelled()?

>  			fiber_yield();
>  		} while (! fiber_is_dead(fiber));
>  	}
> @@ -411,8 +419,8 @@ fiber_join(struct fiber *fiber)
>  }
>  
>  /**
> - * @note: this is not a cancellation point (@sa fiber_testcancel())
> - * but it is considered good practice to call testcancel()
> + * @note: this is a cancellation point (@sa fiber_testcancel())
> + * It is considered good practice to call testcancel()
>   * after each yield.
>   */
>  void

I dig through the history and found

commit 3e2adffb7a918b8a6e956e7eb47da3083432d154
Author: Konstantin Osipov <kostja.osipov@gmail.com>
Date:   Tue Oct 18 18:34:32 2011 +0400

    Lua: make it possible to create new fibers.

    Add box.fiber.create(), box.fiber.resume(),
    fiber.yield(), box.fiber.detach().

    Add tests.

    Makce fiber cancellation implementation
    more robust and quick, so that it
    is actually usable from Lua.
    Debug.

which among other things did the following:

> @@ -208,7 +234,9 @@ void fiber_setcancelstate(bool enable)
>  }
> 
>  /**
> - * @note: this is a cancellation point (@sa fiber_testcancel())
> + * @note: this is not a cancellation point (@sa fiber_testcancel())
> + * but it is considered good practice to call testcancel()
> + * after each yield.
>   */
> 
>  void
> @@ -222,8 +250,6 @@ fiber_yield(void)
> 
>         callee->csw++;
>         coro_transfer(&caller->coro.ctx, &callee->coro.ctx);
> -
> -       fiber_testcancel();
>  }

So I guess the comment is correct after all: by a cancellation point the
author meant a function that won't return in case the current fiber is
cancelled. After that patch removed fiber_testcancel from fiber_yield,
the latter ceased to be a cancellation point.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [tarantool-patches] Re: [PATCH] Fix fiber_join() hang in case fiber_cancel() was called
  2019-02-04 17:44 ` Vladimir Davydov
@ 2019-02-05  6:28   ` Serge Petrenko
  2019-02-05  9:00     ` Vladimir Davydov
  0 siblings, 1 reply; 5+ messages in thread
From: Serge Petrenko @ 2019-02-05  6:28 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Georgy Kirichenko, tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]



> 4 февр. 2019 г., в 20:44, Vladimir Davydov <vdavydov.dev@gmail.com> написал(а):
> 
>> 
>> diff --git a/src/fiber.c b/src/fiber.c
>> index 6f3d0ab78..4dc6b3c5a 100644
>> --- a/src/fiber.c
>> +++ b/src/fiber.c
>> @@ -392,9 +392,17 @@ fiber_join(struct fiber *fiber)
>> 	assert(fiber->flags & FIBER_IS_JOINABLE);
>> 
>> 	if (! fiber_is_dead(fiber)) {
>> -		rlist_add_tail_entry(&fiber->wake, fiber(), state);
>> 
>> 		do {
>> +			/*
>> +			 * In case fiber is cancelled during yield
>> +			 * it will be removed from wake queue by a
>> +			 * wakeup following the cancel.
>> +			 * Having multiple entries for the same fiber
>> +			 * doesn't hurt, since wakeup is executed only
>> +			 * once per fiber.
>> +			 */
>> +			rlist_add_tail_entry(&fiber->wake, fiber(), state);
> 
> I don't quite like the idea that cancelling a fiber that is joining
> another fiber will have no effect until the other fiber has exited.
> Can't we break the loop if fiber_is_cancelled()?

We can do that. But then we have to set FIBER_IS_JOINABLE to false
for the joined fiber so that it executes fiber_recycle(). Otherwise it will leak.
Is it ok?

[-- Attachment #2: Type: text/html, Size: 7386 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [tarantool-patches] Re: [PATCH] Fix fiber_join() hang in case fiber_cancel() was called
  2019-02-05  6:28   ` [tarantool-patches] " Serge Petrenko
@ 2019-02-05  9:00     ` Vladimir Davydov
  2019-02-05 15:02       ` Serge Petrenko
  0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Davydov @ 2019-02-05  9:00 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: Georgy Kirichenko, tarantool-patches

On Tue, Feb 05, 2019 at 09:28:01AM +0300, Serge Petrenko wrote:
> 
> 
> > 4 февр. 2019 г., в 20:44, Vladimir Davydov <vdavydov.dev@gmail.com> написал(а):
> > 
> >> 
> >> diff --git a/src/fiber.c b/src/fiber.c
> >> index 6f3d0ab78..4dc6b3c5a 100644
> >> --- a/src/fiber.c
> >> +++ b/src/fiber.c
> >> @@ -392,9 +392,17 @@ fiber_join(struct fiber *fiber)
> >> 	assert(fiber->flags & FIBER_IS_JOINABLE);
> >> 
> >> 	if (! fiber_is_dead(fiber)) {
> >> -		rlist_add_tail_entry(&fiber->wake, fiber(), state);
> >> 
> >> 		do {
> >> +			/*
> >> +			 * In case fiber is cancelled during yield
> >> +			 * it will be removed from wake queue by a
> >> +			 * wakeup following the cancel.
> >> +			 * Having multiple entries for the same fiber
> >> +			 * doesn't hurt, since wakeup is executed only
> >> +			 * once per fiber.
> >> +			 */
> >> +			rlist_add_tail_entry(&fiber->wake, fiber(), state);
> > 
> > I don't quite like the idea that cancelling a fiber that is joining
> > another fiber will have no effect until the other fiber has exited.
> > Can't we break the loop if fiber_is_cancelled()?
> 
> We can do that. But then we have to set FIBER_IS_JOINABLE to false
> for the joined fiber so that it executes fiber_recycle().

Why should we? IMO the user should be free to kill a fiber executing
fiber_join. If that happens, the joinable fiber shouldn't be collected
until another fiber joins it successfully. This would be consistent with
pthread_join behavior.

> Otherwise it will leak.  Is it ok?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [tarantool-patches] Re: [PATCH] Fix fiber_join() hang in case fiber_cancel() was called
  2019-02-05  9:00     ` Vladimir Davydov
@ 2019-02-05 15:02       ` Serge Petrenko
  0 siblings, 0 replies; 5+ messages in thread
From: Serge Petrenko @ 2019-02-05 15:02 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Georgy Kirichenko, tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 1826 bytes --]



> 5 февр. 2019 г., в 12:00, Vladimir Davydov <vdavydov.dev@gmail.com> написал(а):
> 
> On Tue, Feb 05, 2019 at 09:28:01AM +0300, Serge Petrenko wrote:
>> 
>> 
>>> 4 февр. 2019 г., в 20:44, Vladimir Davydov <vdavydov.dev@gmail.com> написал(а):
>>> 
>>>> 
>>>> diff --git a/src/fiber.c b/src/fiber.c
>>>> index 6f3d0ab78..4dc6b3c5a 100644
>>>> --- a/src/fiber.c
>>>> +++ b/src/fiber.c
>>>> @@ -392,9 +392,17 @@ fiber_join(struct fiber *fiber)
>>>> 	assert(fiber->flags & FIBER_IS_JOINABLE);
>>>> 
>>>> 	if (! fiber_is_dead(fiber)) {
>>>> -		rlist_add_tail_entry(&fiber->wake, fiber(), state);
>>>> 
>>>> 		do {
>>>> +			/*
>>>> +			 * In case fiber is cancelled during yield
>>>> +			 * it will be removed from wake queue by a
>>>> +			 * wakeup following the cancel.
>>>> +			 * Having multiple entries for the same fiber
>>>> +			 * doesn't hurt, since wakeup is executed only
>>>> +			 * once per fiber.
>>>> +			 */
>>>> +			rlist_add_tail_entry(&fiber->wake, fiber(), state);
>>> 
>>> I don't quite like the idea that cancelling a fiber that is joining
>>> another fiber will have no effect until the other fiber has exited.
>>> Can't we break the loop if fiber_is_cancelled()?
>> 
>> We can do that. But then we have to set FIBER_IS_JOINABLE to false
>> for the joined fiber so that it executes fiber_recycle().
> 
> Why should we? IMO the user should be free to kill a fiber executing
> fiber_join. If that happens, the joinable fiber shouldn't be collected
> until another fiber joins it successfully. This would be consistent with
> pthread_join behavior.
> 
>> Otherwise it will leak.  Is it ok?

As discussed verbally, on cancellation lets make the fiber non-joinable and exit.
I addressed your other comments in v2. Please, take a look.


[-- Attachment #2: Type: text/html, Size: 10085 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-02-05 15:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-04 13:51 [PATCH] Fix fiber_join() hang in case fiber_cancel() was called Serge Petrenko
2019-02-04 17:44 ` Vladimir Davydov
2019-02-05  6:28   ` [tarantool-patches] " Serge Petrenko
2019-02-05  9:00     ` Vladimir Davydov
2019-02-05 15:02       ` Serge Petrenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox