* [PATCH v2] Fix fiber_join() hang in case fiber_cancel() was called
@ 2019-02-05 15:01 Serge Petrenko
2019-02-06 9:55 ` Vladimir Davydov
0 siblings, 1 reply; 3+ messages in thread
From: Serge Petrenko @ 2019-02-05 15:01 UTC (permalink / raw)
To: vdavydov.dev; +Cc: tarantool-patches, Serge Petrenko
In case a fiber joining another fiber gets cancelled, it stays suspended
forever and never finishes joining. This happens because fiber_cancel()
wakes the fiber and removes it from all execution queues.
Fix this by handling possible cancellation in fiber_join().
Closes #3948
---
Changes in v2:
- rewrote the test completely.
- instead of continuing to join if the fiber
is cancelled make the fiber to be joined
non-joinable and exit. This solution was
discussed verbally.
- revert comment changes for fiber_yield().
It really isn't a cancellation point.
src/fiber.c | 25 ++++++++++++++++---------
src/lua/fiber.c | 11 ++++++++++-
test/app/fiber.result | 33 +++++++++++++++++++++++++++++++++
test/app/fiber.test.lua | 17 +++++++++++++++++
4 files changed, 76 insertions(+), 10 deletions(-)
diff --git a/src/fiber.c b/src/fiber.c
index 6f3d0ab78..7e9a3d38d 100644
--- a/src/fiber.c
+++ b/src/fiber.c
@@ -396,18 +396,25 @@ fiber_join(struct fiber *fiber)
do {
fiber_yield();
+ if (fiber_is_cancelled())
+ break;
} while (! fiber_is_dead(fiber));
}
-
- /* Move exception to the caller */
- int ret = fiber->f_ret;
- if (ret != 0) {
- assert(!diag_is_empty(&fiber->diag));
- diag_move(&fiber->diag, &fiber()->diag);
+ if (! fiber_is_dead(fiber)) {
+ fiber_set_joinable(fiber, false);
+ diag_set(FiberIsCancelled);
+ return -1;
+ } else {
+ /* Move exception to the caller */
+ int ret = fiber->f_ret;
+ if (ret != 0) {
+ assert(!diag_is_empty(&fiber->diag));
+ diag_move(&fiber->diag, &fiber()->diag);
+ }
+ /* The fiber is already dead. */
+ fiber_recycle(fiber);
+ return ret;
}
- /* The fiber is already dead. */
- fiber_recycle(fiber);
- return ret;
}
/**
diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index 8b17d6475..c655c5258 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -33,6 +33,7 @@
#include <fiber.h>
#include "lua/utils.h"
#include "backtrace.h"
+#include "exception.h"
#include <lua.h>
#include <lauxlib.h>
@@ -676,10 +677,18 @@ lbox_fiber_join(struct lua_State *L)
{
struct fiber *fiber = lbox_checkfiber(L, 1);
struct lua_State *child_L = fiber->storage.lua.stack;
- fiber_join(fiber);
+ int f_ret = fiber_join(fiber);
struct error *e = NULL;
int num_ret = 0;
int coro_ref = 0;
+
+ if (f_ret != 0 && diag_last_error(&fiber()->diag)->type ==
+ &type_FiberIsCancelled) {
+ /* The fiber was cancelled before we joined. */
+ luaL_testcancel(L);
+ }
+ luaL_testcancel(L);
+
if (child_L != NULL) {
coro_ref = lua_tointeger(child_L, -1);
lua_pop(child_L, 1);
diff --git a/test/app/fiber.result b/test/app/fiber.result
index ab7c1941b..f73d32671 100644
--- a/test/app/fiber.result
+++ b/test/app/fiber.result
@@ -1411,6 +1411,39 @@ l = nil
l1 = nil
---
...
+-- gh-3948 fiber.join() blocks if fiber is cancelled.
+function another_func() fiber.yield() end
+---
+...
+test_run:cmd("setopt delimiter ';'")
+---
+- true
+...
+function func()
+ local fib = fiber.create(another_func)
+ fib:set_joinable(true)
+ fib:join()
+end;
+---
+...
+f = fiber.create(func)
+f:cancel()
+while f:status() ~= 'dead' do fiber.sleep(0.01) end;
+---
+...
+test_run:cmd("setopt delimiter ''");
+---
+- true
+...
+f = nil
+---
+...
+func = nil
+---
+...
+another_func = nil
+---
+...
-- cleanup
test_run:cmd("clear filter")
---
diff --git a/test/app/fiber.test.lua b/test/app/fiber.test.lua
index 2762047e4..98bb8f3e1 100644
--- a/test/app/fiber.test.lua
+++ b/test/app/fiber.test.lua
@@ -602,6 +602,23 @@ f = nil
l = nil
l1 = nil
+-- gh-3948 fiber.join() blocks if fiber is cancelled.
+function another_func() fiber.yield() end
+test_run:cmd("setopt delimiter ';'")
+function func()
+ local fib = fiber.create(another_func)
+ fib:set_joinable(true)
+ fib:join()
+end;
+f = fiber.create(func)
+f:cancel()
+while f:status() ~= 'dead' do fiber.sleep(0.01) end;
+
+test_run:cmd("setopt delimiter ''");
+f = nil
+func = nil
+another_func = nil
+
-- cleanup
test_run:cmd("clear filter")
--
2.17.2 (Apple Git-113)
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] Fix fiber_join() hang in case fiber_cancel() was called
2019-02-05 15:01 [PATCH v2] Fix fiber_join() hang in case fiber_cancel() was called Serge Petrenko
@ 2019-02-06 9:55 ` Vladimir Davydov
2019-02-06 12:58 ` [tarantool-patches] " Serge Petrenko
0 siblings, 1 reply; 3+ messages in thread
From: Vladimir Davydov @ 2019-02-06 9:55 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tarantool-patches
On Tue, Feb 05, 2019 at 06:01:11PM +0300, Serge Petrenko wrote:
> In case a fiber joining another fiber gets cancelled, it stays suspended
> forever and never finishes joining. This happens because fiber_cancel()
> wakes the fiber and removes it from all execution queues.
> Fix this by handling possible cancellation in fiber_join().
>
> Closes #3948
> ---
> Changes in v2:
> - rewrote the test completely.
> - instead of continuing to join if the fiber
> is cancelled make the fiber to be joined
> non-joinable and exit. This solution was
> discussed verbally.
> - revert comment changes for fiber_yield().
> It really isn't a cancellation point.
>
> src/fiber.c | 25 ++++++++++++++++---------
> src/lua/fiber.c | 11 ++++++++++-
> test/app/fiber.result | 33 +++++++++++++++++++++++++++++++++
> test/app/fiber.test.lua | 17 +++++++++++++++++
> 4 files changed, 76 insertions(+), 10 deletions(-)
>
> diff --git a/src/fiber.c b/src/fiber.c
> index 6f3d0ab78..7e9a3d38d 100644
> --- a/src/fiber.c
> +++ b/src/fiber.c
> @@ -396,18 +396,25 @@ fiber_join(struct fiber *fiber)
>
> do {
> fiber_yield();
> + if (fiber_is_cancelled())
> + break;
> } while (! fiber_is_dead(fiber));
> }
> -
> - /* Move exception to the caller */
> - int ret = fiber->f_ret;
> - if (ret != 0) {
> - assert(!diag_is_empty(&fiber->diag));
> - diag_move(&fiber->diag, &fiber()->diag);
> + if (! fiber_is_dead(fiber)) {
> + fiber_set_joinable(fiber, false);
> + diag_set(FiberIsCancelled);
> + return -1;
> + } else {
> + /* Move exception to the caller */
> + int ret = fiber->f_ret;
> + if (ret != 0) {
> + assert(!diag_is_empty(&fiber->diag));
> + diag_move(&fiber->diag, &fiber()->diag);
> + }
> + /* The fiber is already dead. */
> + fiber_recycle(fiber);
> + return ret;
> }
> - /* The fiber is already dead. */
> - fiber_recycle(fiber);
> - return ret;
> }
>
> /**
> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> index 8b17d6475..c655c5258 100644
> --- a/src/lua/fiber.c
> +++ b/src/lua/fiber.c
> @@ -33,6 +33,7 @@
> #include <fiber.h>
> #include "lua/utils.h"
> #include "backtrace.h"
> +#include "exception.h"
>
> #include <lua.h>
> #include <lauxlib.h>
> @@ -676,10 +677,18 @@ lbox_fiber_join(struct lua_State *L)
> {
> struct fiber *fiber = lbox_checkfiber(L, 1);
> struct lua_State *child_L = fiber->storage.lua.stack;
> - fiber_join(fiber);
> + int f_ret = fiber_join(fiber);
> struct error *e = NULL;
> int num_ret = 0;
> int coro_ref = 0;
> +
> + if (f_ret != 0 && diag_last_error(&fiber()->diag)->type ==
> + &type_FiberIsCancelled) {
The check looks fragile - what if both our fiber and the fiber we've
been waiting for are cancelled? I'm afraid we'd leak the fiber we are
supposed to join then.
> + /* The fiber was cancelled before we joined. */
> + luaL_testcancel(L);
> + }
> + luaL_testcancel(L);
> +
luaL_testcancel() is called either...
Anyway, now I seem to understand why you wanted to ignore fiber_cancel
in fiber_join - making it cancellable complicates the function protocol,
rendering the function barely usable. Guess I was wrong when asked you
to rework the patch, sorry. Let's revert to v1.
> if (child_L != NULL) {
> coro_ref = lua_tointeger(child_L, -1);
> lua_pop(child_L, 1);
> diff --git a/test/app/fiber.result b/test/app/fiber.result
> index ab7c1941b..f73d32671 100644
> --- a/test/app/fiber.result
> +++ b/test/app/fiber.result
> @@ -1411,6 +1411,39 @@ l = nil
> l1 = nil
> ---
> ...
> +-- gh-3948 fiber.join() blocks if fiber is cancelled.
> +function another_func() fiber.yield() end
> +---
> +...
> +test_run:cmd("setopt delimiter ';'")
> +---
> +- true
> +...
> +function func()
> + local fib = fiber.create(another_func)
> + fib:set_joinable(true)
> + fib:join()
> +end;
> +---
> +...
> +f = fiber.create(func)
> +f:cancel()
> +while f:status() ~= 'dead' do fiber.sleep(0.01) end;
AFAICS the test highly depends on the scheduler algorithm. Let's rewrite
it using fiber.channel please.
> +---
> +...
> +test_run:cmd("setopt delimiter ''");
> +---
> +- true
> +...
> +f = nil
> +---
> +...
> +func = nil
> +---
> +...
> +another_func = nil
> +---
> +...
These assignments aren't necessary.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v2] Fix fiber_join() hang in case fiber_cancel() was called
2019-02-06 9:55 ` Vladimir Davydov
@ 2019-02-06 12:58 ` Serge Petrenko
0 siblings, 0 replies; 3+ messages in thread
From: Serge Petrenko @ 2019-02-06 12:58 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]
Hi, thankyou for review.
All fixed. Please see v3.
> 6 февр. 2019 г., в 12:55, Vladimir Davydov <vdavydov.dev@gmail.com> написал(а):
>
> Anyway, now I seem to understand why you wanted to ignore fiber_cancel
> in fiber_join - making it cancellable complicates the function protocol,
> rendering the function barely usable. Guess I was wrong when asked you
> to rework the patch, sorry. Let's revert to v1.
>
>> if (child_L != NULL) {
>> coro_ref = lua_tointeger(child_L, -1);
>> lua_pop(child_L, 1);
>> diff --git a/test/app/fiber.result b/test/app/fiber.result
>> index ab7c1941b..f73d32671 100644
>> --- a/test/app/fiber.result
>> +++ b/test/app/fiber.result
>> @@ -1411,6 +1411,39 @@ l = nil
>> l1 = nil
>> ---
>> ...
>> +-- gh-3948 fiber.join() blocks if fiber is cancelled.
>> +function another_func() fiber.yield() end
>> +---
>> +...
>> +test_run:cmd("setopt delimiter ';'")
>> +---
>> +- true
>> +...
>> +function func()
>> + local fib = fiber.create(another_func)
>> + fib:set_joinable(true)
>> + fib:join()
>> +end;
>> +---
>> +...
>> +f = fiber.create(func)
>> +f:cancel()
>> +while f:status() ~= 'dead' do fiber.sleep(0.01) end;
>
> AFAICS the test highly depends on the scheduler algorithm. Let's rewrite
> it using fiber.channel please.
>
>> +---
>> +...
>> +test_run:cmd("setopt delimiter ''");
>> +---
>> +- true
>> +...
>
>> +f = nil
>> +---
>> +...
>> +func = nil
>> +---
>> +...
>> +another_func = nil
>> +---
>> +...
>
> These assignments aren't necessary.
[-- Attachment #2: Type: text/html, Size: 10557 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-02-06 12:58 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-05 15:01 [PATCH v2] Fix fiber_join() hang in case fiber_cancel() was called Serge Petrenko
2019-02-06 9:55 ` Vladimir Davydov
2019-02-06 12:58 ` [tarantool-patches] " Serge Petrenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox