Hi! Thanks for changes. See two comments below.
From: Sergey Ostanevich <sergos@tarantool.org>
Hi!
Thanks to Oleg Babin's comment I found there's no need to update any lua
interfaces, since the reason was in C implementation. Also, there is one
place the change is played, so after I fixed it I got complete testing
pass.
Force-pushed branch, v2 patch attached.
Before this patch fiber.cond():wait() just returns for cancelled
fiber. In contrast fiber.channel():get() threw "fiber is
canceled" error.
This patch unify behaviour of channels and condvars and also fixes
related net.box module problem - it was impossible to interrupt
net.box call with fiber.cancel because it used fiber.cond under
the hood. Test cases for both bugs are added.
Closes #4834
Closes #5013
Co-authored-by: Oleg Babin <olegrok@tarantool.org>
@TarantoolBot document
Title: fiber.cond():wait() throws if fiber is cancelled
Currently fiber.cond():wait() throws an error if waiting fiber is
cancelled like in case with fiber.channel():get().
---
Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
Issue: https://github.com/tarantool/tarantool/issues/5013
src/box/box.cc | 6 +-
src/lib/core/fiber_cond.c | 1 +
test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++
test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++
.../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
5 files changed, 120 insertions(+), 4 deletions(-)
create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
diff --git a/src/box/box.cc b/src/box/box.cc
index 18568df3b..bfa1051f9 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
{
double deadline = ev_monotonic_now(loop()) + timeout;
while (is_box_configured == false || box_is_ro() != ro) {
- if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
- return -1;
- if (fiber_is_cancelled()) {
- diag_set(FiberIsCancelled);
+ if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
+ if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
Here you use spaces instead of tabs.
return -1; } } diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c index 904a350d9..b0645069e 100644 --- a/src/lib/core/fiber_cond.c +++ b/src/lib/core/fiber_cond.c @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) diag_set(TimedOut); return -1; } + if (fiber_is_cancelled()) return -1;
It's qute strange to return -1 here but don't set a reason to diagnostic area. Look how it is done for channels
There is some inconsistency without it.
I've looked a bit deeper at the failure I reported before. Seems the problem is in "cbus_unpair" function.
The problem appears only if FiberIsCancelled is setted to diag area in "fiber_cond_wait" function.
This is where my expertise ends, as I'm not familiar with "cbus". However I have some minds how it could be eliminated.
Let's declare cbus_unpair fiber as is not cancellable and stop report is_cancellable flag for non-cancellable fibers. See some PoC below:
diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c
index 5d91fb948..4167c756a 100644
--- a/src/lib/core/cbus.c
+++ b/src/lib/core/cbus.c
@@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct
cpipe *src_pipe,
msg.unpair_arg = unpair_arg;
msg.src_pipe = src_pipe;
msg.complete = false;
+ fiber_set_cancellable(false);
fiber_cond_create(&msg.cond);
cpipe_push(dest_pipe, &msg.cmsg);
@@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct
cpipe *src_pipe,
fiber_cond_wait(&msg.cond);
}
+ fiber_set_cancellable(true);
cpipe_destroy(dest_pipe);
}
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3ce1..8100c9da6 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno)
bool
fiber_is_cancelled(void)
{
+ if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) {
+ return false;
+ }
return fiber()->flags & FIBER_IS_CANCELLED;
}
To be honest I've not checked such change carefully and also have segfault at replication/gc.test.lua for "memtx" engine.
Finally, feel free to ignore this comment I hope Vlad or Sasha
can give you more accurate and correct advices.