[Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond

Sergey Ostanevich sergos at tarantool.org
Tue Nov 3 13:20:18 MSK 2020


Hi Oleg!

I believe the point about 'consistency' is not valid here. I put a
simple check that if diag is already set, then print it out. For the
fiber_cond_wait_timeout() it happened multiple times with various
reports, inlcuding this one:

2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not
set the DIAG to FiberIsCancelled, original diag: Missing .xlog file
between LSN 5 {1: 5} and 6 {1: 6}

that is used in the test system:

test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status =
'loading'})

So, my resolution will be: it is wrong to set a diag in an arbitrary
place, without clear understanting of the reason. This is the case for
the cond_wait machinery, since it doesn't know _why_ the fiber is
cancelled.

I propose to cover the case with condition on empty diag and call it a
day.

regards,
Sergos

=====

diff --git a/src/box/box.cc b/src/box/box.cc
index 18568df3b..9e824453d 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
 {
 	double deadline = ev_monotonic_now(loop()) + timeout;
 	while (is_box_configured == false || box_is_ro() != ro) {
-		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
-			return -1;
-		if (fiber_is_cancelled()) {
-			diag_set(FiberIsCancelled);
+		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
+			if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
 			return -1;
 		}
 	}
diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
index 904a350d9..0c93c5842 100644
--- a/src/lib/core/fiber_cond.c
+++ b/src/lib/core/fiber_cond.c
@@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
 		diag_set(TimedOut);
 		return -1;
 	}
+	if (fiber_is_cancelled()) {
+		if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled);
+		return -1;
+	}
 	return 0;
 }
 
diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
new file mode 100755
index 000000000..ae805c5bf
--- /dev/null
+++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
@@ -0,0 +1,23 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+local fiber = require('fiber')
+local test = tap.test("gh-5013-fiber-cancel")
+
+test:plan(2)
+
+local result = {}
+
+function test_f()
+    local cond = fiber.cond() 
+    local res, err = pcall(cond.wait, cond) 
+    result.res = res
+    result.err = err
+end
+
+local f = fiber.create(test_f)
+f:cancel()
+fiber.yield()
+
+test:ok(result.res == false, tostring(result.res))
+test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
new file mode 100644
index 000000000..eab0a5e4d
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.result
@@ -0,0 +1,65 @@
+-- test-run result file version 2
+remote = require 'net.box'
+ | ---
+ | ...
+fiber = require 'fiber'
+ | ---
+ | ...
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+ | ---
+ | ...
+box.schema.func.create('infinite_call')
+ | ---
+ | ...
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+ | ---
+ | ...
+
+error_msg = nil
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function gh4834()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+gh4834()
+ | ---
+ | ...
+error_msg
+ | ---
+ | - fiber is cancelled
+ | ...
+box.schema.func.drop('infinite_call')
+ | ---
+ | ...
+infinite_call = nil
+ | ---
+ | ...
+channel = nil
+ | ---
+ | ...
+error_msg = nil
+ | ---
+ | ...
diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
new file mode 100644
index 000000000..06fb3ceac
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
@@ -0,0 +1,29 @@
+remote = require 'net.box'
+fiber = require 'fiber'
+test_run = require('test_run').new()
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+box.schema.func.create('infinite_call')
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+
+error_msg = nil
+test_run:cmd("setopt delimiter ';'")
+function gh4834()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+test_run:cmd("setopt delimiter ''");
+gh4834()
+error_msg
+box.schema.func.drop('infinite_call')
+infinite_call = nil
+channel = nil
+error_msg = nil


On 01 ноя 13:13, Oleg Babin wrote:
> Hi! Thanks for changes. See two comments below.
> 
> On 31/10/2020 19:29, sergos at tarantool.org wrote:
> > From: Sergey Ostanevich <sergos at tarantool.org>
> > 
> > Hi!
> > 
> > Thanks to Oleg Babin's comment I found there's no need to update any lua
> > interfaces, since the reason was in C implementation. Also, there is one
> > place the change is played, so after I fixed it I got complete testing
> > pass.
> > Force-pushed branch, v2 patch attached.
> > 
> > 
> > 
> > Before this patch fiber.cond():wait() just returns for cancelled
> > fiber. In contrast fiber.channel():get() threw "fiber is
> > canceled" error.
> > This patch unify behaviour of channels and condvars and also fixes
> > related net.box module problem - it was impossible to interrupt
> > net.box call with fiber.cancel because it used fiber.cond under
> > the hood. Test cases for both bugs are added.
> > 
> > Closes #4834
> > Closes #5013
> > 
> > Co-authored-by: Oleg Babin <olegrok at tarantool.org>
> > 
> > @TarantoolBot document
> > Title: fiber.cond():wait() throws if fiber is cancelled
> > 
> > Currently fiber.cond():wait() throws an error if waiting fiber is
> > cancelled like in case with fiber.channel():get().
> > ---
> > 
> > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
> > Issue: https://github.com/tarantool/tarantool/issues/5013
> > 
> >   src/box/box.cc                                |  6 +-
> >   src/lib/core/fiber_cond.c                     |  1 +
> >   test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
> >   test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
> >   .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
> >   5 files changed, 120 insertions(+), 4 deletions(-)
> >   create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
> >   create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
> >   create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
> > 
> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index 18568df3b..bfa1051f9 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
> >   {
> >   	double deadline = ev_monotonic_now(loop()) + timeout;
> >   	while (is_box_configured == false || box_is_ro() != ro) {
> > -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
> > -			return -1;
> > -		if (fiber_is_cancelled()) {
> > -			diag_set(FiberIsCancelled);
> > +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
> > +                        if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
> 
> Here you use spaces instead of tabs.
> 

Fixed.

> >   			return -1;
> >   		}
> >   	}
> > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
> > index 904a350d9..b0645069e 100644
> > --- a/src/lib/core/fiber_cond.c
> > +++ b/src/lib/core/fiber_cond.c
> > @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
> >   		diag_set(TimedOut);
> >   		return -1;
> >   	}
> > +	if (fiber_is_cancelled()) return -1;
> 
> It's qute strange to return -1 here but don't set a reason to diagnostic
> area. Look how it is done for channels
> 
> (https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180).
> 
> There is some inconsistency without it.
> 
> I've looked a bit deeper at the failure I reported before. Seems the problem
> is in "cbus_unpair" function.
> 
> The problem appears only if FiberIsCancelled is setted to diag area in
> "fiber_cond_wait" function.
> 
> This is where my expertise ends, as I'm not familiar with "cbus". However I
> have some minds how it could be eliminated.
> 
> Let's declare cbus_unpair fiber as is not cancellable and stop report
> is_cancellable flag for non-cancellable fibers. See some PoC below:
> 
> 
> diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c
> index 5d91fb948..4167c756a 100644
> --- a/src/lib/core/cbus.c
> +++ b/src/lib/core/cbus.c
> @@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe
> *src_pipe,
>      msg.unpair_arg = unpair_arg;
>      msg.src_pipe = src_pipe;
>      msg.complete = false;
> +    fiber_set_cancellable(false);
>      fiber_cond_create(&msg.cond);
> 
>      cpipe_push(dest_pipe, &msg.cmsg);
> @@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe
> *src_pipe,
>          fiber_cond_wait(&msg.cond);
>      }
> 
> +    fiber_set_cancellable(true);
>      cpipe_destroy(dest_pipe);
>  }
> 
> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index 483ae3ce1..8100c9da6 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno)
>  bool
>  fiber_is_cancelled(void)
>  {
> +    if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) {
> +        return false;
> +    }
>      return fiber()->flags & FIBER_IS_CANCELLED;
>  }
> 
> 
> To be honest I've not checked such change carefully and also have segfault
> at replication/gc.test.lua for "memtx" engine.
> 
> Finally, feel free to ignore this comment I hope Vlad or Sasha can give you
> more accurate and correct advices.
> 


More information about the Tarantool-patches mailing list