Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
@ 2020-10-31 16:29 sergos
  2020-11-01 10:13 ` Oleg Babin
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: sergos @ 2020-10-31 16:29 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, alexander.turenko

From: Sergey Ostanevich <sergos@tarantool.org>

Hi!

Thanks to Oleg Babin's comment I found there's no need to update any lua
interfaces, since the reason was in C implementation. Also, there is one
place the change is played, so after I fixed it I got complete testing 
pass. 
Force-pushed branch, v2 patch attached.



Before this patch fiber.cond():wait() just returns for cancelled
fiber. In contrast fiber.channel():get() threw "fiber is
canceled" error.
This patch unify behaviour of channels and condvars and also fixes
related net.box module problem - it was impossible to interrupt
net.box call with fiber.cancel because it used fiber.cond under
the hood. Test cases for both bugs are added.

Closes #4834
Closes #5013

Co-authored-by: Oleg Babin <olegrok@tarantool.org>

@TarantoolBot document
Title: fiber.cond():wait() throws if fiber is cancelled

Currently fiber.cond():wait() throws an error if waiting fiber is
cancelled like in case with fiber.channel():get().
---

Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond                                                                                                                          
Issue: https://github.com/tarantool/tarantool/issues/5013                                                                                                                                                   

 src/box/box.cc                                |  6 +-
 src/lib/core/fiber_cond.c                     |  1 +
 test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
 test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
 .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
 5 files changed, 120 insertions(+), 4 deletions(-)
 create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
 create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
 create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index 18568df3b..bfa1051f9 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
 {
 	double deadline = ev_monotonic_now(loop()) + timeout;
 	while (is_box_configured == false || box_is_ro() != ro) {
-		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
-			return -1;
-		if (fiber_is_cancelled()) {
-			diag_set(FiberIsCancelled);
+		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
+                        if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
 			return -1;
 		}
 	}
diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
index 904a350d9..b0645069e 100644
--- a/src/lib/core/fiber_cond.c
+++ b/src/lib/core/fiber_cond.c
@@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
 		diag_set(TimedOut);
 		return -1;
 	}
+	if (fiber_is_cancelled()) return -1;
 	return 0;
 }
 
diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
new file mode 100755
index 000000000..ae805c5bf
--- /dev/null
+++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
@@ -0,0 +1,23 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+local fiber = require('fiber')
+local test = tap.test("gh-5013-fiber-cancel")
+
+test:plan(2)
+
+local result = {}
+
+function test_f()
+    local cond = fiber.cond() 
+    local res, err = pcall(cond.wait, cond) 
+    result.res = res
+    result.err = err
+end
+
+local f = fiber.create(test_f)
+f:cancel()
+fiber.yield()
+
+test:ok(result.res == false, tostring(result.res))
+test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
new file mode 100644
index 000000000..eab0a5e4d
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.result
@@ -0,0 +1,65 @@
+-- test-run result file version 2
+remote = require 'net.box'
+ | ---
+ | ...
+fiber = require 'fiber'
+ | ---
+ | ...
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+ | ---
+ | ...
+box.schema.func.create('infinite_call')
+ | ---
+ | ...
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+ | ---
+ | ...
+
+error_msg = nil
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function gh4834()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+gh4834()
+ | ---
+ | ...
+error_msg
+ | ---
+ | - fiber is cancelled
+ | ...
+box.schema.func.drop('infinite_call')
+ | ---
+ | ...
+infinite_call = nil
+ | ---
+ | ...
+channel = nil
+ | ---
+ | ...
+error_msg = nil
+ | ---
+ | ...
diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
new file mode 100644
index 000000000..06fb3ceac
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
@@ -0,0 +1,29 @@
+remote = require 'net.box'
+fiber = require 'fiber'
+test_run = require('test_run').new()
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+box.schema.func.create('infinite_call')
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+
+error_msg = nil
+test_run:cmd("setopt delimiter ';'")
+function gh4834()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+test_run:cmd("setopt delimiter ''");
+gh4834()
+error_msg
+box.schema.func.drop('infinite_call')
+infinite_call = nil
+channel = nil
+error_msg = nil
-- 
2.17.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos
@ 2020-11-01 10:13 ` Oleg Babin
  2020-11-03 10:20   ` Sergey Ostanevich
  2020-11-04 10:00 ` Leonid Vasiliev
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Oleg Babin @ 2020-11-01 10:13 UTC (permalink / raw)
  To: sergos, tarantool-patches; +Cc: v.shpilevoy, alexander.turenko

[-- Attachment #1: Type: text/plain, Size: 4943 bytes --]

Hi! Thanks for changes. See two comments below.

On 31/10/2020 19:29, sergos@tarantool.org wrote:
> From: Sergey Ostanevich <sergos@tarantool.org>
>
> Hi!
>
> Thanks to Oleg Babin's comment I found there's no need to update any lua
> interfaces, since the reason was in C implementation. Also, there is one
> place the change is played, so after I fixed it I got complete testing
> pass.
> Force-pushed branch, v2 patch attached.
>
>
>
> Before this patch fiber.cond():wait() just returns for cancelled
> fiber. In contrast fiber.channel():get() threw "fiber is
> canceled" error.
> This patch unify behaviour of channels and condvars and also fixes
> related net.box module problem - it was impossible to interrupt
> net.box call with fiber.cancel because it used fiber.cond under
> the hood. Test cases for both bugs are added.
>
> Closes #4834
> Closes #5013
>
> Co-authored-by: Oleg Babin <olegrok@tarantool.org>
>
> @TarantoolBot document
> Title: fiber.cond():wait() throws if fiber is cancelled
>
> Currently fiber.cond():wait() throws an error if waiting fiber is
> cancelled like in case with fiber.channel():get().
> ---
>
> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
> Issue: https://github.com/tarantool/tarantool/issues/5013
>
>   src/box/box.cc                                |  6 +-
>   src/lib/core/fiber_cond.c                     |  1 +
>   test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
>   test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
>   .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
>   5 files changed, 120 insertions(+), 4 deletions(-)
>   create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
>   create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
>   create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
>
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 18568df3b..bfa1051f9 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
>   {
>   	double deadline = ev_monotonic_now(loop()) + timeout;
>   	while (is_box_configured == false || box_is_ro() != ro) {
> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
> -			return -1;
> -		if (fiber_is_cancelled()) {
> -			diag_set(FiberIsCancelled);
> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
> +                        if (fiber_is_cancelled()) diag_set(FiberIsCancelled);

Here you use spaces instead of tabs.

>   			return -1;
>   		}
>   	}
> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
> index 904a350d9..b0645069e 100644
> --- a/src/lib/core/fiber_cond.c
> +++ b/src/lib/core/fiber_cond.c
> @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>   		diag_set(TimedOut);
>   		return -1;
>   	}
> +	if (fiber_is_cancelled()) return -1;

It's qute strange to return -1 here but don't set a reason to diagnostic 
area. Look how it is done for channels

(https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180).

There is some inconsistency without it.

I've looked a bit deeper at the failure I reported before. Seems the 
problem is in "cbus_unpair" function.

The problem appears only if FiberIsCancelled is setted to diag area in 
"fiber_cond_wait" function.

This is where my expertise ends, as I'm not familiar with "cbus". 
However I have some minds how it could be eliminated.

Let's declare cbus_unpair fiber as is not cancellable and stop report 
is_cancellable flag for non-cancellable fibers. See some PoC below:


diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c
index 5d91fb948..4167c756a 100644
--- a/src/lib/core/cbus.c
+++ b/src/lib/core/cbus.c
@@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe 
*src_pipe,
      msg.unpair_arg = unpair_arg;
      msg.src_pipe = src_pipe;
      msg.complete = false;
+    fiber_set_cancellable(false);
      fiber_cond_create(&msg.cond);

      cpipe_push(dest_pipe, &msg.cmsg);
@@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe 
*src_pipe,
          fiber_cond_wait(&msg.cond);
      }

+    fiber_set_cancellable(true);
      cpipe_destroy(dest_pipe);
  }

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3ce1..8100c9da6 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno)
  bool
  fiber_is_cancelled(void)
  {
+    if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) {
+        return false;
+    }
      return fiber()->flags & FIBER_IS_CANCELLED;
  }


To be honest I've not checked such change carefully and also have 
segfault at replication/gc.test.lua for "memtx" engine.

Finally, feel free to ignore this comment I hope Vlad or Sasha can give 
you more accurate and correct advices.


[-- Attachment #2: Type: text/html, Size: 7437 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-01 10:13 ` Oleg Babin
@ 2020-11-03 10:20   ` Sergey Ostanevich
  2020-11-03 10:27     ` Oleg Babin
                       ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Sergey Ostanevich @ 2020-11-03 10:20 UTC (permalink / raw)
  To: Oleg Babin; +Cc: tarantool-patches, v.shpilevoy, alexander.turenko

Hi Oleg!

I believe the point about 'consistency' is not valid here. I put a
simple check that if diag is already set, then print it out. For the
fiber_cond_wait_timeout() it happened multiple times with various
reports, inlcuding this one:

2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not
set the DIAG to FiberIsCancelled, original diag: Missing .xlog file
between LSN 5 {1: 5} and 6 {1: 6}

that is used in the test system:

test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status =
'loading'})

So, my resolution will be: it is wrong to set a diag in an arbitrary
place, without clear understanting of the reason. This is the case for
the cond_wait machinery, since it doesn't know _why_ the fiber is
cancelled.

I propose to cover the case with condition on empty diag and call it a
day.

regards,
Sergos

=====

diff --git a/src/box/box.cc b/src/box/box.cc
index 18568df3b..9e824453d 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
 {
 	double deadline = ev_monotonic_now(loop()) + timeout;
 	while (is_box_configured == false || box_is_ro() != ro) {
-		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
-			return -1;
-		if (fiber_is_cancelled()) {
-			diag_set(FiberIsCancelled);
+		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
+			if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
 			return -1;
 		}
 	}
diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
index 904a350d9..0c93c5842 100644
--- a/src/lib/core/fiber_cond.c
+++ b/src/lib/core/fiber_cond.c
@@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
 		diag_set(TimedOut);
 		return -1;
 	}
+	if (fiber_is_cancelled()) {
+		if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled);
+		return -1;
+	}
 	return 0;
 }
 
diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
new file mode 100755
index 000000000..ae805c5bf
--- /dev/null
+++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
@@ -0,0 +1,23 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+local fiber = require('fiber')
+local test = tap.test("gh-5013-fiber-cancel")
+
+test:plan(2)
+
+local result = {}
+
+function test_f()
+    local cond = fiber.cond() 
+    local res, err = pcall(cond.wait, cond) 
+    result.res = res
+    result.err = err
+end
+
+local f = fiber.create(test_f)
+f:cancel()
+fiber.yield()
+
+test:ok(result.res == false, tostring(result.res))
+test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
new file mode 100644
index 000000000..eab0a5e4d
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.result
@@ -0,0 +1,65 @@
+-- test-run result file version 2
+remote = require 'net.box'
+ | ---
+ | ...
+fiber = require 'fiber'
+ | ---
+ | ...
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+ | ---
+ | ...
+box.schema.func.create('infinite_call')
+ | ---
+ | ...
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+ | ---
+ | ...
+
+error_msg = nil
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function gh4834()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+gh4834()
+ | ---
+ | ...
+error_msg
+ | ---
+ | - fiber is cancelled
+ | ...
+box.schema.func.drop('infinite_call')
+ | ---
+ | ...
+infinite_call = nil
+ | ---
+ | ...
+channel = nil
+ | ---
+ | ...
+error_msg = nil
+ | ---
+ | ...
diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
new file mode 100644
index 000000000..06fb3ceac
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
@@ -0,0 +1,29 @@
+remote = require 'net.box'
+fiber = require 'fiber'
+test_run = require('test_run').new()
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+box.schema.func.create('infinite_call')
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+
+error_msg = nil
+test_run:cmd("setopt delimiter ';'")
+function gh4834()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+test_run:cmd("setopt delimiter ''");
+gh4834()
+error_msg
+box.schema.func.drop('infinite_call')
+infinite_call = nil
+channel = nil
+error_msg = nil


On 01 ноя 13:13, Oleg Babin wrote:
> Hi! Thanks for changes. See two comments below.
> 
> On 31/10/2020 19:29, sergos@tarantool.org wrote:
> > From: Sergey Ostanevich <sergos@tarantool.org>
> > 
> > Hi!
> > 
> > Thanks to Oleg Babin's comment I found there's no need to update any lua
> > interfaces, since the reason was in C implementation. Also, there is one
> > place the change is played, so after I fixed it I got complete testing
> > pass.
> > Force-pushed branch, v2 patch attached.
> > 
> > 
> > 
> > Before this patch fiber.cond():wait() just returns for cancelled
> > fiber. In contrast fiber.channel():get() threw "fiber is
> > canceled" error.
> > This patch unify behaviour of channels and condvars and also fixes
> > related net.box module problem - it was impossible to interrupt
> > net.box call with fiber.cancel because it used fiber.cond under
> > the hood. Test cases for both bugs are added.
> > 
> > Closes #4834
> > Closes #5013
> > 
> > Co-authored-by: Oleg Babin <olegrok@tarantool.org>
> > 
> > @TarantoolBot document
> > Title: fiber.cond():wait() throws if fiber is cancelled
> > 
> > Currently fiber.cond():wait() throws an error if waiting fiber is
> > cancelled like in case with fiber.channel():get().
> > ---
> > 
> > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
> > Issue: https://github.com/tarantool/tarantool/issues/5013
> > 
> >   src/box/box.cc                                |  6 +-
> >   src/lib/core/fiber_cond.c                     |  1 +
> >   test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
> >   test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
> >   .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
> >   5 files changed, 120 insertions(+), 4 deletions(-)
> >   create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
> >   create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
> >   create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
> > 
> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index 18568df3b..bfa1051f9 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
> >   {
> >   	double deadline = ev_monotonic_now(loop()) + timeout;
> >   	while (is_box_configured == false || box_is_ro() != ro) {
> > -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
> > -			return -1;
> > -		if (fiber_is_cancelled()) {
> > -			diag_set(FiberIsCancelled);
> > +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
> > +                        if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
> 
> Here you use spaces instead of tabs.
> 

Fixed.

> >   			return -1;
> >   		}
> >   	}
> > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
> > index 904a350d9..b0645069e 100644
> > --- a/src/lib/core/fiber_cond.c
> > +++ b/src/lib/core/fiber_cond.c
> > @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
> >   		diag_set(TimedOut);
> >   		return -1;
> >   	}
> > +	if (fiber_is_cancelled()) return -1;
> 
> It's qute strange to return -1 here but don't set a reason to diagnostic
> area. Look how it is done for channels
> 
> (https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180).
> 
> There is some inconsistency without it.
> 
> I've looked a bit deeper at the failure I reported before. Seems the problem
> is in "cbus_unpair" function.
> 
> The problem appears only if FiberIsCancelled is setted to diag area in
> "fiber_cond_wait" function.
> 
> This is where my expertise ends, as I'm not familiar with "cbus". However I
> have some minds how it could be eliminated.
> 
> Let's declare cbus_unpair fiber as is not cancellable and stop report
> is_cancellable flag for non-cancellable fibers. See some PoC below:
> 
> 
> diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c
> index 5d91fb948..4167c756a 100644
> --- a/src/lib/core/cbus.c
> +++ b/src/lib/core/cbus.c
> @@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe
> *src_pipe,
>      msg.unpair_arg = unpair_arg;
>      msg.src_pipe = src_pipe;
>      msg.complete = false;
> +    fiber_set_cancellable(false);
>      fiber_cond_create(&msg.cond);
> 
>      cpipe_push(dest_pipe, &msg.cmsg);
> @@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe
> *src_pipe,
>          fiber_cond_wait(&msg.cond);
>      }
> 
> +    fiber_set_cancellable(true);
>      cpipe_destroy(dest_pipe);
>  }
> 
> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index 483ae3ce1..8100c9da6 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno)
>  bool
>  fiber_is_cancelled(void)
>  {
> +    if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) {
> +        return false;
> +    }
>      return fiber()->flags & FIBER_IS_CANCELLED;
>  }
> 
> 
> To be honest I've not checked such change carefully and also have segfault
> at replication/gc.test.lua for "memtx" engine.
> 
> Finally, feel free to ignore this comment I hope Vlad or Sasha can give you
> more accurate and correct advices.
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-03 10:20   ` Sergey Ostanevich
@ 2020-11-03 10:27     ` Oleg Babin
  2020-11-04 10:00     ` Leonid Vasiliev
  2020-11-16 22:12     ` Vladislav Shpilevoy
  2 siblings, 0 replies; 23+ messages in thread
From: Oleg Babin @ 2020-11-03 10:27 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches, v.shpilevoy, alexander.turenko

Hi! Thanks for your comment and diff.

On 03/11/2020 13:20, Sergey Ostanevich wrote:
> So, my resolution will be: it is wrong to set a diag in an arbitrary
> place, without clear understanting of the reason. This is the case for
> the cond_wait machinery, since it doesn't know_why_  the fiber is
> cancelled.
>
> I propose to cover the case with condition on empty diag and call it a
> day.

Sounds reasonable. It seems to be ok.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos
  2020-11-01 10:13 ` Oleg Babin
@ 2020-11-04 10:00 ` Leonid Vasiliev
  2020-11-05 20:42   ` Sergey Ostanevich
  2020-11-10 21:16   ` Sergey Ostanevich
  2020-11-16 22:12 ` Vladislav Shpilevoy
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 23+ messages in thread
From: Leonid Vasiliev @ 2020-11-04 10:00 UTC (permalink / raw)
  To: sergos, tarantool-patches; +Cc: v.shpilevoy, alexander.turenko

Hi! Thank you for the patch.
See some comments below:

1) The patch changes undocumented behavior, AFAIU.
So, I have a question:"Do you plan to backport the patch to
tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the
changes. But if the answer is "No" - I will object, because in this case
both behaviors must be supported in all modules.

2) I think changing the behavior in C doesn't cause much of a problem,
because before when you wait without timeout, you don't need to check
the return value (it's always 0). But in Lua it will cause the problems,
because now throws an error if cancelled and all wait calls should be
wrapped to pcall.

On 31.10.2020 19:29, sergos@tarantool.org wrote:
> From: Sergey Ostanevich <sergos@tarantool.org>
> 
> Hi!
> 
> Thanks to Oleg Babin's comment I found there's no need to update any lua
> interfaces, since the reason was in C implementation. Also, there is one
> place the change is played, so after I fixed it I got complete testing
> pass.
> Force-pushed branch, v2 patch attached.
> 
> 
> 
> Before this patch fiber.cond():wait() just returns for cancelled
> fiber. In contrast fiber.channel():get() threw "fiber is
> canceled" error.
> This patch unify behaviour of channels and condvars and also fixes

3) behaviour -> behavior.

> related net.box module problem - it was impossible to interrupt
> net.box call with fiber.cancel because it used fiber.cond under
> the hood. Test cases for both bugs are added.
> 
> Closes #4834
> Closes #5013
> 
> Co-authored-by: Oleg Babin <olegrok@tarantool.org>
> 
> @TarantoolBot document
> Title: fiber.cond():wait() throws if fiber is cancelled
> 
> Currently fiber.cond():wait() throws an error if waiting fiber is
> cancelled like in case with fiber.channel():get().

4) I don't think it's a good decision adding a comparison with
fiber.channel():get() to the documentation. Up to you.

5) Document the changes in module.h.

> ---

6) Add @ChangeLog.

> 
> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
> Issue: https://github.com/tarantool/tarantool/issues/5013
> 
>   src/box/box.cc                                |  6 +-
>   src/lib/core/fiber_cond.c                     |  1 +
>   test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
>   test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
>   .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
>   5 files changed, 120 insertions(+), 4 deletions(-)
>   create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
>   create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
>   create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 18568df3b..bfa1051f9 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
>   {
>   	double deadline = ev_monotonic_now(loop()) + timeout;
>   	while (is_box_configured == false || box_is_ro() != ro) {
> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
> -			return -1;
> -		if (fiber_is_cancelled()) {
> -			diag_set(FiberIsCancelled);
> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
> +                        if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
>   			return -1;
>   		}
>   	}
> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
> index 904a350d9..b0645069e 100644
> --- a/src/lib/core/fiber_cond.c
> +++ b/src/lib/core/fiber_cond.c
> @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>   		diag_set(TimedOut);
>   		return -1;
>   	}
> +	if (fiber_is_cancelled()) return -1;
>   	return 0;
>   }
>   
> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
> new file mode 100755
> index 000000000..ae805c5bf
> --- /dev/null
> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
> @@ -0,0 +1,23 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +local fiber = require('fiber')
> +local test = tap.test("gh-5013-fiber-cancel")
> +
> +test:plan(2)
> +
> +local result = {}
> +
> +function test_f()
> +    local cond = fiber.cond()
> +    local res, err = pcall(cond.wait, cond)
> +    result.res = res
> +    result.err = err
> +end
> +
> +local f = fiber.create(test_f)
> +f:cancel()
> +fiber.yield()
> +
> +test:ok(result.res == false, tostring(result.res))
> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
> new file mode 100644
> index 000000000..eab0a5e4d
> --- /dev/null
> +++ b/test/box/net.box_fiber_cancel_gh-4834.result
> @@ -0,0 +1,65 @@
> +-- test-run result file version 2
> +remote = require 'net.box'
> + | ---
> + | ...
> +fiber = require 'fiber'
> + | ---
> + | ...
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> +function infinite_call() fiber.channel(1):get() end
> + | ---
> + | ...
> +box.schema.func.create('infinite_call')
> + | ---
> + | ...
> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> + | ---
> + | ...
> +
> +error_msg = nil
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ';'")
> + | ---
> + | - true
> + | ...
> +function gh4834()
> +    local cn = remote.connect(box.cfg.listen)
> +    local f = fiber.new(function()
> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> +    end)
> +    f:set_joinable(true)
> +    fiber.yield()
> +    f:cancel()
> +    f:join()
> +    cn:close()
> +end;
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ''");
> + | ---
> + | - true
> + | ...
> +gh4834()
> + | ---
> + | ...
> +error_msg
> + | ---
> + | - fiber is cancelled
> + | ...
> +box.schema.func.drop('infinite_call')
> + | ---
> + | ...
> +infinite_call = nil
> + | ---
> + | ...
> +channel = nil
> + | ---
> + | ...
> +error_msg = nil
> + | ---
> + | ...
> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> new file mode 100644
> index 000000000..06fb3ceac
> --- /dev/null
> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> @@ -0,0 +1,29 @@
> +remote = require 'net.box'
> +fiber = require 'fiber'
> +test_run = require('test_run').new()
> +
> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> +function infinite_call() fiber.channel(1):get() end
> +box.schema.func.create('infinite_call')
> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> +
> +error_msg = nil
> +test_run:cmd("setopt delimiter ';'")
> +function gh4834()
> +    local cn = remote.connect(box.cfg.listen)
> +    local f = fiber.new(function()
> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> +    end)
> +    f:set_joinable(true)
> +    fiber.yield()
> +    f:cancel()
> +    f:join()
> +    cn:close()
> +end;
> +test_run:cmd("setopt delimiter ''");
> +gh4834()
> +error_msg
> +box.schema.func.drop('infinite_call')
> +infinite_call = nil
> +channel = nil
> +error_msg = nil
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-03 10:20   ` Sergey Ostanevich
  2020-11-03 10:27     ` Oleg Babin
@ 2020-11-04 10:00     ` Leonid Vasiliev
  2020-11-16 22:12     ` Vladislav Shpilevoy
  2 siblings, 0 replies; 23+ messages in thread
From: Leonid Vasiliev @ 2020-11-04 10:00 UTC (permalink / raw)
  To: Sergey Ostanevich, Oleg Babin
  Cc: tarantool-patches, v.shpilevoy, alexander.turenko


Continuation of the review.

On 03.11.2020 13:20, Sergey Ostanevich wrote:
> Hi Oleg!
> 
> I believe the point about 'consistency' is not valid here. I put a
> simple check that if diag is already set, then print it out. For the
> fiber_cond_wait_timeout() it happened multiple times with various
> reports, inlcuding this one:
> 
> 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not
> set the DIAG to FiberIsCancelled, original diag: Missing .xlog file
> between LSN 5 {1: 5} and 6 {1: 6}
> 
> that is used in the test system:
> 
> test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status =
> 'loading'})
> 
> So, my resolution will be: it is wrong to set a diag in an arbitrary
> place, without clear understanting of the reason. This is the case for
> the cond_wait machinery, since it doesn't know _why_ the fiber is
> cancelled.
> 
> I propose to cover the case with condition on empty diag and call it a
> day.
> 
> regards,
> Sergos
> 
> =====
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 18568df3b..9e824453d 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
>   {
>   	double deadline = ev_monotonic_now(loop()) + timeout;
>   	while (is_box_configured == false || box_is_ro() != ro) {
> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
> -			return -1;
> -		if (fiber_is_cancelled()) {
> -			diag_set(FiberIsCancelled);
> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
> +			if (fiber_is_cancelled()) diag_set(FiberIsCancelled);

7) According to 
https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/
it's seems like you are trying to hide something)
Use:
```
if (condition)
     action();
```
Instead of:
```
if (condition) action();
another_action();
```

>   			return -1;
>   		}
>   	}
> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
> index 904a350d9..0c93c5842 100644
> --- a/src/lib/core/fiber_cond.c
> +++ b/src/lib/core/fiber_cond.c
> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>   		diag_set(TimedOut);
>   		return -1;
>   	}
> +	if (fiber_is_cancelled()) {
> +		if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled);

8) The same as previously:
```
if (condition)
     action();
```

9) Checking diag on empty looks strange to me. I think we should add an
error to diag without this check. If we want to save the previous error,
I suggest to use the stack diag.

> +		return -1;
> +	}
>   	return 0;
>   }
>   
> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
> new file mode 100755
> index 000000000..ae805c5bf
> --- /dev/null
> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
> @@ -0,0 +1,23 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +local fiber = require('fiber')
> +local test = tap.test("gh-5013-fiber-cancel")
> +
> +test:plan(2)
> +
> +local result = {}
> +
> +function test_f()
> +    local cond = fiber.cond()
> +    local res, err = pcall(cond.wait, cond)
> +    result.res = res
> +    result.err = err
> +end
> +
> +local f = fiber.create(test_f)
> +f:cancel()
> +fiber.yield()
> +
> +test:ok(result.res == false, tostring(result.res))
> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))

10) Use a user-frendly check message (in both checks).

> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
> new file mode 100644
> index 000000000..eab0a5e4d
> --- /dev/null
> +++ b/test/box/net.box_fiber_cancel_gh-4834.result
> @@ -0,0 +1,65 @@
> +-- test-run result file version 2
> +remote = require 'net.box'
> + | ---
> + | ...
> +fiber = require 'fiber'
> + | ---
> + | ...
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> +function infinite_call() fiber.channel(1):get() end
> + | ---
> + | ...
> +box.schema.func.create('infinite_call')
> + | ---
> + | ...
> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> + | ---
> + | ...
> +
> +error_msg = nil
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ';'")
> + | ---
> + | - true
> + | ...
> +function gh4834( > +    local cn = remote.connect(box.cfg.listen)
> +    local f = fiber.new(function()
> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> +    end)
> +    f:set_joinable(true)
> +    fiber.yield()
> +    f:cancel()
> +    f:join()
> +    cn:close()
> +end;
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ''");
> + | ---
> + | - true
> + | ...
> +gh4834()
> + | ---
> + | ...
> +error_msg
> + | ---
> + | - fiber is cancelled
> + | ...
> +box.schema.func.drop('infinite_call')
> + | ---
> + | ...
> +infinite_call = nil
> + | ---
> + | ...
> +channel = nil
> + | ---
> + | ...
> +error_msg = nil
> + | ---
> + | ...
> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> new file mode 100644
> index 000000000..06fb3ceac
> --- /dev/null
> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> @@ -0,0 +1,29 @@
> +remote = require 'net.box'
> +fiber = require 'fiber'
> +test_run = require('test_run').new()
> +
> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> +function infinite_call() fiber.channel(1):get() end
> +box.schema.func.create('infinite_call')
> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> +
> +error_msg = nil
> +test_run:cmd("setopt delimiter ';'")
> +function gh4834()

11) I think this is not a self-documenting name for the function.

> +    local cn = remote.connect(box.cfg.listen)
> +    local f = fiber.new(function()
> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> +    end)
> +    f:set_joinable(true)
> +    fiber.yield()
> +    f:cancel()
> +    f:join()
> +    cn:close()
> +end;
> +test_run:cmd("setopt delimiter ''");
> +gh4834()
> +error_msg
> +box.schema.func.drop('infinite_call')
> +infinite_call = nil
> +channel = nil

12) You didn't introduce the variable "channel". In addition, I looked
at a couple of tests and didn't see them cleaning up variables. But I
don't mind.

> +error_msg = nil
> 
> 
> On 01 ноя 13:13, Oleg Babin wrote:
>> Hi! Thanks for changes. See two comments below.
>>
>> On 31/10/2020 19:29, sergos@tarantool.org wrote:
>>> From: Sergey Ostanevich <sergos@tarantool.org>
>>>
>>> Hi!
>>>
>>> Thanks to Oleg Babin's comment I found there's no need to update any lua
>>> interfaces, since the reason was in C implementation. Also, there is one
>>> place the change is played, so after I fixed it I got complete testing
>>> pass.
>>> Force-pushed branch, v2 patch attached.
>>>
>>>
>>>
>>> Before this patch fiber.cond():wait() just returns for cancelled
>>> fiber. In contrast fiber.channel():get() threw "fiber is
>>> canceled" error.
>>> This patch unify behaviour of channels and condvars and also fixes
>>> related net.box module problem - it was impossible to interrupt
>>> net.box call with fiber.cancel because it used fiber.cond under
>>> the hood. Test cases for both bugs are added.
>>>
>>> Closes #4834
>>> Closes #5013
>>>
>>> Co-authored-by: Oleg Babin <olegrok@tarantool.org>
>>>
>>> @TarantoolBot document
>>> Title: fiber.cond():wait() throws if fiber is cancelled
>>>
>>> Currently fiber.cond():wait() throws an error if waiting fiber is
>>> cancelled like in case with fiber.channel():get().
>>> ---
>>>
>>> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
>>> Issue: https://github.com/tarantool/tarantool/issues/5013
>>>
>>>    src/box/box.cc                                |  6 +-
>>>    src/lib/core/fiber_cond.c                     |  1 +
>>>    test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
>>>    test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
>>>    .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
>>>    5 files changed, 120 insertions(+), 4 deletions(-)
>>>    create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
>>>    create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
>>>    create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
>>>
>>> diff --git a/src/box/box.cc b/src/box/box.cc
>>> index 18568df3b..bfa1051f9 100644
>>> --- a/src/box/box.cc
>>> +++ b/src/box/box.cc
>>> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
>>>    {
>>>    	double deadline = ev_monotonic_now(loop()) + timeout;
>>>    	while (is_box_configured == false || box_is_ro() != ro) {
>>> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
>>> -			return -1;
>>> -		if (fiber_is_cancelled()) {
>>> -			diag_set(FiberIsCancelled);
>>> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
>>> +                        if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
>>
>> Here you use spaces instead of tabs.
>>
> 
> Fixed.
> 
>>>    			return -1;
>>>    		}
>>>    	}
>>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
>>> index 904a350d9..b0645069e 100644
>>> --- a/src/lib/core/fiber_cond.c
>>> +++ b/src/lib/core/fiber_cond.c
>>> @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>>>    		diag_set(TimedOut);
>>>    		return -1;
>>>    	}
>>> +	if (fiber_is_cancelled()) return -1;
>>
>> It's qute strange to return -1 here but don't set a reason to diagnostic
>> area. Look how it is done for channels
>>
>> (https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180).
>>
>> There is some inconsistency without it.
>>
>> I've looked a bit deeper at the failure I reported before. Seems the problem
>> is in "cbus_unpair" function.
>>
>> The problem appears only if FiberIsCancelled is setted to diag area in
>> "fiber_cond_wait" function.
>>
>> This is where my expertise ends, as I'm not familiar with "cbus". However I
>> have some minds how it could be eliminated.
>>
>> Let's declare cbus_unpair fiber as is not cancellable and stop report
>> is_cancellable flag for non-cancellable fibers. See some PoC below:
>>
>>
>> diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c
>> index 5d91fb948..4167c756a 100644
>> --- a/src/lib/core/cbus.c
>> +++ b/src/lib/core/cbus.c
>> @@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe
>> *src_pipe,
>>       msg.unpair_arg = unpair_arg;
>>       msg.src_pipe = src_pipe;
>>       msg.complete = false;
>> +    fiber_set_cancellable(false);
>>       fiber_cond_create(&msg.cond);
>>
>>       cpipe_push(dest_pipe, &msg.cmsg);
>> @@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe
>> *src_pipe,
>>           fiber_cond_wait(&msg.cond);
>>       }
>>
>> +    fiber_set_cancellable(true);
>>       cpipe_destroy(dest_pipe);
>>   }
>>
>> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
>> index 483ae3ce1..8100c9da6 100644
>> --- a/src/lib/core/fiber.c
>> +++ b/src/lib/core/fiber.c
>> @@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno)
>>   bool
>>   fiber_is_cancelled(void)
>>   {
>> +    if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) {
>> +        return false;
>> +    }
>>       return fiber()->flags & FIBER_IS_CANCELLED;
>>   }
>>
>>
>> To be honest I've not checked such change carefully and also have segfault
>> at replication/gc.test.lua for "memtx" engine.
>>
>> Finally, feel free to ignore this comment I hope Vlad or Sasha can give you
>> more accurate and correct advices.
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-04 10:00 ` Leonid Vasiliev
@ 2020-11-05 20:42   ` Sergey Ostanevich
  2020-11-10 21:16   ` Sergey Ostanevich
  1 sibling, 0 replies; 23+ messages in thread
From: Sergey Ostanevich @ 2020-11-05 20:42 UTC (permalink / raw)
  To: Leonid Vasiliev; +Cc: tarantool-patches, v.shpilevoy, alexander.turenko

Hi!

On 04 ноя 13:00, Leonid Vasiliev wrote:
> Hi! Thank you for the patch.
> See some comments below:
> 
> 1) The patch changes undocumented behavior, AFAIU.
> So, I have a question:"Do you plan to backport the patch to
> tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the
> changes. But if the answer is "No" - I will object, because in this case
> both behaviors must be supported in all modules.

That's a very good point, Leonid! I didn't met any use of the '0' or '1'
result in C or in Lua for the case of fiber cancellation. Although it
might be not true about many projects in Tarantool ecosystem and -
what's more scary - in user's code in the multitude of installations.

Also, I have to admit that introduction of two types of behavior for
different versions of Tarantool will get things even worse.

I think we have to re-iterate the discussion of the change in the
ticket first.

Sergos


> 
> 2) I think changing the behavior in C doesn't cause much of a problem,
> because before when you wait without timeout, you don't need to check
> the return value (it's always 0). But in Lua it will cause the problems,
> because now throws an error if cancelled and all wait calls should be
> wrapped to pcall.
> 
> On 31.10.2020 19:29, sergos@tarantool.org wrote:
> > From: Sergey Ostanevich <sergos@tarantool.org>
> > 
> > Hi!
> > 
> > Thanks to Oleg Babin's comment I found there's no need to update any lua
> > interfaces, since the reason was in C implementation. Also, there is one
> > place the change is played, so after I fixed it I got complete testing
> > pass.
> > Force-pushed branch, v2 patch attached.
> > 
> > 
> > 
> > Before this patch fiber.cond():wait() just returns for cancelled
> > fiber. In contrast fiber.channel():get() threw "fiber is
> > canceled" error.
> > This patch unify behaviour of channels and condvars and also fixes
> 
> 3) behaviour -> behavior.
> 
> > related net.box module problem - it was impossible to interrupt
> > net.box call with fiber.cancel because it used fiber.cond under
> > the hood. Test cases for both bugs are added.
> > 
> > Closes #4834
> > Closes #5013
> > 
> > Co-authored-by: Oleg Babin <olegrok@tarantool.org>
> > 
> > @TarantoolBot document
> > Title: fiber.cond():wait() throws if fiber is cancelled
> > 
> > Currently fiber.cond():wait() throws an error if waiting fiber is
> > cancelled like in case with fiber.channel():get().
> 
> 4) I don't think it's a good decision adding a comparison with
> fiber.channel():get() to the documentation. Up to you.
> 
> 5) Document the changes in module.h.
> 
> > ---
> 
> 6) Add @ChangeLog.
> 
> > 
> > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
> > Issue: https://github.com/tarantool/tarantool/issues/5013
> > 
> >   src/box/box.cc                                |  6 +-
> >   src/lib/core/fiber_cond.c                     |  1 +
> >   test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
> >   test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
> >   .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
> >   5 files changed, 120 insertions(+), 4 deletions(-)
> >   create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
> >   create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
> >   create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
> > 
> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index 18568df3b..bfa1051f9 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
> >   {
> >   	double deadline = ev_monotonic_now(loop()) + timeout;
> >   	while (is_box_configured == false || box_is_ro() != ro) {
> > -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
> > -			return -1;
> > -		if (fiber_is_cancelled()) {
> > -			diag_set(FiberIsCancelled);
> > +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
> > +                        if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
> >   			return -1;
> >   		}
> >   	}
> > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
> > index 904a350d9..b0645069e 100644
> > --- a/src/lib/core/fiber_cond.c
> > +++ b/src/lib/core/fiber_cond.c
> > @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
> >   		diag_set(TimedOut);
> >   		return -1;
> >   	}
> > +	if (fiber_is_cancelled()) return -1;
> >   	return 0;
> >   }
> > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
> > new file mode 100755
> > index 000000000..ae805c5bf
> > --- /dev/null
> > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
> > @@ -0,0 +1,23 @@
> > +#!/usr/bin/env tarantool
> > +
> > +local tap = require('tap')
> > +local fiber = require('fiber')
> > +local test = tap.test("gh-5013-fiber-cancel")
> > +
> > +test:plan(2)
> > +
> > +local result = {}
> > +
> > +function test_f()
> > +    local cond = fiber.cond()
> > +    local res, err = pcall(cond.wait, cond)
> > +    result.res = res
> > +    result.err = err
> > +end
> > +
> > +local f = fiber.create(test_f)
> > +f:cancel()
> > +fiber.yield()
> > +
> > +test:ok(result.res == false, tostring(result.res))
> > +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
> > diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
> > new file mode 100644
> > index 000000000..eab0a5e4d
> > --- /dev/null
> > +++ b/test/box/net.box_fiber_cancel_gh-4834.result
> > @@ -0,0 +1,65 @@
> > +-- test-run result file version 2
> > +remote = require 'net.box'
> > + | ---
> > + | ...
> > +fiber = require 'fiber'
> > + | ---
> > + | ...
> > +test_run = require('test_run').new()
> > + | ---
> > + | ...
> > +
> > +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> > +function infinite_call() fiber.channel(1):get() end
> > + | ---
> > + | ...
> > +box.schema.func.create('infinite_call')
> > + | ---
> > + | ...
> > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> > + | ---
> > + | ...
> > +
> > +error_msg = nil
> > + | ---
> > + | ...
> > +test_run:cmd("setopt delimiter ';'")
> > + | ---
> > + | - true
> > + | ...
> > +function gh4834()
> > +    local cn = remote.connect(box.cfg.listen)
> > +    local f = fiber.new(function()
> > +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> > +    end)
> > +    f:set_joinable(true)
> > +    fiber.yield()
> > +    f:cancel()
> > +    f:join()
> > +    cn:close()
> > +end;
> > + | ---
> > + | ...
> > +test_run:cmd("setopt delimiter ''");
> > + | ---
> > + | - true
> > + | ...
> > +gh4834()
> > + | ---
> > + | ...
> > +error_msg
> > + | ---
> > + | - fiber is cancelled
> > + | ...
> > +box.schema.func.drop('infinite_call')
> > + | ---
> > + | ...
> > +infinite_call = nil
> > + | ---
> > + | ...
> > +channel = nil
> > + | ---
> > + | ...
> > +error_msg = nil
> > + | ---
> > + | ...
> > diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> > new file mode 100644
> > index 000000000..06fb3ceac
> > --- /dev/null
> > +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> > @@ -0,0 +1,29 @@
> > +remote = require 'net.box'
> > +fiber = require 'fiber'
> > +test_run = require('test_run').new()
> > +
> > +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> > +function infinite_call() fiber.channel(1):get() end
> > +box.schema.func.create('infinite_call')
> > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> > +
> > +error_msg = nil
> > +test_run:cmd("setopt delimiter ';'")
> > +function gh4834()
> > +    local cn = remote.connect(box.cfg.listen)
> > +    local f = fiber.new(function()
> > +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> > +    end)
> > +    f:set_joinable(true)
> > +    fiber.yield()
> > +    f:cancel()
> > +    f:join()
> > +    cn:close()
> > +end;
> > +test_run:cmd("setopt delimiter ''");
> > +gh4834()
> > +error_msg
> > +box.schema.func.drop('infinite_call')
> > +infinite_call = nil
> > +channel = nil
> > +error_msg = nil
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-04 10:00 ` Leonid Vasiliev
  2020-11-05 20:42   ` Sergey Ostanevich
@ 2020-11-10 21:16   ` Sergey Ostanevich
  2020-11-12 20:15     ` Sergey Ostanevich
  1 sibling, 1 reply; 23+ messages in thread
From: Sergey Ostanevich @ 2020-11-10 21:16 UTC (permalink / raw)
  To: Leonid Vasiliev; +Cc: tarantool-patches, v.shpilevoy, alexander.turenko

Hi Leonid, thank you for the review!

I put two parts together to handle them in one patch.

> On 4 Nov 2020, at 13:00, Leonid Vasiliev <lvasiliev@tarantool.org> wrote:
> 
> Hi! Thank you for the patch.
> See some comments below:
> 
> 1) The patch changes undocumented behavior, AFAIU.
> So, I have a question:"Do you plan to backport the patch to
> tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the
> changes. But if the answer is "No" - I will object, because in this case
> both behaviors must be supported in all modules.
> 

Yes, the plan is to push it down to 1.10

> 2) I think changing the behavior in C doesn't cause much of a problem,
> because before when you wait without timeout, you don't need to check
> the return value (it's always 0). But in Lua it will cause the problems,
> because now throws an error if cancelled and all wait calls should be
> wrapped to pcall.

Exactly it is the change to Lua and we intentionally make it, since original
behaviour is wrong. Good things are: fiber.cond has no big presence in the
/tarantool sourcebase and the cancellation of fiber itself means the fiber
has to quit.
 
> 
> On 31.10.2020 19:29, sergos@tarantool.org wrote:
>> From: Sergey Ostanevich <sergos@tarantool.org>
>> Hi!
>> Thanks to Oleg Babin's comment I found there's no need to update any lua
>> interfaces, since the reason was in C implementation. Also, there is one
>> place the change is played, so after I fixed it I got complete testing
>> pass.
>> Force-pushed branch, v2 patch attached.
>> Before this patch fiber.cond():wait() just returns for cancelled
>> fiber. In contrast fiber.channel():get() threw "fiber is
>> canceled" error.
>> This patch unify behaviour of channels and condvars and also fixes
> 
> 3) behaviour -> behavior.

Depends on spellchecker - mine sets it to this British version.

> 
>> related net.box module problem - it was impossible to interrupt
>> net.box call with fiber.cancel because it used fiber.cond under
>> the hood. Test cases for both bugs are added.
>> Closes #4834
>> Closes #5013
>> Co-authored-by: Oleg Babin <olegrok@tarantool.org>
>> @TarantoolBot document
>> Title: fiber.cond():wait() throws if fiber is cancelled
>> Currently fiber.cond():wait() throws an error if waiting fiber is
>> cancelled like in case with fiber.channel():get().
> 
> 4) I don't think it's a good decision adding a comparison with
> fiber.channel():get() to the documentation. Up to you.
> 

I’d agree to you: let’s just put the new behaviour here.

> 5) Document the changes in module.h.

Done (need to rebuild, obviously).

> 
>> ---
> 
> 6) Add @ChangeLog.
> 
>> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
>> Issue: https://github.com/tarantool/tarantool/issues/5013
>>  src/box/box.cc                                |  6 +-
>>  src/lib/core/fiber_cond.c                     |  1 +
>>  test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
>>  test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
>>  .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
>>  5 files changed, 120 insertions(+), 4 deletions(-)
>>  create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
>>  create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
>>  create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
> 
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 18568df3b..9e824453d 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
>>  {
>>  	double deadline = ev_monotonic_now(loop()) + timeout;
>>  	while (is_box_configured == false || box_is_ro() != ro) {
>> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
>> -			return -1;
>> -		if (fiber_is_cancelled()) {
>> -			diag_set(FiberIsCancelled);
>> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
>> +			if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
> 
> 7) According to https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/
> it's seems like you are trying to hide something)
> Use:
> ```
> if (condition)
>    action();
> ```
> Instead of:
> ```
> if (condition) action();
> another_action();
> ```

Fixed.

>>  			return -1;
>>  		}
>>  	}
>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
>> index 904a350d9..0c93c5842 100644
>> --- a/src/lib/core/fiber_cond.c
>> +++ b/src/lib/core/fiber_cond.c
>> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>>  		diag_set(TimedOut);
>>  		return -1;
>>  	}
>> +	if (fiber_is_cancelled()) {
>> +		if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled);
> 
> 8) The same as previously:
> ```
> if (condition)
>    action();
> ```
> 

Fixed.

> 9) Checking diag on empty looks strange to me. I think we should add an
> error to diag without this check. If we want to save the previous error,
> I suggest to use the stack diag.

I made this because this interface is used inside the testing machinery, where 
exact cause of e.g. replica disconnection is reported. If we put the diag here
unconditionally it will break this use of diag. I believe the stack diag also 
is not supported there yet.


>> +		return -1;
>> +	}
>>  	return 0;
>>  }
>>  diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
>> new file mode 100755
>> index 000000000..ae805c5bf
>> --- /dev/null
>> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
>> @@ -0,0 +1,23 @@
>> +#!/usr/bin/env tarantool
>> +
>> +local tap = require('tap')
>> +local fiber = require('fiber')
>> +local test = tap.test("gh-5013-fiber-cancel")
>> +
>> +test:plan(2)
>> +
>> +local result = {}
>> +
>> +function test_f()
>> +    local cond = fiber.cond()
>> +    local res, err = pcall(cond.wait, cond)
>> +    result.res = res
>> +    result.err = err
>> +end
>> +
>> +local f = fiber.create(test_f)
>> +f:cancel()
>> +fiber.yield()
>> +
>> +test:ok(result.res == false, tostring(result.res))
>> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
> 
> 10) Use a user-frendly check message (in both checks).
> 
Fixed

>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
>> new file mode 100644
>> index 000000000..eab0a5e4d
>> --- /dev/null
>> +++ b/test/box/net.box_fiber_cancel_gh-4834.result
>> @@ -0,0 +1,65 @@
>> +-- test-run result file version 2
>> +remote = require 'net.box'
>> + | ---
>> + | ...
>> +fiber = require 'fiber'
>> + | ---
>> + | ...
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +
>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
>> +function infinite_call() fiber.channel(1):get() end
>> + | ---
>> + | ...
>> +box.schema.func.create('infinite_call')
>> + | ---
>> + | ...
>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
>> + | ---
>> + | ...
>> +
>> +error_msg = nil
>> + | ---
>> + | ...
>> +test_run:cmd("setopt delimiter ';'")
>> + | ---
>> + | - true
>> + | ...
>> +function gh4834( > +    local cn = remote.connect(box.cfg.listen)
>> +    local f = fiber.new(function()
>> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
>> +    end)
>> +    f:set_joinable(true)
>> +    fiber.yield()
>> +    f:cancel()
>> +    f:join()
>> +    cn:close()
>> +end;
>> + | ---
>> + | ...
>> +test_run:cmd("setopt delimiter ''");
>> + | ---
>> + | - true
>> + | ...
>> +gh4834()
>> + | ---
>> + | ...
>> +error_msg
>> + | ---
>> + | - fiber is cancelled
>> + | ...
>> +box.schema.func.drop('infinite_call')
>> + | ---
>> + | ...
>> +infinite_call = nil
>> + | ---
>> + | ...
>> +channel = nil
>> + | ---
>> + | ...
>> +error_msg = nil
>> + | ---
>> + | ...
>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
>> new file mode 100644
>> index 000000000..06fb3ceac
>> --- /dev/null
>> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
>> @@ -0,0 +1,29 @@
>> +remote = require 'net.box'
>> +fiber = require 'fiber'
>> +test_run = require('test_run').new()
>> +
>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
>> +function infinite_call() fiber.channel(1):get() end
>> +box.schema.func.create('infinite_call')
>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
>> +
>> +error_msg = nil
>> +test_run:cmd("setopt delimiter ';'")
>> +function gh4834()
> 
> 11) I think this is not a self-documenting name for the function.
> 
Renamed.

>> +    local cn = remote.connect(box.cfg.listen)
>> +    local f = fiber.new(function()
>> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
>> +    end)
>> +    f:set_joinable(true)
>> +    fiber.yield()
>> +    f:cancel()
>> +    f:join()
>> +    cn:close()
>> +end;
>> +test_run:cmd("setopt delimiter ''");
>> +gh4834()
>> +error_msg
>> +box.schema.func.drop('infinite_call')
>> +infinite_call = nil
>> +channel = nil
> 
> 12) You didn't introduce the variable "channel". In addition, I looked
> at a couple of tests and didn't see them cleaning up variables. But I
> don't mind.
> 
Removed this one. In general, clean up prevents flaky.

>> +error_msg = nil

=================

Before this patch fiber.cond():wait() just returns for cancelled
fiber. In contrast fiber.channel():get() threw "fiber is
canceled" error.
This patch unify behaviour of channels and condvars and also fixes
related net.box module problem - it was impossible to interrupt
net.box call with fiber.cancel because it used fiber.cond under
the hood. Test cases for both bugs are added.

Closes #4834
Closes #5013

Co-authored-by: Oleg Babin <olegrok@tarantool.org>

@TarantoolBot document
Title: fiber.cond():wait() throws if fiber is cancelled

Currently fiber.cond():wait() throws an error if waiting fiber is
cancelled.
---

Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
Issue: https://github.com/tarantool/tarantool/issues/5013

@Changelog
* fiber.cond().wait() now throws if fiber is cancelled

 src/box/box.cc                                |  6 +-
 src/lib/core/fiber_cond.c                     |  4 ++
 test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
 test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
 .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
 5 files changed, 123 insertions(+), 4 deletions(-)
 create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
 create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
 create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index 18568df3b..9e824453d 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
 {
        double deadline = ev_monotonic_now(loop()) + timeout;
        while (is_box_configured == false || box_is_ro() != ro) {
-               if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
-                       return -1;
-               if (fiber_is_cancelled()) {
-                       diag_set(FiberIsCancelled);
+               if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
+                       if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
                        return -1;
                }
        }
diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
index 904a350d9..0c93c5842 100644
--- a/src/lib/core/fiber_cond.c
+++ b/src/lib/core/fiber_cond.c
@@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
                diag_set(TimedOut);
                return -1;
        }
+       if (fiber_is_cancelled()) {
+               if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled);
+               return -1;
+       }
        return 0;
 }

diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
new file mode 100755
index 000000000..34711fb31
--- /dev/null
+++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
@@ -0,0 +1,23 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+local fiber = require('fiber')
+local test = tap.test("gh-5013-fiber-cancel")
+
+test:plan(2)
+
+local result = {}
+
+function test_f()
+    local cond = fiber.cond()
+    local res, err = pcall(cond.wait, cond)
+    result.res = res
+    result.err = err
+end
+
+local f = fiber.create(test_f)
+f:cancel()
+fiber.yield()
+
+test:ok(result.res == false, tostring(result.res))
+test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
new file mode 100644
index 000000000..eab0a5e4d
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.result
@@ -0,0 +1,65 @@
+-- test-run result file version 2
+remote = require 'net.box'
+ | ---
+ | ...
+fiber = require 'fiber'
+ | ---
+ | ...
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+ | ---
+ | ...
+box.schema.func.create('infinite_call')
+ | ---
+ | ...
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+ | ---
+ | ...
+
+error_msg = nil
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function gh4834()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+gh4834()
+ | ---
+ | ...
+error_msg
+ | ---
+ | - fiber is cancelled
+ | ...
+box.schema.func.drop('infinite_call')
+ | ---
+ | ...
+infinite_call = nil
+ | ---
+ | ...
+channel = nil
+ | ---
+ | ...
+error_msg = nil
+ | ---
+ | ...
diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
new file mode 100644
index 000000000..06fb3ceac
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
@@ -0,0 +1,29 @@
+remote = require 'net.box'
+fiber = require 'fiber'
+test_run = require('test_run').new()
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+box.schema.func.create('infinite_call')
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+
+error_msg = nil
+test_run:cmd("setopt delimiter ';'")
+function gh4834()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+test_run:cmd("setopt delimiter ''");
+gh4834()
+error_msg
+box.schema.func.drop('infinite_call')
+infinite_call = nil
+channel = nil
+error_msg = nil
-- 
2.29.2

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-10 21:16   ` Sergey Ostanevich
@ 2020-11-12 20:15     ` Sergey Ostanevich
  2020-11-13  8:26       ` Leonid Vasiliev
  0 siblings, 1 reply; 23+ messages in thread
From: Sergey Ostanevich @ 2020-11-12 20:15 UTC (permalink / raw)
  To: Leonid Vasiliev; +Cc: tarantool-patches, Vladislav Shpilevoy, alexander.turenko

Sorry, the patch was not updated. Here’s a new one and the branch is updated

commit 42ea06f7cf69ee6b492cefcbb201bc9bba15c686
Author: Sergey Ostanevich <sergos@tarantool.org>
Date:   Tue Nov 3 12:52:26 2020 +0300

    core: handle fiber cancellation for fiber.cond
    
    Before this patch fiber.cond():wait() just returns for cancelled
    fiber. In contrast fiber.channel():get() threw "fiber is
    canceled" error.
    This patch unify behaviour of channels and condvars and also fixes
    related net.box module problem - it was impossible to interrupt
    net.box call with fiber.cancel because it used fiber.cond under
    the hood. Test cases for both bugs are added.
    
    Closes #4834
    Closes #5013
    
    Co-authored-by: Oleg Babin <olegrok@tarantool.org>
    
    @TarantoolBot document
    Title: fiber.cond():wait() throws if fiber is cancelled
    
    Currently fiber.cond():wait() throws an error if waiting fiber is
    cancelled.

---

Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
Issue: https://github.com/tarantool/tarantool/issues/5013

@Changelog
* fiber.cond().wait() now throws if fiber is cancelled

diff --git a/src/box/box.cc b/src/box/box.cc
index 18568df3b..29f74e94b 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout)
 {
 	double deadline = ev_monotonic_now(loop()) + timeout;
 	while (is_box_configured == false || box_is_ro() != ro) {
-		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
-			return -1;
-		if (fiber_is_cancelled()) {
-			diag_set(FiberIsCancelled);
+		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
+			if (fiber_is_cancelled())
+				diag_set(FiberIsCancelled);
 			return -1;
 		}
 	}
diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
index 904a350d9..cc59eaafb 100644
--- a/src/lib/core/fiber_cond.c
+++ b/src/lib/core/fiber_cond.c
@@ -108,6 +108,11 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
 		diag_set(TimedOut);
 		return -1;
 	}
+	if (fiber_is_cancelled()) {
+		if (diag_is_empty(diag_get()))
+                        diag_set(FiberIsCancelled);
+		return -1;
+	}
 	return 0;
 }
 
diff --git a/src/lib/core/fiber_cond.h b/src/lib/core/fiber_cond.h
index 87c6f2ca2..2662e0654 100644
--- a/src/lib/core/fiber_cond.h
+++ b/src/lib/core/fiber_cond.h
@@ -114,7 +114,7 @@ fiber_cond_broadcast(struct fiber_cond *cond);
  * @param cond condition
  * @param timeout timeout in seconds
  * @retval 0 on fiber_cond_signal() call or a spurious wake up
- * @retval -1 on timeout, diag is set to TimedOut
+ * @retval -1 on timeout or fiber cancellation, diag is set
  */
 int
 fiber_cond_wait_timeout(struct fiber_cond *cond, double timeout);
diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
new file mode 100755
index 000000000..ca4ca2c90
--- /dev/null
+++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
@@ -0,0 +1,23 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+local fiber = require('fiber')
+local test = tap.test("gh-5013-fiber-cancel")
+
+test:plan(2)
+
+local result = {}
+
+function test_f()
+    local cond = fiber.cond()
+    local res, err = pcall(cond.wait, cond)
+    result.res = res
+    result.err = err
+end
+
+local f = fiber.create(test_f)
+f:cancel()
+fiber.yield()
+
+test:ok(result.res == false, 'expected result is false')
+test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported')
diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
new file mode 100644
index 000000000..4ed04bb61
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.result
@@ -0,0 +1,62 @@
+-- test-run result file version 2
+remote = require 'net.box'
+ | ---
+ | ...
+fiber = require 'fiber'
+ | ---
+ | ...
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+ | ---
+ | ...
+box.schema.func.create('infinite_call')
+ | ---
+ | ...
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+ | ---
+ | ...
+
+error_msg = nil
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function netbox_runner()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+netbox_runner()
+ | ---
+ | ...
+error_msg
+ | ---
+ | - fiber is cancelled
+ | ...
+box.schema.func.drop('infinite_call')
+ | ---
+ | ...
+infinite_call = nil
+ | ---
+ | ...
+error_msg = nil
+ | ---
+ | ...
diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
new file mode 100644
index 000000000..bc0e5af6e
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
@@ -0,0 +1,28 @@
+remote = require 'net.box'
+fiber = require 'fiber'
+test_run = require('test_run').new()
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+box.schema.func.create('infinite_call')
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+
+error_msg = nil
+test_run:cmd("setopt delimiter ';'")
+function netbox_runner()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+test_run:cmd("setopt delimiter ''");
+netbox_runner()
+error_msg
+box.schema.func.drop('infinite_call')
+infinite_call = nil
+error_msg = nil

======================

> On 11 Nov 2020, at 00:16, Sergey Ostanevich <sergos@tarantool.org> wrote:
> 
> Hi Leonid, thank you for the review!
> 
> I put two parts together to handle them in one patch.
> 
>> On 4 Nov 2020, at 13:00, Leonid Vasiliev <lvasiliev@tarantool.org> wrote:
>> 
>> Hi! Thank you for the patch.
>> See some comments below:
>> 
>> 1) The patch changes undocumented behavior, AFAIU.
>> So, I have a question:"Do you plan to backport the patch to
>> tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the
>> changes. But if the answer is "No" - I will object, because in this case
>> both behaviors must be supported in all modules.
>> 
> 
> Yes, the plan is to push it down to 1.10
> 
>> 2) I think changing the behavior in C doesn't cause much of a problem,
>> because before when you wait without timeout, you don't need to check
>> the return value (it's always 0). But in Lua it will cause the problems,
>> because now throws an error if cancelled and all wait calls should be
>> wrapped to pcall.
> 
> Exactly it is the change to Lua and we intentionally make it, since original
> behaviour is wrong. Good things are: fiber.cond has no big presence in the
> /tarantool sourcebase and the cancellation of fiber itself means the fiber
> has to quit.
> 
>> 
>> On 31.10.2020 19:29, sergos@tarantool.org wrote:
>>> From: Sergey Ostanevich <sergos@tarantool.org>
>>> Hi!
>>> Thanks to Oleg Babin's comment I found there's no need to update any lua
>>> interfaces, since the reason was in C implementation. Also, there is one
>>> place the change is played, so after I fixed it I got complete testing
>>> pass.
>>> Force-pushed branch, v2 patch attached.
>>> Before this patch fiber.cond():wait() just returns for cancelled
>>> fiber. In contrast fiber.channel():get() threw "fiber is
>>> canceled" error.
>>> This patch unify behaviour of channels and condvars and also fixes
>> 
>> 3) behaviour -> behavior.
> 
> Depends on spellchecker - mine sets it to this British version.
> 
>> 
>>> related net.box module problem - it was impossible to interrupt
>>> net.box call with fiber.cancel because it used fiber.cond under
>>> the hood. Test cases for both bugs are added.
>>> Closes #4834
>>> Closes #5013
>>> Co-authored-by: Oleg Babin <olegrok@tarantool.org>
>>> @TarantoolBot document
>>> Title: fiber.cond():wait() throws if fiber is cancelled
>>> Currently fiber.cond():wait() throws an error if waiting fiber is
>>> cancelled like in case with fiber.channel():get().
>> 
>> 4) I don't think it's a good decision adding a comparison with
>> fiber.channel():get() to the documentation. Up to you.
>> 
> 
> I’d agree to you: let’s just put the new behaviour here.
> 
>> 5) Document the changes in module.h.
> 
> Done (need to rebuild, obviously).
> 
>> 
>>> ---
>> 
>> 6) Add @ChangeLog.
>> 
>>> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
>>> Issue: https://github.com/tarantool/tarantool/issues/5013
>>> src/box/box.cc                                |  6 +-
>>> src/lib/core/fiber_cond.c                     |  1 +
>>> test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
>>> test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
>>> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
>>> 5 files changed, 120 insertions(+), 4 deletions(-)
>>> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
>>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
>>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
>> 
>>> diff --git a/src/box/box.cc b/src/box/box.cc
>>> index 18568df3b..9e824453d 100644
>>> --- a/src/box/box.cc
>>> +++ b/src/box/box.cc
>>> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
>>> {
>>> 	double deadline = ev_monotonic_now(loop()) + timeout;
>>> 	while (is_box_configured == false || box_is_ro() != ro) {
>>> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
>>> -			return -1;
>>> -		if (fiber_is_cancelled()) {
>>> -			diag_set(FiberIsCancelled);
>>> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
>>> +			if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
>> 
>> 7) According to https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/
>> it's seems like you are trying to hide something)
>> Use:
>> ```
>> if (condition)
>>   action();
>> ```
>> Instead of:
>> ```
>> if (condition) action();
>> another_action();
>> ```
> 
> Fixed.
> 
>>> 			return -1;
>>> 		}
>>> 	}
>>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
>>> index 904a350d9..0c93c5842 100644
>>> --- a/src/lib/core/fiber_cond.c
>>> +++ b/src/lib/core/fiber_cond.c
>>> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>>> 		diag_set(TimedOut);
>>> 		return -1;
>>> 	}
>>> +	if (fiber_is_cancelled()) {
>>> +		if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled);
>> 
>> 8) The same as previously:
>> ```
>> if (condition)
>>   action();
>> ```
>> 
> 
> Fixed.
> 
>> 9) Checking diag on empty looks strange to me. I think we should add an
>> error to diag without this check. If we want to save the previous error,
>> I suggest to use the stack diag.
> 
> I made this because this interface is used inside the testing machinery, where 
> exact cause of e.g. replica disconnection is reported. If we put the diag here
> unconditionally it will break this use of diag. I believe the stack diag also 
> is not supported there yet.
> 
> 
>>> +		return -1;
>>> +	}
>>> 	return 0;
>>> }
>>> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
>>> new file mode 100755
>>> index 000000000..ae805c5bf
>>> --- /dev/null
>>> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
>>> @@ -0,0 +1,23 @@
>>> +#!/usr/bin/env tarantool
>>> +
>>> +local tap = require('tap')
>>> +local fiber = require('fiber')
>>> +local test = tap.test("gh-5013-fiber-cancel")
>>> +
>>> +test:plan(2)
>>> +
>>> +local result = {}
>>> +
>>> +function test_f()
>>> +    local cond = fiber.cond()
>>> +    local res, err = pcall(cond.wait, cond)
>>> +    result.res = res
>>> +    result.err = err
>>> +end
>>> +
>>> +local f = fiber.create(test_f)
>>> +f:cancel()
>>> +fiber.yield()
>>> +
>>> +test:ok(result.res == false, tostring(result.res))
>>> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
>> 
>> 10) Use a user-frendly check message (in both checks).
>> 
> Fixed
> 
>>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
>>> new file mode 100644
>>> index 000000000..eab0a5e4d
>>> --- /dev/null
>>> +++ b/test/box/net.box_fiber_cancel_gh-4834.result
>>> @@ -0,0 +1,65 @@
>>> +-- test-run result file version 2
>>> +remote = require 'net.box'
>>> + | ---
>>> + | ...
>>> +fiber = require 'fiber'
>>> + | ---
>>> + | ...
>>> +test_run = require('test_run').new()
>>> + | ---
>>> + | ...
>>> +
>>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
>>> +function infinite_call() fiber.channel(1):get() end
>>> + | ---
>>> + | ...
>>> +box.schema.func.create('infinite_call')
>>> + | ---
>>> + | ...
>>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
>>> + | ---
>>> + | ...
>>> +
>>> +error_msg = nil
>>> + | ---
>>> + | ...
>>> +test_run:cmd("setopt delimiter ';'")
>>> + | ---
>>> + | - true
>>> + | ...
>>> +function gh4834( > +    local cn = remote.connect(box.cfg.listen)
>>> +    local f = fiber.new(function()
>>> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
>>> +    end)
>>> +    f:set_joinable(true)
>>> +    fiber.yield()
>>> +    f:cancel()
>>> +    f:join()
>>> +    cn:close()
>>> +end;
>>> + | ---
>>> + | ...
>>> +test_run:cmd("setopt delimiter ''");
>>> + | ---
>>> + | - true
>>> + | ...
>>> +gh4834()
>>> + | ---
>>> + | ...
>>> +error_msg
>>> + | ---
>>> + | - fiber is cancelled
>>> + | ...
>>> +box.schema.func.drop('infinite_call')
>>> + | ---
>>> + | ...
>>> +infinite_call = nil
>>> + | ---
>>> + | ...
>>> +channel = nil
>>> + | ---
>>> + | ...
>>> +error_msg = nil
>>> + | ---
>>> + | ...
>>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
>>> new file mode 100644
>>> index 000000000..06fb3ceac
>>> --- /dev/null
>>> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
>>> @@ -0,0 +1,29 @@
>>> +remote = require 'net.box'
>>> +fiber = require 'fiber'
>>> +test_run = require('test_run').new()
>>> +
>>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
>>> +function infinite_call() fiber.channel(1):get() end
>>> +box.schema.func.create('infinite_call')
>>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
>>> +
>>> +error_msg = nil
>>> +test_run:cmd("setopt delimiter ';'")
>>> +function gh4834()
>> 
>> 11) I think this is not a self-documenting name for the function.
>> 
> Renamed.
> 
>>> +    local cn = remote.connect(box.cfg.listen)
>>> +    local f = fiber.new(function()
>>> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
>>> +    end)
>>> +    f:set_joinable(true)
>>> +    fiber.yield()
>>> +    f:cancel()
>>> +    f:join()
>>> +    cn:close()
>>> +end;
>>> +test_run:cmd("setopt delimiter ''");
>>> +gh4834()
>>> +error_msg
>>> +box.schema.func.drop('infinite_call')
>>> +infinite_call = nil
>>> +channel = nil
>> 
>> 12) You didn't introduce the variable "channel". In addition, I looked
>> at a couple of tests and didn't see them cleaning up variables. But I
>> don't mind.
>> 
> Removed this one. In general, clean up prevents flaky.
> 
>>> +error_msg = nil
> 
> =================
> 
> Before this patch fiber.cond():wait() just returns for cancelled
> fiber. In contrast fiber.channel():get() threw "fiber is
> canceled" error.
> This patch unify behaviour of channels and condvars and also fixes
> related net.box module problem - it was impossible to interrupt
> net.box call with fiber.cancel because it used fiber.cond under
> the hood. Test cases for both bugs are added.
> 
> Closes #4834
> Closes #5013
> 
> Co-authored-by: Oleg Babin <olegrok@tarantool.org>
> 
> @TarantoolBot document
> Title: fiber.cond():wait() throws if fiber is cancelled
> 
> Currently fiber.cond():wait() throws an error if waiting fiber is
> cancelled.
> ---
> 
> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
> Issue: https://github.com/tarantool/tarantool/issues/5013
> 
> @Changelog
> * fiber.cond().wait() now throws if fiber is cancelled
> 
> src/box/box.cc                                |  6 +-
> src/lib/core/fiber_cond.c                     |  4 ++
> test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
> test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
> 5 files changed, 123 insertions(+), 4 deletions(-)
> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 18568df3b..9e824453d 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
> {
>        double deadline = ev_monotonic_now(loop()) + timeout;
>        while (is_box_configured == false || box_is_ro() != ro) {
> -               if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
> -                       return -1;
> -               if (fiber_is_cancelled()) {
> -                       diag_set(FiberIsCancelled);
> +               if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
> +                       if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
>                        return -1;
>                }
>        }
> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
> index 904a350d9..0c93c5842 100644
> --- a/src/lib/core/fiber_cond.c
> +++ b/src/lib/core/fiber_cond.c
> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>                diag_set(TimedOut);
>                return -1;
>        }
> +       if (fiber_is_cancelled()) {
> +               if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled);
> +               return -1;
> +       }
>        return 0;
> }
> 
> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
> new file mode 100755
> index 000000000..34711fb31
> --- /dev/null
> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
> @@ -0,0 +1,23 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +local fiber = require('fiber')
> +local test = tap.test("gh-5013-fiber-cancel")
> +
> +test:plan(2)
> +
> +local result = {}
> +
> +function test_f()
> +    local cond = fiber.cond()
> +    local res, err = pcall(cond.wait, cond)
> +    result.res = res
> +    result.err = err
> +end
> +
> +local f = fiber.create(test_f)
> +f:cancel()
> +fiber.yield()
> +
> +test:ok(result.res == false, tostring(result.res))
> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
> new file mode 100644
> index 000000000..eab0a5e4d
> --- /dev/null
> +++ b/test/box/net.box_fiber_cancel_gh-4834.result
> @@ -0,0 +1,65 @@
> +-- test-run result file version 2
> +remote = require 'net.box'
> + | ---
> + | ...
> +fiber = require 'fiber'
> + | ---
> + | ...
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> +function infinite_call() fiber.channel(1):get() end
> + | ---
> + | ...
> +box.schema.func.create('infinite_call')
> + | ---
> + | ...
> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> + | ---
> + | ...
> +
> +error_msg = nil
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ';'")
> + | ---
> + | - true
> + | ...
> +function gh4834()
> +    local cn = remote.connect(box.cfg.listen)
> +    local f = fiber.new(function()
> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> +    end)
> +    f:set_joinable(true)
> +    fiber.yield()
> +    f:cancel()
> +    f:join()
> +    cn:close()
> +end;
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ''");
> + | ---
> + | - true
> + | ...
> +gh4834()
> + | ---
> + | ...
> +error_msg
> + | ---
> + | - fiber is cancelled
> + | ...
> +box.schema.func.drop('infinite_call')
> + | ---
> + | ...
> +infinite_call = nil
> + | ---
> + | ...
> +channel = nil
> + | ---
> + | ...
> +error_msg = nil
> + | ---
> + | ...
> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> new file mode 100644
> index 000000000..06fb3ceac
> --- /dev/null
> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> @@ -0,0 +1,29 @@
> +remote = require 'net.box'
> +fiber = require 'fiber'
> +test_run = require('test_run').new()
> +
> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> +function infinite_call() fiber.channel(1):get() end
> +box.schema.func.create('infinite_call')
> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> +
> +error_msg = nil
> +test_run:cmd("setopt delimiter ';'")
> +function gh4834()
> +    local cn = remote.connect(box.cfg.listen)
> +    local f = fiber.new(function()
> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> +    end)
> +    f:set_joinable(true)
> +    fiber.yield()
> +    f:cancel()
> +    f:join()
> +    cn:close()
> +end;
> +test_run:cmd("setopt delimiter ''");
> +gh4834()
> +error_msg
> +box.schema.func.drop('infinite_call')
> +infinite_call = nil
> +channel = nil
> +error_msg = nil
> -- 
> 2.29.2

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-12 20:15     ` Sergey Ostanevich
@ 2020-11-13  8:26       ` Leonid Vasiliev
  2020-11-30 22:49         ` Alexander Turenko
  0 siblings, 1 reply; 23+ messages in thread
From: Leonid Vasiliev @ 2020-11-13  8:26 UTC (permalink / raw)
  To: Sergey Ostanevich
  Cc: tarantool-patches, Vladislav Shpilevoy, alexander.turenko

Hi!
Generally - LGTM.
Yet another iteration of the review with me is not needed. But please
see a comment on diag_set under the condition.

On 12.11.2020 23:15, Sergey Ostanevich wrote:
> Sorry, the patch was not updated. Here’s a new one and the branch is updated
> 
> commit 42ea06f7cf69ee6b492cefcbb201bc9bba15c686
> Author: Sergey Ostanevich <sergos@tarantool.org>
> Date:   Tue Nov 3 12:52:26 2020 +0300
> 
>      core: handle fiber cancellation for fiber.cond
>      
>      Before this patch fiber.cond():wait() just returns for cancelled
>      fiber. In contrast fiber.channel():get() threw "fiber is
>      canceled" error.
>      This patch unify behaviour of channels and condvars and also fixes
>      related net.box module problem - it was impossible to interrupt
>      net.box call with fiber.cancel because it used fiber.cond under
>      the hood. Test cases for both bugs are added.
>      
>      Closes #4834
>      Closes #5013
>      
>      Co-authored-by: Oleg Babin <olegrok@tarantool.org>
>      
>      @TarantoolBot document
>      Title: fiber.cond():wait() throws if fiber is cancelled
>      
>      Currently fiber.cond():wait() throws an error if waiting fiber is
>      cancelled.
> 
> ---
> 
> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
> Issue: https://github.com/tarantool/tarantool/issues/5013
> 
> @Changelog
> * fiber.cond().wait() now throws if fiber is cancelled

fiber.cond():wait()

> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 18568df3b..29f74e94b 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout)
>   {
>   	double deadline = ev_monotonic_now(loop()) + timeout;
>   	while (is_box_configured == false || box_is_ro() != ro) {
> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
> -			return -1;
> -		if (fiber_is_cancelled()) {
> -			diag_set(FiberIsCancelled);
> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
> +			if (fiber_is_cancelled())
> +				diag_set(FiberIsCancelled);
>   			return -1;
>   		}
>   	}
> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
> index 904a350d9..cc59eaafb 100644
> --- a/src/lib/core/fiber_cond.c
> +++ b/src/lib/core/fiber_cond.c
> @@ -108,6 +108,11 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>   		diag_set(TimedOut);
>   		return -1;
>   	}
> +	if (fiber_is_cancelled()) {
> +		if (diag_is_empty(diag_get()))
> +                        diag_set(FiberIsCancelled);

I read your argumentation. AFAIU sounds like:"We have a lot of tests
that check an error in the diag in such case. Also, some user
applications can check the diag in a similar way. And stack diag can be 
used only in the new versions of tarantool." So, it's true.
What I propose to think about:
- this is not simple logic. If a fiber has been cancelled, maybe an 
error will be added to the diag, maybe no - 50/50.
- this behavior should be documented.
- ask Mons or Turenko for advice: "What do they think about it?"
All this up to you.


> +		return -1;
> +	}
>   	return 0;
>   }
>   
> diff --git a/src/lib/core/fiber_cond.h b/src/lib/core/fiber_cond.h
> index 87c6f2ca2..2662e0654 100644
> --- a/src/lib/core/fiber_cond.h
> +++ b/src/lib/core/fiber_cond.h
> @@ -114,7 +114,7 @@ fiber_cond_broadcast(struct fiber_cond *cond);
>    * @param cond condition
>    * @param timeout timeout in seconds
>    * @retval 0 on fiber_cond_signal() call or a spurious wake up
> - * @retval -1 on timeout, diag is set to TimedOut
> + * @retval -1 on timeout or fiber cancellation, diag is set
>    */
>   int
>   fiber_cond_wait_timeout(struct fiber_cond *cond, double timeout);
> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
> new file mode 100755
> index 000000000..ca4ca2c90
> --- /dev/null
> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
> @@ -0,0 +1,23 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +local fiber = require('fiber')
> +local test = tap.test("gh-5013-fiber-cancel")
> +
> +test:plan(2)
> +
> +local result = {}
> +
> +function test_f()
> +    local cond = fiber.cond()
> +    local res, err = pcall(cond.wait, cond)
> +    result.res = res
> +    result.err = err
> +end
> +
> +local f = fiber.create(test_f)
> +f:cancel()
> +fiber.yield()
> +
> +test:ok(result.res == false, 'expected result is false')
> +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported')
> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
> new file mode 100644
> index 000000000..4ed04bb61
> --- /dev/null
> +++ b/test/box/net.box_fiber_cancel_gh-4834.result
> @@ -0,0 +1,62 @@
> +-- test-run result file version 2
> +remote = require 'net.box'
> + | ---
> + | ...
> +fiber = require 'fiber'
> + | ---
> + | ...
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> +function infinite_call() fiber.channel(1):get() end
> + | ---
> + | ...
> +box.schema.func.create('infinite_call')
> + | ---
> + | ...
> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> + | ---
> + | ...
> +
> +error_msg = nil
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ';'")
> + | ---
> + | - true
> + | ...
> +function netbox_runner()
> +    local cn = remote.connect(box.cfg.listen)
> +    local f = fiber.new(function()
> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> +    end)
> +    f:set_joinable(true)
> +    fiber.yield()
> +    f:cancel()
> +    f:join()
> +    cn:close()
> +end;
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ''");
> + | ---
> + | - true
> + | ...
> +netbox_runner()
> + | ---
> + | ...
> +error_msg
> + | ---
> + | - fiber is cancelled
> + | ...
> +box.schema.func.drop('infinite_call')
> + | ---
> + | ...
> +infinite_call = nil
> + | ---
> + | ...
> +error_msg = nil
> + | ---
> + | ...
> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> new file mode 100644
> index 000000000..bc0e5af6e
> --- /dev/null
> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> @@ -0,0 +1,28 @@
> +remote = require 'net.box'
> +fiber = require 'fiber'
> +test_run = require('test_run').new()
> +
> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> +function infinite_call() fiber.channel(1):get() end
> +box.schema.func.create('infinite_call')
> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> +
> +error_msg = nil
> +test_run:cmd("setopt delimiter ';'")
> +function netbox_runner()
> +    local cn = remote.connect(box.cfg.listen)
> +    local f = fiber.new(function()
> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> +    end)
> +    f:set_joinable(true)
> +    fiber.yield()
> +    f:cancel()
> +    f:join()
> +    cn:close()
> +end;
> +test_run:cmd("setopt delimiter ''");
> +netbox_runner()
> +error_msg
> +box.schema.func.drop('infinite_call')
> +infinite_call = nil
> +error_msg = nil
> 
> ======================
> 
>> On 11 Nov 2020, at 00:16, Sergey Ostanevich <sergos@tarantool.org> wrote:
>>
>> Hi Leonid, thank you for the review!
>>
>> I put two parts together to handle them in one patch.
>>
>>> On 4 Nov 2020, at 13:00, Leonid Vasiliev <lvasiliev@tarantool.org> wrote:
>>>
>>> Hi! Thank you for the patch.
>>> See some comments below:
>>>
>>> 1) The patch changes undocumented behavior, AFAIU.
>>> So, I have a question:"Do you plan to backport the patch to
>>> tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the
>>> changes. But if the answer is "No" - I will object, because in this case
>>> both behaviors must be supported in all modules.
>>>
>>
>> Yes, the plan is to push it down to 1.10
>>
>>> 2) I think changing the behavior in C doesn't cause much of a problem,
>>> because before when you wait without timeout, you don't need to check
>>> the return value (it's always 0). But in Lua it will cause the problems,
>>> because now throws an error if cancelled and all wait calls should be
>>> wrapped to pcall.
>>
>> Exactly it is the change to Lua and we intentionally make it, since original
>> behaviour is wrong. Good things are: fiber.cond has no big presence in the
>> /tarantool sourcebase and the cancellation of fiber itself means the fiber
>> has to quit.
>>
>>>
>>> On 31.10.2020 19:29, sergos@tarantool.org wrote:
>>>> From: Sergey Ostanevich <sergos@tarantool.org>
>>>> Hi!
>>>> Thanks to Oleg Babin's comment I found there's no need to update any lua
>>>> interfaces, since the reason was in C implementation. Also, there is one
>>>> place the change is played, so after I fixed it I got complete testing
>>>> pass.
>>>> Force-pushed branch, v2 patch attached.
>>>> Before this patch fiber.cond():wait() just returns for cancelled
>>>> fiber. In contrast fiber.channel():get() threw "fiber is
>>>> canceled" error.
>>>> This patch unify behaviour of channels and condvars and also fixes
>>>
>>> 3) behaviour -> behavior.
>>
>> Depends on spellchecker - mine sets it to this British version.
>>
>>>
>>>> related net.box module problem - it was impossible to interrupt
>>>> net.box call with fiber.cancel because it used fiber.cond under
>>>> the hood. Test cases for both bugs are added.
>>>> Closes #4834
>>>> Closes #5013
>>>> Co-authored-by: Oleg Babin <olegrok@tarantool.org>
>>>> @TarantoolBot document
>>>> Title: fiber.cond():wait() throws if fiber is cancelled
>>>> Currently fiber.cond():wait() throws an error if waiting fiber is
>>>> cancelled like in case with fiber.channel():get().
>>>
>>> 4) I don't think it's a good decision adding a comparison with
>>> fiber.channel():get() to the documentation. Up to you.
>>>
>>
>> I’d agree to you: let’s just put the new behaviour here.
>>
>>> 5) Document the changes in module.h.
>>
>> Done (need to rebuild, obviously).
>>
>>>
>>>> ---
>>>
>>> 6) Add @ChangeLog.
>>>
>>>> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
>>>> Issue: https://github.com/tarantool/tarantool/issues/5013
>>>> src/box/box.cc                                |  6 +-
>>>> src/lib/core/fiber_cond.c                     |  1 +
>>>> test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
>>>> test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
>>>> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
>>>> 5 files changed, 120 insertions(+), 4 deletions(-)
>>>> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
>>>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
>>>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
>>>
>>>> diff --git a/src/box/box.cc b/src/box/box.cc
>>>> index 18568df3b..9e824453d 100644
>>>> --- a/src/box/box.cc
>>>> +++ b/src/box/box.cc
>>>> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
>>>> {
>>>> 	double deadline = ev_monotonic_now(loop()) + timeout;
>>>> 	while (is_box_configured == false || box_is_ro() != ro) {
>>>> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
>>>> -			return -1;
>>>> -		if (fiber_is_cancelled()) {
>>>> -			diag_set(FiberIsCancelled);
>>>> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
>>>> +			if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
>>>
>>> 7) According to https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/
>>> it's seems like you are trying to hide something)
>>> Use:
>>> ```
>>> if (condition)
>>>    action();
>>> ```
>>> Instead of:
>>> ```
>>> if (condition) action();
>>> another_action();
>>> ```
>>
>> Fixed.
>>
>>>> 			return -1;
>>>> 		}
>>>> 	}
>>>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
>>>> index 904a350d9..0c93c5842 100644
>>>> --- a/src/lib/core/fiber_cond.c
>>>> +++ b/src/lib/core/fiber_cond.c
>>>> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>>>> 		diag_set(TimedOut);
>>>> 		return -1;
>>>> 	}
>>>> +	if (fiber_is_cancelled()) {
>>>> +		if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled);
>>>
>>> 8) The same as previously:
>>> ```
>>> if (condition)
>>>    action();
>>> ```
>>>
>>
>> Fixed.
>>
>>> 9) Checking diag on empty looks strange to me. I think we should add an
>>> error to diag without this check. If we want to save the previous error,
>>> I suggest to use the stack diag.
>>
>> I made this because this interface is used inside the testing machinery, where
>> exact cause of e.g. replica disconnection is reported. If we put the diag here
>> unconditionally it will break this use of diag. I believe the stack diag also
>> is not supported there yet.
>>
>>
>>>> +		return -1;
>>>> +	}
>>>> 	return 0;
>>>> }
>>>> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
>>>> new file mode 100755
>>>> index 000000000..ae805c5bf
>>>> --- /dev/null
>>>> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
>>>> @@ -0,0 +1,23 @@
>>>> +#!/usr/bin/env tarantool
>>>> +
>>>> +local tap = require('tap')
>>>> +local fiber = require('fiber')
>>>> +local test = tap.test("gh-5013-fiber-cancel")
>>>> +
>>>> +test:plan(2)
>>>> +
>>>> +local result = {}
>>>> +
>>>> +function test_f()
>>>> +    local cond = fiber.cond()
>>>> +    local res, err = pcall(cond.wait, cond)
>>>> +    result.res = res
>>>> +    result.err = err
>>>> +end
>>>> +
>>>> +local f = fiber.create(test_f)
>>>> +f:cancel()
>>>> +fiber.yield()
>>>> +
>>>> +test:ok(result.res == false, tostring(result.res))
>>>> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
>>>
>>> 10) Use a user-frendly check message (in both checks).
>>>
>> Fixed
>>
>>>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
>>>> new file mode 100644
>>>> index 000000000..eab0a5e4d
>>>> --- /dev/null
>>>> +++ b/test/box/net.box_fiber_cancel_gh-4834.result
>>>> @@ -0,0 +1,65 @@
>>>> +-- test-run result file version 2
>>>> +remote = require 'net.box'
>>>> + | ---
>>>> + | ...
>>>> +fiber = require 'fiber'
>>>> + | ---
>>>> + | ...
>>>> +test_run = require('test_run').new()
>>>> + | ---
>>>> + | ...
>>>> +
>>>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
>>>> +function infinite_call() fiber.channel(1):get() end
>>>> + | ---
>>>> + | ...
>>>> +box.schema.func.create('infinite_call')
>>>> + | ---
>>>> + | ...
>>>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
>>>> + | ---
>>>> + | ...
>>>> +
>>>> +error_msg = nil
>>>> + | ---
>>>> + | ...
>>>> +test_run:cmd("setopt delimiter ';'")
>>>> + | ---
>>>> + | - true
>>>> + | ...
>>>> +function gh4834( > +    local cn = remote.connect(box.cfg.listen)
>>>> +    local f = fiber.new(function()
>>>> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
>>>> +    end)
>>>> +    f:set_joinable(true)
>>>> +    fiber.yield()
>>>> +    f:cancel()
>>>> +    f:join()
>>>> +    cn:close()
>>>> +end;
>>>> + | ---
>>>> + | ...
>>>> +test_run:cmd("setopt delimiter ''");
>>>> + | ---
>>>> + | - true
>>>> + | ...
>>>> +gh4834()
>>>> + | ---
>>>> + | ...
>>>> +error_msg
>>>> + | ---
>>>> + | - fiber is cancelled
>>>> + | ...
>>>> +box.schema.func.drop('infinite_call')
>>>> + | ---
>>>> + | ...
>>>> +infinite_call = nil
>>>> + | ---
>>>> + | ...
>>>> +channel = nil
>>>> + | ---
>>>> + | ...
>>>> +error_msg = nil
>>>> + | ---
>>>> + | ...
>>>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
>>>> new file mode 100644
>>>> index 000000000..06fb3ceac
>>>> --- /dev/null
>>>> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
>>>> @@ -0,0 +1,29 @@
>>>> +remote = require 'net.box'
>>>> +fiber = require 'fiber'
>>>> +test_run = require('test_run').new()
>>>> +
>>>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
>>>> +function infinite_call() fiber.channel(1):get() end
>>>> +box.schema.func.create('infinite_call')
>>>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
>>>> +
>>>> +error_msg = nil
>>>> +test_run:cmd("setopt delimiter ';'")
>>>> +function gh4834()
>>>
>>> 11) I think this is not a self-documenting name for the function.
>>>
>> Renamed.
>>
>>>> +    local cn = remote.connect(box.cfg.listen)
>>>> +    local f = fiber.new(function()
>>>> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
>>>> +    end)
>>>> +    f:set_joinable(true)
>>>> +    fiber.yield()
>>>> +    f:cancel()
>>>> +    f:join()
>>>> +    cn:close()
>>>> +end;
>>>> +test_run:cmd("setopt delimiter ''");
>>>> +gh4834()
>>>> +error_msg
>>>> +box.schema.func.drop('infinite_call')
>>>> +infinite_call = nil
>>>> +channel = nil
>>>
>>> 12) You didn't introduce the variable "channel". In addition, I looked
>>> at a couple of tests and didn't see them cleaning up variables. But I
>>> don't mind.
>>>
>> Removed this one. In general, clean up prevents flaky.
>>
>>>> +error_msg = nil
>>
>> =================
>>
>> Before this patch fiber.cond():wait() just returns for cancelled
>> fiber. In contrast fiber.channel():get() threw "fiber is
>> canceled" error.
>> This patch unify behaviour of channels and condvars and also fixes
>> related net.box module problem - it was impossible to interrupt
>> net.box call with fiber.cancel because it used fiber.cond under
>> the hood. Test cases for both bugs are added.
>>
>> Closes #4834
>> Closes #5013
>>
>> Co-authored-by: Oleg Babin <olegrok@tarantool.org>
>>
>> @TarantoolBot document
>> Title: fiber.cond():wait() throws if fiber is cancelled
>>
>> Currently fiber.cond():wait() throws an error if waiting fiber is
>> cancelled.
>> ---
>>
>> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
>> Issue: https://github.com/tarantool/tarantool/issues/5013
>>
>> @Changelog
>> * fiber.cond().wait() now throws if fiber is cancelled
>>
>> src/box/box.cc                                |  6 +-
>> src/lib/core/fiber_cond.c                     |  4 ++
>> test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
>> test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
>> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
>> 5 files changed, 123 insertions(+), 4 deletions(-)
>> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
>>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 18568df3b..9e824453d 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
>> {
>>         double deadline = ev_monotonic_now(loop()) + timeout;
>>         while (is_box_configured == false || box_is_ro() != ro) {
>> -               if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
>> -                       return -1;
>> -               if (fiber_is_cancelled()) {
>> -                       diag_set(FiberIsCancelled);
>> +               if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
>> +                       if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
>>                         return -1;
>>                 }
>>         }
>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
>> index 904a350d9..0c93c5842 100644
>> --- a/src/lib/core/fiber_cond.c
>> +++ b/src/lib/core/fiber_cond.c
>> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>>                 diag_set(TimedOut);
>>                 return -1;
>>         }
>> +       if (fiber_is_cancelled()) {
>> +               if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled);
>> +               return -1;
>> +       }
>>         return 0;
>> }
>>
>> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
>> new file mode 100755
>> index 000000000..34711fb31
>> --- /dev/null
>> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
>> @@ -0,0 +1,23 @@
>> +#!/usr/bin/env tarantool
>> +
>> +local tap = require('tap')
>> +local fiber = require('fiber')
>> +local test = tap.test("gh-5013-fiber-cancel")
>> +
>> +test:plan(2)
>> +
>> +local result = {}
>> +
>> +function test_f()
>> +    local cond = fiber.cond()
>> +    local res, err = pcall(cond.wait, cond)
>> +    result.res = res
>> +    result.err = err
>> +end
>> +
>> +local f = fiber.create(test_f)
>> +f:cancel()
>> +fiber.yield()
>> +
>> +test:ok(result.res == false, tostring(result.res))
>> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
>> new file mode 100644
>> index 000000000..eab0a5e4d
>> --- /dev/null
>> +++ b/test/box/net.box_fiber_cancel_gh-4834.result
>> @@ -0,0 +1,65 @@
>> +-- test-run result file version 2
>> +remote = require 'net.box'
>> + | ---
>> + | ...
>> +fiber = require 'fiber'
>> + | ---
>> + | ...
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +
>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
>> +function infinite_call() fiber.channel(1):get() end
>> + | ---
>> + | ...
>> +box.schema.func.create('infinite_call')
>> + | ---
>> + | ...
>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
>> + | ---
>> + | ...
>> +
>> +error_msg = nil
>> + | ---
>> + | ...
>> +test_run:cmd("setopt delimiter ';'")
>> + | ---
>> + | - true
>> + | ...
>> +function gh4834()
>> +    local cn = remote.connect(box.cfg.listen)
>> +    local f = fiber.new(function()
>> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
>> +    end)
>> +    f:set_joinable(true)
>> +    fiber.yield()
>> +    f:cancel()
>> +    f:join()
>> +    cn:close()
>> +end;
>> + | ---
>> + | ...
>> +test_run:cmd("setopt delimiter ''");
>> + | ---
>> + | - true
>> + | ...
>> +gh4834()
>> + | ---
>> + | ...
>> +error_msg
>> + | ---
>> + | - fiber is cancelled
>> + | ...
>> +box.schema.func.drop('infinite_call')
>> + | ---
>> + | ...
>> +infinite_call = nil
>> + | ---
>> + | ...
>> +channel = nil
>> + | ---
>> + | ...
>> +error_msg = nil
>> + | ---
>> + | ...
>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
>> new file mode 100644
>> index 000000000..06fb3ceac
>> --- /dev/null
>> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
>> @@ -0,0 +1,29 @@
>> +remote = require 'net.box'
>> +fiber = require 'fiber'
>> +test_run = require('test_run').new()
>> +
>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
>> +function infinite_call() fiber.channel(1):get() end
>> +box.schema.func.create('infinite_call')
>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
>> +
>> +error_msg = nil
>> +test_run:cmd("setopt delimiter ';'")
>> +function gh4834()
>> +    local cn = remote.connect(box.cfg.listen)
>> +    local f = fiber.new(function()
>> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
>> +    end)
>> +    f:set_joinable(true)
>> +    fiber.yield()
>> +    f:cancel()
>> +    f:join()
>> +    cn:close()
>> +end;
>> +test_run:cmd("setopt delimiter ''");
>> +gh4834()
>> +error_msg
>> +box.schema.func.drop('infinite_call')
>> +infinite_call = nil
>> +channel = nil
>> +error_msg = nil
>> -- 
>> 2.29.2
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos
  2020-11-01 10:13 ` Oleg Babin
  2020-11-04 10:00 ` Leonid Vasiliev
@ 2020-11-16 22:12 ` Vladislav Shpilevoy
  2020-11-25 21:32 ` Vladislav Shpilevoy
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-16 22:12 UTC (permalink / raw)
  To: sergos, tarantool-patches; +Cc: alexander.turenko

Hi! Thanks for the patch!

Please, change subsystem name to 'fiber:'. 'core:' is too general.
We have tons of stuff in libcore in addition to fibers.

> Before this patch fiber.cond():wait() just returns for cancelled
> fiber. In contrast fiber.channel():get() threw "fiber is
> canceled" error.
> This patch unify behaviour of channels and condvars and also fixes
> related net.box module problem - it was impossible to interrupt
> net.box call with fiber.cancel because it used fiber.cond under
> the hood. Test cases for both bugs are added.

Netbox hangs not because of using fiber.cond. But because it never
calls testcancel(). Normally all looped fibers should do that.

Talking of compatibility - I think it always was supposed to throw.
luaT_fiber_cond_wait() calls luaL_testcancel(), but only when
fiber_cond_wait_timeout() returns not 0, which was never the case for
cancellation. So it was supposed to throw, but nobody covered it with
a test.

See 2 comments below.

> Closes #4834
> Closes #5013
> 
> Co-authored-by: Oleg Babin <olegrok@tarantool.org>
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 18568df3b..29f74e94b 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout)
>  {
>  	double deadline = ev_monotonic_now(loop()) + timeout;
>  	while (is_box_configured == false || box_is_ro() != ro) {
> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
> -			return -1;
> -		if (fiber_is_cancelled()) {
> -			diag_set(FiberIsCancelled);
> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
> +			if (fiber_is_cancelled())
> +				diag_set(FiberIsCancelled);
>  			return -1;
>  		}
>  	}
> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
> index 904a350d9..cc59eaafb 100644
> --- a/src/lib/core/fiber_cond.c
> +++ b/src/lib/core/fiber_cond.c
> @@ -108,6 +108,11 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>  		diag_set(TimedOut);
>  		return -1;
>  	}
> +	if (fiber_is_cancelled()) {
> +		if (diag_is_empty(diag_get()))
> +                        diag_set(FiberIsCancelled);
> +		return -1;

1. Wtf? Why don't you set an error, when this is an error? And why do
you do exactly the same in box_wait_ro() above?

> +	}
>  	return 0;
>  }> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
> new file mode 100644
> index 000000000..4ed04bb61
> --- /dev/null
> +++ b/test/box/net.box_fiber_cancel_gh-4834.result

2. This does not conform to our test file name pattern. Read this
carefully: https://github.com/tarantool/tarantool/wiki/Code-review-procedure#testing

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-03 10:20   ` Sergey Ostanevich
  2020-11-03 10:27     ` Oleg Babin
  2020-11-04 10:00     ` Leonid Vasiliev
@ 2020-11-16 22:12     ` Vladislav Shpilevoy
  2020-11-18 22:05       ` Sergey Ostanevich
  2 siblings, 1 reply; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-16 22:12 UTC (permalink / raw)
  To: Sergey Ostanevich, Oleg Babin; +Cc: tarantool-patches, alexander.turenko

On 03.11.2020 11:20, Sergey Ostanevich wrote:
> Hi Oleg!
> 
> I believe the point about 'consistency' is not valid here. I put a
> simple check that if diag is already set, then print it out. For the
> fiber_cond_wait_timeout() it happened multiple times with various
> reports, inlcuding this one:
> 
> 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not
> set the DIAG to FiberIsCancelled, original diag: Missing .xlog file
> between LSN 5 {1: 5} and 6 {1: 6}
> 
> that is used in the test system:
> 
> test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status =
> 'loading'})
> 
> So, my resolution will be: it is wrong to set a diag in an arbitrary
> place, without clear understanting of the reason. This is the case for
> the cond_wait machinery, since it doesn't know _why_ the fiber is
> cancelled.

It is a wrong resolution, IMO. You just hacked cond wait not to change the
other places. It is not about tests. Tests only show what is provided by the
internal subsystems. And if they depend on fiber cond not setting diag in
case of a fail, then it looks wrong.

I suggest you to fix the usage places, where the caller code thinks that
cond_wait never sets a diag on cancellation.

If a function fails, we set a diag. It is not a thing we do optionally.
Otherwise you make it a bit simpler in this patch, but make it harder to
work with the cond in future.

Talking of your statement:

	I believe the stack diag also is not supported there yet.

It is supported on the level of lib/core, i.e. everywhere. But is not
present on 1.10. However it is not the point. The point is that it is not
needed here.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-16 22:12     ` Vladislav Shpilevoy
@ 2020-11-18 22:05       ` Sergey Ostanevich
  2020-11-22 16:01         ` Vladislav Shpilevoy
  0 siblings, 1 reply; 23+ messages in thread
From: Sergey Ostanevich @ 2020-11-18 22:05 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, alexander.turenko

Hi Vlad!

I put your comments and my patch from the second mail here to keep one
thread - see below.

Thanks,
Sergos


> On 17 Nov 2020, at 01:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> On 03.11.2020 11:20, Sergey Ostanevich wrote:
>> Hi Oleg!
>> 
>> I believe the point about 'consistency' is not valid here. I put a
>> simple check that if diag is already set, then print it out. For the
>> fiber_cond_wait_timeout() it happened multiple times with various
>> reports, inlcuding this one:
>> 
>> 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not
>> set the DIAG to FiberIsCancelled, original diag: Missing .xlog file
>> between LSN 5 {1: 5} and 6 {1: 6}
>> 
>> that is used in the test system:
>> 
>> test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status =
>> 'loading'})
>> 
>> So, my resolution will be: it is wrong to set a diag in an arbitrary
>> place, without clear understanting of the reason. This is the case for
>> the cond_wait machinery, since it doesn't know _why_ the fiber is
>> cancelled.
> 
> It is a wrong resolution, IMO. You just hacked cond wait not to change the
> other places. It is not about tests. Tests only show what is provided by the
> internal subsystems. And if they depend on fiber cond not setting diag in
> case of a fail, then it looks wrong.
> 

Actually I didn’t make fiber_cond not setting a diag, rather preserve the
original one if it is present. As for correct or not - you’re way more savvy
in Tarantool sources than I, so it’s hard for me to oppose. Still I’d try.

For the case of the test I see the relay_process_wal_event() catches from
the recover_remaining_wals() and sets the XlogGapError for the relay. Note,
the relay_set_error() uses exactly the same semantics - sets the error in
case it was not set before. Then it happily cancels the fiber and returns
and eventually appears in relay_subscribe() where it throws (via diag_raise)
the latest diag of the fiber. It appears, that along the way there are 
multiple places the fiber diag can be reset, so the best I can propose is
to use the relay’s diag. Also, to enforce this strategy we need a follow-up
ticket to ensure the relay’s diag is always set by the relay_subscribe() 
exit.

> I suggest you to fix the usage places, where the caller code thinks that
> cond_wait never sets a diag on cancellation.
> 
> If a function fails, we set a diag. It is not a thing we do optionally.
> Otherwise you make it a bit simpler in this patch, but make it harder to
> work with the cond in future.
> 
> Talking of your statement:
> 
> 	I believe the stack diag also is not supported there yet.
> 
> It is supported on the level of lib/core, i.e. everywhere. But is not
> present on 1.10. However it is not the point. The point is that it is not
> needed here.


> On 17 Nov 2020, at 01:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> Hi! Thanks for the patch!
> 
> Please, change subsystem name to 'fiber:'. 'core:' is too general.
> We have tons of stuff in libcore in addition to fibers.
> 
Done.

>> Before this patch fiber.cond():wait() just returns for cancelled
>> fiber. In contrast fiber.channel():get() threw "fiber is
>> canceled" error.
>> This patch unify behaviour of channels and condvars and also fixes
>> related net.box module problem - it was impossible to interrupt
>> net.box call with fiber.cancel because it used fiber.cond under
>> the hood. Test cases for both bugs are added.
> 
> Netbox hangs not because of using fiber.cond. But because it never
> calls testcancel(). Normally all looped fibers should do that.
> 
I believe the fix is an indirect result, since all 3 wait() calls in
net_box.lua aren't covered with a pcall() - it errors out. I reproduced
it with the following:

--- a/src/box/lua/net_box.lua
+++ b/src/box/lua/net_box.lua
@@ -414,7 +417,11 @@ local function create_transport(host, port, user, password, callback,
             -- waiting client is waked up prematurely.
             while timeout > 0 and not self:is_ready() do
                 local ts = fiber.clock()
-                self.cond:wait(timeout)
+                local ok, err = pcall(self.cond.wait, self.cond, timeout)
+                if (not ok) then
+                    print('net_box.lua:423 thrown, err: ' .. tostring(err))
+                    error(err)
+                end
                 timeout = timeout - (fiber.clock() - ts)
             end
             if not self:is_ready() then

Note, that if I don’t put the explicit error() there, the test from the
patch hangs. 
To me it looks like testcancel(), since it raises error the same way. 
Although, there could be more places in net_box where testcancel() can be 
called - still it is beyond this ticket.

> Talking of compatibility - I think it always was supposed to throw.
> luaT_fiber_cond_wait() calls luaL_testcancel(), but only when
> fiber_cond_wait_timeout() returns not 0, which was never the case for
> cancellation. So it was supposed to throw, but nobody covered it with
> a test.
> 
> See 2 comments below.
> 
>> Closes #4834
>> Closes #5013
>> 
>> Co-authored-by: Oleg Babin <olegrok@tarantool.org>
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 18568df3b..29f74e94b 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout)
>> {
>> 	double deadline = ev_monotonic_now(loop()) + timeout;
>> 	while (is_box_configured == false || box_is_ro() != ro) {
>> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
>> -			return -1;
>> -		if (fiber_is_cancelled()) {
>> -			diag_set(FiberIsCancelled);
>> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
>> +			if (fiber_is_cancelled())
>> +				diag_set(FiberIsCancelled);
>> 			return -1;
>> 		}
>> 	}
>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
>> index 904a350d9..cc59eaafb 100644
>> --- a/src/lib/core/fiber_cond.c
>> +++ b/src/lib/core/fiber_cond.c
>> @@ -108,6 +108,11 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>> 		diag_set(TimedOut);
>> 		return -1;
>> 	}
>> +	if (fiber_is_cancelled()) {
>> +		if (diag_is_empty(diag_get()))
>> +                        diag_set(FiberIsCancelled);
>> +		return -1;
> 
> 1. Wtf? Why don't you set an error, when this is an error? And why do
> you do exactly the same in box_wait_ro() above?
> 
Fixed as per first part of review.

>> +	}
>> 	return 0;
>> }> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
>> new file mode 100644
>> index 000000000..4ed04bb61
>> --- /dev/null
>> +++ b/test/box/net.box_fiber_cancel_gh-4834.result
> 
> 2. This does not conform to our test file name pattern. Read this
> carefully: https://github.com/tarantool/tarantool/wiki/Code-review-procedure#testing
Fixed. Although, we have a big legacy of similarly named tests under test/box/

Branch is force-pushed, updated patch is below.

===============
From 5e6426340f4f5af8429c4273f1b251f503c6dd9b Mon Sep 17 00:00:00 2001
From: Sergey Ostanevich <sergos@tarantool.org>
Date: Tue, 3 Nov 2020 12:52:26 +0300
Subject: [PATCH] fiber: handle fiber cancellation for fiber.cond

Before this patch fiber.cond():wait() just returns for cancelled
fiber. In contrast fiber.channel():get() threw "fiber is
canceled" error.
This patch unify behaviour of channels and condvars and also fixes
related net.box module problem - it was impossible to interrupt
net.box call with fiber.cancel because it used fiber.cond under
the hood. Test cases for both bugs are added.

Closes #4834
Closes #5013

Co-authored-by: Oleg Babin <olegrok@tarantool.org>

@TarantoolBot document
Title: fiber.cond():wait() throws if fiber is cancelled

Currently fiber.cond():wait() throws an error if waiting fiber is
cancelled.
---

Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
Issue: https://github.com/tarantool/tarantool/issues/5013

@Changelog
* fiber.cond().wait() now throws if fiber is cancelled

 src/box/box.cc                                |  7 +--
 src/box/relay.cc                              | 12 +++-
 src/lib/core/fiber_cond.c                     |  4 ++
 src/lib/core/fiber_cond.h                     |  2 +-
 test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
 test/box/gh-4834-netbox-fiber-cancel.result   | 62 +++++++++++++++++++
 test/box/gh-4834-netbox-fiber-cancel.test.lua | 28 +++++++++
 7 files changed, 132 insertions(+), 6 deletions(-)
 create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
 create mode 100644 test/box/gh-4834-netbox-fiber-cancel.result
 create mode 100644 test/box/gh-4834-netbox-fiber-cancel.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index 1f7dec362..aa23dcc13 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout)
 {
 	double deadline = ev_monotonic_now(loop()) + timeout;
 	while (is_box_configured == false || box_is_ro() != ro) {
-		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
-			return -1;
-		if (fiber_is_cancelled()) {
-			diag_set(FiberIsCancelled);
+		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
+			if (fiber_is_cancelled())
+				diag_set(FiberIsCancelled);
 			return -1;
 		}
 	}
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 1e77e0d9b..cce139c87 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -821,8 +821,18 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 			      relay_subscribe_f, relay);
 	if (rc == 0)
 		rc = cord_cojoin(&relay->cord);
-	if (rc != 0)
+	if (rc != 0) {
+		/*
+		 * We should always raise a problem from relay itself, not all
+		 * other modules that are change diag in the current fiber.
+		 * TODO: investigate why and how we can leave the relay_subscribe_f
+		 * with diag unset in the relay.
+		 */
+		if (diag_last_error(&relay->diag)) {
+			diag_set_error(diag_get(), diag_last_error(&relay->diag));
+		}
 		diag_raise();
+	}
 }
 
 static void
diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
index 904a350d9..71bb2d04d 100644
--- a/src/lib/core/fiber_cond.c
+++ b/src/lib/core/fiber_cond.c
@@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
 		diag_set(TimedOut);
 		return -1;
 	}
+	if (fiber_is_cancelled()) {
+		diag_set(FiberIsCancelled);
+		return -1;
+	}
 	return 0;
 }
 
diff --git a/src/lib/core/fiber_cond.h b/src/lib/core/fiber_cond.h
index 87c6f2ca2..2662e0654 100644
--- a/src/lib/core/fiber_cond.h
+++ b/src/lib/core/fiber_cond.h
@@ -114,7 +114,7 @@ fiber_cond_broadcast(struct fiber_cond *cond);
  * @param cond condition
  * @param timeout timeout in seconds
  * @retval 0 on fiber_cond_signal() call or a spurious wake up
- * @retval -1 on timeout, diag is set to TimedOut
+ * @retval -1 on timeout or fiber cancellation, diag is set
  */
 int
 fiber_cond_wait_timeout(struct fiber_cond *cond, double timeout);
diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
new file mode 100755
index 000000000..ca4ca2c90
--- /dev/null
+++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
@@ -0,0 +1,23 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+local fiber = require('fiber')
+local test = tap.test("gh-5013-fiber-cancel")
+
+test:plan(2)
+
+local result = {}
+
+function test_f()
+    local cond = fiber.cond()
+    local res, err = pcall(cond.wait, cond)
+    result.res = res
+    result.err = err
+end
+
+local f = fiber.create(test_f)
+f:cancel()
+fiber.yield()
+
+test:ok(result.res == false, 'expected result is false')
+test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported')
diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result
new file mode 100644
index 000000000..4ed04bb61
--- /dev/null
+++ b/test/box/gh-4834-netbox-fiber-cancel.result
@@ -0,0 +1,62 @@
+-- test-run result file version 2
+remote = require 'net.box'
+ | ---
+ | ...
+fiber = require 'fiber'
+ | ---
+ | ...
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+ | ---
+ | ...
+box.schema.func.create('infinite_call')
+ | ---
+ | ...
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+ | ---
+ | ...
+
+error_msg = nil
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function netbox_runner()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+netbox_runner()
+ | ---
+ | ...
+error_msg
+ | ---
+ | - fiber is cancelled
+ | ...
+box.schema.func.drop('infinite_call')
+ | ---
+ | ...
+infinite_call = nil
+ | ---
+ | ...
+error_msg = nil
+ | ---
+ | ...
diff --git a/test/box/gh-4834-netbox-fiber-cancel.test.lua b/test/box/gh-4834-netbox-fiber-cancel.test.lua
new file mode 100644
index 000000000..bc0e5af6e
--- /dev/null
+++ b/test/box/gh-4834-netbox-fiber-cancel.test.lua
@@ -0,0 +1,28 @@
+remote = require 'net.box'
+fiber = require 'fiber'
+test_run = require('test_run').new()
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+box.schema.func.create('infinite_call')
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+
+error_msg = nil
+test_run:cmd("setopt delimiter ';'")
+function netbox_runner()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+test_run:cmd("setopt delimiter ''");
+netbox_runner()
+error_msg
+box.schema.func.drop('infinite_call')
+infinite_call = nil
+error_msg = nil
-- 
2.24.3 (Apple Git-128)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-18 22:05       ` Sergey Ostanevich
@ 2020-11-22 16:01         ` Vladislav Shpilevoy
  2020-11-23 21:47           ` Sergey Ostanevich
  2020-11-24  7:31           ` Sergey Ostanevich
  0 siblings, 2 replies; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-22 16:01 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches, alexander.turenko

Hi! Thanks for the changes!

Technically almost good.

>> On 17 Nov 2020, at 01:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
>>
>> On 03.11.2020 11:20, Sergey Ostanevich wrote:
>>> Hi Oleg!
>>>
>>> I believe the point about 'consistency' is not valid here. I put a
>>> simple check that if diag is already set, then print it out. For the
>>> fiber_cond_wait_timeout() it happened multiple times with various
>>> reports, inlcuding this one:
>>>
>>> 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not
>>> set the DIAG to FiberIsCancelled, original diag: Missing .xlog file
>>> between LSN 5 {1: 5} and 6 {1: 6}
>>>
>>> that is used in the test system:
>>>
>>> test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status =
>>> 'loading'})
>>>
>>> So, my resolution will be: it is wrong to set a diag in an arbitrary
>>> place, without clear understanting of the reason. This is the case for
>>> the cond_wait machinery, since it doesn't know _why_ the fiber is
>>> cancelled.
>>
>> It is a wrong resolution, IMO. You just hacked cond wait not to change the
>> other places. It is not about tests. Tests only show what is provided by the
>> internal subsystems. And if they depend on fiber cond not setting diag in
>> case of a fail, then it looks wrong.
>>
> 
> Actually I didn’t make fiber_cond not setting a diag, rather preserve the
> original one if it is present.

Yes, by not setting a diag.

>>> Before this patch fiber.cond():wait() just returns for cancelled
>>> fiber. In contrast fiber.channel():get() threw "fiber is
>>> canceled" error.
>>> This patch unify behaviour of channels and condvars and also fixes
>>> related net.box module problem - it was impossible to interrupt
>>> net.box call with fiber.cancel because it used fiber.cond under
>>> the hood. Test cases for both bugs are added.
>>
>> Netbox hangs not because of using fiber.cond. But because it never
>> calls testcancel(). Normally all looped fibers should do that.
>>
> I believe the fix is an indirect result, since all 3 wait() calls in
> net_box.lua aren't covered with a pcall() - it errors out. I reproduced
> it with the following:

I know that your patch fixes it. But I was talking about the statement in
the commit message. You said:

	it was impossible to interrupt net.box call with fiber.cancel
	because it used fiber.cond under the hood

And it still uses the cond, but does not hang anymore. Therefore, the
statement obviously isn't true. The fact of cond usage is not a reason.

See 4 comments below.

> diff --git a/src/box/box.cc b/src/box/box.cc
> index 1f7dec362..aa23dcc13 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout)
>  {
>  	double deadline = ev_monotonic_now(loop()) + timeout;
>  	while (is_box_configured == false || box_is_ro() != ro) {
> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
> -			return -1;
> -		if (fiber_is_cancelled()) {
> -			diag_set(FiberIsCancelled);
> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
> +			if (fiber_is_cancelled())
> +				diag_set(FiberIsCancelled);

1. Why do you need this diag_set here? The cancellation diag is already
set in fiber_cond_wait_deadline(). Why do you set it again?

>  			return -1;
>  		}
>  	}
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index 1e77e0d9b..cce139c87 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -821,8 +821,18 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>  			      relay_subscribe_f, relay);
>  	if (rc == 0)
>  		rc = cord_cojoin(&relay->cord);
> -	if (rc != 0)
> +	if (rc != 0) {
> +		/*
> +		 * We should always raise a problem from relay itself, not all
> +		 * other modules that are change diag in the current fiber.
> +		 * TODO: investigate why and how we can leave the relay_subscribe_f
> +		 * with diag unset in the relay.
> +		 */
> +		if (diag_last_error(&relay->diag)) {

2. Please, use != NULL, according to our code style.
https://github.com/tarantool/tarantool/wiki/Code-review-procedure#code-style

> +			diag_set_error(diag_get(), diag_last_error(&relay->diag));
> +		}

3. What is the test where diag_last_error() is NULL? I added
an assertion, that it is not NULL, and the tests passed (except a
couple of tests which always fail on my machine due to too long UNIX
socket path).

Also in the end of relay_subscribe_f() there is an existing
assert(!diag_is_empty(&relay->diag)). So it can't be NULL. Or do you
have a test?

The only possible issue I see is that

	diag_set_error(diag_get(), diag_last_error(&relay->diag));

is called too early in relay_subscribe_f(). After that the diag
can be rewritten somehow. For instance, by fiber_join(reader). So
probably you can move this diag move to the end right before 'return -1;',
and remove this hunk entirely. In a separate commit in the same branch,
since it is not related to fiber_cond bug directly.

>  		diag_raise();
> +	}
>  }
> diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result
> new file mode 100644
> index 000000000..4ed04bb61
> --- /dev/null
> +++ b/test/box/gh-4834-netbox-fiber-cancel.result
> @@ -0,0 +1,62 @@
> +-- test-run result file version 2
> +remote = require 'net.box'
> + | ---
> + | ...
> +fiber = require 'fiber'

4. Please, use () for require. You even did it in the other test file,
but here you changed your mind somewhy. Or it is bad copy-paste.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-22 16:01         ` Vladislav Shpilevoy
@ 2020-11-23 21:47           ` Sergey Ostanevich
  2020-11-24  7:31           ` Sergey Ostanevich
  1 sibling, 0 replies; 23+ messages in thread
From: Sergey Ostanevich @ 2020-11-23 21:47 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, alexander.turenko

Hi!

Thanks for review! 

> On 22 Nov 2020, at 19:01, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> Hi! Thanks for the changes!
> 
> Technically almost good.
> 
>>> On 17 Nov 2020, at 01:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
>>> 
>>> On 03.11.2020 11:20, Sergey Ostanevich wrote:
>>>> Hi Oleg!
>>>> 
>>>> I believe the point about 'consistency' is not valid here. I put a
>>>> simple check that if diag is already set, then print it out. For the
>>>> fiber_cond_wait_timeout() it happened multiple times with various
>>>> reports, inlcuding this one:
>>>> 
>>>> 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not
>>>> set the DIAG to FiberIsCancelled, original diag: Missing .xlog file
>>>> between LSN 5 {1: 5} and 6 {1: 6}
>>>> 
>>>> that is used in the test system:
>>>> 
>>>> test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status =
>>>> 'loading'})
>>>> 
>>>> So, my resolution will be: it is wrong to set a diag in an arbitrary
>>>> place, without clear understanting of the reason. This is the case for
>>>> the cond_wait machinery, since it doesn't know _why_ the fiber is
>>>> cancelled.
>>> 
>>> It is a wrong resolution, IMO. You just hacked cond wait not to change the
>>> other places. It is not about tests. Tests only show what is provided by the
>>> internal subsystems. And if they depend on fiber cond not setting diag in
>>> case of a fail, then it looks wrong.
>>> 
>> 
>> Actually I didn’t make fiber_cond not setting a diag, rather preserve the
>> original one if it is present.
> 
> Yes, by not setting a diag.
> 
>>>> Before this patch fiber.cond():wait() just returns for cancelled
>>>> fiber. In contrast fiber.channel():get() threw "fiber is
>>>> canceled" error.
>>>> This patch unify behaviour of channels and condvars and also fixes
>>>> related net.box module problem - it was impossible to interrupt
>>>> net.box call with fiber.cancel because it used fiber.cond under
>>>> the hood. Test cases for both bugs are added.
>>> 
>>> Netbox hangs not because of using fiber.cond. But because it never
>>> calls testcancel(). Normally all looped fibers should do that.
>>> 
>> I believe the fix is an indirect result, since all 3 wait() calls in
>> net_box.lua aren't covered with a pcall() - it errors out. I reproduced
>> it with the following:
> 
> I know that your patch fixes it. But I was talking about the statement in
> the commit message. You said:
> 
> 	sergos/gh-5013-fiber-cond 
> 	because it used fiber.cond under the hood
> 
> And it still uses the cond, but does not hang anymore. Therefore, the
> statement obviously isn't true. The fact of cond usage is not a reason.
> 

Well, it might mislead, I see. Let me rephrase it.

> See 4 comments below.
> 
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 1f7dec362..aa23dcc13 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout)
>> {
>> 	double deadline = ev_monotonic_now(loop()) + timeout;
>> 	while (is_box_configured == false || box_is_ro() != ro) {
>> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
>> -			return -1;
>> -		if (fiber_is_cancelled()) {
>> -			diag_set(FiberIsCancelled);
>> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
>> +			if (fiber_is_cancelled())
>> +				diag_set(FiberIsCancelled);
> 
> 1. Why do you need this diag_set here? The cancellation diag is already
> set in fiber_cond_wait_deadline(). Why do you set it again?

It's an artefact appeared during the solution. You’re right, it is not 
needed, removed. 

> 
>> 			return -1;
>> 		}
>> 	}
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index 1e77e0d9b..cce139c87 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -821,8 +821,18 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>> 			      relay_subscribe_f, relay);
>> 	if (rc == 0)
>> 		rc = cord_cojoin(&relay->cord);
>> -	if (rc != 0)
>> +	if (rc != 0) {
>> +		/*
>> +		 * We should always raise a problem from relay itself, not all
>> +		 * other modules that are change diag in the current fiber.
>> +		 * TODO: investigate why and how we can leave the relay_subscribe_f
>> +		 * with diag unset in the relay.
>> +		 */
>> +		if (diag_last_error(&relay->diag)) {
> 
> 2. Please, use != NULL, according to our code style.
> https://github.com/tarantool/tarantool/wiki/Code-review-procedure#code-style
> 

I prefer to fix this along with the comment #3

>> +			diag_set_error(diag_get(), diag_last_error(&relay->diag));
>> +		}
> 
> 3. What is the test where diag_last_error() is NULL? I added
> an assertion, that it is not NULL, and the tests passed (except a
> couple of tests which always fail on my machine due to too long UNIX
> socket path).
> 
> Also in the end of relay_subscribe_f() there is an existing
> assert(!diag_is_empty(&relay->diag)). So it can't be NULL. Or do you
> have a test?
> 
> The only possible issue I see is that
> 
> 	diag_set_error(diag_get(), diag_last_error(&relay->diag));
> 
> is called too early inrelay_subscribe_f(). After that the diag
> can be rewritten somehow. For instance, by fiber_join(reader). So
> probably you can move this diag move to the end right before 'return -1;',
> and remove this hunk entirely. In a separate commit in the same branch,
> since it is not related to fiber_cond bug directly.
> 

You’re right - the assert was there for long time, so should not happen 
at all. I move the diag_set_error, still I have a concern - the
relay_exit(relay) at the very end of the relay_subscribe_f() can raise
down in the call chain at the recovery_stop_local(). It can happen only
if fiber from recovery watcher has some error set, which is not clear
to me yet. 
 
> 
>> 		diag_raise();
>> +	}
>> }
>> diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result
>> new file mode 100644
>> index 000000000..4ed04bb61
>> --- /dev/null
>> +++ b/test/box/gh-4834-netbox-fiber-cancel.result
>> @@ -0,0 +1,62 @@
>> +-- test-run result file version 2
>> +remote = require 'net.box'
>> + | ---
>> + | ...
>> +fiber = require 'fiber'
> 
> 4. Please, use () for require. You even did it in the other test file,
> but here you changed your mind somewhy. Or it is bad copy-paste.
Done. 

================
From e8bd26bb26c76e0a722958db1fe4763c027dcb8b Mon Sep 17 00:00:00 2001
From: Sergey Ostanevich <sergos@tarantool.org>
Date: Tue, 3 Nov 2020 12:52:26 +0300
Subject: [PATCH] fiber: handle fiber cancellation for fiber.cond

Before this patch fiber.cond():wait() just returns for cancelled
fiber. In contrast fiber.channel():get() throws "fiber is canceled"
error. This patch unifies behaviour of channels and condvars.
It also fixes a related net.box module problem #4834 since fiber.cond
now performs test for fiber cancellation.

Closes #4834
Closes #5013

Co-authored-by: Oleg Babin <olegrok@tarantool.org>

@TarantoolBot document
Title: fiber.cond():wait() throws if fiber is cancelled

Currently fiber.cond():wait() throws an error if waiting fiber is
cancelled.
---

Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
Issue: https://github.com/tarantool/tarantool/issues/5013

 src/box/box.cc                                |  4 --
 src/box/relay.cc                              | 19 +++---
 src/lib/core/fiber_cond.c                     |  4 ++
 src/lib/core/fiber_cond.h                     |  2 +-
 test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
 test/box/gh-4834-netbox-fiber-cancel.result   | 62 +++++++++++++++++++
 test/box/gh-4834-netbox-fiber-cancel.test.lua | 28 +++++++++
 7 files changed, 128 insertions(+), 14 deletions(-)
 create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
 create mode 100644 test/box/gh-4834-netbox-fiber-cancel.result
 create mode 100644 test/box/gh-4834-netbox-fiber-cancel.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index 18568df3b..111432937 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -307,10 +307,6 @@ box_wait_ro(bool ro, double timeout)
 	while (is_box_configured == false || box_is_ro() != ro) {
 		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
 			return -1;
-		if (fiber_is_cancelled()) {
-			diag_set(FiberIsCancelled);
-			return -1;
-		}
 	}
 	return 0;
 }
diff --git a/src/box/relay.cc b/src/box/relay.cc
index b68b45e00..a7bc2c6f7 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -753,15 +753,6 @@ relay_subscribe_f(va_list ap)
 	if (!relay->replica->anon)
 		relay_send_is_raft_enabled(relay, &raft_enabler, false);
 
-	/*
-	 * Log the error that caused the relay to break the loop.
-	 * Don't clear the error for status reporting.
-	 */
-	assert(!diag_is_empty(&relay->diag));
-	diag_set_error(diag_get(), diag_last_error(&relay->diag));
-	diag_log();
-	say_crit("exiting the relay loop");
-
 	/*
 	 * Clear garbage collector trigger and WAL watcher.
 	 * trigger_clear() does nothing in case the triggers
@@ -780,6 +771,16 @@ relay_subscribe_f(va_list ap)
 	cbus_endpoint_destroy(&relay->endpoint, cbus_process);
 
 	relay_exit(relay);
+
+	/*
+	 * Log the error that caused the relay to break the loop.
+	 * Don't clear the error for status reporting.
+	 */
+	assert(!diag_is_empty(&relay->diag));
+	diag_set_error(diag_get(), diag_last_error(&relay->diag));
+	diag_log();
+	say_crit("exiting the relay loop");
+
 	return -1;
 }
 
diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
index 904a350d9..71bb2d04d 100644
--- a/src/lib/core/fiber_cond.c
+++ b/src/lib/core/fiber_cond.c
@@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
 		diag_set(TimedOut);
 		return -1;
 	}
+	if (fiber_is_cancelled()) {
+		diag_set(FiberIsCancelled);
+		return -1;
+	}
 	return 0;
 }
 
diff --git a/src/lib/core/fiber_cond.h b/src/lib/core/fiber_cond.h
index 87c6f2ca2..2662e0654 100644
--- a/src/lib/core/fiber_cond.h
+++ b/src/lib/core/fiber_cond.h
@@ -114,7 +114,7 @@ fiber_cond_broadcast(struct fiber_cond *cond);
  * @param cond condition
  * @param timeout timeout in seconds
  * @retval 0 on fiber_cond_signal() call or a spurious wake up
- * @retval -1 on timeout, diag is set to TimedOut
+ * @retval -1 on timeout or fiber cancellation, diag is set
  */
 int
 fiber_cond_wait_timeout(struct fiber_cond *cond, double timeout);
diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
new file mode 100755
index 000000000..ca4ca2c90
--- /dev/null
+++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
@@ -0,0 +1,23 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+local fiber = require('fiber')
+local test = tap.test("gh-5013-fiber-cancel")
+
+test:plan(2)
+
+local result = {}
+
+function test_f()
+    local cond = fiber.cond()
+    local res, err = pcall(cond.wait, cond)
+    result.res = res
+    result.err = err
+end
+
+local f = fiber.create(test_f)
+f:cancel()
+fiber.yield()
+
+test:ok(result.res == false, 'expected result is false')
+test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported')
diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result
new file mode 100644
index 000000000..4ed04bb61
--- /dev/null
+++ b/test/box/gh-4834-netbox-fiber-cancel.result
@@ -0,0 +1,62 @@
+-- test-run result file version 2
+remote = require 'net.box'
+ | ---
+ | ...
+fiber = require 'fiber'
+ | ---
+ | ...
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+ | ---
+ | ...
+box.schema.func.create('infinite_call')
+ | ---
+ | ...
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+ | ---
+ | ...
+
+error_msg = nil
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function netbox_runner()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+netbox_runner()
+ | ---
+ | ...
+error_msg
+ | ---
+ | - fiber is cancelled
+ | ...
+box.schema.func.drop('infinite_call')
+ | ---
+ | ...
+infinite_call = nil
+ | ---
+ | ...
+error_msg = nil
+ | ---
+ | ...
diff --git a/test/box/gh-4834-netbox-fiber-cancel.test.lua b/test/box/gh-4834-netbox-fiber-cancel.test.lua
new file mode 100644
index 000000000..dbeff083b
--- /dev/null
+++ b/test/box/gh-4834-netbox-fiber-cancel.test.lua
@@ -0,0 +1,28 @@
+remote = require('net.box')
+fiber = require('fiber')
+test_run = require('test_run').new()
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+box.schema.func.create('infinite_call')
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+
+error_msg = nil
+test_run:cmd("setopt delimiter ';'")
+function netbox_runner()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+test_run:cmd("setopt delimiter ''");
+netbox_runner()
+error_msg
+box.schema.func.drop('infinite_call')
+infinite_call = nil
+error_msg = nil
-- 
2.24.3 (Apple Git-128)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-22 16:01         ` Vladislav Shpilevoy
  2020-11-23 21:47           ` Sergey Ostanevich
@ 2020-11-24  7:31           ` Sergey Ostanevich
  1 sibling, 0 replies; 23+ messages in thread
From: Sergey Ostanevich @ 2020-11-24  7:31 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, Alexander Turenko

Yet another fix-up because of test update:

--- a/test/box/gh-4834-netbox-fiber-cancel.result
+++ b/test/box/gh-4834-netbox-fiber-cancel.result
@@ -1,8 +1,8 @@
 -- test-run result file version 2
-remote = require 'net.box'
+remote = require('net.box')
  | ---
  | ...
-fiber = require 'fiber'
+fiber = require('fiber')
  | ---
  | ...
 test_run = require('test_run').new()

Force-pushed. 

> On 22 Nov 2020, at 19:01, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> Hi! Thanks for the changes!
> 
> Technically almost good.
> 
>>> On 17 Nov 2020, at 01:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
>>> 
>>> On 03.11.2020 11:20, Sergey Ostanevich wrote:
>>>> Hi Oleg!
>>>> 
>>>> I believe the point about 'consistency' is not valid here. I put a
>>>> simple check that if diag is already set, then print it out. For the
>>>> fiber_cond_wait_timeout() it happened multiple times with various
>>>> reports, inlcuding this one:
>>>> 
>>>> 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not
>>>> set the DIAG to FiberIsCancelled, original diag: Missing .xlog file
>>>> between LSN 5 {1: 5} and 6 {1: 6}
>>>> 
>>>> that is used in the test system:
>>>> 
>>>> test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status =
>>>> 'loading'})
>>>> 
>>>> So, my resolution will be: it is wrong to set a diag in an arbitrary
>>>> place, without clear understanting of the reason. This is the case for
>>>> the cond_wait machinery, since it doesn't know _why_ the fiber is
>>>> cancelled.
>>> 
>>> It is a wrong resolution, IMO. You just hacked cond wait not to change the
>>> other places. It is not about tests. Tests only show what is provided by the
>>> internal subsystems. And if they depend on fiber cond not setting diag in
>>> case of a fail, then it looks wrong.
>>> 
>> 
>> Actually I didn’t make fiber_cond not setting a diag, rather preserve the
>> original one if it is present.
> 
> Yes, by not setting a diag.
> 
>>>> Before this patch fiber.cond():wait() just returns for cancelled
>>>> fiber. In contrast fiber.channel():get() threw "fiber is
>>>> canceled" error.
>>>> This patch unify behaviour of channels and condvars and also fixes
>>>> related net.box module problem - it was impossible to interrupt
>>>> net.box call with fiber.cancel because it used fiber.cond under
>>>> the hood. Test cases for both bugs are added.
>>> 
>>> Netbox hangs not because of using fiber.cond. But because it never
>>> calls testcancel(). Normally all looped fibers should do that.
>>> 
>> I believe the fix is an indirect result, since all 3 wait() calls in
>> net_box.lua aren't covered with a pcall() - it errors out. I reproduced
>> it with the following:
> 
> I know that your patch fixes it. But I was talking about the statement in
> the commit message. You said:
> 
> 	it was impossible to interrupt net.box call with fiber.cancel
> 	because it used fiber.cond under the hood
> 
> And it still uses the cond, but does not hang anymore. Therefore, the
> statement obviously isn't true. The fact of cond usage is not a reason.
> 
> See 4 comments below.
> 
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 1f7dec362..aa23dcc13 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout)
>> {
>> 	double deadline = ev_monotonic_now(loop()) + timeout;
>> 	while (is_box_configured == false || box_is_ro() != ro) {
>> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
>> -			return -1;
>> -		if (fiber_is_cancelled()) {
>> -			diag_set(FiberIsCancelled);
>> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
>> +			if (fiber_is_cancelled())
>> +				diag_set(FiberIsCancelled);
> 
> 1. Why do you need this diag_set here? The cancellation diag is already
> set in fiber_cond_wait_deadline(). Why do you set it again?
> 
>> 			return -1;
>> 		}
>> 	}
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index 1e77e0d9b..cce139c87 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -821,8 +821,18 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>> 			      relay_subscribe_f, relay);
>> 	if (rc == 0)
>> 		rc = cord_cojoin(&relay->cord);
>> -	if (rc != 0)
>> +	if (rc != 0) {
>> +		/*
>> +		 * We should always raise a problem from relay itself, not all
>> +		 * other modules that are change diag in the current fiber.
>> +		 * TODO: investigate why and how we can leave the relay_subscribe_f
>> +		 * with diag unset in the relay.
>> +		 */
>> +		if (diag_last_error(&relay->diag)) {
> 
> 2. Please, use != NULL, according to our code style.
> https://github.com/tarantool/tarantool/wiki/Code-review-procedure#code-style
> 
>> +			diag_set_error(diag_get(), diag_last_error(&relay->diag));
>> +		}
> 
> 3. What is the test where diag_last_error() is NULL? I added
> an assertion, that it is not NULL, and the tests passed (except a
> couple of tests which always fail on my machine due to too long UNIX
> socket path).
> 
> Also in the end of relay_subscribe_f() there is an existing
> assert(!diag_is_empty(&relay->diag)). So it can't be NULL. Or do you
> have a test?
> 
> The only possible issue I see is that
> 
> 	diag_set_error(diag_get(), diag_last_error(&relay->diag));
> 
> is called too early in relay_subscribe_f(). After that the diag
> can be rewritten somehow. For instance, by fiber_join(reader). So
> probably you can move this diag move to the end right before 'return -1;',
> and remove this hunk entirely. In a separate commit in the same branch,
> since it is not related to fiber_cond bug directly.
> 
>> 		diag_raise();
>> +	}
>> }
>> diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result
>> new file mode 100644
>> index 000000000..4ed04bb61
>> --- /dev/null
>> +++ b/test/box/gh-4834-netbox-fiber-cancel.result
>> @@ -0,0 +1,62 @@
>> +-- test-run result file version 2
>> +remote = require 'net.box'
>> + | ---
>> + | ...
>> +fiber = require 'fiber'
> 
> 4. Please, use () for require. You even did it in the other test file,
> but here you changed your mind somewhy. Or it is bad copy-paste.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos
                   ` (2 preceding siblings ...)
  2020-11-16 22:12 ` Vladislav Shpilevoy
@ 2020-11-25 21:32 ` Vladislav Shpilevoy
  2020-11-29 21:41   ` Sergey Ostanevich
  2020-11-30 21:46   ` Alexander Turenko
  2020-11-30 21:01 ` Vladislav Shpilevoy
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-25 21:32 UTC (permalink / raw)
  To: sergos, tarantool-patches; +Cc: alexander.turenko

Hi! Thanks for the fixes!

Technically the patch is good! See 3 non-technical comments below.

> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index b68b45e00..a7bc2c6f7 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -780,6 +771,16 @@ relay_subscribe_f(va_list ap)
>  	cbus_endpoint_destroy(&relay->endpoint, cbus_process);
>  
>  	relay_exit(relay);
> +
> +	/*
> +	 * Log the error that caused the relay to break the loop.
> +	 * Don't clear the error for status reporting.
> +	 */
> +	assert(!diag_is_empty(&relay->diag));
> +	diag_set_error(diag_get(), diag_last_error(&relay->diag));
> +	diag_log();
> +	say_crit("exiting the relay loop");
> +

1. I suggest you to move this to a separate commit, before you
change fiber cond behaviour. Because it seems it is not related
to conds really. And even if it would be related, still the
issue looks like a separate bug.

>  	return -1;
>  }
> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
> new file mode 100755
> index 000000000..ca4ca2c90
> --- /dev/null
> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
> @@ -0,0 +1,23 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +local fiber = require('fiber')
> +local test = tap.test("gh-5013-fiber-cancel")
> +
> +test:plan(2)
> +
> +local result = {}
> +
> +function test_f()
> +    local cond = fiber.cond()
> +    local res, err = pcall(cond.wait, cond)
> +    result.res = res
> +    result.err = err
> +end
> +
> +local f = fiber.create(test_f)
> +f:cancel()
> +fiber.yield()
> +
> +test:ok(result.res == false, 'expected result is false')
> +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported')

2. I think you are also supposed to call os.exit with test:check()
like other tap tests do. Otherwise it probably always ends with 0
code, and won't work properly when we will make tap tests non-diff
based.

> diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result
> new file mode 100644
> index 000000000..cb631995c
> --- /dev/null
> +++ b/test/box/gh-4834-netbox-fiber-cancel.result
> @@ -0,0 +1,62 @@
> +-- test-run result file version 2
> +remote = require('net.box')
> + | ---
> + | ...
> +fiber = require('fiber')
> + | ---
> + | ...
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> +function infinite_call() fiber.channel(1):get() end

3. I noticed you never finish the fiber. So it hangs there forever
or until a restart. Better not leave any test artifacts and finish
it completely. Just save the channel to a global variable and put
something into it in the end.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-25 21:32 ` Vladislav Shpilevoy
@ 2020-11-29 21:41   ` Sergey Ostanevich
  2020-11-30 21:46   ` Alexander Turenko
  1 sibling, 0 replies; 23+ messages in thread
From: Sergey Ostanevich @ 2020-11-29 21:41 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, alexander.turenko

Hi! Thank you for review!

Force-pushed the sergos/gh-5013-fiber-cond branch, new patches are inline.

Sergos

> On 26 Nov 2020, at 00:32, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote:
> 
> Hi! Thanks for the fixes!
> 
> Technically the patch is good! See 3 non-technical comments below.
> 
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index b68b45e00..a7bc2c6f7 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -780,6 +771,16 @@ relay_subscribe_f(va_list ap)
>> 	cbus_endpoint_destroy(&relay->endpoint, cbus_process);
>> 
>> 	relay_exit(relay);
>> +
>> +	/*
>> +	 * Log the error that caused the relay to break the loop.
>> +	 * Don't clear the error for status reporting.
>> +	 */
>> +	assert(!diag_is_empty(&relay->diag));
>> +	diag_set_error(diag_get(), diag_last_error(&relay->diag));
>> +	diag_log();
>> +	say_crit("exiting the relay loop");
>> +
> 
> 1. I suggest you to move this to a separate commit, before you
> change fiber cond behaviour. Because it seems it is not related
> to conds really. And even if it would be related, still the
> issue looks like a separate bug.

I failed to make a test that is differs from the fiber_cond setting
a diag. So I put it is as a preparation step and mentioned it in 
commit message. 

——-
The fiber_cond_wait() will set an error in case fiber is cancelled.
As a result, the current diag in the fiber can be reset during
the wal_clear_watcher(). To prevent such overwrite the diag copy from
the relay into current fiber is moved to the exit of the
relay_subscribe_f().

Part of #5013
---
 src/box/relay.cc | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/box/relay.cc b/src/box/relay.cc
index 1e77e0d9b..f62e925af 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -753,15 +753,6 @@ relay_subscribe_f(va_list ap)
        if (!relay->replica->anon)
                relay_send_is_raft_enabled(relay, &raft_enabler, false);

-       /*
-        * Log the error that caused the relay to break the loop.
-        * Don't clear the error for status reporting.
-        */
-       assert(!diag_is_empty(&relay->diag));
-       diag_set_error(diag_get(), diag_last_error(&relay->diag));
-       diag_log();
-       say_crit("exiting the relay loop");
-
        /*
         * Clear garbage collector trigger and WAL watcher.
         * trigger_clear() does nothing in case the triggers
@@ -780,6 +771,16 @@ relay_subscribe_f(va_list ap)
        cbus_endpoint_destroy(&relay->endpoint, cbus_process);

        relay_exit(relay);
+
+       /*
+        * Log the error that caused the relay to break the loop.
+        * Don't clear the error for status reporting.
+        */
+       assert(!diag_is_empty(&relay->diag));
+       diag_set_error(diag_get(), diag_last_error(&relay->diag));
+       diag_log();
+       say_crit("exiting the relay loop");
+
        return -1;
 }

--

> 
>> 	return -1;
>> }
>> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
>> new file mode 100755
>> index 000000000..ca4ca2c90
>> --- /dev/null
>> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
>> @@ -0,0 +1,23 @@
>> +#!/usr/bin/env tarantool
>> +
>> +local tap = require('tap')
>> +local fiber = require('fiber')
>> +local test = tap.test("gh-5013-fiber-cancel")
>> +
>> +test:plan(2)
>> +
>> +local result = {}
>> +
>> +function test_f()
>> +    local cond = fiber.cond()
>> +    local res, err = pcall(cond.wait, cond)
>> +    result.res = res
>> +    result.err = err
>> +end
>> +
>> +local f = fiber.create(test_f)
>> +f:cancel()
>> +fiber.yield()
>> +
>> +test:ok(result.res == false, 'expected result is false')
>> +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported')
> 
> 2. I think you are also supposed to call os.exit with test:check()
> like other tap tests do. Otherwise it probably always ends with 0
> code, and won't work properly when we will make tap tests non-diff
> based.

Done. 

> 
>> diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result
>> new file mode 100644
>> index 000000000..cb631995c
>> --- /dev/null
>> +++ b/test/box/gh-4834-netbox-fiber-cancel.result
>> @@ -0,0 +1,62 @@
>> +-- test-run result file version 2
>> +remote = require('net.box')
>> + | ---
>> + | ...
>> +fiber = require('fiber')
>> + | ---
>> + | ...
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +
>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
>> +function infinite_call() fiber.channel(1):get() end
> 
> 3. I noticed you never finish the fiber. So it hangs there forever
> or until a restart. Better not leave any test artifacts and finish
> it completely. Just save the channel to a global variable and put
> something into it in the end.

The channel is created in the remote execution - I just kept it
until the end of the routine and closed it. 
===========

Before this patch fiber.cond():wait() just returns for cancelled
fiber. In contrast fiber.channel():get() throws "fiber is canceled"
error. This patch unifies behaviour of channels and condvars.
It also fixes a related net.box module problem #4834 since fiber.cond
now performs test for fiber cancellation.

Closes #4834
Closes #5013

Co-authored-by: Oleg Babin <olegrok@tarantool.org>

@TarantoolBot document
Title: fiber.cond():wait() throws if fiber is cancelled

Currently fiber.cond():wait() throws an error if waiting fiber is
cancelled.
---
 src/box/box.cc                                |  4 --
 src/lib/core/fiber_cond.c                     |  4 ++
 src/lib/core/fiber_cond.h                     |  2 +-
 test/app-tap/gh-5013-fiber-cancel.test.lua    | 26 ++++++++
 test/box/gh-4834-netbox-fiber-cancel.result   | 62 +++++++++++++++++++
 test/box/gh-4834-netbox-fiber-cancel.test.lua | 28 +++++++++
 6 files changed, 121 insertions(+), 5 deletions(-)
 create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
 create mode 100644 test/box/gh-4834-netbox-fiber-cancel.result
 create mode 100644 test/box/gh-4834-netbox-fiber-cancel.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index 1f7dec362..e02be2c6f 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -307,10 +307,6 @@ box_wait_ro(bool ro, double timeout)
 	while (is_box_configured == false || box_is_ro() != ro) {
 		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
 			return -1;
-		if (fiber_is_cancelled()) {
-			diag_set(FiberIsCancelled);
-			return -1;
-		}
 	}
 	return 0;
 }
diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
index 904a350d9..71bb2d04d 100644
--- a/src/lib/core/fiber_cond.c
+++ b/src/lib/core/fiber_cond.c
@@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
 		diag_set(TimedOut);
 		return -1;
 	}
+	if (fiber_is_cancelled()) {
+		diag_set(FiberIsCancelled);
+		return -1;
+	}
 	return 0;
 }
 
diff --git a/src/lib/core/fiber_cond.h b/src/lib/core/fiber_cond.h
index 87c6f2ca2..2662e0654 100644
--- a/src/lib/core/fiber_cond.h
+++ b/src/lib/core/fiber_cond.h
@@ -114,7 +114,7 @@ fiber_cond_broadcast(struct fiber_cond *cond);
  * @param cond condition
  * @param timeout timeout in seconds
  * @retval 0 on fiber_cond_signal() call or a spurious wake up
- * @retval -1 on timeout, diag is set to TimedOut
+ * @retval -1 on timeout or fiber cancellation, diag is set
  */
 int
 fiber_cond_wait_timeout(struct fiber_cond *cond, double timeout);
diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
new file mode 100755
index 000000000..91c6775dc
--- /dev/null
+++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
@@ -0,0 +1,26 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+local fiber = require('fiber')
+local test = tap.test("gh-5013-fiber-cancel")
+
+test:plan(2)
+
+local result = {}
+
+function test_f()
+    local cond = fiber.cond()
+    local res, err = pcall(cond.wait, cond)
+    result.res = res
+    result.err = err
+end
+
+local f = fiber.create(test_f)
+f:cancel()
+fiber.yield()
+
+print(require('yaml').encode(result))
+
+test:ok(result.res == false, 'expected result is false')
+test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported')
+os.exit(test:check() and 0 or 1)
diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result
new file mode 100644
index 000000000..5af404e82
--- /dev/null
+++ b/test/box/gh-4834-netbox-fiber-cancel.result
@@ -0,0 +1,62 @@
+-- test-run result file version 2
+remote = require('net.box')
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() local channel = fiber.channel(1) pcall(channel:get()) channel.close() end
+ | ---
+ | ...
+box.schema.func.create('infinite_call')
+ | ---
+ | ...
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+ | ---
+ | ...
+
+error_msg = nil
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function netbox_runner()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+netbox_runner()
+ | ---
+ | ...
+error_msg
+ | ---
+ | - fiber is cancelled
+ | ...
+box.schema.func.drop('infinite_call')
+ | ---
+ | ...
+infinite_call = nil
+ | ---
+ | ...
+error_msg = nil
+ | ---
+ | ...
diff --git a/test/box/gh-4834-netbox-fiber-cancel.test.lua b/test/box/gh-4834-netbox-fiber-cancel.test.lua
new file mode 100644
index 000000000..59963ba91
--- /dev/null
+++ b/test/box/gh-4834-netbox-fiber-cancel.test.lua
@@ -0,0 +1,28 @@
+remote = require('net.box')
+fiber = require('fiber')
+test_run = require('test_run').new()
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() local channel = fiber.channel(1) pcall(channel:get()) channel.close() end
+box.schema.func.create('infinite_call')
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+
+error_msg = nil
+test_run:cmd("setopt delimiter ';'")
+function netbox_runner()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+test_run:cmd("setopt delimiter ''");
+netbox_runner()
+error_msg
+box.schema.func.drop('infinite_call')
+infinite_call = nil
+error_msg = nil
-- 
2.24.3 (Apple Git-128)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos
                   ` (3 preceding siblings ...)
  2020-11-25 21:32 ` Vladislav Shpilevoy
@ 2020-11-30 21:01 ` Vladislav Shpilevoy
  2020-12-02 10:58 ` Alexander V. Tikhonov
  2020-12-02 22:18 ` Vladislav Shpilevoy
  6 siblings, 0 replies; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-30 21:01 UTC (permalink / raw)
  To: sergos, tarantool-patches, Alexander V. Tikhonov; +Cc: alexander.turenko

Hi! Thanks for the patchset!

LGTM.

Sasha (Tikh.), please, validate the branch is good to push.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-25 21:32 ` Vladislav Shpilevoy
  2020-11-29 21:41   ` Sergey Ostanevich
@ 2020-11-30 21:46   ` Alexander Turenko
  1 sibling, 0 replies; 23+ messages in thread
From: Alexander Turenko @ 2020-11-30 21:46 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

> > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
> > new file mode 100755
> > index 000000000..ca4ca2c90
> > --- /dev/null
> > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
> > @@ -0,0 +1,23 @@
> > +#!/usr/bin/env tarantool
> > +
> > +local tap = require('tap')
> > +local fiber = require('fiber')
> > +local test = tap.test("gh-5013-fiber-cancel")
> > +
> > +test:plan(2)
> > +
> > +local result = {}
> > +
> > +function test_f()
> > +    local cond = fiber.cond()
> > +    local res, err = pcall(cond.wait, cond)
> > +    result.res = res
> > +    result.err = err
> > +end
> > +
> > +local f = fiber.create(test_f)
> > +f:cancel()
> > +fiber.yield()
> > +
> > +test:ok(result.res == false, 'expected result is false')
> > +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported')
> 
> 2. I think you are also supposed to call os.exit with test:check()
> like other tap tests do. Otherwise it probably always ends with 0
> code, and won't work properly when we will make tap tests non-diff
> based.

Just side note. TAP13 tests are already not diff-based. test-run parses
and verifies TAP13 output if a result file is not present. test-run also
checks the exit code of a process. (That's all are about 'core = app'
test suites.)

But I anyway think that it is good property to return non-zero exit code
from a test if it is not passed. When all tests are written this way, we
have more freedom around ways to run a test. Who knows, maybe we'll want
to run some tests in a very restricted environment, where it will be
hard to get all this python stuff workable?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-11-13  8:26       ` Leonid Vasiliev
@ 2020-11-30 22:49         ` Alexander Turenko
  0 siblings, 0 replies; 23+ messages in thread
From: Alexander Turenko @ 2020-11-30 22:49 UTC (permalink / raw)
  To: Leonid Vasiliev; +Cc: tarantool-patches, Vladislav Shpilevoy

> > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
> > index 904a350d9..cc59eaafb 100644
> > --- a/src/lib/core/fiber_cond.c
> > +++ b/src/lib/core/fiber_cond.c
> > @@ -108,6 +108,11 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
> >   		diag_set(TimedOut);
> >   		return -1;
> >   	}
> > +	if (fiber_is_cancelled()) {
> > +		if (diag_is_empty(diag_get()))
> > +                        diag_set(FiberIsCancelled);
> 
> I read your argumentation. AFAIU sounds like:"We have a lot of tests
> that check an error in the diag in such case. Also, some user
> applications can check the diag in a similar way. And stack diag can be used
> only in the new versions of tarantool." So, it's true.
> What I propose to think about:
> - this is not simple logic. If a fiber has been cancelled, maybe an error
> will be added to the diag, maybe no - 50/50.
> - this behavior should be documented.
> - ask Mons or Turenko for advice: "What do they think about it?"
> All this up to you.

The fiber_channel implementation (C level) convinced me. It is good to
have the similar behaviour in fiber_cond and fiber_channel.

I would only ask to document this subtle difference in our web
documentation. Python document subtle changes (at least some of them)
and it is nice for users. Example:

Python 2:

 | Popen.send_signal(signal)
 |
 | Sends the signal `signal` to the child.

Python 3:

 | Popen.send_signal(signal)
 |
 | Sends the signal `signal` to the child.
 |
 | Do nothing if the process completed.

(It would be better to mark it with 'Since Python x.y.z', though.)

That is the place, where Python 2 and Python 3 behaviour actually
differs (when the popen implementation aware about a process
termination, i.e. after <popen object>.poll()).

It is important, when you writting a code that must work on both Python
2 and Python 3: you should catch OSError here.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos
                   ` (4 preceding siblings ...)
  2020-11-30 21:01 ` Vladislav Shpilevoy
@ 2020-12-02 10:58 ` Alexander V. Tikhonov
  2020-12-02 22:18 ` Vladislav Shpilevoy
  6 siblings, 0 replies; 23+ messages in thread
From: Alexander V. Tikhonov @ 2020-12-02 10:58 UTC (permalink / raw)
  To: sergos; +Cc: tarantool-patches

Hi, thanks for the patch, as I see no new degradation found in
gitlab-ci testing commit criteria pipeline [1], patch LGTM.

[1] - https://gitlab.com/tarantool/tarantool/-/pipelines/222853269

On Sat, Oct 31, 2020 at 07:29:11PM +0300, sergos@tarantool.org wrote:
> From: Sergey Ostanevich <sergos@tarantool.org>
> 
> Hi!
> 
> Thanks to Oleg Babin's comment I found there's no need to update any lua
> interfaces, since the reason was in C implementation. Also, there is one
> place the change is played, so after I fixed it I got complete testing 
> pass. 
> Force-pushed branch, v2 patch attached.
> 
> 
> 
> Before this patch fiber.cond():wait() just returns for cancelled
> fiber. In contrast fiber.channel():get() threw "fiber is
> canceled" error.
> This patch unify behaviour of channels and condvars and also fixes
> related net.box module problem - it was impossible to interrupt
> net.box call with fiber.cancel because it used fiber.cond under
> the hood. Test cases for both bugs are added.
> 
> Closes #4834
> Closes #5013
> 
> Co-authored-by: Oleg Babin <olegrok@tarantool.org>
> 
> @TarantoolBot document
> Title: fiber.cond():wait() throws if fiber is cancelled
> 
> Currently fiber.cond():wait() throws an error if waiting fiber is
> cancelled like in case with fiber.channel():get().
> ---
> 
> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond                                                                                                                          
> Issue: https://github.com/tarantool/tarantool/issues/5013                                                                                                                                                   
> 
>  src/box/box.cc                                |  6 +-
>  src/lib/core/fiber_cond.c                     |  1 +
>  test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
>  test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
>  .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
>  5 files changed, 120 insertions(+), 4 deletions(-)
>  create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
>  create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
>  create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 18568df3b..bfa1051f9 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
>  {
>  	double deadline = ev_monotonic_now(loop()) + timeout;
>  	while (is_box_configured == false || box_is_ro() != ro) {
> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
> -			return -1;
> -		if (fiber_is_cancelled()) {
> -			diag_set(FiberIsCancelled);
> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
> +                        if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
>  			return -1;
>  		}
>  	}
> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
> index 904a350d9..b0645069e 100644
> --- a/src/lib/core/fiber_cond.c
> +++ b/src/lib/core/fiber_cond.c
> @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>  		diag_set(TimedOut);
>  		return -1;
>  	}
> +	if (fiber_is_cancelled()) return -1;
>  	return 0;
>  }
>  
> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
> new file mode 100755
> index 000000000..ae805c5bf
> --- /dev/null
> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
> @@ -0,0 +1,23 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +local fiber = require('fiber')
> +local test = tap.test("gh-5013-fiber-cancel")
> +
> +test:plan(2)
> +
> +local result = {}
> +
> +function test_f()
> +    local cond = fiber.cond() 
> +    local res, err = pcall(cond.wait, cond) 
> +    result.res = res
> +    result.err = err
> +end
> +
> +local f = fiber.create(test_f)
> +f:cancel()
> +fiber.yield()
> +
> +test:ok(result.res == false, tostring(result.res))
> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
> new file mode 100644
> index 000000000..eab0a5e4d
> --- /dev/null
> +++ b/test/box/net.box_fiber_cancel_gh-4834.result
> @@ -0,0 +1,65 @@
> +-- test-run result file version 2
> +remote = require 'net.box'
> + | ---
> + | ...
> +fiber = require 'fiber'
> + | ---
> + | ...
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> +function infinite_call() fiber.channel(1):get() end
> + | ---
> + | ...
> +box.schema.func.create('infinite_call')
> + | ---
> + | ...
> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> + | ---
> + | ...
> +
> +error_msg = nil
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ';'")
> + | ---
> + | - true
> + | ...
> +function gh4834()
> +    local cn = remote.connect(box.cfg.listen)
> +    local f = fiber.new(function()
> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> +    end)
> +    f:set_joinable(true)
> +    fiber.yield()
> +    f:cancel()
> +    f:join()
> +    cn:close()
> +end;
> + | ---
> + | ...
> +test_run:cmd("setopt delimiter ''");
> + | ---
> + | - true
> + | ...
> +gh4834()
> + | ---
> + | ...
> +error_msg
> + | ---
> + | - fiber is cancelled
> + | ...
> +box.schema.func.drop('infinite_call')
> + | ---
> + | ...
> +infinite_call = nil
> + | ---
> + | ...
> +channel = nil
> + | ---
> + | ...
> +error_msg = nil
> + | ---
> + | ...
> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> new file mode 100644
> index 000000000..06fb3ceac
> --- /dev/null
> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
> @@ -0,0 +1,29 @@
> +remote = require 'net.box'
> +fiber = require 'fiber'
> +test_run = require('test_run').new()
> +
> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
> +function infinite_call() fiber.channel(1):get() end
> +box.schema.func.create('infinite_call')
> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
> +
> +error_msg = nil
> +test_run:cmd("setopt delimiter ';'")
> +function gh4834()
> +    local cn = remote.connect(box.cfg.listen)
> +    local f = fiber.new(function()
> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
> +    end)
> +    f:set_joinable(true)
> +    fiber.yield()
> +    f:cancel()
> +    f:join()
> +    cn:close()
> +end;
> +test_run:cmd("setopt delimiter ''");
> +gh4834()
> +error_msg
> +box.schema.func.drop('infinite_call')
> +infinite_call = nil
> +channel = nil
> +error_msg = nil
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond
  2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos
                   ` (5 preceding siblings ...)
  2020-12-02 10:58 ` Alexander V. Tikhonov
@ 2020-12-02 22:18 ` Vladislav Shpilevoy
  6 siblings, 0 replies; 23+ messages in thread
From: Vladislav Shpilevoy @ 2020-12-02 22:18 UTC (permalink / raw)
  To: sergos, tarantool-patches; +Cc: alexander.turenko

Pushed to master, 2.6, 2.5, and 1.10.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-12-02 22:18 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos
2020-11-01 10:13 ` Oleg Babin
2020-11-03 10:20   ` Sergey Ostanevich
2020-11-03 10:27     ` Oleg Babin
2020-11-04 10:00     ` Leonid Vasiliev
2020-11-16 22:12     ` Vladislav Shpilevoy
2020-11-18 22:05       ` Sergey Ostanevich
2020-11-22 16:01         ` Vladislav Shpilevoy
2020-11-23 21:47           ` Sergey Ostanevich
2020-11-24  7:31           ` Sergey Ostanevich
2020-11-04 10:00 ` Leonid Vasiliev
2020-11-05 20:42   ` Sergey Ostanevich
2020-11-10 21:16   ` Sergey Ostanevich
2020-11-12 20:15     ` Sergey Ostanevich
2020-11-13  8:26       ` Leonid Vasiliev
2020-11-30 22:49         ` Alexander Turenko
2020-11-16 22:12 ` Vladislav Shpilevoy
2020-11-25 21:32 ` Vladislav Shpilevoy
2020-11-29 21:41   ` Sergey Ostanevich
2020-11-30 21:46   ` Alexander Turenko
2020-11-30 21:01 ` Vladislav Shpilevoy
2020-12-02 10:58 ` Alexander V. Tikhonov
2020-12-02 22:18 ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox