[Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond

Sergey Ostanevich sergos at tarantool.org
Wed Nov 11 00:16:16 MSK 2020


Hi Leonid, thank you for the review!

I put two parts together to handle them in one patch.

> On 4 Nov 2020, at 13:00, Leonid Vasiliev <lvasiliev at tarantool.org> wrote:
> 
> Hi! Thank you for the patch.
> See some comments below:
> 
> 1) The patch changes undocumented behavior, AFAIU.
> So, I have a question:"Do you plan to backport the patch to
> tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the
> changes. But if the answer is "No" - I will object, because in this case
> both behaviors must be supported in all modules.
> 

Yes, the plan is to push it down to 1.10

> 2) I think changing the behavior in C doesn't cause much of a problem,
> because before when you wait without timeout, you don't need to check
> the return value (it's always 0). But in Lua it will cause the problems,
> because now throws an error if cancelled and all wait calls should be
> wrapped to pcall.

Exactly it is the change to Lua and we intentionally make it, since original
behaviour is wrong. Good things are: fiber.cond has no big presence in the
/tarantool sourcebase and the cancellation of fiber itself means the fiber
has to quit.
 
> 
> On 31.10.2020 19:29, sergos at tarantool.org wrote:
>> From: Sergey Ostanevich <sergos at tarantool.org>
>> Hi!
>> Thanks to Oleg Babin's comment I found there's no need to update any lua
>> interfaces, since the reason was in C implementation. Also, there is one
>> place the change is played, so after I fixed it I got complete testing
>> pass.
>> Force-pushed branch, v2 patch attached.
>> Before this patch fiber.cond():wait() just returns for cancelled
>> fiber. In contrast fiber.channel():get() threw "fiber is
>> canceled" error.
>> This patch unify behaviour of channels and condvars and also fixes
> 
> 3) behaviour -> behavior.

Depends on spellchecker - mine sets it to this British version.

> 
>> related net.box module problem - it was impossible to interrupt
>> net.box call with fiber.cancel because it used fiber.cond under
>> the hood. Test cases for both bugs are added.
>> Closes #4834
>> Closes #5013
>> Co-authored-by: Oleg Babin <olegrok at tarantool.org>
>> @TarantoolBot document
>> Title: fiber.cond():wait() throws if fiber is cancelled
>> Currently fiber.cond():wait() throws an error if waiting fiber is
>> cancelled like in case with fiber.channel():get().
> 
> 4) I don't think it's a good decision adding a comparison with
> fiber.channel():get() to the documentation. Up to you.
> 

I’d agree to you: let’s just put the new behaviour here.

> 5) Document the changes in module.h.

Done (need to rebuild, obviously).

> 
>> ---
> 
> 6) Add @ChangeLog.
> 
>> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
>> Issue: https://github.com/tarantool/tarantool/issues/5013
>>  src/box/box.cc                                |  6 +-
>>  src/lib/core/fiber_cond.c                     |  1 +
>>  test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
>>  test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
>>  .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
>>  5 files changed, 120 insertions(+), 4 deletions(-)
>>  create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
>>  create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
>>  create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua
> 
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 18568df3b..9e824453d 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
>>  {
>>  	double deadline = ev_monotonic_now(loop()) + timeout;
>>  	while (is_box_configured == false || box_is_ro() != ro) {
>> -		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
>> -			return -1;
>> -		if (fiber_is_cancelled()) {
>> -			diag_set(FiberIsCancelled);
>> +		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
>> +			if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
> 
> 7) According to https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/
> it's seems like you are trying to hide something)
> Use:
> ```
> if (condition)
>    action();
> ```
> Instead of:
> ```
> if (condition) action();
> another_action();
> ```

Fixed.

>>  			return -1;
>>  		}
>>  	}
>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
>> index 904a350d9..0c93c5842 100644
>> --- a/src/lib/core/fiber_cond.c
>> +++ b/src/lib/core/fiber_cond.c
>> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
>>  		diag_set(TimedOut);
>>  		return -1;
>>  	}
>> +	if (fiber_is_cancelled()) {
>> +		if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled);
> 
> 8) The same as previously:
> ```
> if (condition)
>    action();
> ```
> 

Fixed.

> 9) Checking diag on empty looks strange to me. I think we should add an
> error to diag without this check. If we want to save the previous error,
> I suggest to use the stack diag.

I made this because this interface is used inside the testing machinery, where 
exact cause of e.g. replica disconnection is reported. If we put the diag here
unconditionally it will break this use of diag. I believe the stack diag also 
is not supported there yet.


>> +		return -1;
>> +	}
>>  	return 0;
>>  }
>>  diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
>> new file mode 100755
>> index 000000000..ae805c5bf
>> --- /dev/null
>> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
>> @@ -0,0 +1,23 @@
>> +#!/usr/bin/env tarantool
>> +
>> +local tap = require('tap')
>> +local fiber = require('fiber')
>> +local test = tap.test("gh-5013-fiber-cancel")
>> +
>> +test:plan(2)
>> +
>> +local result = {}
>> +
>> +function test_f()
>> +    local cond = fiber.cond()
>> +    local res, err = pcall(cond.wait, cond)
>> +    result.res = res
>> +    result.err = err
>> +end
>> +
>> +local f = fiber.create(test_f)
>> +f:cancel()
>> +fiber.yield()
>> +
>> +test:ok(result.res == false, tostring(result.res))
>> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
> 
> 10) Use a user-frendly check message (in both checks).
> 
Fixed

>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
>> new file mode 100644
>> index 000000000..eab0a5e4d
>> --- /dev/null
>> +++ b/test/box/net.box_fiber_cancel_gh-4834.result
>> @@ -0,0 +1,65 @@
>> +-- test-run result file version 2
>> +remote = require 'net.box'
>> + | ---
>> + | ...
>> +fiber = require 'fiber'
>> + | ---
>> + | ...
>> +test_run = require('test_run').new()
>> + | ---
>> + | ...
>> +
>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
>> +function infinite_call() fiber.channel(1):get() end
>> + | ---
>> + | ...
>> +box.schema.func.create('infinite_call')
>> + | ---
>> + | ...
>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
>> + | ---
>> + | ...
>> +
>> +error_msg = nil
>> + | ---
>> + | ...
>> +test_run:cmd("setopt delimiter ';'")
>> + | ---
>> + | - true
>> + | ...
>> +function gh4834( > +    local cn = remote.connect(box.cfg.listen)
>> +    local f = fiber.new(function()
>> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
>> +    end)
>> +    f:set_joinable(true)
>> +    fiber.yield()
>> +    f:cancel()
>> +    f:join()
>> +    cn:close()
>> +end;
>> + | ---
>> + | ...
>> +test_run:cmd("setopt delimiter ''");
>> + | ---
>> + | - true
>> + | ...
>> +gh4834()
>> + | ---
>> + | ...
>> +error_msg
>> + | ---
>> + | - fiber is cancelled
>> + | ...
>> +box.schema.func.drop('infinite_call')
>> + | ---
>> + | ...
>> +infinite_call = nil
>> + | ---
>> + | ...
>> +channel = nil
>> + | ---
>> + | ...
>> +error_msg = nil
>> + | ---
>> + | ...
>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
>> new file mode 100644
>> index 000000000..06fb3ceac
>> --- /dev/null
>> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
>> @@ -0,0 +1,29 @@
>> +remote = require 'net.box'
>> +fiber = require 'fiber'
>> +test_run = require('test_run').new()
>> +
>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations
>> +function infinite_call() fiber.channel(1):get() end
>> +box.schema.func.create('infinite_call')
>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
>> +
>> +error_msg = nil
>> +test_run:cmd("setopt delimiter ';'")
>> +function gh4834()
> 
> 11) I think this is not a self-documenting name for the function.
> 
Renamed.

>> +    local cn = remote.connect(box.cfg.listen)
>> +    local f = fiber.new(function()
>> +        _, error_msg = pcall(cn.call, cn, 'infinite_call')
>> +    end)
>> +    f:set_joinable(true)
>> +    fiber.yield()
>> +    f:cancel()
>> +    f:join()
>> +    cn:close()
>> +end;
>> +test_run:cmd("setopt delimiter ''");
>> +gh4834()
>> +error_msg
>> +box.schema.func.drop('infinite_call')
>> +infinite_call = nil
>> +channel = nil
> 
> 12) You didn't introduce the variable "channel". In addition, I looked
> at a couple of tests and didn't see them cleaning up variables. But I
> don't mind.
> 
Removed this one. In general, clean up prevents flaky.

>> +error_msg = nil

=================

Before this patch fiber.cond():wait() just returns for cancelled
fiber. In contrast fiber.channel():get() threw "fiber is
canceled" error.
This patch unify behaviour of channels and condvars and also fixes
related net.box module problem - it was impossible to interrupt
net.box call with fiber.cancel because it used fiber.cond under
the hood. Test cases for both bugs are added.

Closes #4834
Closes #5013

Co-authored-by: Oleg Babin <olegrok at tarantool.org>

@TarantoolBot document
Title: fiber.cond():wait() throws if fiber is cancelled

Currently fiber.cond():wait() throws an error if waiting fiber is
cancelled.
---

Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond
Issue: https://github.com/tarantool/tarantool/issues/5013

@Changelog
* fiber.cond().wait() now throws if fiber is cancelled

 src/box/box.cc                                |  6 +-
 src/lib/core/fiber_cond.c                     |  4 ++
 test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
 test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
 .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
 5 files changed, 123 insertions(+), 4 deletions(-)
 create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
 create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
 create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index 18568df3b..9e824453d 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
 {
        double deadline = ev_monotonic_now(loop()) + timeout;
        while (is_box_configured == false || box_is_ro() != ro) {
-               if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
-                       return -1;
-               if (fiber_is_cancelled()) {
-                       diag_set(FiberIsCancelled);
+               if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
+                       if (fiber_is_cancelled()) diag_set(FiberIsCancelled);
                        return -1;
                }
        }
diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
index 904a350d9..0c93c5842 100644
--- a/src/lib/core/fiber_cond.c
+++ b/src/lib/core/fiber_cond.c
@@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
                diag_set(TimedOut);
                return -1;
        }
+       if (fiber_is_cancelled()) {
+               if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled);
+               return -1;
+       }
        return 0;
 }

diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua
new file mode 100755
index 000000000..34711fb31
--- /dev/null
+++ b/test/app-tap/gh-5013-fiber-cancel.test.lua
@@ -0,0 +1,23 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+local fiber = require('fiber')
+local test = tap.test("gh-5013-fiber-cancel")
+
+test:plan(2)
+
+local result = {}
+
+function test_f()
+    local cond = fiber.cond()
+    local res, err = pcall(cond.wait, cond)
+    result.res = res
+    result.err = err
+end
+
+local f = fiber.create(test_f)
+f:cancel()
+fiber.yield()
+
+test:ok(result.res == false, tostring(result.res))
+test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err))
diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result
new file mode 100644
index 000000000..eab0a5e4d
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.result
@@ -0,0 +1,65 @@
+-- test-run result file version 2
+remote = require 'net.box'
+ | ---
+ | ...
+fiber = require 'fiber'
+ | ---
+ | ...
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+ | ---
+ | ...
+box.schema.func.create('infinite_call')
+ | ---
+ | ...
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+ | ---
+ | ...
+
+error_msg = nil
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+function gh4834()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+gh4834()
+ | ---
+ | ...
+error_msg
+ | ---
+ | - fiber is cancelled
+ | ...
+box.schema.func.drop('infinite_call')
+ | ---
+ | ...
+infinite_call = nil
+ | ---
+ | ...
+channel = nil
+ | ---
+ | ...
+error_msg = nil
+ | ---
+ | ...
diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua
new file mode 100644
index 000000000..06fb3ceac
--- /dev/null
+++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua
@@ -0,0 +1,29 @@
+remote = require 'net.box'
+fiber = require 'fiber'
+test_run = require('test_run').new()
+
+-- #4834: Cancelling fiber doesn't interrupt netbox operations
+function infinite_call() fiber.channel(1):get() end
+box.schema.func.create('infinite_call')
+box.schema.user.grant('guest', 'execute', 'function', 'infinite_call')
+
+error_msg = nil
+test_run:cmd("setopt delimiter ';'")
+function gh4834()
+    local cn = remote.connect(box.cfg.listen)
+    local f = fiber.new(function()
+        _, error_msg = pcall(cn.call, cn, 'infinite_call')
+    end)
+    f:set_joinable(true)
+    fiber.yield()
+    f:cancel()
+    f:join()
+    cn:close()
+end;
+test_run:cmd("setopt delimiter ''");
+gh4834()
+error_msg
+box.schema.func.drop('infinite_call')
+infinite_call = nil
+channel = nil
+error_msg = nil
-- 
2.29.2




More information about the Tarantool-patches mailing list