From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp39.i.mail.ru (smtp39.i.mail.ru [94.100.177.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 036E6469719 for ; Wed, 11 Nov 2020 00:16:18 +0300 (MSK) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) From: Sergey Ostanevich In-Reply-To: <7e2cf16f-f443-5a4b-d658-8d4e3ecd6f74@tarantool.org> Date: Wed, 11 Nov 2020 00:16:16 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20201031162911.61876-1-sergos@tarantool.org> <7e2cf16f-f443-5a4b-d658-8d4e3ecd6f74@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Leonid Vasiliev Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org, alexander.turenko@tarantool.org Hi Leonid, thank you for the review! I put two parts together to handle them in one patch. > On 4 Nov 2020, at 13:00, Leonid Vasiliev = wrote: >=20 > Hi! Thank you for the patch. > See some comments below: >=20 > 1) The patch changes undocumented behavior, AFAIU. > So, I have a question:"Do you plan to backport the patch to > tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the > changes. But if the answer is "No" - I will object, because in this = case > both behaviors must be supported in all modules. >=20 Yes, the plan is to push it down to 1.10 > 2) I think changing the behavior in C doesn't cause much of a problem, > because before when you wait without timeout, you don't need to check > the return value (it's always 0). But in Lua it will cause the = problems, > because now throws an error if cancelled and all wait calls should be > wrapped to pcall. Exactly it is the change to Lua and we intentionally make it, since = original behaviour is wrong. Good things are: fiber.cond has no big presence in = the /tarantool sourcebase and the cancellation of fiber itself means the = fiber has to quit. =20 >=20 > On 31.10.2020 19:29, sergos@tarantool.org wrote: >> From: Sergey Ostanevich >> Hi! >> Thanks to Oleg Babin's comment I found there's no need to update any = lua >> interfaces, since the reason was in C implementation. Also, there is = one >> place the change is played, so after I fixed it I got complete = testing >> pass. >> Force-pushed branch, v2 patch attached. >> Before this patch fiber.cond():wait() just returns for cancelled >> fiber. In contrast fiber.channel():get() threw "fiber is >> canceled" error. >> This patch unify behaviour of channels and condvars and also fixes >=20 > 3) behaviour -> behavior. Depends on spellchecker - mine sets it to this British version. >=20 >> related net.box module problem - it was impossible to interrupt >> net.box call with fiber.cancel because it used fiber.cond under >> the hood. Test cases for both bugs are added. >> Closes #4834 >> Closes #5013 >> Co-authored-by: Oleg Babin >> @TarantoolBot document >> Title: fiber.cond():wait() throws if fiber is cancelled >> Currently fiber.cond():wait() throws an error if waiting fiber is >> cancelled like in case with fiber.channel():get(). >=20 > 4) I don't think it's a good decision adding a comparison with > fiber.channel():get() to the documentation. Up to you. >=20 I=E2=80=99d agree to you: let=E2=80=99s just put the new behaviour here. > 5) Document the changes in module.h. Done (need to rebuild, obviously). >=20 >> --- >=20 > 6) Add @ChangeLog. >=20 >> Github: = https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond= >> Issue: https://github.com/tarantool/tarantool/issues/5013 >> src/box/box.cc | 6 +- >> src/lib/core/fiber_cond.c | 1 + >> test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ >> test/box/net.box_fiber_cancel_gh-4834.result | 65 = +++++++++++++++++++ >> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ >> 5 files changed, 120 insertions(+), 4 deletions(-) >> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua >> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result >> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua >=20 >> diff --git a/src/box/box.cc b/src/box/box.cc >> index 18568df3b..9e824453d 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) >> { >> double deadline =3D ev_monotonic_now(loop()) + timeout; >> while (is_box_configured =3D=3D false || box_is_ro() !=3D ro) { >> - if (fiber_cond_wait_deadline(&ro_cond, deadline) !=3D 0) >> - return -1; >> - if (fiber_is_cancelled()) { >> - diag_set(FiberIsCancelled); >> + if (fiber_cond_wait_deadline(&ro_cond, deadline) !=3D 0) = { >> + if (fiber_is_cancelled()) = diag_set(FiberIsCancelled); >=20 > 7) According to = https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/ > it's seems like you are trying to hide something) > Use: > ``` > if (condition) > action(); > ``` > Instead of: > ``` > if (condition) action(); > another_action(); > ``` Fixed. >> return -1; >> } >> } >> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c >> index 904a350d9..0c93c5842 100644 >> --- a/src/lib/core/fiber_cond.c >> +++ b/src/lib/core/fiber_cond.c >> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, = double timeout) >> diag_set(TimedOut); >> return -1; >> } >> + if (fiber_is_cancelled()) { >> + if (diag_is_empty(diag_get())) = diag_set(FiberIsCancelled); >=20 > 8) The same as previously: > ``` > if (condition) > action(); > ``` >=20 Fixed. > 9) Checking diag on empty looks strange to me. I think we should add = an > error to diag without this check. If we want to save the previous = error, > I suggest to use the stack diag. I made this because this interface is used inside the testing machinery, = where=20 exact cause of e.g. replica disconnection is reported. If we put the = diag here unconditionally it will break this use of diag. I believe the stack diag = also=20 is not supported there yet. >> + return -1; >> + } >> return 0; >> } >> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua = b/test/app-tap/gh-5013-fiber-cancel.test.lua >> new file mode 100755 >> index 000000000..ae805c5bf >> --- /dev/null >> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua >> @@ -0,0 +1,23 @@ >> +#!/usr/bin/env tarantool >> + >> +local tap =3D require('tap') >> +local fiber =3D require('fiber') >> +local test =3D tap.test("gh-5013-fiber-cancel") >> + >> +test:plan(2) >> + >> +local result =3D {} >> + >> +function test_f() >> + local cond =3D fiber.cond() >> + local res, err =3D pcall(cond.wait, cond) >> + result.res =3D res >> + result.err =3D err >> +end >> + >> +local f =3D fiber.create(test_f) >> +f:cancel() >> +fiber.yield() >> + >> +test:ok(result.res =3D=3D false, tostring(result.res)) >> +test:ok(tostring(result.err) =3D=3D 'fiber is cancelled', = tostring(result.err)) >=20 > 10) Use a user-frendly check message (in both checks). >=20 Fixed >> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result = b/test/box/net.box_fiber_cancel_gh-4834.result >> new file mode 100644 >> index 000000000..eab0a5e4d >> --- /dev/null >> +++ b/test/box/net.box_fiber_cancel_gh-4834.result >> @@ -0,0 +1,65 @@ >> +-- test-run result file version 2 >> +remote =3D require 'net.box' >> + | --- >> + | ... >> +fiber =3D require 'fiber' >> + | --- >> + | ... >> +test_run =3D require('test_run').new() >> + | --- >> + | ... >> + >> +-- #4834: Cancelling fiber doesn't interrupt netbox operations >> +function infinite_call() fiber.channel(1):get() end >> + | --- >> + | ... >> +box.schema.func.create('infinite_call') >> + | --- >> + | ... >> +box.schema.user.grant('guest', 'execute', 'function', = 'infinite_call') >> + | --- >> + | ... >> + >> +error_msg =3D nil >> + | --- >> + | ... >> +test_run:cmd("setopt delimiter ';'") >> + | --- >> + | - true >> + | ... >> +function gh4834( > + local cn =3D remote.connect(box.cfg.listen) >> + local f =3D fiber.new(function() >> + _, error_msg =3D pcall(cn.call, cn, 'infinite_call') >> + end) >> + f:set_joinable(true) >> + fiber.yield() >> + f:cancel() >> + f:join() >> + cn:close() >> +end; >> + | --- >> + | ... >> +test_run:cmd("setopt delimiter ''"); >> + | --- >> + | - true >> + | ... >> +gh4834() >> + | --- >> + | ... >> +error_msg >> + | --- >> + | - fiber is cancelled >> + | ... >> +box.schema.func.drop('infinite_call') >> + | --- >> + | ... >> +infinite_call =3D nil >> + | --- >> + | ... >> +channel =3D nil >> + | --- >> + | ... >> +error_msg =3D nil >> + | --- >> + | ... >> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua = b/test/box/net.box_fiber_cancel_gh-4834.test.lua >> new file mode 100644 >> index 000000000..06fb3ceac >> --- /dev/null >> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua >> @@ -0,0 +1,29 @@ >> +remote =3D require 'net.box' >> +fiber =3D require 'fiber' >> +test_run =3D require('test_run').new() >> + >> +-- #4834: Cancelling fiber doesn't interrupt netbox operations >> +function infinite_call() fiber.channel(1):get() end >> +box.schema.func.create('infinite_call') >> +box.schema.user.grant('guest', 'execute', 'function', = 'infinite_call') >> + >> +error_msg =3D nil >> +test_run:cmd("setopt delimiter ';'") >> +function gh4834() >=20 > 11) I think this is not a self-documenting name for the function. >=20 Renamed. >> + local cn =3D remote.connect(box.cfg.listen) >> + local f =3D fiber.new(function() >> + _, error_msg =3D pcall(cn.call, cn, 'infinite_call') >> + end) >> + f:set_joinable(true) >> + fiber.yield() >> + f:cancel() >> + f:join() >> + cn:close() >> +end; >> +test_run:cmd("setopt delimiter ''"); >> +gh4834() >> +error_msg >> +box.schema.func.drop('infinite_call') >> +infinite_call =3D nil >> +channel =3D nil >=20 > 12) You didn't introduce the variable "channel". In addition, I looked > at a couple of tests and didn't see them cleaning up variables. But I > don't mind. >=20 Removed this one. In general, clean up prevents flaky. >> +error_msg =3D nil =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Before this patch fiber.cond():wait() just returns for cancelled fiber. In contrast fiber.channel():get() threw "fiber is canceled" error. This patch unify behaviour of channels and condvars and also fixes related net.box module problem - it was impossible to interrupt net.box call with fiber.cancel because it used fiber.cond under the hood. Test cases for both bugs are added. Closes #4834 Closes #5013 Co-authored-by: Oleg Babin @TarantoolBot document Title: fiber.cond():wait() throws if fiber is cancelled Currently fiber.cond():wait() throws an error if waiting fiber is cancelled. --- Github: = https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond= Issue: https://github.com/tarantool/tarantool/issues/5013 @Changelog * fiber.cond().wait() now throws if fiber is cancelled src/box/box.cc | 6 +- src/lib/core/fiber_cond.c | 4 ++ test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ 5 files changed, 123 insertions(+), 4 deletions(-) create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua diff --git a/src/box/box.cc b/src/box/box.cc index 18568df3b..9e824453d 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) { double deadline =3D ev_monotonic_now(loop()) + timeout; while (is_box_configured =3D=3D false || box_is_ro() !=3D ro) { - if (fiber_cond_wait_deadline(&ro_cond, deadline) !=3D 0) - return -1; - if (fiber_is_cancelled()) { - diag_set(FiberIsCancelled); + if (fiber_cond_wait_deadline(&ro_cond, deadline) !=3D 0) = { + if (fiber_is_cancelled()) = diag_set(FiberIsCancelled); return -1; } } diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c index 904a350d9..0c93c5842 100644 --- a/src/lib/core/fiber_cond.c +++ b/src/lib/core/fiber_cond.c @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, = double timeout) diag_set(TimedOut); return -1; } + if (fiber_is_cancelled()) { + if (diag_is_empty(diag_get())) = diag_set(FiberIsCancelled); + return -1; + } return 0; } diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua = b/test/app-tap/gh-5013-fiber-cancel.test.lua new file mode 100755 index 000000000..34711fb31 --- /dev/null +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua @@ -0,0 +1,23 @@ +#!/usr/bin/env tarantool + +local tap =3D require('tap') +local fiber =3D require('fiber') +local test =3D tap.test("gh-5013-fiber-cancel") + +test:plan(2) + +local result =3D {} + +function test_f() + local cond =3D fiber.cond() + local res, err =3D pcall(cond.wait, cond) + result.res =3D res + result.err =3D err +end + +local f =3D fiber.create(test_f) +f:cancel() +fiber.yield() + +test:ok(result.res =3D=3D false, tostring(result.res)) +test:ok(tostring(result.err) =3D=3D 'fiber is cancelled', = tostring(result.err)) diff --git a/test/box/net.box_fiber_cancel_gh-4834.result = b/test/box/net.box_fiber_cancel_gh-4834.result new file mode 100644 index 000000000..eab0a5e4d --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.result @@ -0,0 +1,65 @@ +-- test-run result file version 2 +remote =3D require 'net.box' + | --- + | ... +fiber =3D require 'fiber' + | --- + | ... +test_run =3D require('test_run').new() + | --- + | ... + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end + | --- + | ... +box.schema.func.create('infinite_call') + | --- + | ... +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + | --- + | ... + +error_msg =3D nil + | --- + | ... +test_run:cmd("setopt delimiter ';'") + | --- + | - true + | ... +function gh4834() + local cn =3D remote.connect(box.cfg.listen) + local f =3D fiber.new(function() + _, error_msg =3D pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; + | --- + | ... +test_run:cmd("setopt delimiter ''"); + | --- + | - true + | ... +gh4834() + | --- + | ... +error_msg + | --- + | - fiber is cancelled + | ... +box.schema.func.drop('infinite_call') + | --- + | ... +infinite_call =3D nil + | --- + | ... +channel =3D nil + | --- + | ... +error_msg =3D nil + | --- + | ... diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua = b/test/box/net.box_fiber_cancel_gh-4834.test.lua new file mode 100644 index 000000000..06fb3ceac --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua @@ -0,0 +1,29 @@ +remote =3D require 'net.box' +fiber =3D require 'fiber' +test_run =3D require('test_run').new() + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end +box.schema.func.create('infinite_call') +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + +error_msg =3D nil +test_run:cmd("setopt delimiter ';'") +function gh4834() + local cn =3D remote.connect(box.cfg.listen) + local f =3D fiber.new(function() + _, error_msg =3D pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; +test_run:cmd("setopt delimiter ''"); +gh4834() +error_msg +box.schema.func.drop('infinite_call') +infinite_call =3D nil +channel =3D nil +error_msg =3D nil --=20 2.29.2