From: Leonid Vasiliev <lvasiliev@tarantool.org> To: Sergey Ostanevich <sergos@tarantool.org>, Oleg Babin <olegrok@tarantool.org> Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org, alexander.turenko@tarantool.org Subject: Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond Date: Wed, 4 Nov 2020 13:00:36 +0300 [thread overview] Message-ID: <3761ea24-1738-886b-e4b9-a2d30ac9b174@tarantool.org> (raw) In-Reply-To: <20201103102018.GC517@tarantool.org> Continuation of the review. On 03.11.2020 13:20, Sergey Ostanevich wrote: > Hi Oleg! > > I believe the point about 'consistency' is not valid here. I put a > simple check that if diag is already set, then print it out. For the > fiber_cond_wait_timeout() it happened multiple times with various > reports, inlcuding this one: > > 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not > set the DIAG to FiberIsCancelled, original diag: Missing .xlog file > between LSN 5 {1: 5} and 6 {1: 6} > > that is used in the test system: > > test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = > 'loading'}) > > So, my resolution will be: it is wrong to set a diag in an arbitrary > place, without clear understanting of the reason. This is the case for > the cond_wait machinery, since it doesn't know _why_ the fiber is > cancelled. > > I propose to cover the case with condition on empty diag and call it a > day. > > regards, > Sergos > > ===== > > diff --git a/src/box/box.cc b/src/box/box.cc > index 18568df3b..9e824453d 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) > { > double deadline = ev_monotonic_now(loop()) + timeout; > while (is_box_configured == false || box_is_ro() != ro) { > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > - return -1; > - if (fiber_is_cancelled()) { > - diag_set(FiberIsCancelled); > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); 7) According to https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/ it's seems like you are trying to hide something) Use: ``` if (condition) action(); ``` Instead of: ``` if (condition) action(); another_action(); ``` > return -1; > } > } > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > index 904a350d9..0c93c5842 100644 > --- a/src/lib/core/fiber_cond.c > +++ b/src/lib/core/fiber_cond.c > @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > diag_set(TimedOut); > return -1; > } > + if (fiber_is_cancelled()) { > + if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled); 8) The same as previously: ``` if (condition) action(); ``` 9) Checking diag on empty looks strange to me. I think we should add an error to diag without this check. If we want to save the previous error, I suggest to use the stack diag. > + return -1; > + } > return 0; > } > > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua > new file mode 100755 > index 000000000..ae805c5bf > --- /dev/null > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua > @@ -0,0 +1,23 @@ > +#!/usr/bin/env tarantool > + > +local tap = require('tap') > +local fiber = require('fiber') > +local test = tap.test("gh-5013-fiber-cancel") > + > +test:plan(2) > + > +local result = {} > + > +function test_f() > + local cond = fiber.cond() > + local res, err = pcall(cond.wait, cond) > + result.res = res > + result.err = err > +end > + > +local f = fiber.create(test_f) > +f:cancel() > +fiber.yield() > + > +test:ok(result.res == false, tostring(result.res)) > +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) 10) Use a user-frendly check message (in both checks). > diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result > new file mode 100644 > index 000000000..eab0a5e4d > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.result > @@ -0,0 +1,65 @@ > +-- test-run result file version 2 > +remote = require 'net.box' > + | --- > + | ... > +fiber = require 'fiber' > + | --- > + | ... > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > + | --- > + | ... > +box.schema.func.create('infinite_call') > + | --- > + | ... > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + | --- > + | ... > + > +error_msg = nil > + | --- > + | ... > +test_run:cmd("setopt delimiter ';'") > + | --- > + | - true > + | ... > +function gh4834( > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > + | --- > + | ... > +test_run:cmd("setopt delimiter ''"); > + | --- > + | - true > + | ... > +gh4834() > + | --- > + | ... > +error_msg > + | --- > + | - fiber is cancelled > + | ... > +box.schema.func.drop('infinite_call') > + | --- > + | ... > +infinite_call = nil > + | --- > + | ... > +channel = nil > + | --- > + | ... > +error_msg = nil > + | --- > + | ... > diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua > new file mode 100644 > index 000000000..06fb3ceac > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua > @@ -0,0 +1,29 @@ > +remote = require 'net.box' > +fiber = require 'fiber' > +test_run = require('test_run').new() > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > +box.schema.func.create('infinite_call') > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + > +error_msg = nil > +test_run:cmd("setopt delimiter ';'") > +function gh4834() 11) I think this is not a self-documenting name for the function. > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > +test_run:cmd("setopt delimiter ''"); > +gh4834() > +error_msg > +box.schema.func.drop('infinite_call') > +infinite_call = nil > +channel = nil 12) You didn't introduce the variable "channel". In addition, I looked at a couple of tests and didn't see them cleaning up variables. But I don't mind. > +error_msg = nil > > > On 01 ноя 13:13, Oleg Babin wrote: >> Hi! Thanks for changes. See two comments below. >> >> On 31/10/2020 19:29, sergos@tarantool.org wrote: >>> From: Sergey Ostanevich <sergos@tarantool.org> >>> >>> Hi! >>> >>> Thanks to Oleg Babin's comment I found there's no need to update any lua >>> interfaces, since the reason was in C implementation. Also, there is one >>> place the change is played, so after I fixed it I got complete testing >>> pass. >>> Force-pushed branch, v2 patch attached. >>> >>> >>> >>> Before this patch fiber.cond():wait() just returns for cancelled >>> fiber. In contrast fiber.channel():get() threw "fiber is >>> canceled" error. >>> This patch unify behaviour of channels and condvars and also fixes >>> related net.box module problem - it was impossible to interrupt >>> net.box call with fiber.cancel because it used fiber.cond under >>> the hood. Test cases for both bugs are added. >>> >>> Closes #4834 >>> Closes #5013 >>> >>> Co-authored-by: Oleg Babin <olegrok@tarantool.org> >>> >>> @TarantoolBot document >>> Title: fiber.cond():wait() throws if fiber is cancelled >>> >>> Currently fiber.cond():wait() throws an error if waiting fiber is >>> cancelled like in case with fiber.channel():get(). >>> --- >>> >>> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond >>> Issue: https://github.com/tarantool/tarantool/issues/5013 >>> >>> src/box/box.cc | 6 +- >>> src/lib/core/fiber_cond.c | 1 + >>> test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ >>> test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ >>> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ >>> 5 files changed, 120 insertions(+), 4 deletions(-) >>> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua >>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result >>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua >>> >>> diff --git a/src/box/box.cc b/src/box/box.cc >>> index 18568df3b..bfa1051f9 100644 >>> --- a/src/box/box.cc >>> +++ b/src/box/box.cc >>> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) >>> { >>> double deadline = ev_monotonic_now(loop()) + timeout; >>> while (is_box_configured == false || box_is_ro() != ro) { >>> - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) >>> - return -1; >>> - if (fiber_is_cancelled()) { >>> - diag_set(FiberIsCancelled); >>> + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { >>> + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); >> >> Here you use spaces instead of tabs. >> > > Fixed. > >>> return -1; >>> } >>> } >>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c >>> index 904a350d9..b0645069e 100644 >>> --- a/src/lib/core/fiber_cond.c >>> +++ b/src/lib/core/fiber_cond.c >>> @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) >>> diag_set(TimedOut); >>> return -1; >>> } >>> + if (fiber_is_cancelled()) return -1; >> >> It's qute strange to return -1 here but don't set a reason to diagnostic >> area. Look how it is done for channels >> >> (https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180). >> >> There is some inconsistency without it. >> >> I've looked a bit deeper at the failure I reported before. Seems the problem >> is in "cbus_unpair" function. >> >> The problem appears only if FiberIsCancelled is setted to diag area in >> "fiber_cond_wait" function. >> >> This is where my expertise ends, as I'm not familiar with "cbus". However I >> have some minds how it could be eliminated. >> >> Let's declare cbus_unpair fiber as is not cancellable and stop report >> is_cancellable flag for non-cancellable fibers. See some PoC below: >> >> >> diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c >> index 5d91fb948..4167c756a 100644 >> --- a/src/lib/core/cbus.c >> +++ b/src/lib/core/cbus.c >> @@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe >> *src_pipe, >> msg.unpair_arg = unpair_arg; >> msg.src_pipe = src_pipe; >> msg.complete = false; >> + fiber_set_cancellable(false); >> fiber_cond_create(&msg.cond); >> >> cpipe_push(dest_pipe, &msg.cmsg); >> @@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe >> *src_pipe, >> fiber_cond_wait(&msg.cond); >> } >> >> + fiber_set_cancellable(true); >> cpipe_destroy(dest_pipe); >> } >> >> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c >> index 483ae3ce1..8100c9da6 100644 >> --- a/src/lib/core/fiber.c >> +++ b/src/lib/core/fiber.c >> @@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno) >> bool >> fiber_is_cancelled(void) >> { >> + if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) { >> + return false; >> + } >> return fiber()->flags & FIBER_IS_CANCELLED; >> } >> >> >> To be honest I've not checked such change carefully and also have segfault >> at replication/gc.test.lua for "memtx" engine. >> >> Finally, feel free to ignore this comment I hope Vlad or Sasha can give you >> more accurate and correct advices. >>
next prev parent reply other threads:[~2020-11-04 10:01 UTC|newest] Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-31 16:29 sergos 2020-11-01 10:13 ` Oleg Babin 2020-11-03 10:20 ` Sergey Ostanevich 2020-11-03 10:27 ` Oleg Babin 2020-11-04 10:00 ` Leonid Vasiliev [this message] 2020-11-16 22:12 ` Vladislav Shpilevoy 2020-11-18 22:05 ` Sergey Ostanevich 2020-11-22 16:01 ` Vladislav Shpilevoy 2020-11-23 21:47 ` Sergey Ostanevich 2020-11-24 7:31 ` Sergey Ostanevich 2020-11-04 10:00 ` Leonid Vasiliev 2020-11-05 20:42 ` Sergey Ostanevich 2020-11-10 21:16 ` Sergey Ostanevich 2020-11-12 20:15 ` Sergey Ostanevich 2020-11-13 8:26 ` Leonid Vasiliev 2020-11-30 22:49 ` Alexander Turenko 2020-11-16 22:12 ` Vladislav Shpilevoy 2020-11-25 21:32 ` Vladislav Shpilevoy 2020-11-29 21:41 ` Sergey Ostanevich 2020-11-30 21:46 ` Alexander Turenko 2020-11-30 21:01 ` Vladislav Shpilevoy 2020-12-02 10:58 ` Alexander V. Tikhonov 2020-12-02 22:18 ` Vladislav Shpilevoy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=3761ea24-1738-886b-e4b9-a2d30ac9b174@tarantool.org \ --to=lvasiliev@tarantool.org \ --cc=alexander.turenko@tarantool.org \ --cc=olegrok@tarantool.org \ --cc=sergos@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox