From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 67EA8446439 for ; Wed, 4 Nov 2020 13:01:10 +0300 (MSK) References: <20201031162911.61876-1-sergos@tarantool.org> <20201103102018.GC517@tarantool.org> From: Leonid Vasiliev Message-ID: <3761ea24-1738-886b-e4b9-a2d30ac9b174@tarantool.org> Date: Wed, 4 Nov 2020 13:00:36 +0300 MIME-Version: 1.0 In-Reply-To: <20201103102018.GC517@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Ostanevich , Oleg Babin Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org, alexander.turenko@tarantool.org Continuation of the review. On 03.11.2020 13:20, Sergey Ostanevich wrote: > Hi Oleg! > > I believe the point about 'consistency' is not valid here. I put a > simple check that if diag is already set, then print it out. For the > fiber_cond_wait_timeout() it happened multiple times with various > reports, inlcuding this one: > > 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not > set the DIAG to FiberIsCancelled, original diag: Missing .xlog file > between LSN 5 {1: 5} and 6 {1: 6} > > that is used in the test system: > > test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = > 'loading'}) > > So, my resolution will be: it is wrong to set a diag in an arbitrary > place, without clear understanting of the reason. This is the case for > the cond_wait machinery, since it doesn't know _why_ the fiber is > cancelled. > > I propose to cover the case with condition on empty diag and call it a > day. > > regards, > Sergos > > ===== > > diff --git a/src/box/box.cc b/src/box/box.cc > index 18568df3b..9e824453d 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) > { > double deadline = ev_monotonic_now(loop()) + timeout; > while (is_box_configured == false || box_is_ro() != ro) { > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > - return -1; > - if (fiber_is_cancelled()) { > - diag_set(FiberIsCancelled); > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); 7) According to https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/ it's seems like you are trying to hide something) Use: ``` if (condition) action(); ``` Instead of: ``` if (condition) action(); another_action(); ``` > return -1; > } > } > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > index 904a350d9..0c93c5842 100644 > --- a/src/lib/core/fiber_cond.c > +++ b/src/lib/core/fiber_cond.c > @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > diag_set(TimedOut); > return -1; > } > + if (fiber_is_cancelled()) { > + if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled); 8) The same as previously: ``` if (condition) action(); ``` 9) Checking diag on empty looks strange to me. I think we should add an error to diag without this check. If we want to save the previous error, I suggest to use the stack diag. > + return -1; > + } > return 0; > } > > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua > new file mode 100755 > index 000000000..ae805c5bf > --- /dev/null > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua > @@ -0,0 +1,23 @@ > +#!/usr/bin/env tarantool > + > +local tap = require('tap') > +local fiber = require('fiber') > +local test = tap.test("gh-5013-fiber-cancel") > + > +test:plan(2) > + > +local result = {} > + > +function test_f() > + local cond = fiber.cond() > + local res, err = pcall(cond.wait, cond) > + result.res = res > + result.err = err > +end > + > +local f = fiber.create(test_f) > +f:cancel() > +fiber.yield() > + > +test:ok(result.res == false, tostring(result.res)) > +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) 10) Use a user-frendly check message (in both checks). > diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result > new file mode 100644 > index 000000000..eab0a5e4d > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.result > @@ -0,0 +1,65 @@ > +-- test-run result file version 2 > +remote = require 'net.box' > + | --- > + | ... > +fiber = require 'fiber' > + | --- > + | ... > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > + | --- > + | ... > +box.schema.func.create('infinite_call') > + | --- > + | ... > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + | --- > + | ... > + > +error_msg = nil > + | --- > + | ... > +test_run:cmd("setopt delimiter ';'") > + | --- > + | - true > + | ... > +function gh4834( > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > + | --- > + | ... > +test_run:cmd("setopt delimiter ''"); > + | --- > + | - true > + | ... > +gh4834() > + | --- > + | ... > +error_msg > + | --- > + | - fiber is cancelled > + | ... > +box.schema.func.drop('infinite_call') > + | --- > + | ... > +infinite_call = nil > + | --- > + | ... > +channel = nil > + | --- > + | ... > +error_msg = nil > + | --- > + | ... > diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua > new file mode 100644 > index 000000000..06fb3ceac > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua > @@ -0,0 +1,29 @@ > +remote = require 'net.box' > +fiber = require 'fiber' > +test_run = require('test_run').new() > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > +box.schema.func.create('infinite_call') > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + > +error_msg = nil > +test_run:cmd("setopt delimiter ';'") > +function gh4834() 11) I think this is not a self-documenting name for the function. > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > +test_run:cmd("setopt delimiter ''"); > +gh4834() > +error_msg > +box.schema.func.drop('infinite_call') > +infinite_call = nil > +channel = nil 12) You didn't introduce the variable "channel". In addition, I looked at a couple of tests and didn't see them cleaning up variables. But I don't mind. > +error_msg = nil > > > On 01 ноя 13:13, Oleg Babin wrote: >> Hi! Thanks for changes. See two comments below. >> >> On 31/10/2020 19:29, sergos@tarantool.org wrote: >>> From: Sergey Ostanevich >>> >>> Hi! >>> >>> Thanks to Oleg Babin's comment I found there's no need to update any lua >>> interfaces, since the reason was in C implementation. Also, there is one >>> place the change is played, so after I fixed it I got complete testing >>> pass. >>> Force-pushed branch, v2 patch attached. >>> >>> >>> >>> Before this patch fiber.cond():wait() just returns for cancelled >>> fiber. In contrast fiber.channel():get() threw "fiber is >>> canceled" error. >>> This patch unify behaviour of channels and condvars and also fixes >>> related net.box module problem - it was impossible to interrupt >>> net.box call with fiber.cancel because it used fiber.cond under >>> the hood. Test cases for both bugs are added. >>> >>> Closes #4834 >>> Closes #5013 >>> >>> Co-authored-by: Oleg Babin >>> >>> @TarantoolBot document >>> Title: fiber.cond():wait() throws if fiber is cancelled >>> >>> Currently fiber.cond():wait() throws an error if waiting fiber is >>> cancelled like in case with fiber.channel():get(). >>> --- >>> >>> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond >>> Issue: https://github.com/tarantool/tarantool/issues/5013 >>> >>> src/box/box.cc | 6 +- >>> src/lib/core/fiber_cond.c | 1 + >>> test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ >>> test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ >>> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ >>> 5 files changed, 120 insertions(+), 4 deletions(-) >>> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua >>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result >>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua >>> >>> diff --git a/src/box/box.cc b/src/box/box.cc >>> index 18568df3b..bfa1051f9 100644 >>> --- a/src/box/box.cc >>> +++ b/src/box/box.cc >>> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) >>> { >>> double deadline = ev_monotonic_now(loop()) + timeout; >>> while (is_box_configured == false || box_is_ro() != ro) { >>> - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) >>> - return -1; >>> - if (fiber_is_cancelled()) { >>> - diag_set(FiberIsCancelled); >>> + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { >>> + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); >> >> Here you use spaces instead of tabs. >> > > Fixed. > >>> return -1; >>> } >>> } >>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c >>> index 904a350d9..b0645069e 100644 >>> --- a/src/lib/core/fiber_cond.c >>> +++ b/src/lib/core/fiber_cond.c >>> @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) >>> diag_set(TimedOut); >>> return -1; >>> } >>> + if (fiber_is_cancelled()) return -1; >> >> It's qute strange to return -1 here but don't set a reason to diagnostic >> area. Look how it is done for channels >> >> (https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180). >> >> There is some inconsistency without it. >> >> I've looked a bit deeper at the failure I reported before. Seems the problem >> is in "cbus_unpair" function. >> >> The problem appears only if FiberIsCancelled is setted to diag area in >> "fiber_cond_wait" function. >> >> This is where my expertise ends, as I'm not familiar with "cbus". However I >> have some minds how it could be eliminated. >> >> Let's declare cbus_unpair fiber as is not cancellable and stop report >> is_cancellable flag for non-cancellable fibers. See some PoC below: >> >> >> diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c >> index 5d91fb948..4167c756a 100644 >> --- a/src/lib/core/cbus.c >> +++ b/src/lib/core/cbus.c >> @@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe >> *src_pipe, >>      msg.unpair_arg = unpair_arg; >>      msg.src_pipe = src_pipe; >>      msg.complete = false; >> +    fiber_set_cancellable(false); >>      fiber_cond_create(&msg.cond); >> >>      cpipe_push(dest_pipe, &msg.cmsg); >> @@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe >> *src_pipe, >>          fiber_cond_wait(&msg.cond); >>      } >> >> +    fiber_set_cancellable(true); >>      cpipe_destroy(dest_pipe); >>  } >> >> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c >> index 483ae3ce1..8100c9da6 100644 >> --- a/src/lib/core/fiber.c >> +++ b/src/lib/core/fiber.c >> @@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno) >>  bool >>  fiber_is_cancelled(void) >>  { >> +    if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) { >> +        return false; >> +    } >>      return fiber()->flags & FIBER_IS_CANCELLED; >>  } >> >> >> To be honest I've not checked such change carefully and also have segfault >> at replication/gc.test.lua for "memtx" engine. >> >> Finally, feel free to ignore this comment I hope Vlad or Sasha can give you >> more accurate and correct advices. >>