From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp36.i.mail.ru (smtp36.i.mail.ru [94.100.177.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id A0F01469719 for ; Tue, 3 Nov 2020 13:20:21 +0300 (MSK) Date: Tue, 3 Nov 2020 13:20:18 +0300 From: Sergey Ostanevich Message-ID: <20201103102018.GC517@tarantool.org> References: <20201031162911.61876-1-sergos@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Oleg Babin Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org, alexander.turenko@tarantool.org Hi Oleg! I believe the point about 'consistency' is not valid here. I put a simple check that if diag is already set, then print it out. For the fiber_cond_wait_timeout() it happened multiple times with various reports, inlcuding this one: 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not set the DIAG to FiberIsCancelled, original diag: Missing .xlog file between LSN 5 {1: 5} and 6 {1: 6} that is used in the test system: test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'}) So, my resolution will be: it is wrong to set a diag in an arbitrary place, without clear understanting of the reason. This is the case for the cond_wait machinery, since it doesn't know _why_ the fiber is cancelled. I propose to cover the case with condition on empty diag and call it a day. regards, Sergos ===== diff --git a/src/box/box.cc b/src/box/box.cc index 18568df3b..9e824453d 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) { double deadline = ev_monotonic_now(loop()) + timeout; while (is_box_configured == false || box_is_ro() != ro) { - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) - return -1; - if (fiber_is_cancelled()) { - diag_set(FiberIsCancelled); + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); return -1; } } diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c index 904a350d9..0c93c5842 100644 --- a/src/lib/core/fiber_cond.c +++ b/src/lib/core/fiber_cond.c @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) diag_set(TimedOut); return -1; } + if (fiber_is_cancelled()) { + if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled); + return -1; + } return 0; } diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua new file mode 100755 index 000000000..ae805c5bf --- /dev/null +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua @@ -0,0 +1,23 @@ +#!/usr/bin/env tarantool + +local tap = require('tap') +local fiber = require('fiber') +local test = tap.test("gh-5013-fiber-cancel") + +test:plan(2) + +local result = {} + +function test_f() + local cond = fiber.cond() + local res, err = pcall(cond.wait, cond) + result.res = res + result.err = err +end + +local f = fiber.create(test_f) +f:cancel() +fiber.yield() + +test:ok(result.res == false, tostring(result.res)) +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result new file mode 100644 index 000000000..eab0a5e4d --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.result @@ -0,0 +1,65 @@ +-- test-run result file version 2 +remote = require 'net.box' + | --- + | ... +fiber = require 'fiber' + | --- + | ... +test_run = require('test_run').new() + | --- + | ... + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end + | --- + | ... +box.schema.func.create('infinite_call') + | --- + | ... +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + | --- + | ... + +error_msg = nil + | --- + | ... +test_run:cmd("setopt delimiter ';'") + | --- + | - true + | ... +function gh4834() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; + | --- + | ... +test_run:cmd("setopt delimiter ''"); + | --- + | - true + | ... +gh4834() + | --- + | ... +error_msg + | --- + | - fiber is cancelled + | ... +box.schema.func.drop('infinite_call') + | --- + | ... +infinite_call = nil + | --- + | ... +channel = nil + | --- + | ... +error_msg = nil + | --- + | ... diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua new file mode 100644 index 000000000..06fb3ceac --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua @@ -0,0 +1,29 @@ +remote = require 'net.box' +fiber = require 'fiber' +test_run = require('test_run').new() + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end +box.schema.func.create('infinite_call') +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + +error_msg = nil +test_run:cmd("setopt delimiter ';'") +function gh4834() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; +test_run:cmd("setopt delimiter ''"); +gh4834() +error_msg +box.schema.func.drop('infinite_call') +infinite_call = nil +channel = nil +error_msg = nil On 01 ноя 13:13, Oleg Babin wrote: > Hi! Thanks for changes. See two comments below. > > On 31/10/2020 19:29, sergos@tarantool.org wrote: > > From: Sergey Ostanevich > > > > Hi! > > > > Thanks to Oleg Babin's comment I found there's no need to update any lua > > interfaces, since the reason was in C implementation. Also, there is one > > place the change is played, so after I fixed it I got complete testing > > pass. > > Force-pushed branch, v2 patch attached. > > > > > > > > Before this patch fiber.cond():wait() just returns for cancelled > > fiber. In contrast fiber.channel():get() threw "fiber is > > canceled" error. > > This patch unify behaviour of channels and condvars and also fixes > > related net.box module problem - it was impossible to interrupt > > net.box call with fiber.cancel because it used fiber.cond under > > the hood. Test cases for both bugs are added. > > > > Closes #4834 > > Closes #5013 > > > > Co-authored-by: Oleg Babin > > > > @TarantoolBot document > > Title: fiber.cond():wait() throws if fiber is cancelled > > > > Currently fiber.cond():wait() throws an error if waiting fiber is > > cancelled like in case with fiber.channel():get(). > > --- > > > > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond > > Issue: https://github.com/tarantool/tarantool/issues/5013 > > > > src/box/box.cc | 6 +- > > src/lib/core/fiber_cond.c | 1 + > > test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ > > test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ > > .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ > > 5 files changed, 120 insertions(+), 4 deletions(-) > > create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua > > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result > > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua > > > > diff --git a/src/box/box.cc b/src/box/box.cc > > index 18568df3b..bfa1051f9 100644 > > --- a/src/box/box.cc > > +++ b/src/box/box.cc > > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) > > { > > double deadline = ev_monotonic_now(loop()) + timeout; > > while (is_box_configured == false || box_is_ro() != ro) { > > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > > - return -1; > > - if (fiber_is_cancelled()) { > > - diag_set(FiberIsCancelled); > > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > > + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); > > Here you use spaces instead of tabs. > Fixed. > > return -1; > > } > > } > > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > > index 904a350d9..b0645069e 100644 > > --- a/src/lib/core/fiber_cond.c > > +++ b/src/lib/core/fiber_cond.c > > @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > > diag_set(TimedOut); > > return -1; > > } > > + if (fiber_is_cancelled()) return -1; > > It's qute strange to return -1 here but don't set a reason to diagnostic > area. Look how it is done for channels > > (https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180). > > There is some inconsistency without it. > > I've looked a bit deeper at the failure I reported before. Seems the problem > is in "cbus_unpair" function. > > The problem appears only if FiberIsCancelled is setted to diag area in > "fiber_cond_wait" function. > > This is where my expertise ends, as I'm not familiar with "cbus". However I > have some minds how it could be eliminated. > > Let's declare cbus_unpair fiber as is not cancellable and stop report > is_cancellable flag for non-cancellable fibers. See some PoC below: > > > diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c > index 5d91fb948..4167c756a 100644 > --- a/src/lib/core/cbus.c > +++ b/src/lib/core/cbus.c > @@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe > *src_pipe, >      msg.unpair_arg = unpair_arg; >      msg.src_pipe = src_pipe; >      msg.complete = false; > +    fiber_set_cancellable(false); >      fiber_cond_create(&msg.cond); > >      cpipe_push(dest_pipe, &msg.cmsg); > @@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe > *src_pipe, >          fiber_cond_wait(&msg.cond); >      } > > +    fiber_set_cancellable(true); >      cpipe_destroy(dest_pipe); >  } > > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > index 483ae3ce1..8100c9da6 100644 > --- a/src/lib/core/fiber.c > +++ b/src/lib/core/fiber.c > @@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno) >  bool >  fiber_is_cancelled(void) >  { > +    if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) { > +        return false; > +    } >      return fiber()->flags & FIBER_IS_CANCELLED; >  } > > > To be honest I've not checked such change carefully and also have segfault > at replication/gc.test.lua for "memtx" engine. > > Finally, feel free to ignore this comment I hope Vlad or Sasha can give you > more accurate and correct advices. >