* [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond @ 2020-10-31 16:29 sergos 2020-11-01 10:13 ` Oleg Babin ` (6 more replies) 0 siblings, 7 replies; 23+ messages in thread From: sergos @ 2020-10-31 16:29 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, alexander.turenko From: Sergey Ostanevich <sergos@tarantool.org> Hi! Thanks to Oleg Babin's comment I found there's no need to update any lua interfaces, since the reason was in C implementation. Also, there is one place the change is played, so after I fixed it I got complete testing pass. Force-pushed branch, v2 patch attached. Before this patch fiber.cond():wait() just returns for cancelled fiber. In contrast fiber.channel():get() threw "fiber is canceled" error. This patch unify behaviour of channels and condvars and also fixes related net.box module problem - it was impossible to interrupt net.box call with fiber.cancel because it used fiber.cond under the hood. Test cases for both bugs are added. Closes #4834 Closes #5013 Co-authored-by: Oleg Babin <olegrok@tarantool.org> @TarantoolBot document Title: fiber.cond():wait() throws if fiber is cancelled Currently fiber.cond():wait() throws an error if waiting fiber is cancelled like in case with fiber.channel():get(). --- Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond Issue: https://github.com/tarantool/tarantool/issues/5013 src/box/box.cc | 6 +- src/lib/core/fiber_cond.c | 1 + test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ 5 files changed, 120 insertions(+), 4 deletions(-) create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua diff --git a/src/box/box.cc b/src/box/box.cc index 18568df3b..bfa1051f9 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) { double deadline = ev_monotonic_now(loop()) + timeout; while (is_box_configured == false || box_is_ro() != ro) { - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) - return -1; - if (fiber_is_cancelled()) { - diag_set(FiberIsCancelled); + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); return -1; } } diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c index 904a350d9..b0645069e 100644 --- a/src/lib/core/fiber_cond.c +++ b/src/lib/core/fiber_cond.c @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) diag_set(TimedOut); return -1; } + if (fiber_is_cancelled()) return -1; return 0; } diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua new file mode 100755 index 000000000..ae805c5bf --- /dev/null +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua @@ -0,0 +1,23 @@ +#!/usr/bin/env tarantool + +local tap = require('tap') +local fiber = require('fiber') +local test = tap.test("gh-5013-fiber-cancel") + +test:plan(2) + +local result = {} + +function test_f() + local cond = fiber.cond() + local res, err = pcall(cond.wait, cond) + result.res = res + result.err = err +end + +local f = fiber.create(test_f) +f:cancel() +fiber.yield() + +test:ok(result.res == false, tostring(result.res)) +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result new file mode 100644 index 000000000..eab0a5e4d --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.result @@ -0,0 +1,65 @@ +-- test-run result file version 2 +remote = require 'net.box' + | --- + | ... +fiber = require 'fiber' + | --- + | ... +test_run = require('test_run').new() + | --- + | ... + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end + | --- + | ... +box.schema.func.create('infinite_call') + | --- + | ... +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + | --- + | ... + +error_msg = nil + | --- + | ... +test_run:cmd("setopt delimiter ';'") + | --- + | - true + | ... +function gh4834() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; + | --- + | ... +test_run:cmd("setopt delimiter ''"); + | --- + | - true + | ... +gh4834() + | --- + | ... +error_msg + | --- + | - fiber is cancelled + | ... +box.schema.func.drop('infinite_call') + | --- + | ... +infinite_call = nil + | --- + | ... +channel = nil + | --- + | ... +error_msg = nil + | --- + | ... diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua new file mode 100644 index 000000000..06fb3ceac --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua @@ -0,0 +1,29 @@ +remote = require 'net.box' +fiber = require 'fiber' +test_run = require('test_run').new() + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end +box.schema.func.create('infinite_call') +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + +error_msg = nil +test_run:cmd("setopt delimiter ';'") +function gh4834() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; +test_run:cmd("setopt delimiter ''"); +gh4834() +error_msg +box.schema.func.drop('infinite_call') +infinite_call = nil +channel = nil +error_msg = nil -- 2.17.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos @ 2020-11-01 10:13 ` Oleg Babin 2020-11-03 10:20 ` Sergey Ostanevich 2020-11-04 10:00 ` Leonid Vasiliev ` (5 subsequent siblings) 6 siblings, 1 reply; 23+ messages in thread From: Oleg Babin @ 2020-11-01 10:13 UTC (permalink / raw) To: sergos, tarantool-patches; +Cc: v.shpilevoy, alexander.turenko [-- Attachment #1: Type: text/plain, Size: 4943 bytes --] Hi! Thanks for changes. See two comments below. On 31/10/2020 19:29, sergos@tarantool.org wrote: > From: Sergey Ostanevich <sergos@tarantool.org> > > Hi! > > Thanks to Oleg Babin's comment I found there's no need to update any lua > interfaces, since the reason was in C implementation. Also, there is one > place the change is played, so after I fixed it I got complete testing > pass. > Force-pushed branch, v2 patch attached. > > > > Before this patch fiber.cond():wait() just returns for cancelled > fiber. In contrast fiber.channel():get() threw "fiber is > canceled" error. > This patch unify behaviour of channels and condvars and also fixes > related net.box module problem - it was impossible to interrupt > net.box call with fiber.cancel because it used fiber.cond under > the hood. Test cases for both bugs are added. > > Closes #4834 > Closes #5013 > > Co-authored-by: Oleg Babin <olegrok@tarantool.org> > > @TarantoolBot document > Title: fiber.cond():wait() throws if fiber is cancelled > > Currently fiber.cond():wait() throws an error if waiting fiber is > cancelled like in case with fiber.channel():get(). > --- > > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond > Issue: https://github.com/tarantool/tarantool/issues/5013 > > src/box/box.cc | 6 +- > src/lib/core/fiber_cond.c | 1 + > test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ > test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ > .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ > 5 files changed, 120 insertions(+), 4 deletions(-) > create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua > > diff --git a/src/box/box.cc b/src/box/box.cc > index 18568df3b..bfa1051f9 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) > { > double deadline = ev_monotonic_now(loop()) + timeout; > while (is_box_configured == false || box_is_ro() != ro) { > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > - return -1; > - if (fiber_is_cancelled()) { > - diag_set(FiberIsCancelled); > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); Here you use spaces instead of tabs. > return -1; > } > } > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > index 904a350d9..b0645069e 100644 > --- a/src/lib/core/fiber_cond.c > +++ b/src/lib/core/fiber_cond.c > @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > diag_set(TimedOut); > return -1; > } > + if (fiber_is_cancelled()) return -1; It's qute strange to return -1 here but don't set a reason to diagnostic area. Look how it is done for channels (https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180). There is some inconsistency without it. I've looked a bit deeper at the failure I reported before. Seems the problem is in "cbus_unpair" function. The problem appears only if FiberIsCancelled is setted to diag area in "fiber_cond_wait" function. This is where my expertise ends, as I'm not familiar with "cbus". However I have some minds how it could be eliminated. Let's declare cbus_unpair fiber as is not cancellable and stop report is_cancellable flag for non-cancellable fibers. See some PoC below: diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c index 5d91fb948..4167c756a 100644 --- a/src/lib/core/cbus.c +++ b/src/lib/core/cbus.c @@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe *src_pipe, msg.unpair_arg = unpair_arg; msg.src_pipe = src_pipe; msg.complete = false; + fiber_set_cancellable(false); fiber_cond_create(&msg.cond); cpipe_push(dest_pipe, &msg.cmsg); @@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe *src_pipe, fiber_cond_wait(&msg.cond); } + fiber_set_cancellable(true); cpipe_destroy(dest_pipe); } diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 483ae3ce1..8100c9da6 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno) bool fiber_is_cancelled(void) { + if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) { + return false; + } return fiber()->flags & FIBER_IS_CANCELLED; } To be honest I've not checked such change carefully and also have segfault at replication/gc.test.lua for "memtx" engine. Finally, feel free to ignore this comment I hope Vlad or Sasha can give you more accurate and correct advices. [-- Attachment #2: Type: text/html, Size: 7437 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-01 10:13 ` Oleg Babin @ 2020-11-03 10:20 ` Sergey Ostanevich 2020-11-03 10:27 ` Oleg Babin ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Sergey Ostanevich @ 2020-11-03 10:20 UTC (permalink / raw) To: Oleg Babin; +Cc: tarantool-patches, v.shpilevoy, alexander.turenko Hi Oleg! I believe the point about 'consistency' is not valid here. I put a simple check that if diag is already set, then print it out. For the fiber_cond_wait_timeout() it happened multiple times with various reports, inlcuding this one: 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not set the DIAG to FiberIsCancelled, original diag: Missing .xlog file between LSN 5 {1: 5} and 6 {1: 6} that is used in the test system: test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'}) So, my resolution will be: it is wrong to set a diag in an arbitrary place, without clear understanting of the reason. This is the case for the cond_wait machinery, since it doesn't know _why_ the fiber is cancelled. I propose to cover the case with condition on empty diag and call it a day. regards, Sergos ===== diff --git a/src/box/box.cc b/src/box/box.cc index 18568df3b..9e824453d 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) { double deadline = ev_monotonic_now(loop()) + timeout; while (is_box_configured == false || box_is_ro() != ro) { - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) - return -1; - if (fiber_is_cancelled()) { - diag_set(FiberIsCancelled); + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); return -1; } } diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c index 904a350d9..0c93c5842 100644 --- a/src/lib/core/fiber_cond.c +++ b/src/lib/core/fiber_cond.c @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) diag_set(TimedOut); return -1; } + if (fiber_is_cancelled()) { + if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled); + return -1; + } return 0; } diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua new file mode 100755 index 000000000..ae805c5bf --- /dev/null +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua @@ -0,0 +1,23 @@ +#!/usr/bin/env tarantool + +local tap = require('tap') +local fiber = require('fiber') +local test = tap.test("gh-5013-fiber-cancel") + +test:plan(2) + +local result = {} + +function test_f() + local cond = fiber.cond() + local res, err = pcall(cond.wait, cond) + result.res = res + result.err = err +end + +local f = fiber.create(test_f) +f:cancel() +fiber.yield() + +test:ok(result.res == false, tostring(result.res)) +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result new file mode 100644 index 000000000..eab0a5e4d --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.result @@ -0,0 +1,65 @@ +-- test-run result file version 2 +remote = require 'net.box' + | --- + | ... +fiber = require 'fiber' + | --- + | ... +test_run = require('test_run').new() + | --- + | ... + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end + | --- + | ... +box.schema.func.create('infinite_call') + | --- + | ... +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + | --- + | ... + +error_msg = nil + | --- + | ... +test_run:cmd("setopt delimiter ';'") + | --- + | - true + | ... +function gh4834() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; + | --- + | ... +test_run:cmd("setopt delimiter ''"); + | --- + | - true + | ... +gh4834() + | --- + | ... +error_msg + | --- + | - fiber is cancelled + | ... +box.schema.func.drop('infinite_call') + | --- + | ... +infinite_call = nil + | --- + | ... +channel = nil + | --- + | ... +error_msg = nil + | --- + | ... diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua new file mode 100644 index 000000000..06fb3ceac --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua @@ -0,0 +1,29 @@ +remote = require 'net.box' +fiber = require 'fiber' +test_run = require('test_run').new() + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end +box.schema.func.create('infinite_call') +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + +error_msg = nil +test_run:cmd("setopt delimiter ';'") +function gh4834() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; +test_run:cmd("setopt delimiter ''"); +gh4834() +error_msg +box.schema.func.drop('infinite_call') +infinite_call = nil +channel = nil +error_msg = nil On 01 ноя 13:13, Oleg Babin wrote: > Hi! Thanks for changes. See two comments below. > > On 31/10/2020 19:29, sergos@tarantool.org wrote: > > From: Sergey Ostanevich <sergos@tarantool.org> > > > > Hi! > > > > Thanks to Oleg Babin's comment I found there's no need to update any lua > > interfaces, since the reason was in C implementation. Also, there is one > > place the change is played, so after I fixed it I got complete testing > > pass. > > Force-pushed branch, v2 patch attached. > > > > > > > > Before this patch fiber.cond():wait() just returns for cancelled > > fiber. In contrast fiber.channel():get() threw "fiber is > > canceled" error. > > This patch unify behaviour of channels and condvars and also fixes > > related net.box module problem - it was impossible to interrupt > > net.box call with fiber.cancel because it used fiber.cond under > > the hood. Test cases for both bugs are added. > > > > Closes #4834 > > Closes #5013 > > > > Co-authored-by: Oleg Babin <olegrok@tarantool.org> > > > > @TarantoolBot document > > Title: fiber.cond():wait() throws if fiber is cancelled > > > > Currently fiber.cond():wait() throws an error if waiting fiber is > > cancelled like in case with fiber.channel():get(). > > --- > > > > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond > > Issue: https://github.com/tarantool/tarantool/issues/5013 > > > > src/box/box.cc | 6 +- > > src/lib/core/fiber_cond.c | 1 + > > test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ > > test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ > > .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ > > 5 files changed, 120 insertions(+), 4 deletions(-) > > create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua > > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result > > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua > > > > diff --git a/src/box/box.cc b/src/box/box.cc > > index 18568df3b..bfa1051f9 100644 > > --- a/src/box/box.cc > > +++ b/src/box/box.cc > > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) > > { > > double deadline = ev_monotonic_now(loop()) + timeout; > > while (is_box_configured == false || box_is_ro() != ro) { > > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > > - return -1; > > - if (fiber_is_cancelled()) { > > - diag_set(FiberIsCancelled); > > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > > + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); > > Here you use spaces instead of tabs. > Fixed. > > return -1; > > } > > } > > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > > index 904a350d9..b0645069e 100644 > > --- a/src/lib/core/fiber_cond.c > > +++ b/src/lib/core/fiber_cond.c > > @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > > diag_set(TimedOut); > > return -1; > > } > > + if (fiber_is_cancelled()) return -1; > > It's qute strange to return -1 here but don't set a reason to diagnostic > area. Look how it is done for channels > > (https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180). > > There is some inconsistency without it. > > I've looked a bit deeper at the failure I reported before. Seems the problem > is in "cbus_unpair" function. > > The problem appears only if FiberIsCancelled is setted to diag area in > "fiber_cond_wait" function. > > This is where my expertise ends, as I'm not familiar with "cbus". However I > have some minds how it could be eliminated. > > Let's declare cbus_unpair fiber as is not cancellable and stop report > is_cancellable flag for non-cancellable fibers. See some PoC below: > > > diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c > index 5d91fb948..4167c756a 100644 > --- a/src/lib/core/cbus.c > +++ b/src/lib/core/cbus.c > @@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe > *src_pipe, > msg.unpair_arg = unpair_arg; > msg.src_pipe = src_pipe; > msg.complete = false; > + fiber_set_cancellable(false); > fiber_cond_create(&msg.cond); > > cpipe_push(dest_pipe, &msg.cmsg); > @@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe > *src_pipe, > fiber_cond_wait(&msg.cond); > } > > + fiber_set_cancellable(true); > cpipe_destroy(dest_pipe); > } > > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > index 483ae3ce1..8100c9da6 100644 > --- a/src/lib/core/fiber.c > +++ b/src/lib/core/fiber.c > @@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno) > bool > fiber_is_cancelled(void) > { > + if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) { > + return false; > + } > return fiber()->flags & FIBER_IS_CANCELLED; > } > > > To be honest I've not checked such change carefully and also have segfault > at replication/gc.test.lua for "memtx" engine. > > Finally, feel free to ignore this comment I hope Vlad or Sasha can give you > more accurate and correct advices. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-03 10:20 ` Sergey Ostanevich @ 2020-11-03 10:27 ` Oleg Babin 2020-11-04 10:00 ` Leonid Vasiliev 2020-11-16 22:12 ` Vladislav Shpilevoy 2 siblings, 0 replies; 23+ messages in thread From: Oleg Babin @ 2020-11-03 10:27 UTC (permalink / raw) To: Sergey Ostanevich; +Cc: tarantool-patches, v.shpilevoy, alexander.turenko Hi! Thanks for your comment and diff. On 03/11/2020 13:20, Sergey Ostanevich wrote: > So, my resolution will be: it is wrong to set a diag in an arbitrary > place, without clear understanting of the reason. This is the case for > the cond_wait machinery, since it doesn't know_why_ the fiber is > cancelled. > > I propose to cover the case with condition on empty diag and call it a > day. Sounds reasonable. It seems to be ok. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-03 10:20 ` Sergey Ostanevich 2020-11-03 10:27 ` Oleg Babin @ 2020-11-04 10:00 ` Leonid Vasiliev 2020-11-16 22:12 ` Vladislav Shpilevoy 2 siblings, 0 replies; 23+ messages in thread From: Leonid Vasiliev @ 2020-11-04 10:00 UTC (permalink / raw) To: Sergey Ostanevich, Oleg Babin Cc: tarantool-patches, v.shpilevoy, alexander.turenko Continuation of the review. On 03.11.2020 13:20, Sergey Ostanevich wrote: > Hi Oleg! > > I believe the point about 'consistency' is not valid here. I put a > simple check that if diag is already set, then print it out. For the > fiber_cond_wait_timeout() it happened multiple times with various > reports, inlcuding this one: > > 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not > set the DIAG to FiberIsCancelled, original diag: Missing .xlog file > between LSN 5 {1: 5} and 6 {1: 6} > > that is used in the test system: > > test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = > 'loading'}) > > So, my resolution will be: it is wrong to set a diag in an arbitrary > place, without clear understanting of the reason. This is the case for > the cond_wait machinery, since it doesn't know _why_ the fiber is > cancelled. > > I propose to cover the case with condition on empty diag and call it a > day. > > regards, > Sergos > > ===== > > diff --git a/src/box/box.cc b/src/box/box.cc > index 18568df3b..9e824453d 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) > { > double deadline = ev_monotonic_now(loop()) + timeout; > while (is_box_configured == false || box_is_ro() != ro) { > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > - return -1; > - if (fiber_is_cancelled()) { > - diag_set(FiberIsCancelled); > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); 7) According to https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/ it's seems like you are trying to hide something) Use: ``` if (condition) action(); ``` Instead of: ``` if (condition) action(); another_action(); ``` > return -1; > } > } > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > index 904a350d9..0c93c5842 100644 > --- a/src/lib/core/fiber_cond.c > +++ b/src/lib/core/fiber_cond.c > @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > diag_set(TimedOut); > return -1; > } > + if (fiber_is_cancelled()) { > + if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled); 8) The same as previously: ``` if (condition) action(); ``` 9) Checking diag on empty looks strange to me. I think we should add an error to diag without this check. If we want to save the previous error, I suggest to use the stack diag. > + return -1; > + } > return 0; > } > > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua > new file mode 100755 > index 000000000..ae805c5bf > --- /dev/null > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua > @@ -0,0 +1,23 @@ > +#!/usr/bin/env tarantool > + > +local tap = require('tap') > +local fiber = require('fiber') > +local test = tap.test("gh-5013-fiber-cancel") > + > +test:plan(2) > + > +local result = {} > + > +function test_f() > + local cond = fiber.cond() > + local res, err = pcall(cond.wait, cond) > + result.res = res > + result.err = err > +end > + > +local f = fiber.create(test_f) > +f:cancel() > +fiber.yield() > + > +test:ok(result.res == false, tostring(result.res)) > +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) 10) Use a user-frendly check message (in both checks). > diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result > new file mode 100644 > index 000000000..eab0a5e4d > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.result > @@ -0,0 +1,65 @@ > +-- test-run result file version 2 > +remote = require 'net.box' > + | --- > + | ... > +fiber = require 'fiber' > + | --- > + | ... > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > + | --- > + | ... > +box.schema.func.create('infinite_call') > + | --- > + | ... > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + | --- > + | ... > + > +error_msg = nil > + | --- > + | ... > +test_run:cmd("setopt delimiter ';'") > + | --- > + | - true > + | ... > +function gh4834( > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > + | --- > + | ... > +test_run:cmd("setopt delimiter ''"); > + | --- > + | - true > + | ... > +gh4834() > + | --- > + | ... > +error_msg > + | --- > + | - fiber is cancelled > + | ... > +box.schema.func.drop('infinite_call') > + | --- > + | ... > +infinite_call = nil > + | --- > + | ... > +channel = nil > + | --- > + | ... > +error_msg = nil > + | --- > + | ... > diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua > new file mode 100644 > index 000000000..06fb3ceac > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua > @@ -0,0 +1,29 @@ > +remote = require 'net.box' > +fiber = require 'fiber' > +test_run = require('test_run').new() > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > +box.schema.func.create('infinite_call') > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + > +error_msg = nil > +test_run:cmd("setopt delimiter ';'") > +function gh4834() 11) I think this is not a self-documenting name for the function. > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > +test_run:cmd("setopt delimiter ''"); > +gh4834() > +error_msg > +box.schema.func.drop('infinite_call') > +infinite_call = nil > +channel = nil 12) You didn't introduce the variable "channel". In addition, I looked at a couple of tests and didn't see them cleaning up variables. But I don't mind. > +error_msg = nil > > > On 01 ноя 13:13, Oleg Babin wrote: >> Hi! Thanks for changes. See two comments below. >> >> On 31/10/2020 19:29, sergos@tarantool.org wrote: >>> From: Sergey Ostanevich <sergos@tarantool.org> >>> >>> Hi! >>> >>> Thanks to Oleg Babin's comment I found there's no need to update any lua >>> interfaces, since the reason was in C implementation. Also, there is one >>> place the change is played, so after I fixed it I got complete testing >>> pass. >>> Force-pushed branch, v2 patch attached. >>> >>> >>> >>> Before this patch fiber.cond():wait() just returns for cancelled >>> fiber. In contrast fiber.channel():get() threw "fiber is >>> canceled" error. >>> This patch unify behaviour of channels and condvars and also fixes >>> related net.box module problem - it was impossible to interrupt >>> net.box call with fiber.cancel because it used fiber.cond under >>> the hood. Test cases for both bugs are added. >>> >>> Closes #4834 >>> Closes #5013 >>> >>> Co-authored-by: Oleg Babin <olegrok@tarantool.org> >>> >>> @TarantoolBot document >>> Title: fiber.cond():wait() throws if fiber is cancelled >>> >>> Currently fiber.cond():wait() throws an error if waiting fiber is >>> cancelled like in case with fiber.channel():get(). >>> --- >>> >>> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond >>> Issue: https://github.com/tarantool/tarantool/issues/5013 >>> >>> src/box/box.cc | 6 +- >>> src/lib/core/fiber_cond.c | 1 + >>> test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ >>> test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ >>> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ >>> 5 files changed, 120 insertions(+), 4 deletions(-) >>> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua >>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result >>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua >>> >>> diff --git a/src/box/box.cc b/src/box/box.cc >>> index 18568df3b..bfa1051f9 100644 >>> --- a/src/box/box.cc >>> +++ b/src/box/box.cc >>> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) >>> { >>> double deadline = ev_monotonic_now(loop()) + timeout; >>> while (is_box_configured == false || box_is_ro() != ro) { >>> - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) >>> - return -1; >>> - if (fiber_is_cancelled()) { >>> - diag_set(FiberIsCancelled); >>> + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { >>> + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); >> >> Here you use spaces instead of tabs. >> > > Fixed. > >>> return -1; >>> } >>> } >>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c >>> index 904a350d9..b0645069e 100644 >>> --- a/src/lib/core/fiber_cond.c >>> +++ b/src/lib/core/fiber_cond.c >>> @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) >>> diag_set(TimedOut); >>> return -1; >>> } >>> + if (fiber_is_cancelled()) return -1; >> >> It's qute strange to return -1 here but don't set a reason to diagnostic >> area. Look how it is done for channels >> >> (https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180). >> >> There is some inconsistency without it. >> >> I've looked a bit deeper at the failure I reported before. Seems the problem >> is in "cbus_unpair" function. >> >> The problem appears only if FiberIsCancelled is setted to diag area in >> "fiber_cond_wait" function. >> >> This is where my expertise ends, as I'm not familiar with "cbus". However I >> have some minds how it could be eliminated. >> >> Let's declare cbus_unpair fiber as is not cancellable and stop report >> is_cancellable flag for non-cancellable fibers. See some PoC below: >> >> >> diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c >> index 5d91fb948..4167c756a 100644 >> --- a/src/lib/core/cbus.c >> +++ b/src/lib/core/cbus.c >> @@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe >> *src_pipe, >> msg.unpair_arg = unpair_arg; >> msg.src_pipe = src_pipe; >> msg.complete = false; >> + fiber_set_cancellable(false); >> fiber_cond_create(&msg.cond); >> >> cpipe_push(dest_pipe, &msg.cmsg); >> @@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe >> *src_pipe, >> fiber_cond_wait(&msg.cond); >> } >> >> + fiber_set_cancellable(true); >> cpipe_destroy(dest_pipe); >> } >> >> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c >> index 483ae3ce1..8100c9da6 100644 >> --- a/src/lib/core/fiber.c >> +++ b/src/lib/core/fiber.c >> @@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno) >> bool >> fiber_is_cancelled(void) >> { >> + if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) { >> + return false; >> + } >> return fiber()->flags & FIBER_IS_CANCELLED; >> } >> >> >> To be honest I've not checked such change carefully and also have segfault >> at replication/gc.test.lua for "memtx" engine. >> >> Finally, feel free to ignore this comment I hope Vlad or Sasha can give you >> more accurate and correct advices. >> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-03 10:20 ` Sergey Ostanevich 2020-11-03 10:27 ` Oleg Babin 2020-11-04 10:00 ` Leonid Vasiliev @ 2020-11-16 22:12 ` Vladislav Shpilevoy 2020-11-18 22:05 ` Sergey Ostanevich 2 siblings, 1 reply; 23+ messages in thread From: Vladislav Shpilevoy @ 2020-11-16 22:12 UTC (permalink / raw) To: Sergey Ostanevich, Oleg Babin; +Cc: tarantool-patches, alexander.turenko On 03.11.2020 11:20, Sergey Ostanevich wrote: > Hi Oleg! > > I believe the point about 'consistency' is not valid here. I put a > simple check that if diag is already set, then print it out. For the > fiber_cond_wait_timeout() it happened multiple times with various > reports, inlcuding this one: > > 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not > set the DIAG to FiberIsCancelled, original diag: Missing .xlog file > between LSN 5 {1: 5} and 6 {1: 6} > > that is used in the test system: > > test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = > 'loading'}) > > So, my resolution will be: it is wrong to set a diag in an arbitrary > place, without clear understanting of the reason. This is the case for > the cond_wait machinery, since it doesn't know _why_ the fiber is > cancelled. It is a wrong resolution, IMO. You just hacked cond wait not to change the other places. It is not about tests. Tests only show what is provided by the internal subsystems. And if they depend on fiber cond not setting diag in case of a fail, then it looks wrong. I suggest you to fix the usage places, where the caller code thinks that cond_wait never sets a diag on cancellation. If a function fails, we set a diag. It is not a thing we do optionally. Otherwise you make it a bit simpler in this patch, but make it harder to work with the cond in future. Talking of your statement: I believe the stack diag also is not supported there yet. It is supported on the level of lib/core, i.e. everywhere. But is not present on 1.10. However it is not the point. The point is that it is not needed here. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-16 22:12 ` Vladislav Shpilevoy @ 2020-11-18 22:05 ` Sergey Ostanevich 2020-11-22 16:01 ` Vladislav Shpilevoy 0 siblings, 1 reply; 23+ messages in thread From: Sergey Ostanevich @ 2020-11-18 22:05 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches, alexander.turenko Hi Vlad! I put your comments and my patch from the second mail here to keep one thread - see below. Thanks, Sergos > On 17 Nov 2020, at 01:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: > > On 03.11.2020 11:20, Sergey Ostanevich wrote: >> Hi Oleg! >> >> I believe the point about 'consistency' is not valid here. I put a >> simple check that if diag is already set, then print it out. For the >> fiber_cond_wait_timeout() it happened multiple times with various >> reports, inlcuding this one: >> >> 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not >> set the DIAG to FiberIsCancelled, original diag: Missing .xlog file >> between LSN 5 {1: 5} and 6 {1: 6} >> >> that is used in the test system: >> >> test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = >> 'loading'}) >> >> So, my resolution will be: it is wrong to set a diag in an arbitrary >> place, without clear understanting of the reason. This is the case for >> the cond_wait machinery, since it doesn't know _why_ the fiber is >> cancelled. > > It is a wrong resolution, IMO. You just hacked cond wait not to change the > other places. It is not about tests. Tests only show what is provided by the > internal subsystems. And if they depend on fiber cond not setting diag in > case of a fail, then it looks wrong. > Actually I didn’t make fiber_cond not setting a diag, rather preserve the original one if it is present. As for correct or not - you’re way more savvy in Tarantool sources than I, so it’s hard for me to oppose. Still I’d try. For the case of the test I see the relay_process_wal_event() catches from the recover_remaining_wals() and sets the XlogGapError for the relay. Note, the relay_set_error() uses exactly the same semantics - sets the error in case it was not set before. Then it happily cancels the fiber and returns and eventually appears in relay_subscribe() where it throws (via diag_raise) the latest diag of the fiber. It appears, that along the way there are multiple places the fiber diag can be reset, so the best I can propose is to use the relay’s diag. Also, to enforce this strategy we need a follow-up ticket to ensure the relay’s diag is always set by the relay_subscribe() exit. > I suggest you to fix the usage places, where the caller code thinks that > cond_wait never sets a diag on cancellation. > > If a function fails, we set a diag. It is not a thing we do optionally. > Otherwise you make it a bit simpler in this patch, but make it harder to > work with the cond in future. > > Talking of your statement: > > I believe the stack diag also is not supported there yet. > > It is supported on the level of lib/core, i.e. everywhere. But is not > present on 1.10. However it is not the point. The point is that it is not > needed here. > On 17 Nov 2020, at 01:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: > > Hi! Thanks for the patch! > > Please, change subsystem name to 'fiber:'. 'core:' is too general. > We have tons of stuff in libcore in addition to fibers. > Done. >> Before this patch fiber.cond():wait() just returns for cancelled >> fiber. In contrast fiber.channel():get() threw "fiber is >> canceled" error. >> This patch unify behaviour of channels and condvars and also fixes >> related net.box module problem - it was impossible to interrupt >> net.box call with fiber.cancel because it used fiber.cond under >> the hood. Test cases for both bugs are added. > > Netbox hangs not because of using fiber.cond. But because it never > calls testcancel(). Normally all looped fibers should do that. > I believe the fix is an indirect result, since all 3 wait() calls in net_box.lua aren't covered with a pcall() - it errors out. I reproduced it with the following: --- a/src/box/lua/net_box.lua +++ b/src/box/lua/net_box.lua @@ -414,7 +417,11 @@ local function create_transport(host, port, user, password, callback, -- waiting client is waked up prematurely. while timeout > 0 and not self:is_ready() do local ts = fiber.clock() - self.cond:wait(timeout) + local ok, err = pcall(self.cond.wait, self.cond, timeout) + if (not ok) then + print('net_box.lua:423 thrown, err: ' .. tostring(err)) + error(err) + end timeout = timeout - (fiber.clock() - ts) end if not self:is_ready() then Note, that if I don’t put the explicit error() there, the test from the patch hangs. To me it looks like testcancel(), since it raises error the same way. Although, there could be more places in net_box where testcancel() can be called - still it is beyond this ticket. > Talking of compatibility - I think it always was supposed to throw. > luaT_fiber_cond_wait() calls luaL_testcancel(), but only when > fiber_cond_wait_timeout() returns not 0, which was never the case for > cancellation. So it was supposed to throw, but nobody covered it with > a test. > > See 2 comments below. > >> Closes #4834 >> Closes #5013 >> >> Co-authored-by: Oleg Babin <olegrok@tarantool.org> >> diff --git a/src/box/box.cc b/src/box/box.cc >> index 18568df3b..29f74e94b 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout) >> { >> double deadline = ev_monotonic_now(loop()) + timeout; >> while (is_box_configured == false || box_is_ro() != ro) { >> - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) >> - return -1; >> - if (fiber_is_cancelled()) { >> - diag_set(FiberIsCancelled); >> + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { >> + if (fiber_is_cancelled()) >> + diag_set(FiberIsCancelled); >> return -1; >> } >> } >> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c >> index 904a350d9..cc59eaafb 100644 >> --- a/src/lib/core/fiber_cond.c >> +++ b/src/lib/core/fiber_cond.c >> @@ -108,6 +108,11 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) >> diag_set(TimedOut); >> return -1; >> } >> + if (fiber_is_cancelled()) { >> + if (diag_is_empty(diag_get())) >> + diag_set(FiberIsCancelled); >> + return -1; > > 1. Wtf? Why don't you set an error, when this is an error? And why do > you do exactly the same in box_wait_ro() above? > Fixed as per first part of review. >> + } >> return 0; >> }> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result >> new file mode 100644 >> index 000000000..4ed04bb61 >> --- /dev/null >> +++ b/test/box/net.box_fiber_cancel_gh-4834.result > > 2. This does not conform to our test file name pattern. Read this > carefully: https://github.com/tarantool/tarantool/wiki/Code-review-procedure#testing Fixed. Although, we have a big legacy of similarly named tests under test/box/ Branch is force-pushed, updated patch is below. =============== From 5e6426340f4f5af8429c4273f1b251f503c6dd9b Mon Sep 17 00:00:00 2001 From: Sergey Ostanevich <sergos@tarantool.org> Date: Tue, 3 Nov 2020 12:52:26 +0300 Subject: [PATCH] fiber: handle fiber cancellation for fiber.cond Before this patch fiber.cond():wait() just returns for cancelled fiber. In contrast fiber.channel():get() threw "fiber is canceled" error. This patch unify behaviour of channels and condvars and also fixes related net.box module problem - it was impossible to interrupt net.box call with fiber.cancel because it used fiber.cond under the hood. Test cases for both bugs are added. Closes #4834 Closes #5013 Co-authored-by: Oleg Babin <olegrok@tarantool.org> @TarantoolBot document Title: fiber.cond():wait() throws if fiber is cancelled Currently fiber.cond():wait() throws an error if waiting fiber is cancelled. --- Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond Issue: https://github.com/tarantool/tarantool/issues/5013 @Changelog * fiber.cond().wait() now throws if fiber is cancelled src/box/box.cc | 7 +-- src/box/relay.cc | 12 +++- src/lib/core/fiber_cond.c | 4 ++ src/lib/core/fiber_cond.h | 2 +- test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ test/box/gh-4834-netbox-fiber-cancel.result | 62 +++++++++++++++++++ test/box/gh-4834-netbox-fiber-cancel.test.lua | 28 +++++++++ 7 files changed, 132 insertions(+), 6 deletions(-) create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua create mode 100644 test/box/gh-4834-netbox-fiber-cancel.result create mode 100644 test/box/gh-4834-netbox-fiber-cancel.test.lua diff --git a/src/box/box.cc b/src/box/box.cc index 1f7dec362..aa23dcc13 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout) { double deadline = ev_monotonic_now(loop()) + timeout; while (is_box_configured == false || box_is_ro() != ro) { - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) - return -1; - if (fiber_is_cancelled()) { - diag_set(FiberIsCancelled); + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { + if (fiber_is_cancelled()) + diag_set(FiberIsCancelled); return -1; } } diff --git a/src/box/relay.cc b/src/box/relay.cc index 1e77e0d9b..cce139c87 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -821,8 +821,18 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync, relay_subscribe_f, relay); if (rc == 0) rc = cord_cojoin(&relay->cord); - if (rc != 0) + if (rc != 0) { + /* + * We should always raise a problem from relay itself, not all + * other modules that are change diag in the current fiber. + * TODO: investigate why and how we can leave the relay_subscribe_f + * with diag unset in the relay. + */ + if (diag_last_error(&relay->diag)) { + diag_set_error(diag_get(), diag_last_error(&relay->diag)); + } diag_raise(); + } } static void diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c index 904a350d9..71bb2d04d 100644 --- a/src/lib/core/fiber_cond.c +++ b/src/lib/core/fiber_cond.c @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) diag_set(TimedOut); return -1; } + if (fiber_is_cancelled()) { + diag_set(FiberIsCancelled); + return -1; + } return 0; } diff --git a/src/lib/core/fiber_cond.h b/src/lib/core/fiber_cond.h index 87c6f2ca2..2662e0654 100644 --- a/src/lib/core/fiber_cond.h +++ b/src/lib/core/fiber_cond.h @@ -114,7 +114,7 @@ fiber_cond_broadcast(struct fiber_cond *cond); * @param cond condition * @param timeout timeout in seconds * @retval 0 on fiber_cond_signal() call or a spurious wake up - * @retval -1 on timeout, diag is set to TimedOut + * @retval -1 on timeout or fiber cancellation, diag is set */ int fiber_cond_wait_timeout(struct fiber_cond *cond, double timeout); diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua new file mode 100755 index 000000000..ca4ca2c90 --- /dev/null +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua @@ -0,0 +1,23 @@ +#!/usr/bin/env tarantool + +local tap = require('tap') +local fiber = require('fiber') +local test = tap.test("gh-5013-fiber-cancel") + +test:plan(2) + +local result = {} + +function test_f() + local cond = fiber.cond() + local res, err = pcall(cond.wait, cond) + result.res = res + result.err = err +end + +local f = fiber.create(test_f) +f:cancel() +fiber.yield() + +test:ok(result.res == false, 'expected result is false') +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported') diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result new file mode 100644 index 000000000..4ed04bb61 --- /dev/null +++ b/test/box/gh-4834-netbox-fiber-cancel.result @@ -0,0 +1,62 @@ +-- test-run result file version 2 +remote = require 'net.box' + | --- + | ... +fiber = require 'fiber' + | --- + | ... +test_run = require('test_run').new() + | --- + | ... + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end + | --- + | ... +box.schema.func.create('infinite_call') + | --- + | ... +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + | --- + | ... + +error_msg = nil + | --- + | ... +test_run:cmd("setopt delimiter ';'") + | --- + | - true + | ... +function netbox_runner() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; + | --- + | ... +test_run:cmd("setopt delimiter ''"); + | --- + | - true + | ... +netbox_runner() + | --- + | ... +error_msg + | --- + | - fiber is cancelled + | ... +box.schema.func.drop('infinite_call') + | --- + | ... +infinite_call = nil + | --- + | ... +error_msg = nil + | --- + | ... diff --git a/test/box/gh-4834-netbox-fiber-cancel.test.lua b/test/box/gh-4834-netbox-fiber-cancel.test.lua new file mode 100644 index 000000000..bc0e5af6e --- /dev/null +++ b/test/box/gh-4834-netbox-fiber-cancel.test.lua @@ -0,0 +1,28 @@ +remote = require 'net.box' +fiber = require 'fiber' +test_run = require('test_run').new() + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end +box.schema.func.create('infinite_call') +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + +error_msg = nil +test_run:cmd("setopt delimiter ';'") +function netbox_runner() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; +test_run:cmd("setopt delimiter ''"); +netbox_runner() +error_msg +box.schema.func.drop('infinite_call') +infinite_call = nil +error_msg = nil -- 2.24.3 (Apple Git-128) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-18 22:05 ` Sergey Ostanevich @ 2020-11-22 16:01 ` Vladislav Shpilevoy 2020-11-23 21:47 ` Sergey Ostanevich 2020-11-24 7:31 ` Sergey Ostanevich 0 siblings, 2 replies; 23+ messages in thread From: Vladislav Shpilevoy @ 2020-11-22 16:01 UTC (permalink / raw) To: Sergey Ostanevich; +Cc: tarantool-patches, alexander.turenko Hi! Thanks for the changes! Technically almost good. >> On 17 Nov 2020, at 01:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: >> >> On 03.11.2020 11:20, Sergey Ostanevich wrote: >>> Hi Oleg! >>> >>> I believe the point about 'consistency' is not valid here. I put a >>> simple check that if diag is already set, then print it out. For the >>> fiber_cond_wait_timeout() it happened multiple times with various >>> reports, inlcuding this one: >>> >>> 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not >>> set the DIAG to FiberIsCancelled, original diag: Missing .xlog file >>> between LSN 5 {1: 5} and 6 {1: 6} >>> >>> that is used in the test system: >>> >>> test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = >>> 'loading'}) >>> >>> So, my resolution will be: it is wrong to set a diag in an arbitrary >>> place, without clear understanting of the reason. This is the case for >>> the cond_wait machinery, since it doesn't know _why_ the fiber is >>> cancelled. >> >> It is a wrong resolution, IMO. You just hacked cond wait not to change the >> other places. It is not about tests. Tests only show what is provided by the >> internal subsystems. And if they depend on fiber cond not setting diag in >> case of a fail, then it looks wrong. >> > > Actually I didn’t make fiber_cond not setting a diag, rather preserve the > original one if it is present. Yes, by not setting a diag. >>> Before this patch fiber.cond():wait() just returns for cancelled >>> fiber. In contrast fiber.channel():get() threw "fiber is >>> canceled" error. >>> This patch unify behaviour of channels and condvars and also fixes >>> related net.box module problem - it was impossible to interrupt >>> net.box call with fiber.cancel because it used fiber.cond under >>> the hood. Test cases for both bugs are added. >> >> Netbox hangs not because of using fiber.cond. But because it never >> calls testcancel(). Normally all looped fibers should do that. >> > I believe the fix is an indirect result, since all 3 wait() calls in > net_box.lua aren't covered with a pcall() - it errors out. I reproduced > it with the following: I know that your patch fixes it. But I was talking about the statement in the commit message. You said: it was impossible to interrupt net.box call with fiber.cancel because it used fiber.cond under the hood And it still uses the cond, but does not hang anymore. Therefore, the statement obviously isn't true. The fact of cond usage is not a reason. See 4 comments below. > diff --git a/src/box/box.cc b/src/box/box.cc > index 1f7dec362..aa23dcc13 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout) > { > double deadline = ev_monotonic_now(loop()) + timeout; > while (is_box_configured == false || box_is_ro() != ro) { > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > - return -1; > - if (fiber_is_cancelled()) { > - diag_set(FiberIsCancelled); > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > + if (fiber_is_cancelled()) > + diag_set(FiberIsCancelled); 1. Why do you need this diag_set here? The cancellation diag is already set in fiber_cond_wait_deadline(). Why do you set it again? > return -1; > } > } > diff --git a/src/box/relay.cc b/src/box/relay.cc > index 1e77e0d9b..cce139c87 100644 > --- a/src/box/relay.cc > +++ b/src/box/relay.cc > @@ -821,8 +821,18 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync, > relay_subscribe_f, relay); > if (rc == 0) > rc = cord_cojoin(&relay->cord); > - if (rc != 0) > + if (rc != 0) { > + /* > + * We should always raise a problem from relay itself, not all > + * other modules that are change diag in the current fiber. > + * TODO: investigate why and how we can leave the relay_subscribe_f > + * with diag unset in the relay. > + */ > + if (diag_last_error(&relay->diag)) { 2. Please, use != NULL, according to our code style. https://github.com/tarantool/tarantool/wiki/Code-review-procedure#code-style > + diag_set_error(diag_get(), diag_last_error(&relay->diag)); > + } 3. What is the test where diag_last_error() is NULL? I added an assertion, that it is not NULL, and the tests passed (except a couple of tests which always fail on my machine due to too long UNIX socket path). Also in the end of relay_subscribe_f() there is an existing assert(!diag_is_empty(&relay->diag)). So it can't be NULL. Or do you have a test? The only possible issue I see is that diag_set_error(diag_get(), diag_last_error(&relay->diag)); is called too early in relay_subscribe_f(). After that the diag can be rewritten somehow. For instance, by fiber_join(reader). So probably you can move this diag move to the end right before 'return -1;', and remove this hunk entirely. In a separate commit in the same branch, since it is not related to fiber_cond bug directly. > diag_raise(); > + } > } > diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result > new file mode 100644 > index 000000000..4ed04bb61 > --- /dev/null > +++ b/test/box/gh-4834-netbox-fiber-cancel.result > @@ -0,0 +1,62 @@ > +-- test-run result file version 2 > +remote = require 'net.box' > + | --- > + | ... > +fiber = require 'fiber' 4. Please, use () for require. You even did it in the other test file, but here you changed your mind somewhy. Or it is bad copy-paste. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-22 16:01 ` Vladislav Shpilevoy @ 2020-11-23 21:47 ` Sergey Ostanevich 2020-11-24 7:31 ` Sergey Ostanevich 1 sibling, 0 replies; 23+ messages in thread From: Sergey Ostanevich @ 2020-11-23 21:47 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches, alexander.turenko Hi! Thanks for review! > On 22 Nov 2020, at 19:01, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: > > Hi! Thanks for the changes! > > Technically almost good. > >>> On 17 Nov 2020, at 01:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: >>> >>> On 03.11.2020 11:20, Sergey Ostanevich wrote: >>>> Hi Oleg! >>>> >>>> I believe the point about 'consistency' is not valid here. I put a >>>> simple check that if diag is already set, then print it out. For the >>>> fiber_cond_wait_timeout() it happened multiple times with various >>>> reports, inlcuding this one: >>>> >>>> 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not >>>> set the DIAG to FiberIsCancelled, original diag: Missing .xlog file >>>> between LSN 5 {1: 5} and 6 {1: 6} >>>> >>>> that is used in the test system: >>>> >>>> test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = >>>> 'loading'}) >>>> >>>> So, my resolution will be: it is wrong to set a diag in an arbitrary >>>> place, without clear understanting of the reason. This is the case for >>>> the cond_wait machinery, since it doesn't know _why_ the fiber is >>>> cancelled. >>> >>> It is a wrong resolution, IMO. You just hacked cond wait not to change the >>> other places. It is not about tests. Tests only show what is provided by the >>> internal subsystems. And if they depend on fiber cond not setting diag in >>> case of a fail, then it looks wrong. >>> >> >> Actually I didn’t make fiber_cond not setting a diag, rather preserve the >> original one if it is present. > > Yes, by not setting a diag. > >>>> Before this patch fiber.cond():wait() just returns for cancelled >>>> fiber. In contrast fiber.channel():get() threw "fiber is >>>> canceled" error. >>>> This patch unify behaviour of channels and condvars and also fixes >>>> related net.box module problem - it was impossible to interrupt >>>> net.box call with fiber.cancel because it used fiber.cond under >>>> the hood. Test cases for both bugs are added. >>> >>> Netbox hangs not because of using fiber.cond. But because it never >>> calls testcancel(). Normally all looped fibers should do that. >>> >> I believe the fix is an indirect result, since all 3 wait() calls in >> net_box.lua aren't covered with a pcall() - it errors out. I reproduced >> it with the following: > > I know that your patch fixes it. But I was talking about the statement in > the commit message. You said: > > sergos/gh-5013-fiber-cond > because it used fiber.cond under the hood > > And it still uses the cond, but does not hang anymore. Therefore, the > statement obviously isn't true. The fact of cond usage is not a reason. > Well, it might mislead, I see. Let me rephrase it. > See 4 comments below. > >> diff --git a/src/box/box.cc b/src/box/box.cc >> index 1f7dec362..aa23dcc13 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout) >> { >> double deadline = ev_monotonic_now(loop()) + timeout; >> while (is_box_configured == false || box_is_ro() != ro) { >> - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) >> - return -1; >> - if (fiber_is_cancelled()) { >> - diag_set(FiberIsCancelled); >> + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { >> + if (fiber_is_cancelled()) >> + diag_set(FiberIsCancelled); > > 1. Why do you need this diag_set here? The cancellation diag is already > set in fiber_cond_wait_deadline(). Why do you set it again? It's an artefact appeared during the solution. You’re right, it is not needed, removed. > >> return -1; >> } >> } >> diff --git a/src/box/relay.cc b/src/box/relay.cc >> index 1e77e0d9b..cce139c87 100644 >> --- a/src/box/relay.cc >> +++ b/src/box/relay.cc >> @@ -821,8 +821,18 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync, >> relay_subscribe_f, relay); >> if (rc == 0) >> rc = cord_cojoin(&relay->cord); >> - if (rc != 0) >> + if (rc != 0) { >> + /* >> + * We should always raise a problem from relay itself, not all >> + * other modules that are change diag in the current fiber. >> + * TODO: investigate why and how we can leave the relay_subscribe_f >> + * with diag unset in the relay. >> + */ >> + if (diag_last_error(&relay->diag)) { > > 2. Please, use != NULL, according to our code style. > https://github.com/tarantool/tarantool/wiki/Code-review-procedure#code-style > I prefer to fix this along with the comment #3 >> + diag_set_error(diag_get(), diag_last_error(&relay->diag)); >> + } > > 3. What is the test where diag_last_error() is NULL? I added > an assertion, that it is not NULL, and the tests passed (except a > couple of tests which always fail on my machine due to too long UNIX > socket path). > > Also in the end of relay_subscribe_f() there is an existing > assert(!diag_is_empty(&relay->diag)). So it can't be NULL. Or do you > have a test? > > The only possible issue I see is that > > diag_set_error(diag_get(), diag_last_error(&relay->diag)); > > is called too early inrelay_subscribe_f(). After that the diag > can be rewritten somehow. For instance, by fiber_join(reader). So > probably you can move this diag move to the end right before 'return -1;', > and remove this hunk entirely. In a separate commit in the same branch, > since it is not related to fiber_cond bug directly. > You’re right - the assert was there for long time, so should not happen at all. I move the diag_set_error, still I have a concern - the relay_exit(relay) at the very end of the relay_subscribe_f() can raise down in the call chain at the recovery_stop_local(). It can happen only if fiber from recovery watcher has some error set, which is not clear to me yet. > >> diag_raise(); >> + } >> } >> diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result >> new file mode 100644 >> index 000000000..4ed04bb61 >> --- /dev/null >> +++ b/test/box/gh-4834-netbox-fiber-cancel.result >> @@ -0,0 +1,62 @@ >> +-- test-run result file version 2 >> +remote = require 'net.box' >> + | --- >> + | ... >> +fiber = require 'fiber' > > 4. Please, use () for require. You even did it in the other test file, > but here you changed your mind somewhy. Or it is bad copy-paste. Done. ================ From e8bd26bb26c76e0a722958db1fe4763c027dcb8b Mon Sep 17 00:00:00 2001 From: Sergey Ostanevich <sergos@tarantool.org> Date: Tue, 3 Nov 2020 12:52:26 +0300 Subject: [PATCH] fiber: handle fiber cancellation for fiber.cond Before this patch fiber.cond():wait() just returns for cancelled fiber. In contrast fiber.channel():get() throws "fiber is canceled" error. This patch unifies behaviour of channels and condvars. It also fixes a related net.box module problem #4834 since fiber.cond now performs test for fiber cancellation. Closes #4834 Closes #5013 Co-authored-by: Oleg Babin <olegrok@tarantool.org> @TarantoolBot document Title: fiber.cond():wait() throws if fiber is cancelled Currently fiber.cond():wait() throws an error if waiting fiber is cancelled. --- Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond Issue: https://github.com/tarantool/tarantool/issues/5013 src/box/box.cc | 4 -- src/box/relay.cc | 19 +++--- src/lib/core/fiber_cond.c | 4 ++ src/lib/core/fiber_cond.h | 2 +- test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ test/box/gh-4834-netbox-fiber-cancel.result | 62 +++++++++++++++++++ test/box/gh-4834-netbox-fiber-cancel.test.lua | 28 +++++++++ 7 files changed, 128 insertions(+), 14 deletions(-) create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua create mode 100644 test/box/gh-4834-netbox-fiber-cancel.result create mode 100644 test/box/gh-4834-netbox-fiber-cancel.test.lua diff --git a/src/box/box.cc b/src/box/box.cc index 18568df3b..111432937 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -307,10 +307,6 @@ box_wait_ro(bool ro, double timeout) while (is_box_configured == false || box_is_ro() != ro) { if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) return -1; - if (fiber_is_cancelled()) { - diag_set(FiberIsCancelled); - return -1; - } } return 0; } diff --git a/src/box/relay.cc b/src/box/relay.cc index b68b45e00..a7bc2c6f7 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -753,15 +753,6 @@ relay_subscribe_f(va_list ap) if (!relay->replica->anon) relay_send_is_raft_enabled(relay, &raft_enabler, false); - /* - * Log the error that caused the relay to break the loop. - * Don't clear the error for status reporting. - */ - assert(!diag_is_empty(&relay->diag)); - diag_set_error(diag_get(), diag_last_error(&relay->diag)); - diag_log(); - say_crit("exiting the relay loop"); - /* * Clear garbage collector trigger and WAL watcher. * trigger_clear() does nothing in case the triggers @@ -780,6 +771,16 @@ relay_subscribe_f(va_list ap) cbus_endpoint_destroy(&relay->endpoint, cbus_process); relay_exit(relay); + + /* + * Log the error that caused the relay to break the loop. + * Don't clear the error for status reporting. + */ + assert(!diag_is_empty(&relay->diag)); + diag_set_error(diag_get(), diag_last_error(&relay->diag)); + diag_log(); + say_crit("exiting the relay loop"); + return -1; } diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c index 904a350d9..71bb2d04d 100644 --- a/src/lib/core/fiber_cond.c +++ b/src/lib/core/fiber_cond.c @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) diag_set(TimedOut); return -1; } + if (fiber_is_cancelled()) { + diag_set(FiberIsCancelled); + return -1; + } return 0; } diff --git a/src/lib/core/fiber_cond.h b/src/lib/core/fiber_cond.h index 87c6f2ca2..2662e0654 100644 --- a/src/lib/core/fiber_cond.h +++ b/src/lib/core/fiber_cond.h @@ -114,7 +114,7 @@ fiber_cond_broadcast(struct fiber_cond *cond); * @param cond condition * @param timeout timeout in seconds * @retval 0 on fiber_cond_signal() call or a spurious wake up - * @retval -1 on timeout, diag is set to TimedOut + * @retval -1 on timeout or fiber cancellation, diag is set */ int fiber_cond_wait_timeout(struct fiber_cond *cond, double timeout); diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua new file mode 100755 index 000000000..ca4ca2c90 --- /dev/null +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua @@ -0,0 +1,23 @@ +#!/usr/bin/env tarantool + +local tap = require('tap') +local fiber = require('fiber') +local test = tap.test("gh-5013-fiber-cancel") + +test:plan(2) + +local result = {} + +function test_f() + local cond = fiber.cond() + local res, err = pcall(cond.wait, cond) + result.res = res + result.err = err +end + +local f = fiber.create(test_f) +f:cancel() +fiber.yield() + +test:ok(result.res == false, 'expected result is false') +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported') diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result new file mode 100644 index 000000000..4ed04bb61 --- /dev/null +++ b/test/box/gh-4834-netbox-fiber-cancel.result @@ -0,0 +1,62 @@ +-- test-run result file version 2 +remote = require 'net.box' + | --- + | ... +fiber = require 'fiber' + | --- + | ... +test_run = require('test_run').new() + | --- + | ... + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end + | --- + | ... +box.schema.func.create('infinite_call') + | --- + | ... +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + | --- + | ... + +error_msg = nil + | --- + | ... +test_run:cmd("setopt delimiter ';'") + | --- + | - true + | ... +function netbox_runner() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; + | --- + | ... +test_run:cmd("setopt delimiter ''"); + | --- + | - true + | ... +netbox_runner() + | --- + | ... +error_msg + | --- + | - fiber is cancelled + | ... +box.schema.func.drop('infinite_call') + | --- + | ... +infinite_call = nil + | --- + | ... +error_msg = nil + | --- + | ... diff --git a/test/box/gh-4834-netbox-fiber-cancel.test.lua b/test/box/gh-4834-netbox-fiber-cancel.test.lua new file mode 100644 index 000000000..dbeff083b --- /dev/null +++ b/test/box/gh-4834-netbox-fiber-cancel.test.lua @@ -0,0 +1,28 @@ +remote = require('net.box') +fiber = require('fiber') +test_run = require('test_run').new() + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end +box.schema.func.create('infinite_call') +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + +error_msg = nil +test_run:cmd("setopt delimiter ';'") +function netbox_runner() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; +test_run:cmd("setopt delimiter ''"); +netbox_runner() +error_msg +box.schema.func.drop('infinite_call') +infinite_call = nil +error_msg = nil -- 2.24.3 (Apple Git-128) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-22 16:01 ` Vladislav Shpilevoy 2020-11-23 21:47 ` Sergey Ostanevich @ 2020-11-24 7:31 ` Sergey Ostanevich 1 sibling, 0 replies; 23+ messages in thread From: Sergey Ostanevich @ 2020-11-24 7:31 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches, Alexander Turenko Yet another fix-up because of test update: --- a/test/box/gh-4834-netbox-fiber-cancel.result +++ b/test/box/gh-4834-netbox-fiber-cancel.result @@ -1,8 +1,8 @@ -- test-run result file version 2 -remote = require 'net.box' +remote = require('net.box') | --- | ... -fiber = require 'fiber' +fiber = require('fiber') | --- | ... test_run = require('test_run').new() Force-pushed. > On 22 Nov 2020, at 19:01, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: > > Hi! Thanks for the changes! > > Technically almost good. > >>> On 17 Nov 2020, at 01:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: >>> >>> On 03.11.2020 11:20, Sergey Ostanevich wrote: >>>> Hi Oleg! >>>> >>>> I believe the point about 'consistency' is not valid here. I put a >>>> simple check that if diag is already set, then print it out. For the >>>> fiber_cond_wait_timeout() it happened multiple times with various >>>> reports, inlcuding this one: >>>> >>>> 2020-11-03 10:28:01.630 [72411] relay/unix/:(socket)/101/main C> Did not >>>> set the DIAG to FiberIsCancelled, original diag: Missing .xlog file >>>> between LSN 5 {1: 5} and 6 {1: 6} >>>> >>>> that is used in the test system: >>>> >>>> test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = >>>> 'loading'}) >>>> >>>> So, my resolution will be: it is wrong to set a diag in an arbitrary >>>> place, without clear understanting of the reason. This is the case for >>>> the cond_wait machinery, since it doesn't know _why_ the fiber is >>>> cancelled. >>> >>> It is a wrong resolution, IMO. You just hacked cond wait not to change the >>> other places. It is not about tests. Tests only show what is provided by the >>> internal subsystems. And if they depend on fiber cond not setting diag in >>> case of a fail, then it looks wrong. >>> >> >> Actually I didn’t make fiber_cond not setting a diag, rather preserve the >> original one if it is present. > > Yes, by not setting a diag. > >>>> Before this patch fiber.cond():wait() just returns for cancelled >>>> fiber. In contrast fiber.channel():get() threw "fiber is >>>> canceled" error. >>>> This patch unify behaviour of channels and condvars and also fixes >>>> related net.box module problem - it was impossible to interrupt >>>> net.box call with fiber.cancel because it used fiber.cond under >>>> the hood. Test cases for both bugs are added. >>> >>> Netbox hangs not because of using fiber.cond. But because it never >>> calls testcancel(). Normally all looped fibers should do that. >>> >> I believe the fix is an indirect result, since all 3 wait() calls in >> net_box.lua aren't covered with a pcall() - it errors out. I reproduced >> it with the following: > > I know that your patch fixes it. But I was talking about the statement in > the commit message. You said: > > it was impossible to interrupt net.box call with fiber.cancel > because it used fiber.cond under the hood > > And it still uses the cond, but does not hang anymore. Therefore, the > statement obviously isn't true. The fact of cond usage is not a reason. > > See 4 comments below. > >> diff --git a/src/box/box.cc b/src/box/box.cc >> index 1f7dec362..aa23dcc13 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout) >> { >> double deadline = ev_monotonic_now(loop()) + timeout; >> while (is_box_configured == false || box_is_ro() != ro) { >> - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) >> - return -1; >> - if (fiber_is_cancelled()) { >> - diag_set(FiberIsCancelled); >> + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { >> + if (fiber_is_cancelled()) >> + diag_set(FiberIsCancelled); > > 1. Why do you need this diag_set here? The cancellation diag is already > set in fiber_cond_wait_deadline(). Why do you set it again? > >> return -1; >> } >> } >> diff --git a/src/box/relay.cc b/src/box/relay.cc >> index 1e77e0d9b..cce139c87 100644 >> --- a/src/box/relay.cc >> +++ b/src/box/relay.cc >> @@ -821,8 +821,18 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync, >> relay_subscribe_f, relay); >> if (rc == 0) >> rc = cord_cojoin(&relay->cord); >> - if (rc != 0) >> + if (rc != 0) { >> + /* >> + * We should always raise a problem from relay itself, not all >> + * other modules that are change diag in the current fiber. >> + * TODO: investigate why and how we can leave the relay_subscribe_f >> + * with diag unset in the relay. >> + */ >> + if (diag_last_error(&relay->diag)) { > > 2. Please, use != NULL, according to our code style. > https://github.com/tarantool/tarantool/wiki/Code-review-procedure#code-style > >> + diag_set_error(diag_get(), diag_last_error(&relay->diag)); >> + } > > 3. What is the test where diag_last_error() is NULL? I added > an assertion, that it is not NULL, and the tests passed (except a > couple of tests which always fail on my machine due to too long UNIX > socket path). > > Also in the end of relay_subscribe_f() there is an existing > assert(!diag_is_empty(&relay->diag)). So it can't be NULL. Or do you > have a test? > > The only possible issue I see is that > > diag_set_error(diag_get(), diag_last_error(&relay->diag)); > > is called too early in relay_subscribe_f(). After that the diag > can be rewritten somehow. For instance, by fiber_join(reader). So > probably you can move this diag move to the end right before 'return -1;', > and remove this hunk entirely. In a separate commit in the same branch, > since it is not related to fiber_cond bug directly. > >> diag_raise(); >> + } >> } >> diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result >> new file mode 100644 >> index 000000000..4ed04bb61 >> --- /dev/null >> +++ b/test/box/gh-4834-netbox-fiber-cancel.result >> @@ -0,0 +1,62 @@ >> +-- test-run result file version 2 >> +remote = require 'net.box' >> + | --- >> + | ... >> +fiber = require 'fiber' > > 4. Please, use () for require. You even did it in the other test file, > but here you changed your mind somewhy. Or it is bad copy-paste. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos 2020-11-01 10:13 ` Oleg Babin @ 2020-11-04 10:00 ` Leonid Vasiliev 2020-11-05 20:42 ` Sergey Ostanevich 2020-11-10 21:16 ` Sergey Ostanevich 2020-11-16 22:12 ` Vladislav Shpilevoy ` (4 subsequent siblings) 6 siblings, 2 replies; 23+ messages in thread From: Leonid Vasiliev @ 2020-11-04 10:00 UTC (permalink / raw) To: sergos, tarantool-patches; +Cc: v.shpilevoy, alexander.turenko Hi! Thank you for the patch. See some comments below: 1) The patch changes undocumented behavior, AFAIU. So, I have a question:"Do you plan to backport the patch to tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the changes. But if the answer is "No" - I will object, because in this case both behaviors must be supported in all modules. 2) I think changing the behavior in C doesn't cause much of a problem, because before when you wait without timeout, you don't need to check the return value (it's always 0). But in Lua it will cause the problems, because now throws an error if cancelled and all wait calls should be wrapped to pcall. On 31.10.2020 19:29, sergos@tarantool.org wrote: > From: Sergey Ostanevich <sergos@tarantool.org> > > Hi! > > Thanks to Oleg Babin's comment I found there's no need to update any lua > interfaces, since the reason was in C implementation. Also, there is one > place the change is played, so after I fixed it I got complete testing > pass. > Force-pushed branch, v2 patch attached. > > > > Before this patch fiber.cond():wait() just returns for cancelled > fiber. In contrast fiber.channel():get() threw "fiber is > canceled" error. > This patch unify behaviour of channels and condvars and also fixes 3) behaviour -> behavior. > related net.box module problem - it was impossible to interrupt > net.box call with fiber.cancel because it used fiber.cond under > the hood. Test cases for both bugs are added. > > Closes #4834 > Closes #5013 > > Co-authored-by: Oleg Babin <olegrok@tarantool.org> > > @TarantoolBot document > Title: fiber.cond():wait() throws if fiber is cancelled > > Currently fiber.cond():wait() throws an error if waiting fiber is > cancelled like in case with fiber.channel():get(). 4) I don't think it's a good decision adding a comparison with fiber.channel():get() to the documentation. Up to you. 5) Document the changes in module.h. > --- 6) Add @ChangeLog. > > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond > Issue: https://github.com/tarantool/tarantool/issues/5013 > > src/box/box.cc | 6 +- > src/lib/core/fiber_cond.c | 1 + > test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ > test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ > .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ > 5 files changed, 120 insertions(+), 4 deletions(-) > create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua > > diff --git a/src/box/box.cc b/src/box/box.cc > index 18568df3b..bfa1051f9 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) > { > double deadline = ev_monotonic_now(loop()) + timeout; > while (is_box_configured == false || box_is_ro() != ro) { > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > - return -1; > - if (fiber_is_cancelled()) { > - diag_set(FiberIsCancelled); > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); > return -1; > } > } > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > index 904a350d9..b0645069e 100644 > --- a/src/lib/core/fiber_cond.c > +++ b/src/lib/core/fiber_cond.c > @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > diag_set(TimedOut); > return -1; > } > + if (fiber_is_cancelled()) return -1; > return 0; > } > > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua > new file mode 100755 > index 000000000..ae805c5bf > --- /dev/null > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua > @@ -0,0 +1,23 @@ > +#!/usr/bin/env tarantool > + > +local tap = require('tap') > +local fiber = require('fiber') > +local test = tap.test("gh-5013-fiber-cancel") > + > +test:plan(2) > + > +local result = {} > + > +function test_f() > + local cond = fiber.cond() > + local res, err = pcall(cond.wait, cond) > + result.res = res > + result.err = err > +end > + > +local f = fiber.create(test_f) > +f:cancel() > +fiber.yield() > + > +test:ok(result.res == false, tostring(result.res)) > +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) > diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result > new file mode 100644 > index 000000000..eab0a5e4d > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.result > @@ -0,0 +1,65 @@ > +-- test-run result file version 2 > +remote = require 'net.box' > + | --- > + | ... > +fiber = require 'fiber' > + | --- > + | ... > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > + | --- > + | ... > +box.schema.func.create('infinite_call') > + | --- > + | ... > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + | --- > + | ... > + > +error_msg = nil > + | --- > + | ... > +test_run:cmd("setopt delimiter ';'") > + | --- > + | - true > + | ... > +function gh4834() > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > + | --- > + | ... > +test_run:cmd("setopt delimiter ''"); > + | --- > + | - true > + | ... > +gh4834() > + | --- > + | ... > +error_msg > + | --- > + | - fiber is cancelled > + | ... > +box.schema.func.drop('infinite_call') > + | --- > + | ... > +infinite_call = nil > + | --- > + | ... > +channel = nil > + | --- > + | ... > +error_msg = nil > + | --- > + | ... > diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua > new file mode 100644 > index 000000000..06fb3ceac > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua > @@ -0,0 +1,29 @@ > +remote = require 'net.box' > +fiber = require 'fiber' > +test_run = require('test_run').new() > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > +box.schema.func.create('infinite_call') > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + > +error_msg = nil > +test_run:cmd("setopt delimiter ';'") > +function gh4834() > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > +test_run:cmd("setopt delimiter ''"); > +gh4834() > +error_msg > +box.schema.func.drop('infinite_call') > +infinite_call = nil > +channel = nil > +error_msg = nil > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-04 10:00 ` Leonid Vasiliev @ 2020-11-05 20:42 ` Sergey Ostanevich 2020-11-10 21:16 ` Sergey Ostanevich 1 sibling, 0 replies; 23+ messages in thread From: Sergey Ostanevich @ 2020-11-05 20:42 UTC (permalink / raw) To: Leonid Vasiliev; +Cc: tarantool-patches, v.shpilevoy, alexander.turenko Hi! On 04 ноя 13:00, Leonid Vasiliev wrote: > Hi! Thank you for the patch. > See some comments below: > > 1) The patch changes undocumented behavior, AFAIU. > So, I have a question:"Do you plan to backport the patch to > tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the > changes. But if the answer is "No" - I will object, because in this case > both behaviors must be supported in all modules. That's a very good point, Leonid! I didn't met any use of the '0' or '1' result in C or in Lua for the case of fiber cancellation. Although it might be not true about many projects in Tarantool ecosystem and - what's more scary - in user's code in the multitude of installations. Also, I have to admit that introduction of two types of behavior for different versions of Tarantool will get things even worse. I think we have to re-iterate the discussion of the change in the ticket first. Sergos > > 2) I think changing the behavior in C doesn't cause much of a problem, > because before when you wait without timeout, you don't need to check > the return value (it's always 0). But in Lua it will cause the problems, > because now throws an error if cancelled and all wait calls should be > wrapped to pcall. > > On 31.10.2020 19:29, sergos@tarantool.org wrote: > > From: Sergey Ostanevich <sergos@tarantool.org> > > > > Hi! > > > > Thanks to Oleg Babin's comment I found there's no need to update any lua > > interfaces, since the reason was in C implementation. Also, there is one > > place the change is played, so after I fixed it I got complete testing > > pass. > > Force-pushed branch, v2 patch attached. > > > > > > > > Before this patch fiber.cond():wait() just returns for cancelled > > fiber. In contrast fiber.channel():get() threw "fiber is > > canceled" error. > > This patch unify behaviour of channels and condvars and also fixes > > 3) behaviour -> behavior. > > > related net.box module problem - it was impossible to interrupt > > net.box call with fiber.cancel because it used fiber.cond under > > the hood. Test cases for both bugs are added. > > > > Closes #4834 > > Closes #5013 > > > > Co-authored-by: Oleg Babin <olegrok@tarantool.org> > > > > @TarantoolBot document > > Title: fiber.cond():wait() throws if fiber is cancelled > > > > Currently fiber.cond():wait() throws an error if waiting fiber is > > cancelled like in case with fiber.channel():get(). > > 4) I don't think it's a good decision adding a comparison with > fiber.channel():get() to the documentation. Up to you. > > 5) Document the changes in module.h. > > > --- > > 6) Add @ChangeLog. > > > > > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond > > Issue: https://github.com/tarantool/tarantool/issues/5013 > > > > src/box/box.cc | 6 +- > > src/lib/core/fiber_cond.c | 1 + > > test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ > > test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ > > .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ > > 5 files changed, 120 insertions(+), 4 deletions(-) > > create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua > > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result > > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua > > > > diff --git a/src/box/box.cc b/src/box/box.cc > > index 18568df3b..bfa1051f9 100644 > > --- a/src/box/box.cc > > +++ b/src/box/box.cc > > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) > > { > > double deadline = ev_monotonic_now(loop()) + timeout; > > while (is_box_configured == false || box_is_ro() != ro) { > > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > > - return -1; > > - if (fiber_is_cancelled()) { > > - diag_set(FiberIsCancelled); > > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > > + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); > > return -1; > > } > > } > > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > > index 904a350d9..b0645069e 100644 > > --- a/src/lib/core/fiber_cond.c > > +++ b/src/lib/core/fiber_cond.c > > @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > > diag_set(TimedOut); > > return -1; > > } > > + if (fiber_is_cancelled()) return -1; > > return 0; > > } > > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua > > new file mode 100755 > > index 000000000..ae805c5bf > > --- /dev/null > > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua > > @@ -0,0 +1,23 @@ > > +#!/usr/bin/env tarantool > > + > > +local tap = require('tap') > > +local fiber = require('fiber') > > +local test = tap.test("gh-5013-fiber-cancel") > > + > > +test:plan(2) > > + > > +local result = {} > > + > > +function test_f() > > + local cond = fiber.cond() > > + local res, err = pcall(cond.wait, cond) > > + result.res = res > > + result.err = err > > +end > > + > > +local f = fiber.create(test_f) > > +f:cancel() > > +fiber.yield() > > + > > +test:ok(result.res == false, tostring(result.res)) > > +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) > > diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result > > new file mode 100644 > > index 000000000..eab0a5e4d > > --- /dev/null > > +++ b/test/box/net.box_fiber_cancel_gh-4834.result > > @@ -0,0 +1,65 @@ > > +-- test-run result file version 2 > > +remote = require 'net.box' > > + | --- > > + | ... > > +fiber = require 'fiber' > > + | --- > > + | ... > > +test_run = require('test_run').new() > > + | --- > > + | ... > > + > > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > > +function infinite_call() fiber.channel(1):get() end > > + | --- > > + | ... > > +box.schema.func.create('infinite_call') > > + | --- > > + | ... > > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > > + | --- > > + | ... > > + > > +error_msg = nil > > + | --- > > + | ... > > +test_run:cmd("setopt delimiter ';'") > > + | --- > > + | - true > > + | ... > > +function gh4834() > > + local cn = remote.connect(box.cfg.listen) > > + local f = fiber.new(function() > > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > > + end) > > + f:set_joinable(true) > > + fiber.yield() > > + f:cancel() > > + f:join() > > + cn:close() > > +end; > > + | --- > > + | ... > > +test_run:cmd("setopt delimiter ''"); > > + | --- > > + | - true > > + | ... > > +gh4834() > > + | --- > > + | ... > > +error_msg > > + | --- > > + | - fiber is cancelled > > + | ... > > +box.schema.func.drop('infinite_call') > > + | --- > > + | ... > > +infinite_call = nil > > + | --- > > + | ... > > +channel = nil > > + | --- > > + | ... > > +error_msg = nil > > + | --- > > + | ... > > diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua > > new file mode 100644 > > index 000000000..06fb3ceac > > --- /dev/null > > +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua > > @@ -0,0 +1,29 @@ > > +remote = require 'net.box' > > +fiber = require 'fiber' > > +test_run = require('test_run').new() > > + > > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > > +function infinite_call() fiber.channel(1):get() end > > +box.schema.func.create('infinite_call') > > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > > + > > +error_msg = nil > > +test_run:cmd("setopt delimiter ';'") > > +function gh4834() > > + local cn = remote.connect(box.cfg.listen) > > + local f = fiber.new(function() > > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > > + end) > > + f:set_joinable(true) > > + fiber.yield() > > + f:cancel() > > + f:join() > > + cn:close() > > +end; > > +test_run:cmd("setopt delimiter ''"); > > +gh4834() > > +error_msg > > +box.schema.func.drop('infinite_call') > > +infinite_call = nil > > +channel = nil > > +error_msg = nil > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-04 10:00 ` Leonid Vasiliev 2020-11-05 20:42 ` Sergey Ostanevich @ 2020-11-10 21:16 ` Sergey Ostanevich 2020-11-12 20:15 ` Sergey Ostanevich 1 sibling, 1 reply; 23+ messages in thread From: Sergey Ostanevich @ 2020-11-10 21:16 UTC (permalink / raw) To: Leonid Vasiliev; +Cc: tarantool-patches, v.shpilevoy, alexander.turenko Hi Leonid, thank you for the review! I put two parts together to handle them in one patch. > On 4 Nov 2020, at 13:00, Leonid Vasiliev <lvasiliev@tarantool.org> wrote: > > Hi! Thank you for the patch. > See some comments below: > > 1) The patch changes undocumented behavior, AFAIU. > So, I have a question:"Do you plan to backport the patch to > tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the > changes. But if the answer is "No" - I will object, because in this case > both behaviors must be supported in all modules. > Yes, the plan is to push it down to 1.10 > 2) I think changing the behavior in C doesn't cause much of a problem, > because before when you wait without timeout, you don't need to check > the return value (it's always 0). But in Lua it will cause the problems, > because now throws an error if cancelled and all wait calls should be > wrapped to pcall. Exactly it is the change to Lua and we intentionally make it, since original behaviour is wrong. Good things are: fiber.cond has no big presence in the /tarantool sourcebase and the cancellation of fiber itself means the fiber has to quit. > > On 31.10.2020 19:29, sergos@tarantool.org wrote: >> From: Sergey Ostanevich <sergos@tarantool.org> >> Hi! >> Thanks to Oleg Babin's comment I found there's no need to update any lua >> interfaces, since the reason was in C implementation. Also, there is one >> place the change is played, so after I fixed it I got complete testing >> pass. >> Force-pushed branch, v2 patch attached. >> Before this patch fiber.cond():wait() just returns for cancelled >> fiber. In contrast fiber.channel():get() threw "fiber is >> canceled" error. >> This patch unify behaviour of channels and condvars and also fixes > > 3) behaviour -> behavior. Depends on spellchecker - mine sets it to this British version. > >> related net.box module problem - it was impossible to interrupt >> net.box call with fiber.cancel because it used fiber.cond under >> the hood. Test cases for both bugs are added. >> Closes #4834 >> Closes #5013 >> Co-authored-by: Oleg Babin <olegrok@tarantool.org> >> @TarantoolBot document >> Title: fiber.cond():wait() throws if fiber is cancelled >> Currently fiber.cond():wait() throws an error if waiting fiber is >> cancelled like in case with fiber.channel():get(). > > 4) I don't think it's a good decision adding a comparison with > fiber.channel():get() to the documentation. Up to you. > I’d agree to you: let’s just put the new behaviour here. > 5) Document the changes in module.h. Done (need to rebuild, obviously). > >> --- > > 6) Add @ChangeLog. > >> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond >> Issue: https://github.com/tarantool/tarantool/issues/5013 >> src/box/box.cc | 6 +- >> src/lib/core/fiber_cond.c | 1 + >> test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ >> test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ >> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ >> 5 files changed, 120 insertions(+), 4 deletions(-) >> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua >> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result >> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua > >> diff --git a/src/box/box.cc b/src/box/box.cc >> index 18568df3b..9e824453d 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) >> { >> double deadline = ev_monotonic_now(loop()) + timeout; >> while (is_box_configured == false || box_is_ro() != ro) { >> - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) >> - return -1; >> - if (fiber_is_cancelled()) { >> - diag_set(FiberIsCancelled); >> + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { >> + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); > > 7) According to https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/ > it's seems like you are trying to hide something) > Use: > ``` > if (condition) > action(); > ``` > Instead of: > ``` > if (condition) action(); > another_action(); > ``` Fixed. >> return -1; >> } >> } >> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c >> index 904a350d9..0c93c5842 100644 >> --- a/src/lib/core/fiber_cond.c >> +++ b/src/lib/core/fiber_cond.c >> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) >> diag_set(TimedOut); >> return -1; >> } >> + if (fiber_is_cancelled()) { >> + if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled); > > 8) The same as previously: > ``` > if (condition) > action(); > ``` > Fixed. > 9) Checking diag on empty looks strange to me. I think we should add an > error to diag without this check. If we want to save the previous error, > I suggest to use the stack diag. I made this because this interface is used inside the testing machinery, where exact cause of e.g. replica disconnection is reported. If we put the diag here unconditionally it will break this use of diag. I believe the stack diag also is not supported there yet. >> + return -1; >> + } >> return 0; >> } >> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua >> new file mode 100755 >> index 000000000..ae805c5bf >> --- /dev/null >> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua >> @@ -0,0 +1,23 @@ >> +#!/usr/bin/env tarantool >> + >> +local tap = require('tap') >> +local fiber = require('fiber') >> +local test = tap.test("gh-5013-fiber-cancel") >> + >> +test:plan(2) >> + >> +local result = {} >> + >> +function test_f() >> + local cond = fiber.cond() >> + local res, err = pcall(cond.wait, cond) >> + result.res = res >> + result.err = err >> +end >> + >> +local f = fiber.create(test_f) >> +f:cancel() >> +fiber.yield() >> + >> +test:ok(result.res == false, tostring(result.res)) >> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) > > 10) Use a user-frendly check message (in both checks). > Fixed >> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result >> new file mode 100644 >> index 000000000..eab0a5e4d >> --- /dev/null >> +++ b/test/box/net.box_fiber_cancel_gh-4834.result >> @@ -0,0 +1,65 @@ >> +-- test-run result file version 2 >> +remote = require 'net.box' >> + | --- >> + | ... >> +fiber = require 'fiber' >> + | --- >> + | ... >> +test_run = require('test_run').new() >> + | --- >> + | ... >> + >> +-- #4834: Cancelling fiber doesn't interrupt netbox operations >> +function infinite_call() fiber.channel(1):get() end >> + | --- >> + | ... >> +box.schema.func.create('infinite_call') >> + | --- >> + | ... >> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') >> + | --- >> + | ... >> + >> +error_msg = nil >> + | --- >> + | ... >> +test_run:cmd("setopt delimiter ';'") >> + | --- >> + | - true >> + | ... >> +function gh4834( > + local cn = remote.connect(box.cfg.listen) >> + local f = fiber.new(function() >> + _, error_msg = pcall(cn.call, cn, 'infinite_call') >> + end) >> + f:set_joinable(true) >> + fiber.yield() >> + f:cancel() >> + f:join() >> + cn:close() >> +end; >> + | --- >> + | ... >> +test_run:cmd("setopt delimiter ''"); >> + | --- >> + | - true >> + | ... >> +gh4834() >> + | --- >> + | ... >> +error_msg >> + | --- >> + | - fiber is cancelled >> + | ... >> +box.schema.func.drop('infinite_call') >> + | --- >> + | ... >> +infinite_call = nil >> + | --- >> + | ... >> +channel = nil >> + | --- >> + | ... >> +error_msg = nil >> + | --- >> + | ... >> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua >> new file mode 100644 >> index 000000000..06fb3ceac >> --- /dev/null >> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua >> @@ -0,0 +1,29 @@ >> +remote = require 'net.box' >> +fiber = require 'fiber' >> +test_run = require('test_run').new() >> + >> +-- #4834: Cancelling fiber doesn't interrupt netbox operations >> +function infinite_call() fiber.channel(1):get() end >> +box.schema.func.create('infinite_call') >> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') >> + >> +error_msg = nil >> +test_run:cmd("setopt delimiter ';'") >> +function gh4834() > > 11) I think this is not a self-documenting name for the function. > Renamed. >> + local cn = remote.connect(box.cfg.listen) >> + local f = fiber.new(function() >> + _, error_msg = pcall(cn.call, cn, 'infinite_call') >> + end) >> + f:set_joinable(true) >> + fiber.yield() >> + f:cancel() >> + f:join() >> + cn:close() >> +end; >> +test_run:cmd("setopt delimiter ''"); >> +gh4834() >> +error_msg >> +box.schema.func.drop('infinite_call') >> +infinite_call = nil >> +channel = nil > > 12) You didn't introduce the variable "channel". In addition, I looked > at a couple of tests and didn't see them cleaning up variables. But I > don't mind. > Removed this one. In general, clean up prevents flaky. >> +error_msg = nil ================= Before this patch fiber.cond():wait() just returns for cancelled fiber. In contrast fiber.channel():get() threw "fiber is canceled" error. This patch unify behaviour of channels and condvars and also fixes related net.box module problem - it was impossible to interrupt net.box call with fiber.cancel because it used fiber.cond under the hood. Test cases for both bugs are added. Closes #4834 Closes #5013 Co-authored-by: Oleg Babin <olegrok@tarantool.org> @TarantoolBot document Title: fiber.cond():wait() throws if fiber is cancelled Currently fiber.cond():wait() throws an error if waiting fiber is cancelled. --- Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond Issue: https://github.com/tarantool/tarantool/issues/5013 @Changelog * fiber.cond().wait() now throws if fiber is cancelled src/box/box.cc | 6 +- src/lib/core/fiber_cond.c | 4 ++ test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ 5 files changed, 123 insertions(+), 4 deletions(-) create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua diff --git a/src/box/box.cc b/src/box/box.cc index 18568df3b..9e824453d 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) { double deadline = ev_monotonic_now(loop()) + timeout; while (is_box_configured == false || box_is_ro() != ro) { - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) - return -1; - if (fiber_is_cancelled()) { - diag_set(FiberIsCancelled); + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); return -1; } } diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c index 904a350d9..0c93c5842 100644 --- a/src/lib/core/fiber_cond.c +++ b/src/lib/core/fiber_cond.c @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) diag_set(TimedOut); return -1; } + if (fiber_is_cancelled()) { + if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled); + return -1; + } return 0; } diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua new file mode 100755 index 000000000..34711fb31 --- /dev/null +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua @@ -0,0 +1,23 @@ +#!/usr/bin/env tarantool + +local tap = require('tap') +local fiber = require('fiber') +local test = tap.test("gh-5013-fiber-cancel") + +test:plan(2) + +local result = {} + +function test_f() + local cond = fiber.cond() + local res, err = pcall(cond.wait, cond) + result.res = res + result.err = err +end + +local f = fiber.create(test_f) +f:cancel() +fiber.yield() + +test:ok(result.res == false, tostring(result.res)) +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result new file mode 100644 index 000000000..eab0a5e4d --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.result @@ -0,0 +1,65 @@ +-- test-run result file version 2 +remote = require 'net.box' + | --- + | ... +fiber = require 'fiber' + | --- + | ... +test_run = require('test_run').new() + | --- + | ... + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end + | --- + | ... +box.schema.func.create('infinite_call') + | --- + | ... +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + | --- + | ... + +error_msg = nil + | --- + | ... +test_run:cmd("setopt delimiter ';'") + | --- + | - true + | ... +function gh4834() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; + | --- + | ... +test_run:cmd("setopt delimiter ''"); + | --- + | - true + | ... +gh4834() + | --- + | ... +error_msg + | --- + | - fiber is cancelled + | ... +box.schema.func.drop('infinite_call') + | --- + | ... +infinite_call = nil + | --- + | ... +channel = nil + | --- + | ... +error_msg = nil + | --- + | ... diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua new file mode 100644 index 000000000..06fb3ceac --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua @@ -0,0 +1,29 @@ +remote = require 'net.box' +fiber = require 'fiber' +test_run = require('test_run').new() + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end +box.schema.func.create('infinite_call') +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + +error_msg = nil +test_run:cmd("setopt delimiter ';'") +function gh4834() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; +test_run:cmd("setopt delimiter ''"); +gh4834() +error_msg +box.schema.func.drop('infinite_call') +infinite_call = nil +channel = nil +error_msg = nil -- 2.29.2 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-10 21:16 ` Sergey Ostanevich @ 2020-11-12 20:15 ` Sergey Ostanevich 2020-11-13 8:26 ` Leonid Vasiliev 0 siblings, 1 reply; 23+ messages in thread From: Sergey Ostanevich @ 2020-11-12 20:15 UTC (permalink / raw) To: Leonid Vasiliev; +Cc: tarantool-patches, Vladislav Shpilevoy, alexander.turenko Sorry, the patch was not updated. Here’s a new one and the branch is updated commit 42ea06f7cf69ee6b492cefcbb201bc9bba15c686 Author: Sergey Ostanevich <sergos@tarantool.org> Date: Tue Nov 3 12:52:26 2020 +0300 core: handle fiber cancellation for fiber.cond Before this patch fiber.cond():wait() just returns for cancelled fiber. In contrast fiber.channel():get() threw "fiber is canceled" error. This patch unify behaviour of channels and condvars and also fixes related net.box module problem - it was impossible to interrupt net.box call with fiber.cancel because it used fiber.cond under the hood. Test cases for both bugs are added. Closes #4834 Closes #5013 Co-authored-by: Oleg Babin <olegrok@tarantool.org> @TarantoolBot document Title: fiber.cond():wait() throws if fiber is cancelled Currently fiber.cond():wait() throws an error if waiting fiber is cancelled. --- Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond Issue: https://github.com/tarantool/tarantool/issues/5013 @Changelog * fiber.cond().wait() now throws if fiber is cancelled diff --git a/src/box/box.cc b/src/box/box.cc index 18568df3b..29f74e94b 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout) { double deadline = ev_monotonic_now(loop()) + timeout; while (is_box_configured == false || box_is_ro() != ro) { - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) - return -1; - if (fiber_is_cancelled()) { - diag_set(FiberIsCancelled); + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { + if (fiber_is_cancelled()) + diag_set(FiberIsCancelled); return -1; } } diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c index 904a350d9..cc59eaafb 100644 --- a/src/lib/core/fiber_cond.c +++ b/src/lib/core/fiber_cond.c @@ -108,6 +108,11 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) diag_set(TimedOut); return -1; } + if (fiber_is_cancelled()) { + if (diag_is_empty(diag_get())) + diag_set(FiberIsCancelled); + return -1; + } return 0; } diff --git a/src/lib/core/fiber_cond.h b/src/lib/core/fiber_cond.h index 87c6f2ca2..2662e0654 100644 --- a/src/lib/core/fiber_cond.h +++ b/src/lib/core/fiber_cond.h @@ -114,7 +114,7 @@ fiber_cond_broadcast(struct fiber_cond *cond); * @param cond condition * @param timeout timeout in seconds * @retval 0 on fiber_cond_signal() call or a spurious wake up - * @retval -1 on timeout, diag is set to TimedOut + * @retval -1 on timeout or fiber cancellation, diag is set */ int fiber_cond_wait_timeout(struct fiber_cond *cond, double timeout); diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua new file mode 100755 index 000000000..ca4ca2c90 --- /dev/null +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua @@ -0,0 +1,23 @@ +#!/usr/bin/env tarantool + +local tap = require('tap') +local fiber = require('fiber') +local test = tap.test("gh-5013-fiber-cancel") + +test:plan(2) + +local result = {} + +function test_f() + local cond = fiber.cond() + local res, err = pcall(cond.wait, cond) + result.res = res + result.err = err +end + +local f = fiber.create(test_f) +f:cancel() +fiber.yield() + +test:ok(result.res == false, 'expected result is false') +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported') diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result new file mode 100644 index 000000000..4ed04bb61 --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.result @@ -0,0 +1,62 @@ +-- test-run result file version 2 +remote = require 'net.box' + | --- + | ... +fiber = require 'fiber' + | --- + | ... +test_run = require('test_run').new() + | --- + | ... + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end + | --- + | ... +box.schema.func.create('infinite_call') + | --- + | ... +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + | --- + | ... + +error_msg = nil + | --- + | ... +test_run:cmd("setopt delimiter ';'") + | --- + | - true + | ... +function netbox_runner() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; + | --- + | ... +test_run:cmd("setopt delimiter ''"); + | --- + | - true + | ... +netbox_runner() + | --- + | ... +error_msg + | --- + | - fiber is cancelled + | ... +box.schema.func.drop('infinite_call') + | --- + | ... +infinite_call = nil + | --- + | ... +error_msg = nil + | --- + | ... diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua new file mode 100644 index 000000000..bc0e5af6e --- /dev/null +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua @@ -0,0 +1,28 @@ +remote = require 'net.box' +fiber = require 'fiber' +test_run = require('test_run').new() + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() fiber.channel(1):get() end +box.schema.func.create('infinite_call') +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + +error_msg = nil +test_run:cmd("setopt delimiter ';'") +function netbox_runner() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; +test_run:cmd("setopt delimiter ''"); +netbox_runner() +error_msg +box.schema.func.drop('infinite_call') +infinite_call = nil +error_msg = nil ====================== > On 11 Nov 2020, at 00:16, Sergey Ostanevich <sergos@tarantool.org> wrote: > > Hi Leonid, thank you for the review! > > I put two parts together to handle them in one patch. > >> On 4 Nov 2020, at 13:00, Leonid Vasiliev <lvasiliev@tarantool.org> wrote: >> >> Hi! Thank you for the patch. >> See some comments below: >> >> 1) The patch changes undocumented behavior, AFAIU. >> So, I have a question:"Do you plan to backport the patch to >> tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the >> changes. But if the answer is "No" - I will object, because in this case >> both behaviors must be supported in all modules. >> > > Yes, the plan is to push it down to 1.10 > >> 2) I think changing the behavior in C doesn't cause much of a problem, >> because before when you wait without timeout, you don't need to check >> the return value (it's always 0). But in Lua it will cause the problems, >> because now throws an error if cancelled and all wait calls should be >> wrapped to pcall. > > Exactly it is the change to Lua and we intentionally make it, since original > behaviour is wrong. Good things are: fiber.cond has no big presence in the > /tarantool sourcebase and the cancellation of fiber itself means the fiber > has to quit. > >> >> On 31.10.2020 19:29, sergos@tarantool.org wrote: >>> From: Sergey Ostanevich <sergos@tarantool.org> >>> Hi! >>> Thanks to Oleg Babin's comment I found there's no need to update any lua >>> interfaces, since the reason was in C implementation. Also, there is one >>> place the change is played, so after I fixed it I got complete testing >>> pass. >>> Force-pushed branch, v2 patch attached. >>> Before this patch fiber.cond():wait() just returns for cancelled >>> fiber. In contrast fiber.channel():get() threw "fiber is >>> canceled" error. >>> This patch unify behaviour of channels and condvars and also fixes >> >> 3) behaviour -> behavior. > > Depends on spellchecker - mine sets it to this British version. > >> >>> related net.box module problem - it was impossible to interrupt >>> net.box call with fiber.cancel because it used fiber.cond under >>> the hood. Test cases for both bugs are added. >>> Closes #4834 >>> Closes #5013 >>> Co-authored-by: Oleg Babin <olegrok@tarantool.org> >>> @TarantoolBot document >>> Title: fiber.cond():wait() throws if fiber is cancelled >>> Currently fiber.cond():wait() throws an error if waiting fiber is >>> cancelled like in case with fiber.channel():get(). >> >> 4) I don't think it's a good decision adding a comparison with >> fiber.channel():get() to the documentation. Up to you. >> > > I’d agree to you: let’s just put the new behaviour here. > >> 5) Document the changes in module.h. > > Done (need to rebuild, obviously). > >> >>> --- >> >> 6) Add @ChangeLog. >> >>> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond >>> Issue: https://github.com/tarantool/tarantool/issues/5013 >>> src/box/box.cc | 6 +- >>> src/lib/core/fiber_cond.c | 1 + >>> test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ >>> test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ >>> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ >>> 5 files changed, 120 insertions(+), 4 deletions(-) >>> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua >>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result >>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua >> >>> diff --git a/src/box/box.cc b/src/box/box.cc >>> index 18568df3b..9e824453d 100644 >>> --- a/src/box/box.cc >>> +++ b/src/box/box.cc >>> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) >>> { >>> double deadline = ev_monotonic_now(loop()) + timeout; >>> while (is_box_configured == false || box_is_ro() != ro) { >>> - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) >>> - return -1; >>> - if (fiber_is_cancelled()) { >>> - diag_set(FiberIsCancelled); >>> + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { >>> + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); >> >> 7) According to https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/ >> it's seems like you are trying to hide something) >> Use: >> ``` >> if (condition) >> action(); >> ``` >> Instead of: >> ``` >> if (condition) action(); >> another_action(); >> ``` > > Fixed. > >>> return -1; >>> } >>> } >>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c >>> index 904a350d9..0c93c5842 100644 >>> --- a/src/lib/core/fiber_cond.c >>> +++ b/src/lib/core/fiber_cond.c >>> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) >>> diag_set(TimedOut); >>> return -1; >>> } >>> + if (fiber_is_cancelled()) { >>> + if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled); >> >> 8) The same as previously: >> ``` >> if (condition) >> action(); >> ``` >> > > Fixed. > >> 9) Checking diag on empty looks strange to me. I think we should add an >> error to diag without this check. If we want to save the previous error, >> I suggest to use the stack diag. > > I made this because this interface is used inside the testing machinery, where > exact cause of e.g. replica disconnection is reported. If we put the diag here > unconditionally it will break this use of diag. I believe the stack diag also > is not supported there yet. > > >>> + return -1; >>> + } >>> return 0; >>> } >>> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua >>> new file mode 100755 >>> index 000000000..ae805c5bf >>> --- /dev/null >>> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua >>> @@ -0,0 +1,23 @@ >>> +#!/usr/bin/env tarantool >>> + >>> +local tap = require('tap') >>> +local fiber = require('fiber') >>> +local test = tap.test("gh-5013-fiber-cancel") >>> + >>> +test:plan(2) >>> + >>> +local result = {} >>> + >>> +function test_f() >>> + local cond = fiber.cond() >>> + local res, err = pcall(cond.wait, cond) >>> + result.res = res >>> + result.err = err >>> +end >>> + >>> +local f = fiber.create(test_f) >>> +f:cancel() >>> +fiber.yield() >>> + >>> +test:ok(result.res == false, tostring(result.res)) >>> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) >> >> 10) Use a user-frendly check message (in both checks). >> > Fixed > >>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result >>> new file mode 100644 >>> index 000000000..eab0a5e4d >>> --- /dev/null >>> +++ b/test/box/net.box_fiber_cancel_gh-4834.result >>> @@ -0,0 +1,65 @@ >>> +-- test-run result file version 2 >>> +remote = require 'net.box' >>> + | --- >>> + | ... >>> +fiber = require 'fiber' >>> + | --- >>> + | ... >>> +test_run = require('test_run').new() >>> + | --- >>> + | ... >>> + >>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations >>> +function infinite_call() fiber.channel(1):get() end >>> + | --- >>> + | ... >>> +box.schema.func.create('infinite_call') >>> + | --- >>> + | ... >>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') >>> + | --- >>> + | ... >>> + >>> +error_msg = nil >>> + | --- >>> + | ... >>> +test_run:cmd("setopt delimiter ';'") >>> + | --- >>> + | - true >>> + | ... >>> +function gh4834( > + local cn = remote.connect(box.cfg.listen) >>> + local f = fiber.new(function() >>> + _, error_msg = pcall(cn.call, cn, 'infinite_call') >>> + end) >>> + f:set_joinable(true) >>> + fiber.yield() >>> + f:cancel() >>> + f:join() >>> + cn:close() >>> +end; >>> + | --- >>> + | ... >>> +test_run:cmd("setopt delimiter ''"); >>> + | --- >>> + | - true >>> + | ... >>> +gh4834() >>> + | --- >>> + | ... >>> +error_msg >>> + | --- >>> + | - fiber is cancelled >>> + | ... >>> +box.schema.func.drop('infinite_call') >>> + | --- >>> + | ... >>> +infinite_call = nil >>> + | --- >>> + | ... >>> +channel = nil >>> + | --- >>> + | ... >>> +error_msg = nil >>> + | --- >>> + | ... >>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua >>> new file mode 100644 >>> index 000000000..06fb3ceac >>> --- /dev/null >>> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua >>> @@ -0,0 +1,29 @@ >>> +remote = require 'net.box' >>> +fiber = require 'fiber' >>> +test_run = require('test_run').new() >>> + >>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations >>> +function infinite_call() fiber.channel(1):get() end >>> +box.schema.func.create('infinite_call') >>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') >>> + >>> +error_msg = nil >>> +test_run:cmd("setopt delimiter ';'") >>> +function gh4834() >> >> 11) I think this is not a self-documenting name for the function. >> > Renamed. > >>> + local cn = remote.connect(box.cfg.listen) >>> + local f = fiber.new(function() >>> + _, error_msg = pcall(cn.call, cn, 'infinite_call') >>> + end) >>> + f:set_joinable(true) >>> + fiber.yield() >>> + f:cancel() >>> + f:join() >>> + cn:close() >>> +end; >>> +test_run:cmd("setopt delimiter ''"); >>> +gh4834() >>> +error_msg >>> +box.schema.func.drop('infinite_call') >>> +infinite_call = nil >>> +channel = nil >> >> 12) You didn't introduce the variable "channel". In addition, I looked >> at a couple of tests and didn't see them cleaning up variables. But I >> don't mind. >> > Removed this one. In general, clean up prevents flaky. > >>> +error_msg = nil > > ================= > > Before this patch fiber.cond():wait() just returns for cancelled > fiber. In contrast fiber.channel():get() threw "fiber is > canceled" error. > This patch unify behaviour of channels and condvars and also fixes > related net.box module problem - it was impossible to interrupt > net.box call with fiber.cancel because it used fiber.cond under > the hood. Test cases for both bugs are added. > > Closes #4834 > Closes #5013 > > Co-authored-by: Oleg Babin <olegrok@tarantool.org> > > @TarantoolBot document > Title: fiber.cond():wait() throws if fiber is cancelled > > Currently fiber.cond():wait() throws an error if waiting fiber is > cancelled. > --- > > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond > Issue: https://github.com/tarantool/tarantool/issues/5013 > > @Changelog > * fiber.cond().wait() now throws if fiber is cancelled > > src/box/box.cc | 6 +- > src/lib/core/fiber_cond.c | 4 ++ > test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ > test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ > .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ > 5 files changed, 123 insertions(+), 4 deletions(-) > create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua > > diff --git a/src/box/box.cc b/src/box/box.cc > index 18568df3b..9e824453d 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) > { > double deadline = ev_monotonic_now(loop()) + timeout; > while (is_box_configured == false || box_is_ro() != ro) { > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > - return -1; > - if (fiber_is_cancelled()) { > - diag_set(FiberIsCancelled); > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); > return -1; > } > } > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > index 904a350d9..0c93c5842 100644 > --- a/src/lib/core/fiber_cond.c > +++ b/src/lib/core/fiber_cond.c > @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > diag_set(TimedOut); > return -1; > } > + if (fiber_is_cancelled()) { > + if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled); > + return -1; > + } > return 0; > } > > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua > new file mode 100755 > index 000000000..34711fb31 > --- /dev/null > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua > @@ -0,0 +1,23 @@ > +#!/usr/bin/env tarantool > + > +local tap = require('tap') > +local fiber = require('fiber') > +local test = tap.test("gh-5013-fiber-cancel") > + > +test:plan(2) > + > +local result = {} > + > +function test_f() > + local cond = fiber.cond() > + local res, err = pcall(cond.wait, cond) > + result.res = res > + result.err = err > +end > + > +local f = fiber.create(test_f) > +f:cancel() > +fiber.yield() > + > +test:ok(result.res == false, tostring(result.res)) > +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) > diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result > new file mode 100644 > index 000000000..eab0a5e4d > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.result > @@ -0,0 +1,65 @@ > +-- test-run result file version 2 > +remote = require 'net.box' > + | --- > + | ... > +fiber = require 'fiber' > + | --- > + | ... > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > + | --- > + | ... > +box.schema.func.create('infinite_call') > + | --- > + | ... > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + | --- > + | ... > + > +error_msg = nil > + | --- > + | ... > +test_run:cmd("setopt delimiter ';'") > + | --- > + | - true > + | ... > +function gh4834() > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > + | --- > + | ... > +test_run:cmd("setopt delimiter ''"); > + | --- > + | - true > + | ... > +gh4834() > + | --- > + | ... > +error_msg > + | --- > + | - fiber is cancelled > + | ... > +box.schema.func.drop('infinite_call') > + | --- > + | ... > +infinite_call = nil > + | --- > + | ... > +channel = nil > + | --- > + | ... > +error_msg = nil > + | --- > + | ... > diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua > new file mode 100644 > index 000000000..06fb3ceac > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua > @@ -0,0 +1,29 @@ > +remote = require 'net.box' > +fiber = require 'fiber' > +test_run = require('test_run').new() > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > +box.schema.func.create('infinite_call') > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + > +error_msg = nil > +test_run:cmd("setopt delimiter ';'") > +function gh4834() > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > +test_run:cmd("setopt delimiter ''"); > +gh4834() > +error_msg > +box.schema.func.drop('infinite_call') > +infinite_call = nil > +channel = nil > +error_msg = nil > -- > 2.29.2 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-12 20:15 ` Sergey Ostanevich @ 2020-11-13 8:26 ` Leonid Vasiliev 2020-11-30 22:49 ` Alexander Turenko 0 siblings, 1 reply; 23+ messages in thread From: Leonid Vasiliev @ 2020-11-13 8:26 UTC (permalink / raw) To: Sergey Ostanevich Cc: tarantool-patches, Vladislav Shpilevoy, alexander.turenko Hi! Generally - LGTM. Yet another iteration of the review with me is not needed. But please see a comment on diag_set under the condition. On 12.11.2020 23:15, Sergey Ostanevich wrote: > Sorry, the patch was not updated. Here’s a new one and the branch is updated > > commit 42ea06f7cf69ee6b492cefcbb201bc9bba15c686 > Author: Sergey Ostanevich <sergos@tarantool.org> > Date: Tue Nov 3 12:52:26 2020 +0300 > > core: handle fiber cancellation for fiber.cond > > Before this patch fiber.cond():wait() just returns for cancelled > fiber. In contrast fiber.channel():get() threw "fiber is > canceled" error. > This patch unify behaviour of channels and condvars and also fixes > related net.box module problem - it was impossible to interrupt > net.box call with fiber.cancel because it used fiber.cond under > the hood. Test cases for both bugs are added. > > Closes #4834 > Closes #5013 > > Co-authored-by: Oleg Babin <olegrok@tarantool.org> > > @TarantoolBot document > Title: fiber.cond():wait() throws if fiber is cancelled > > Currently fiber.cond():wait() throws an error if waiting fiber is > cancelled. > > --- > > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond > Issue: https://github.com/tarantool/tarantool/issues/5013 > > @Changelog > * fiber.cond().wait() now throws if fiber is cancelled fiber.cond():wait() > > diff --git a/src/box/box.cc b/src/box/box.cc > index 18568df3b..29f74e94b 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout) > { > double deadline = ev_monotonic_now(loop()) + timeout; > while (is_box_configured == false || box_is_ro() != ro) { > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > - return -1; > - if (fiber_is_cancelled()) { > - diag_set(FiberIsCancelled); > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > + if (fiber_is_cancelled()) > + diag_set(FiberIsCancelled); > return -1; > } > } > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > index 904a350d9..cc59eaafb 100644 > --- a/src/lib/core/fiber_cond.c > +++ b/src/lib/core/fiber_cond.c > @@ -108,6 +108,11 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > diag_set(TimedOut); > return -1; > } > + if (fiber_is_cancelled()) { > + if (diag_is_empty(diag_get())) > + diag_set(FiberIsCancelled); I read your argumentation. AFAIU sounds like:"We have a lot of tests that check an error in the diag in such case. Also, some user applications can check the diag in a similar way. And stack diag can be used only in the new versions of tarantool." So, it's true. What I propose to think about: - this is not simple logic. If a fiber has been cancelled, maybe an error will be added to the diag, maybe no - 50/50. - this behavior should be documented. - ask Mons or Turenko for advice: "What do they think about it?" All this up to you. > + return -1; > + } > return 0; > } > > diff --git a/src/lib/core/fiber_cond.h b/src/lib/core/fiber_cond.h > index 87c6f2ca2..2662e0654 100644 > --- a/src/lib/core/fiber_cond.h > +++ b/src/lib/core/fiber_cond.h > @@ -114,7 +114,7 @@ fiber_cond_broadcast(struct fiber_cond *cond); > * @param cond condition > * @param timeout timeout in seconds > * @retval 0 on fiber_cond_signal() call or a spurious wake up > - * @retval -1 on timeout, diag is set to TimedOut > + * @retval -1 on timeout or fiber cancellation, diag is set > */ > int > fiber_cond_wait_timeout(struct fiber_cond *cond, double timeout); > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua > new file mode 100755 > index 000000000..ca4ca2c90 > --- /dev/null > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua > @@ -0,0 +1,23 @@ > +#!/usr/bin/env tarantool > + > +local tap = require('tap') > +local fiber = require('fiber') > +local test = tap.test("gh-5013-fiber-cancel") > + > +test:plan(2) > + > +local result = {} > + > +function test_f() > + local cond = fiber.cond() > + local res, err = pcall(cond.wait, cond) > + result.res = res > + result.err = err > +end > + > +local f = fiber.create(test_f) > +f:cancel() > +fiber.yield() > + > +test:ok(result.res == false, 'expected result is false') > +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported') > diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result > new file mode 100644 > index 000000000..4ed04bb61 > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.result > @@ -0,0 +1,62 @@ > +-- test-run result file version 2 > +remote = require 'net.box' > + | --- > + | ... > +fiber = require 'fiber' > + | --- > + | ... > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > + | --- > + | ... > +box.schema.func.create('infinite_call') > + | --- > + | ... > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + | --- > + | ... > + > +error_msg = nil > + | --- > + | ... > +test_run:cmd("setopt delimiter ';'") > + | --- > + | - true > + | ... > +function netbox_runner() > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > + | --- > + | ... > +test_run:cmd("setopt delimiter ''"); > + | --- > + | - true > + | ... > +netbox_runner() > + | --- > + | ... > +error_msg > + | --- > + | - fiber is cancelled > + | ... > +box.schema.func.drop('infinite_call') > + | --- > + | ... > +infinite_call = nil > + | --- > + | ... > +error_msg = nil > + | --- > + | ... > diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua > new file mode 100644 > index 000000000..bc0e5af6e > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua > @@ -0,0 +1,28 @@ > +remote = require 'net.box' > +fiber = require 'fiber' > +test_run = require('test_run').new() > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > +box.schema.func.create('infinite_call') > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + > +error_msg = nil > +test_run:cmd("setopt delimiter ';'") > +function netbox_runner() > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > +test_run:cmd("setopt delimiter ''"); > +netbox_runner() > +error_msg > +box.schema.func.drop('infinite_call') > +infinite_call = nil > +error_msg = nil > > ====================== > >> On 11 Nov 2020, at 00:16, Sergey Ostanevich <sergos@tarantool.org> wrote: >> >> Hi Leonid, thank you for the review! >> >> I put two parts together to handle them in one patch. >> >>> On 4 Nov 2020, at 13:00, Leonid Vasiliev <lvasiliev@tarantool.org> wrote: >>> >>> Hi! Thank you for the patch. >>> See some comments below: >>> >>> 1) The patch changes undocumented behavior, AFAIU. >>> So, I have a question:"Do you plan to backport the patch to >>> tarantool 1.10?". If the answer is "Yes" - I'm comfortable with the >>> changes. But if the answer is "No" - I will object, because in this case >>> both behaviors must be supported in all modules. >>> >> >> Yes, the plan is to push it down to 1.10 >> >>> 2) I think changing the behavior in C doesn't cause much of a problem, >>> because before when you wait without timeout, you don't need to check >>> the return value (it's always 0). But in Lua it will cause the problems, >>> because now throws an error if cancelled and all wait calls should be >>> wrapped to pcall. >> >> Exactly it is the change to Lua and we intentionally make it, since original >> behaviour is wrong. Good things are: fiber.cond has no big presence in the >> /tarantool sourcebase and the cancellation of fiber itself means the fiber >> has to quit. >> >>> >>> On 31.10.2020 19:29, sergos@tarantool.org wrote: >>>> From: Sergey Ostanevich <sergos@tarantool.org> >>>> Hi! >>>> Thanks to Oleg Babin's comment I found there's no need to update any lua >>>> interfaces, since the reason was in C implementation. Also, there is one >>>> place the change is played, so after I fixed it I got complete testing >>>> pass. >>>> Force-pushed branch, v2 patch attached. >>>> Before this patch fiber.cond():wait() just returns for cancelled >>>> fiber. In contrast fiber.channel():get() threw "fiber is >>>> canceled" error. >>>> This patch unify behaviour of channels and condvars and also fixes >>> >>> 3) behaviour -> behavior. >> >> Depends on spellchecker - mine sets it to this British version. >> >>> >>>> related net.box module problem - it was impossible to interrupt >>>> net.box call with fiber.cancel because it used fiber.cond under >>>> the hood. Test cases for both bugs are added. >>>> Closes #4834 >>>> Closes #5013 >>>> Co-authored-by: Oleg Babin <olegrok@tarantool.org> >>>> @TarantoolBot document >>>> Title: fiber.cond():wait() throws if fiber is cancelled >>>> Currently fiber.cond():wait() throws an error if waiting fiber is >>>> cancelled like in case with fiber.channel():get(). >>> >>> 4) I don't think it's a good decision adding a comparison with >>> fiber.channel():get() to the documentation. Up to you. >>> >> >> I’d agree to you: let’s just put the new behaviour here. >> >>> 5) Document the changes in module.h. >> >> Done (need to rebuild, obviously). >> >>> >>>> --- >>> >>> 6) Add @ChangeLog. >>> >>>> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond >>>> Issue: https://github.com/tarantool/tarantool/issues/5013 >>>> src/box/box.cc | 6 +- >>>> src/lib/core/fiber_cond.c | 1 + >>>> test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ >>>> test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ >>>> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ >>>> 5 files changed, 120 insertions(+), 4 deletions(-) >>>> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua >>>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result >>>> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua >>> >>>> diff --git a/src/box/box.cc b/src/box/box.cc >>>> index 18568df3b..9e824453d 100644 >>>> --- a/src/box/box.cc >>>> +++ b/src/box/box.cc >>>> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) >>>> { >>>> double deadline = ev_monotonic_now(loop()) + timeout; >>>> while (is_box_configured == false || box_is_ro() != ro) { >>>> - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) >>>> - return -1; >>>> - if (fiber_is_cancelled()) { >>>> - diag_set(FiberIsCancelled); >>>> + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { >>>> + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); >>> >>> 7) According to https://www.tarantool.io/en/doc/latest/dev_guide/c_style_guide/ >>> it's seems like you are trying to hide something) >>> Use: >>> ``` >>> if (condition) >>> action(); >>> ``` >>> Instead of: >>> ``` >>> if (condition) action(); >>> another_action(); >>> ``` >> >> Fixed. >> >>>> return -1; >>>> } >>>> } >>>> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c >>>> index 904a350d9..0c93c5842 100644 >>>> --- a/src/lib/core/fiber_cond.c >>>> +++ b/src/lib/core/fiber_cond.c >>>> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) >>>> diag_set(TimedOut); >>>> return -1; >>>> } >>>> + if (fiber_is_cancelled()) { >>>> + if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled); >>> >>> 8) The same as previously: >>> ``` >>> if (condition) >>> action(); >>> ``` >>> >> >> Fixed. >> >>> 9) Checking diag on empty looks strange to me. I think we should add an >>> error to diag without this check. If we want to save the previous error, >>> I suggest to use the stack diag. >> >> I made this because this interface is used inside the testing machinery, where >> exact cause of e.g. replica disconnection is reported. If we put the diag here >> unconditionally it will break this use of diag. I believe the stack diag also >> is not supported there yet. >> >> >>>> + return -1; >>>> + } >>>> return 0; >>>> } >>>> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua >>>> new file mode 100755 >>>> index 000000000..ae805c5bf >>>> --- /dev/null >>>> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua >>>> @@ -0,0 +1,23 @@ >>>> +#!/usr/bin/env tarantool >>>> + >>>> +local tap = require('tap') >>>> +local fiber = require('fiber') >>>> +local test = tap.test("gh-5013-fiber-cancel") >>>> + >>>> +test:plan(2) >>>> + >>>> +local result = {} >>>> + >>>> +function test_f() >>>> + local cond = fiber.cond() >>>> + local res, err = pcall(cond.wait, cond) >>>> + result.res = res >>>> + result.err = err >>>> +end >>>> + >>>> +local f = fiber.create(test_f) >>>> +f:cancel() >>>> +fiber.yield() >>>> + >>>> +test:ok(result.res == false, tostring(result.res)) >>>> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) >>> >>> 10) Use a user-frendly check message (in both checks). >>> >> Fixed >> >>>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result >>>> new file mode 100644 >>>> index 000000000..eab0a5e4d >>>> --- /dev/null >>>> +++ b/test/box/net.box_fiber_cancel_gh-4834.result >>>> @@ -0,0 +1,65 @@ >>>> +-- test-run result file version 2 >>>> +remote = require 'net.box' >>>> + | --- >>>> + | ... >>>> +fiber = require 'fiber' >>>> + | --- >>>> + | ... >>>> +test_run = require('test_run').new() >>>> + | --- >>>> + | ... >>>> + >>>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations >>>> +function infinite_call() fiber.channel(1):get() end >>>> + | --- >>>> + | ... >>>> +box.schema.func.create('infinite_call') >>>> + | --- >>>> + | ... >>>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') >>>> + | --- >>>> + | ... >>>> + >>>> +error_msg = nil >>>> + | --- >>>> + | ... >>>> +test_run:cmd("setopt delimiter ';'") >>>> + | --- >>>> + | - true >>>> + | ... >>>> +function gh4834( > + local cn = remote.connect(box.cfg.listen) >>>> + local f = fiber.new(function() >>>> + _, error_msg = pcall(cn.call, cn, 'infinite_call') >>>> + end) >>>> + f:set_joinable(true) >>>> + fiber.yield() >>>> + f:cancel() >>>> + f:join() >>>> + cn:close() >>>> +end; >>>> + | --- >>>> + | ... >>>> +test_run:cmd("setopt delimiter ''"); >>>> + | --- >>>> + | - true >>>> + | ... >>>> +gh4834() >>>> + | --- >>>> + | ... >>>> +error_msg >>>> + | --- >>>> + | - fiber is cancelled >>>> + | ... >>>> +box.schema.func.drop('infinite_call') >>>> + | --- >>>> + | ... >>>> +infinite_call = nil >>>> + | --- >>>> + | ... >>>> +channel = nil >>>> + | --- >>>> + | ... >>>> +error_msg = nil >>>> + | --- >>>> + | ... >>>> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua >>>> new file mode 100644 >>>> index 000000000..06fb3ceac >>>> --- /dev/null >>>> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua >>>> @@ -0,0 +1,29 @@ >>>> +remote = require 'net.box' >>>> +fiber = require 'fiber' >>>> +test_run = require('test_run').new() >>>> + >>>> +-- #4834: Cancelling fiber doesn't interrupt netbox operations >>>> +function infinite_call() fiber.channel(1):get() end >>>> +box.schema.func.create('infinite_call') >>>> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') >>>> + >>>> +error_msg = nil >>>> +test_run:cmd("setopt delimiter ';'") >>>> +function gh4834() >>> >>> 11) I think this is not a self-documenting name for the function. >>> >> Renamed. >> >>>> + local cn = remote.connect(box.cfg.listen) >>>> + local f = fiber.new(function() >>>> + _, error_msg = pcall(cn.call, cn, 'infinite_call') >>>> + end) >>>> + f:set_joinable(true) >>>> + fiber.yield() >>>> + f:cancel() >>>> + f:join() >>>> + cn:close() >>>> +end; >>>> +test_run:cmd("setopt delimiter ''"); >>>> +gh4834() >>>> +error_msg >>>> +box.schema.func.drop('infinite_call') >>>> +infinite_call = nil >>>> +channel = nil >>> >>> 12) You didn't introduce the variable "channel". In addition, I looked >>> at a couple of tests and didn't see them cleaning up variables. But I >>> don't mind. >>> >> Removed this one. In general, clean up prevents flaky. >> >>>> +error_msg = nil >> >> ================= >> >> Before this patch fiber.cond():wait() just returns for cancelled >> fiber. In contrast fiber.channel():get() threw "fiber is >> canceled" error. >> This patch unify behaviour of channels and condvars and also fixes >> related net.box module problem - it was impossible to interrupt >> net.box call with fiber.cancel because it used fiber.cond under >> the hood. Test cases for both bugs are added. >> >> Closes #4834 >> Closes #5013 >> >> Co-authored-by: Oleg Babin <olegrok@tarantool.org> >> >> @TarantoolBot document >> Title: fiber.cond():wait() throws if fiber is cancelled >> >> Currently fiber.cond():wait() throws an error if waiting fiber is >> cancelled. >> --- >> >> Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond >> Issue: https://github.com/tarantool/tarantool/issues/5013 >> >> @Changelog >> * fiber.cond().wait() now throws if fiber is cancelled >> >> src/box/box.cc | 6 +- >> src/lib/core/fiber_cond.c | 4 ++ >> test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ >> test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ >> .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ >> 5 files changed, 123 insertions(+), 4 deletions(-) >> create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua >> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result >> create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua >> >> diff --git a/src/box/box.cc b/src/box/box.cc >> index 18568df3b..9e824453d 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) >> { >> double deadline = ev_monotonic_now(loop()) + timeout; >> while (is_box_configured == false || box_is_ro() != ro) { >> - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) >> - return -1; >> - if (fiber_is_cancelled()) { >> - diag_set(FiberIsCancelled); >> + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { >> + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); >> return -1; >> } >> } >> diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c >> index 904a350d9..0c93c5842 100644 >> --- a/src/lib/core/fiber_cond.c >> +++ b/src/lib/core/fiber_cond.c >> @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) >> diag_set(TimedOut); >> return -1; >> } >> + if (fiber_is_cancelled()) { >> + if (diag_is_empty(diag_get())) diag_set(FiberIsCancelled); >> + return -1; >> + } >> return 0; >> } >> >> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua >> new file mode 100755 >> index 000000000..34711fb31 >> --- /dev/null >> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua >> @@ -0,0 +1,23 @@ >> +#!/usr/bin/env tarantool >> + >> +local tap = require('tap') >> +local fiber = require('fiber') >> +local test = tap.test("gh-5013-fiber-cancel") >> + >> +test:plan(2) >> + >> +local result = {} >> + >> +function test_f() >> + local cond = fiber.cond() >> + local res, err = pcall(cond.wait, cond) >> + result.res = res >> + result.err = err >> +end >> + >> +local f = fiber.create(test_f) >> +f:cancel() >> +fiber.yield() >> + >> +test:ok(result.res == false, tostring(result.res)) >> +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) >> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result >> new file mode 100644 >> index 000000000..eab0a5e4d >> --- /dev/null >> +++ b/test/box/net.box_fiber_cancel_gh-4834.result >> @@ -0,0 +1,65 @@ >> +-- test-run result file version 2 >> +remote = require 'net.box' >> + | --- >> + | ... >> +fiber = require 'fiber' >> + | --- >> + | ... >> +test_run = require('test_run').new() >> + | --- >> + | ... >> + >> +-- #4834: Cancelling fiber doesn't interrupt netbox operations >> +function infinite_call() fiber.channel(1):get() end >> + | --- >> + | ... >> +box.schema.func.create('infinite_call') >> + | --- >> + | ... >> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') >> + | --- >> + | ... >> + >> +error_msg = nil >> + | --- >> + | ... >> +test_run:cmd("setopt delimiter ';'") >> + | --- >> + | - true >> + | ... >> +function gh4834() >> + local cn = remote.connect(box.cfg.listen) >> + local f = fiber.new(function() >> + _, error_msg = pcall(cn.call, cn, 'infinite_call') >> + end) >> + f:set_joinable(true) >> + fiber.yield() >> + f:cancel() >> + f:join() >> + cn:close() >> +end; >> + | --- >> + | ... >> +test_run:cmd("setopt delimiter ''"); >> + | --- >> + | - true >> + | ... >> +gh4834() >> + | --- >> + | ... >> +error_msg >> + | --- >> + | - fiber is cancelled >> + | ... >> +box.schema.func.drop('infinite_call') >> + | --- >> + | ... >> +infinite_call = nil >> + | --- >> + | ... >> +channel = nil >> + | --- >> + | ... >> +error_msg = nil >> + | --- >> + | ... >> diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua >> new file mode 100644 >> index 000000000..06fb3ceac >> --- /dev/null >> +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua >> @@ -0,0 +1,29 @@ >> +remote = require 'net.box' >> +fiber = require 'fiber' >> +test_run = require('test_run').new() >> + >> +-- #4834: Cancelling fiber doesn't interrupt netbox operations >> +function infinite_call() fiber.channel(1):get() end >> +box.schema.func.create('infinite_call') >> +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') >> + >> +error_msg = nil >> +test_run:cmd("setopt delimiter ';'") >> +function gh4834() >> + local cn = remote.connect(box.cfg.listen) >> + local f = fiber.new(function() >> + _, error_msg = pcall(cn.call, cn, 'infinite_call') >> + end) >> + f:set_joinable(true) >> + fiber.yield() >> + f:cancel() >> + f:join() >> + cn:close() >> +end; >> +test_run:cmd("setopt delimiter ''"); >> +gh4834() >> +error_msg >> +box.schema.func.drop('infinite_call') >> +infinite_call = nil >> +channel = nil >> +error_msg = nil >> -- >> 2.29.2 > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-13 8:26 ` Leonid Vasiliev @ 2020-11-30 22:49 ` Alexander Turenko 0 siblings, 0 replies; 23+ messages in thread From: Alexander Turenko @ 2020-11-30 22:49 UTC (permalink / raw) To: Leonid Vasiliev; +Cc: tarantool-patches, Vladislav Shpilevoy > > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > > index 904a350d9..cc59eaafb 100644 > > --- a/src/lib/core/fiber_cond.c > > +++ b/src/lib/core/fiber_cond.c > > @@ -108,6 +108,11 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > > diag_set(TimedOut); > > return -1; > > } > > + if (fiber_is_cancelled()) { > > + if (diag_is_empty(diag_get())) > > + diag_set(FiberIsCancelled); > > I read your argumentation. AFAIU sounds like:"We have a lot of tests > that check an error in the diag in such case. Also, some user > applications can check the diag in a similar way. And stack diag can be used > only in the new versions of tarantool." So, it's true. > What I propose to think about: > - this is not simple logic. If a fiber has been cancelled, maybe an error > will be added to the diag, maybe no - 50/50. > - this behavior should be documented. > - ask Mons or Turenko for advice: "What do they think about it?" > All this up to you. The fiber_channel implementation (C level) convinced me. It is good to have the similar behaviour in fiber_cond and fiber_channel. I would only ask to document this subtle difference in our web documentation. Python document subtle changes (at least some of them) and it is nice for users. Example: Python 2: | Popen.send_signal(signal) | | Sends the signal `signal` to the child. Python 3: | Popen.send_signal(signal) | | Sends the signal `signal` to the child. | | Do nothing if the process completed. (It would be better to mark it with 'Since Python x.y.z', though.) That is the place, where Python 2 and Python 3 behaviour actually differs (when the popen implementation aware about a process termination, i.e. after <popen object>.poll()). It is important, when you writting a code that must work on both Python 2 and Python 3: you should catch OSError here. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos 2020-11-01 10:13 ` Oleg Babin 2020-11-04 10:00 ` Leonid Vasiliev @ 2020-11-16 22:12 ` Vladislav Shpilevoy 2020-11-25 21:32 ` Vladislav Shpilevoy ` (3 subsequent siblings) 6 siblings, 0 replies; 23+ messages in thread From: Vladislav Shpilevoy @ 2020-11-16 22:12 UTC (permalink / raw) To: sergos, tarantool-patches; +Cc: alexander.turenko Hi! Thanks for the patch! Please, change subsystem name to 'fiber:'. 'core:' is too general. We have tons of stuff in libcore in addition to fibers. > Before this patch fiber.cond():wait() just returns for cancelled > fiber. In contrast fiber.channel():get() threw "fiber is > canceled" error. > This patch unify behaviour of channels and condvars and also fixes > related net.box module problem - it was impossible to interrupt > net.box call with fiber.cancel because it used fiber.cond under > the hood. Test cases for both bugs are added. Netbox hangs not because of using fiber.cond. But because it never calls testcancel(). Normally all looped fibers should do that. Talking of compatibility - I think it always was supposed to throw. luaT_fiber_cond_wait() calls luaL_testcancel(), but only when fiber_cond_wait_timeout() returns not 0, which was never the case for cancellation. So it was supposed to throw, but nobody covered it with a test. See 2 comments below. > Closes #4834 > Closes #5013 > > Co-authored-by: Oleg Babin <olegrok@tarantool.org> > diff --git a/src/box/box.cc b/src/box/box.cc > index 18568df3b..29f74e94b 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -305,10 +305,9 @@ box_wait_ro(bool ro, double timeout) > { > double deadline = ev_monotonic_now(loop()) + timeout; > while (is_box_configured == false || box_is_ro() != ro) { > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > - return -1; > - if (fiber_is_cancelled()) { > - diag_set(FiberIsCancelled); > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > + if (fiber_is_cancelled()) > + diag_set(FiberIsCancelled); > return -1; > } > } > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > index 904a350d9..cc59eaafb 100644 > --- a/src/lib/core/fiber_cond.c > +++ b/src/lib/core/fiber_cond.c > @@ -108,6 +108,11 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > diag_set(TimedOut); > return -1; > } > + if (fiber_is_cancelled()) { > + if (diag_is_empty(diag_get())) > + diag_set(FiberIsCancelled); > + return -1; 1. Wtf? Why don't you set an error, when this is an error? And why do you do exactly the same in box_wait_ro() above? > + } > return 0; > }> diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result > new file mode 100644 > index 000000000..4ed04bb61 > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.result 2. This does not conform to our test file name pattern. Read this carefully: https://github.com/tarantool/tarantool/wiki/Code-review-procedure#testing ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos ` (2 preceding siblings ...) 2020-11-16 22:12 ` Vladislav Shpilevoy @ 2020-11-25 21:32 ` Vladislav Shpilevoy 2020-11-29 21:41 ` Sergey Ostanevich 2020-11-30 21:46 ` Alexander Turenko 2020-11-30 21:01 ` Vladislav Shpilevoy ` (2 subsequent siblings) 6 siblings, 2 replies; 23+ messages in thread From: Vladislav Shpilevoy @ 2020-11-25 21:32 UTC (permalink / raw) To: sergos, tarantool-patches; +Cc: alexander.turenko Hi! Thanks for the fixes! Technically the patch is good! See 3 non-technical comments below. > diff --git a/src/box/relay.cc b/src/box/relay.cc > index b68b45e00..a7bc2c6f7 100644 > --- a/src/box/relay.cc > +++ b/src/box/relay.cc > @@ -780,6 +771,16 @@ relay_subscribe_f(va_list ap) > cbus_endpoint_destroy(&relay->endpoint, cbus_process); > > relay_exit(relay); > + > + /* > + * Log the error that caused the relay to break the loop. > + * Don't clear the error for status reporting. > + */ > + assert(!diag_is_empty(&relay->diag)); > + diag_set_error(diag_get(), diag_last_error(&relay->diag)); > + diag_log(); > + say_crit("exiting the relay loop"); > + 1. I suggest you to move this to a separate commit, before you change fiber cond behaviour. Because it seems it is not related to conds really. And even if it would be related, still the issue looks like a separate bug. > return -1; > } > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua > new file mode 100755 > index 000000000..ca4ca2c90 > --- /dev/null > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua > @@ -0,0 +1,23 @@ > +#!/usr/bin/env tarantool > + > +local tap = require('tap') > +local fiber = require('fiber') > +local test = tap.test("gh-5013-fiber-cancel") > + > +test:plan(2) > + > +local result = {} > + > +function test_f() > + local cond = fiber.cond() > + local res, err = pcall(cond.wait, cond) > + result.res = res > + result.err = err > +end > + > +local f = fiber.create(test_f) > +f:cancel() > +fiber.yield() > + > +test:ok(result.res == false, 'expected result is false') > +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported') 2. I think you are also supposed to call os.exit with test:check() like other tap tests do. Otherwise it probably always ends with 0 code, and won't work properly when we will make tap tests non-diff based. > diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result > new file mode 100644 > index 000000000..cb631995c > --- /dev/null > +++ b/test/box/gh-4834-netbox-fiber-cancel.result > @@ -0,0 +1,62 @@ > +-- test-run result file version 2 > +remote = require('net.box') > + | --- > + | ... > +fiber = require('fiber') > + | --- > + | ... > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end 3. I noticed you never finish the fiber. So it hangs there forever or until a restart. Better not leave any test artifacts and finish it completely. Just save the channel to a global variable and put something into it in the end. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-25 21:32 ` Vladislav Shpilevoy @ 2020-11-29 21:41 ` Sergey Ostanevich 2020-11-30 21:46 ` Alexander Turenko 1 sibling, 0 replies; 23+ messages in thread From: Sergey Ostanevich @ 2020-11-29 21:41 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches, alexander.turenko Hi! Thank you for review! Force-pushed the sergos/gh-5013-fiber-cond branch, new patches are inline. Sergos > On 26 Nov 2020, at 00:32, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: > > Hi! Thanks for the fixes! > > Technically the patch is good! See 3 non-technical comments below. > >> diff --git a/src/box/relay.cc b/src/box/relay.cc >> index b68b45e00..a7bc2c6f7 100644 >> --- a/src/box/relay.cc >> +++ b/src/box/relay.cc >> @@ -780,6 +771,16 @@ relay_subscribe_f(va_list ap) >> cbus_endpoint_destroy(&relay->endpoint, cbus_process); >> >> relay_exit(relay); >> + >> + /* >> + * Log the error that caused the relay to break the loop. >> + * Don't clear the error for status reporting. >> + */ >> + assert(!diag_is_empty(&relay->diag)); >> + diag_set_error(diag_get(), diag_last_error(&relay->diag)); >> + diag_log(); >> + say_crit("exiting the relay loop"); >> + > > 1. I suggest you to move this to a separate commit, before you > change fiber cond behaviour. Because it seems it is not related > to conds really. And even if it would be related, still the > issue looks like a separate bug. I failed to make a test that is differs from the fiber_cond setting a diag. So I put it is as a preparation step and mentioned it in commit message. ——- The fiber_cond_wait() will set an error in case fiber is cancelled. As a result, the current diag in the fiber can be reset during the wal_clear_watcher(). To prevent such overwrite the diag copy from the relay into current fiber is moved to the exit of the relay_subscribe_f(). Part of #5013 --- src/box/relay.cc | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/src/box/relay.cc b/src/box/relay.cc index 1e77e0d9b..f62e925af 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -753,15 +753,6 @@ relay_subscribe_f(va_list ap) if (!relay->replica->anon) relay_send_is_raft_enabled(relay, &raft_enabler, false); - /* - * Log the error that caused the relay to break the loop. - * Don't clear the error for status reporting. - */ - assert(!diag_is_empty(&relay->diag)); - diag_set_error(diag_get(), diag_last_error(&relay->diag)); - diag_log(); - say_crit("exiting the relay loop"); - /* * Clear garbage collector trigger and WAL watcher. * trigger_clear() does nothing in case the triggers @@ -780,6 +771,16 @@ relay_subscribe_f(va_list ap) cbus_endpoint_destroy(&relay->endpoint, cbus_process); relay_exit(relay); + + /* + * Log the error that caused the relay to break the loop. + * Don't clear the error for status reporting. + */ + assert(!diag_is_empty(&relay->diag)); + diag_set_error(diag_get(), diag_last_error(&relay->diag)); + diag_log(); + say_crit("exiting the relay loop"); + return -1; } -- > >> return -1; >> } >> diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua >> new file mode 100755 >> index 000000000..ca4ca2c90 >> --- /dev/null >> +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua >> @@ -0,0 +1,23 @@ >> +#!/usr/bin/env tarantool >> + >> +local tap = require('tap') >> +local fiber = require('fiber') >> +local test = tap.test("gh-5013-fiber-cancel") >> + >> +test:plan(2) >> + >> +local result = {} >> + >> +function test_f() >> + local cond = fiber.cond() >> + local res, err = pcall(cond.wait, cond) >> + result.res = res >> + result.err = err >> +end >> + >> +local f = fiber.create(test_f) >> +f:cancel() >> +fiber.yield() >> + >> +test:ok(result.res == false, 'expected result is false') >> +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported') > > 2. I think you are also supposed to call os.exit with test:check() > like other tap tests do. Otherwise it probably always ends with 0 > code, and won't work properly when we will make tap tests non-diff > based. Done. > >> diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result >> new file mode 100644 >> index 000000000..cb631995c >> --- /dev/null >> +++ b/test/box/gh-4834-netbox-fiber-cancel.result >> @@ -0,0 +1,62 @@ >> +-- test-run result file version 2 >> +remote = require('net.box') >> + | --- >> + | ... >> +fiber = require('fiber') >> + | --- >> + | ... >> +test_run = require('test_run').new() >> + | --- >> + | ... >> + >> +-- #4834: Cancelling fiber doesn't interrupt netbox operations >> +function infinite_call() fiber.channel(1):get() end > > 3. I noticed you never finish the fiber. So it hangs there forever > or until a restart. Better not leave any test artifacts and finish > it completely. Just save the channel to a global variable and put > something into it in the end. The channel is created in the remote execution - I just kept it until the end of the routine and closed it. =========== Before this patch fiber.cond():wait() just returns for cancelled fiber. In contrast fiber.channel():get() throws "fiber is canceled" error. This patch unifies behaviour of channels and condvars. It also fixes a related net.box module problem #4834 since fiber.cond now performs test for fiber cancellation. Closes #4834 Closes #5013 Co-authored-by: Oleg Babin <olegrok@tarantool.org> @TarantoolBot document Title: fiber.cond():wait() throws if fiber is cancelled Currently fiber.cond():wait() throws an error if waiting fiber is cancelled. --- src/box/box.cc | 4 -- src/lib/core/fiber_cond.c | 4 ++ src/lib/core/fiber_cond.h | 2 +- test/app-tap/gh-5013-fiber-cancel.test.lua | 26 ++++++++ test/box/gh-4834-netbox-fiber-cancel.result | 62 +++++++++++++++++++ test/box/gh-4834-netbox-fiber-cancel.test.lua | 28 +++++++++ 6 files changed, 121 insertions(+), 5 deletions(-) create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua create mode 100644 test/box/gh-4834-netbox-fiber-cancel.result create mode 100644 test/box/gh-4834-netbox-fiber-cancel.test.lua diff --git a/src/box/box.cc b/src/box/box.cc index 1f7dec362..e02be2c6f 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -307,10 +307,6 @@ box_wait_ro(bool ro, double timeout) while (is_box_configured == false || box_is_ro() != ro) { if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) return -1; - if (fiber_is_cancelled()) { - diag_set(FiberIsCancelled); - return -1; - } } return 0; } diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c index 904a350d9..71bb2d04d 100644 --- a/src/lib/core/fiber_cond.c +++ b/src/lib/core/fiber_cond.c @@ -108,6 +108,10 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) diag_set(TimedOut); return -1; } + if (fiber_is_cancelled()) { + diag_set(FiberIsCancelled); + return -1; + } return 0; } diff --git a/src/lib/core/fiber_cond.h b/src/lib/core/fiber_cond.h index 87c6f2ca2..2662e0654 100644 --- a/src/lib/core/fiber_cond.h +++ b/src/lib/core/fiber_cond.h @@ -114,7 +114,7 @@ fiber_cond_broadcast(struct fiber_cond *cond); * @param cond condition * @param timeout timeout in seconds * @retval 0 on fiber_cond_signal() call or a spurious wake up - * @retval -1 on timeout, diag is set to TimedOut + * @retval -1 on timeout or fiber cancellation, diag is set */ int fiber_cond_wait_timeout(struct fiber_cond *cond, double timeout); diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua new file mode 100755 index 000000000..91c6775dc --- /dev/null +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua @@ -0,0 +1,26 @@ +#!/usr/bin/env tarantool + +local tap = require('tap') +local fiber = require('fiber') +local test = tap.test("gh-5013-fiber-cancel") + +test:plan(2) + +local result = {} + +function test_f() + local cond = fiber.cond() + local res, err = pcall(cond.wait, cond) + result.res = res + result.err = err +end + +local f = fiber.create(test_f) +f:cancel() +fiber.yield() + +print(require('yaml').encode(result)) + +test:ok(result.res == false, 'expected result is false') +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported') +os.exit(test:check() and 0 or 1) diff --git a/test/box/gh-4834-netbox-fiber-cancel.result b/test/box/gh-4834-netbox-fiber-cancel.result new file mode 100644 index 000000000..5af404e82 --- /dev/null +++ b/test/box/gh-4834-netbox-fiber-cancel.result @@ -0,0 +1,62 @@ +-- test-run result file version 2 +remote = require('net.box') + | --- + | ... +fiber = require('fiber') + | --- + | ... +test_run = require('test_run').new() + | --- + | ... + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() local channel = fiber.channel(1) pcall(channel:get()) channel.close() end + | --- + | ... +box.schema.func.create('infinite_call') + | --- + | ... +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + | --- + | ... + +error_msg = nil + | --- + | ... +test_run:cmd("setopt delimiter ';'") + | --- + | - true + | ... +function netbox_runner() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; + | --- + | ... +test_run:cmd("setopt delimiter ''"); + | --- + | - true + | ... +netbox_runner() + | --- + | ... +error_msg + | --- + | - fiber is cancelled + | ... +box.schema.func.drop('infinite_call') + | --- + | ... +infinite_call = nil + | --- + | ... +error_msg = nil + | --- + | ... diff --git a/test/box/gh-4834-netbox-fiber-cancel.test.lua b/test/box/gh-4834-netbox-fiber-cancel.test.lua new file mode 100644 index 000000000..59963ba91 --- /dev/null +++ b/test/box/gh-4834-netbox-fiber-cancel.test.lua @@ -0,0 +1,28 @@ +remote = require('net.box') +fiber = require('fiber') +test_run = require('test_run').new() + +-- #4834: Cancelling fiber doesn't interrupt netbox operations +function infinite_call() local channel = fiber.channel(1) pcall(channel:get()) channel.close() end +box.schema.func.create('infinite_call') +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') + +error_msg = nil +test_run:cmd("setopt delimiter ';'") +function netbox_runner() + local cn = remote.connect(box.cfg.listen) + local f = fiber.new(function() + _, error_msg = pcall(cn.call, cn, 'infinite_call') + end) + f:set_joinable(true) + fiber.yield() + f:cancel() + f:join() + cn:close() +end; +test_run:cmd("setopt delimiter ''"); +netbox_runner() +error_msg +box.schema.func.drop('infinite_call') +infinite_call = nil +error_msg = nil -- 2.24.3 (Apple Git-128) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-11-25 21:32 ` Vladislav Shpilevoy 2020-11-29 21:41 ` Sergey Ostanevich @ 2020-11-30 21:46 ` Alexander Turenko 1 sibling, 0 replies; 23+ messages in thread From: Alexander Turenko @ 2020-11-30 21:46 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tarantool-patches > > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua > > new file mode 100755 > > index 000000000..ca4ca2c90 > > --- /dev/null > > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua > > @@ -0,0 +1,23 @@ > > +#!/usr/bin/env tarantool > > + > > +local tap = require('tap') > > +local fiber = require('fiber') > > +local test = tap.test("gh-5013-fiber-cancel") > > + > > +test:plan(2) > > + > > +local result = {} > > + > > +function test_f() > > + local cond = fiber.cond() > > + local res, err = pcall(cond.wait, cond) > > + result.res = res > > + result.err = err > > +end > > + > > +local f = fiber.create(test_f) > > +f:cancel() > > +fiber.yield() > > + > > +test:ok(result.res == false, 'expected result is false') > > +test:ok(tostring(result.err) == 'fiber is cancelled', 'fiber cancellation should be reported') > > 2. I think you are also supposed to call os.exit with test:check() > like other tap tests do. Otherwise it probably always ends with 0 > code, and won't work properly when we will make tap tests non-diff > based. Just side note. TAP13 tests are already not diff-based. test-run parses and verifies TAP13 output if a result file is not present. test-run also checks the exit code of a process. (That's all are about 'core = app' test suites.) But I anyway think that it is good property to return non-zero exit code from a test if it is not passed. When all tests are written this way, we have more freedom around ways to run a test. Who knows, maybe we'll want to run some tests in a very restricted environment, where it will be hard to get all this python stuff workable? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos ` (3 preceding siblings ...) 2020-11-25 21:32 ` Vladislav Shpilevoy @ 2020-11-30 21:01 ` Vladislav Shpilevoy 2020-12-02 10:58 ` Alexander V. Tikhonov 2020-12-02 22:18 ` Vladislav Shpilevoy 6 siblings, 0 replies; 23+ messages in thread From: Vladislav Shpilevoy @ 2020-11-30 21:01 UTC (permalink / raw) To: sergos, tarantool-patches, Alexander V. Tikhonov; +Cc: alexander.turenko Hi! Thanks for the patchset! LGTM. Sasha (Tikh.), please, validate the branch is good to push. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos ` (4 preceding siblings ...) 2020-11-30 21:01 ` Vladislav Shpilevoy @ 2020-12-02 10:58 ` Alexander V. Tikhonov 2020-12-02 22:18 ` Vladislav Shpilevoy 6 siblings, 0 replies; 23+ messages in thread From: Alexander V. Tikhonov @ 2020-12-02 10:58 UTC (permalink / raw) To: sergos; +Cc: tarantool-patches Hi, thanks for the patch, as I see no new degradation found in gitlab-ci testing commit criteria pipeline [1], patch LGTM. [1] - https://gitlab.com/tarantool/tarantool/-/pipelines/222853269 On Sat, Oct 31, 2020 at 07:29:11PM +0300, sergos@tarantool.org wrote: > From: Sergey Ostanevich <sergos@tarantool.org> > > Hi! > > Thanks to Oleg Babin's comment I found there's no need to update any lua > interfaces, since the reason was in C implementation. Also, there is one > place the change is played, so after I fixed it I got complete testing > pass. > Force-pushed branch, v2 patch attached. > > > > Before this patch fiber.cond():wait() just returns for cancelled > fiber. In contrast fiber.channel():get() threw "fiber is > canceled" error. > This patch unify behaviour of channels and condvars and also fixes > related net.box module problem - it was impossible to interrupt > net.box call with fiber.cancel because it used fiber.cond under > the hood. Test cases for both bugs are added. > > Closes #4834 > Closes #5013 > > Co-authored-by: Oleg Babin <olegrok@tarantool.org> > > @TarantoolBot document > Title: fiber.cond():wait() throws if fiber is cancelled > > Currently fiber.cond():wait() throws an error if waiting fiber is > cancelled like in case with fiber.channel():get(). > --- > > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond > Issue: https://github.com/tarantool/tarantool/issues/5013 > > src/box/box.cc | 6 +- > src/lib/core/fiber_cond.c | 1 + > test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ > test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ > .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ > 5 files changed, 120 insertions(+), 4 deletions(-) > create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua > > diff --git a/src/box/box.cc b/src/box/box.cc > index 18568df3b..bfa1051f9 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) > { > double deadline = ev_monotonic_now(loop()) + timeout; > while (is_box_configured == false || box_is_ro() != ro) { > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > - return -1; > - if (fiber_is_cancelled()) { > - diag_set(FiberIsCancelled); > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); > return -1; > } > } > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > index 904a350d9..b0645069e 100644 > --- a/src/lib/core/fiber_cond.c > +++ b/src/lib/core/fiber_cond.c > @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > diag_set(TimedOut); > return -1; > } > + if (fiber_is_cancelled()) return -1; > return 0; > } > > diff --git a/test/app-tap/gh-5013-fiber-cancel.test.lua b/test/app-tap/gh-5013-fiber-cancel.test.lua > new file mode 100755 > index 000000000..ae805c5bf > --- /dev/null > +++ b/test/app-tap/gh-5013-fiber-cancel.test.lua > @@ -0,0 +1,23 @@ > +#!/usr/bin/env tarantool > + > +local tap = require('tap') > +local fiber = require('fiber') > +local test = tap.test("gh-5013-fiber-cancel") > + > +test:plan(2) > + > +local result = {} > + > +function test_f() > + local cond = fiber.cond() > + local res, err = pcall(cond.wait, cond) > + result.res = res > + result.err = err > +end > + > +local f = fiber.create(test_f) > +f:cancel() > +fiber.yield() > + > +test:ok(result.res == false, tostring(result.res)) > +test:ok(tostring(result.err) == 'fiber is cancelled', tostring(result.err)) > diff --git a/test/box/net.box_fiber_cancel_gh-4834.result b/test/box/net.box_fiber_cancel_gh-4834.result > new file mode 100644 > index 000000000..eab0a5e4d > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.result > @@ -0,0 +1,65 @@ > +-- test-run result file version 2 > +remote = require 'net.box' > + | --- > + | ... > +fiber = require 'fiber' > + | --- > + | ... > +test_run = require('test_run').new() > + | --- > + | ... > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > + | --- > + | ... > +box.schema.func.create('infinite_call') > + | --- > + | ... > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + | --- > + | ... > + > +error_msg = nil > + | --- > + | ... > +test_run:cmd("setopt delimiter ';'") > + | --- > + | - true > + | ... > +function gh4834() > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > + | --- > + | ... > +test_run:cmd("setopt delimiter ''"); > + | --- > + | - true > + | ... > +gh4834() > + | --- > + | ... > +error_msg > + | --- > + | - fiber is cancelled > + | ... > +box.schema.func.drop('infinite_call') > + | --- > + | ... > +infinite_call = nil > + | --- > + | ... > +channel = nil > + | --- > + | ... > +error_msg = nil > + | --- > + | ... > diff --git a/test/box/net.box_fiber_cancel_gh-4834.test.lua b/test/box/net.box_fiber_cancel_gh-4834.test.lua > new file mode 100644 > index 000000000..06fb3ceac > --- /dev/null > +++ b/test/box/net.box_fiber_cancel_gh-4834.test.lua > @@ -0,0 +1,29 @@ > +remote = require 'net.box' > +fiber = require 'fiber' > +test_run = require('test_run').new() > + > +-- #4834: Cancelling fiber doesn't interrupt netbox operations > +function infinite_call() fiber.channel(1):get() end > +box.schema.func.create('infinite_call') > +box.schema.user.grant('guest', 'execute', 'function', 'infinite_call') > + > +error_msg = nil > +test_run:cmd("setopt delimiter ';'") > +function gh4834() > + local cn = remote.connect(box.cfg.listen) > + local f = fiber.new(function() > + _, error_msg = pcall(cn.call, cn, 'infinite_call') > + end) > + f:set_joinable(true) > + fiber.yield() > + f:cancel() > + f:join() > + cn:close() > +end; > +test_run:cmd("setopt delimiter ''"); > +gh4834() > +error_msg > +box.schema.func.drop('infinite_call') > +infinite_call = nil > +channel = nil > +error_msg = nil > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond 2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos ` (5 preceding siblings ...) 2020-12-02 10:58 ` Alexander V. Tikhonov @ 2020-12-02 22:18 ` Vladislav Shpilevoy 6 siblings, 0 replies; 23+ messages in thread From: Vladislav Shpilevoy @ 2020-12-02 22:18 UTC (permalink / raw) To: sergos, tarantool-patches; +Cc: alexander.turenko Pushed to master, 2.6, 2.5, and 1.10. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2020-12-02 22:18 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-31 16:29 [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond sergos 2020-11-01 10:13 ` Oleg Babin 2020-11-03 10:20 ` Sergey Ostanevich 2020-11-03 10:27 ` Oleg Babin 2020-11-04 10:00 ` Leonid Vasiliev 2020-11-16 22:12 ` Vladislav Shpilevoy 2020-11-18 22:05 ` Sergey Ostanevich 2020-11-22 16:01 ` Vladislav Shpilevoy 2020-11-23 21:47 ` Sergey Ostanevich 2020-11-24 7:31 ` Sergey Ostanevich 2020-11-04 10:00 ` Leonid Vasiliev 2020-11-05 20:42 ` Sergey Ostanevich 2020-11-10 21:16 ` Sergey Ostanevich 2020-11-12 20:15 ` Sergey Ostanevich 2020-11-13 8:26 ` Leonid Vasiliev 2020-11-30 22:49 ` Alexander Turenko 2020-11-16 22:12 ` Vladislav Shpilevoy 2020-11-25 21:32 ` Vladislav Shpilevoy 2020-11-29 21:41 ` Sergey Ostanevich 2020-11-30 21:46 ` Alexander Turenko 2020-11-30 21:01 ` Vladislav Shpilevoy 2020-12-02 10:58 ` Alexander V. Tikhonov 2020-12-02 22:18 ` Vladislav Shpilevoy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox