From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp54.i.mail.ru (smtp54.i.mail.ru [217.69.128.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 3F73C469719 for ; Sun, 1 Nov 2020 13:13:11 +0300 (MSK) References: <20201031162911.61876-1-sergos@tarantool.org> From: Oleg Babin Message-ID: Date: Sun, 1 Nov 2020 13:13:09 +0300 MIME-Version: 1.0 In-Reply-To: <20201031162911.61876-1-sergos@tarantool.org> Content-Type: multipart/alternative; boundary="------------BDBA15BE4D1E18503F067B6E" Content-Language: en-GB Subject: Re: [Tarantool-patches] [PATCH v2] core: handle fiber cancellation for fiber.cond List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: sergos@tarantool.org, tarantool-patches@dev.tarantool.org Cc: v.shpilevoy@tarantool.org, alexander.turenko@tarantool.org This is a multi-part message in MIME format. --------------BDBA15BE4D1E18503F067B6E Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Hi! Thanks for changes. See two comments below. On 31/10/2020 19:29, sergos@tarantool.org wrote: > From: Sergey Ostanevich > > Hi! > > Thanks to Oleg Babin's comment I found there's no need to update any lua > interfaces, since the reason was in C implementation. Also, there is one > place the change is played, so after I fixed it I got complete testing > pass. > Force-pushed branch, v2 patch attached. > > > > Before this patch fiber.cond():wait() just returns for cancelled > fiber. In contrast fiber.channel():get() threw "fiber is > canceled" error. > This patch unify behaviour of channels and condvars and also fixes > related net.box module problem - it was impossible to interrupt > net.box call with fiber.cancel because it used fiber.cond under > the hood. Test cases for both bugs are added. > > Closes #4834 > Closes #5013 > > Co-authored-by: Oleg Babin > > @TarantoolBot document > Title: fiber.cond():wait() throws if fiber is cancelled > > Currently fiber.cond():wait() throws an error if waiting fiber is > cancelled like in case with fiber.channel():get(). > --- > > Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond > Issue: https://github.com/tarantool/tarantool/issues/5013 > > src/box/box.cc | 6 +- > src/lib/core/fiber_cond.c | 1 + > test/app-tap/gh-5013-fiber-cancel.test.lua | 23 +++++++ > test/box/net.box_fiber_cancel_gh-4834.result | 65 +++++++++++++++++++ > .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++ > 5 files changed, 120 insertions(+), 4 deletions(-) > create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result > create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua > > diff --git a/src/box/box.cc b/src/box/box.cc > index 18568df3b..bfa1051f9 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout) > { > double deadline = ev_monotonic_now(loop()) + timeout; > while (is_box_configured == false || box_is_ro() != ro) { > - if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) > - return -1; > - if (fiber_is_cancelled()) { > - diag_set(FiberIsCancelled); > + if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) { > + if (fiber_is_cancelled()) diag_set(FiberIsCancelled); Here you use spaces instead of tabs. > return -1; > } > } > diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c > index 904a350d9..b0645069e 100644 > --- a/src/lib/core/fiber_cond.c > +++ b/src/lib/core/fiber_cond.c > @@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout) > diag_set(TimedOut); > return -1; > } > + if (fiber_is_cancelled()) return -1; It's qute strange to return -1 here but don't set a reason to diagnostic area. Look how it is done for channels (https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180). There is some inconsistency without it. I've looked a bit deeper at the failure I reported before. Seems the problem is in "cbus_unpair" function. The problem appears only if FiberIsCancelled is setted to diag area in "fiber_cond_wait" function. This is where my expertise ends, as I'm not familiar with "cbus". However I have some minds how it could be eliminated. Let's declare cbus_unpair fiber as is not cancellable and stop report is_cancellable flag for non-cancellable fibers. See some PoC below: diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c index 5d91fb948..4167c756a 100644 --- a/src/lib/core/cbus.c +++ b/src/lib/core/cbus.c @@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe *src_pipe,      msg.unpair_arg = unpair_arg;      msg.src_pipe = src_pipe;      msg.complete = false; +    fiber_set_cancellable(false);      fiber_cond_create(&msg.cond);      cpipe_push(dest_pipe, &msg.cmsg); @@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe *src_pipe,          fiber_cond_wait(&msg.cond);      } +    fiber_set_cancellable(true);      cpipe_destroy(dest_pipe);  } diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 483ae3ce1..8100c9da6 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno)  bool  fiber_is_cancelled(void)  { +    if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) { +        return false; +    }      return fiber()->flags & FIBER_IS_CANCELLED;  } To be honest I've not checked such change carefully and also have segfault at replication/gc.test.lua for "memtx" engine. Finally, feel free to ignore this comment I hope Vlad or Sasha can give you more accurate and correct advices. --------------BDBA15BE4D1E18503F067B6E Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: 8bit

Hi! Thanks for changes. See two comments below.

On 31/10/2020 19:29, sergos@tarantool.org wrote:
From: Sergey Ostanevich <sergos@tarantool.org>

Hi!

Thanks to Oleg Babin's comment I found there's no need to update any lua
interfaces, since the reason was in C implementation. Also, there is one
place the change is played, so after I fixed it I got complete testing 
pass. 
Force-pushed branch, v2 patch attached.



Before this patch fiber.cond():wait() just returns for cancelled
fiber. In contrast fiber.channel():get() threw "fiber is
canceled" error.
This patch unify behaviour of channels and condvars and also fixes
related net.box module problem - it was impossible to interrupt
net.box call with fiber.cancel because it used fiber.cond under
the hood. Test cases for both bugs are added.

Closes #4834
Closes #5013

Co-authored-by: Oleg Babin <olegrok@tarantool.org>

@TarantoolBot document
Title: fiber.cond():wait() throws if fiber is cancelled

Currently fiber.cond():wait() throws an error if waiting fiber is
cancelled like in case with fiber.channel():get().
---

Github: https://gitlab.com/tarantool/tarantool/-/commits/sergos/gh-5013-fiber-cond                                                                                                                          
Issue: https://github.com/tarantool/tarantool/issues/5013                                                                                                                                                   

 src/box/box.cc                                |  6 +-
 src/lib/core/fiber_cond.c                     |  1 +
 test/app-tap/gh-5013-fiber-cancel.test.lua    | 23 +++++++
 test/box/net.box_fiber_cancel_gh-4834.result  | 65 +++++++++++++++++++
 .../box/net.box_fiber_cancel_gh-4834.test.lua | 29 +++++++++
 5 files changed, 120 insertions(+), 4 deletions(-)
 create mode 100755 test/app-tap/gh-5013-fiber-cancel.test.lua
 create mode 100644 test/box/net.box_fiber_cancel_gh-4834.result
 create mode 100644 test/box/net.box_fiber_cancel_gh-4834.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index 18568df3b..bfa1051f9 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -305,10 +305,8 @@ box_wait_ro(bool ro, double timeout)
 {
 	double deadline = ev_monotonic_now(loop()) + timeout;
 	while (is_box_configured == false || box_is_ro() != ro) {
-		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0)
-			return -1;
-		if (fiber_is_cancelled()) {
-			diag_set(FiberIsCancelled);
+		if (fiber_cond_wait_deadline(&ro_cond, deadline) != 0) {
+                        if (fiber_is_cancelled()) diag_set(FiberIsCancelled);

Here you use spaces instead of tabs.

 			return -1;
 		}
 	}
diff --git a/src/lib/core/fiber_cond.c b/src/lib/core/fiber_cond.c
index 904a350d9..b0645069e 100644
--- a/src/lib/core/fiber_cond.c
+++ b/src/lib/core/fiber_cond.c
@@ -108,6 +108,7 @@ fiber_cond_wait_timeout(struct fiber_cond *c, double timeout)
 		diag_set(TimedOut);
 		return -1;
 	}
+	if (fiber_is_cancelled()) return -1;

It's qute strange to return -1 here but don't set a reason to diagnostic area. Look how it is done for channels

(https://github.com/tarantool/tarantool/blob/42c64d06d5d1a3ec937b3c596af083a672a68ad8/src/lib/core/fiber_channel.c#L180).

There is some inconsistency without it.

I've looked a bit deeper at the failure I reported before. Seems the problem is in "cbus_unpair" function.

The problem appears only if FiberIsCancelled is setted to diag area in "fiber_cond_wait" function.

This is where my expertise ends, as I'm not familiar with "cbus". However I have some minds how it could be eliminated.

Let's declare cbus_unpair fiber as is not cancellable and stop report is_cancellable flag for non-cancellable fibers. See some PoC below:


diff --git a/src/lib/core/cbus.c b/src/lib/core/cbus.c
index 5d91fb948..4167c756a 100644
--- a/src/lib/core/cbus.c
+++ b/src/lib/core/cbus.c
@@ -630,6 +630,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe *src_pipe,
     msg.unpair_arg = unpair_arg;
     msg.src_pipe = src_pipe;
     msg.complete = false;
+    fiber_set_cancellable(false);
     fiber_cond_create(&msg.cond);
 
     cpipe_push(dest_pipe, &msg.cmsg);
@@ -643,6 +644,7 @@ cbus_unpair(struct cpipe *dest_pipe, struct cpipe *src_pipe,
         fiber_cond_wait(&msg.cond);
     }
 
+    fiber_set_cancellable(true);
     cpipe_destroy(dest_pipe);
 }
 
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3ce1..8100c9da6 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -553,6 +553,9 @@ fiber_set_cancellable(bool yesno)
 bool
 fiber_is_cancelled(void)
 {
+    if ((fiber()->flags & FIBER_IS_CANCELLABLE) == 0) {
+        return false;
+    }
     return fiber()->flags & FIBER_IS_CANCELLED;
 }


To be honest I've not checked such change carefully and also have segfault at replication/gc.test.lua for "memtx" engine.

Finally, feel free to ignore this comment I hope Vlad or Sasha can give you more accurate and correct advices.

--------------BDBA15BE4D1E18503F067B6E--