* [Tarantool-patches] [PATCH 1/2] fiber: make wakeup in Lua and C nop on self
2021-04-23 23:15 [Tarantool-patches] [PATCH 0/2] fiber_wakeup() nop on self Vladislav Shpilevoy via Tarantool-patches
@ 2021-04-23 23:15 ` Vladislav Shpilevoy via Tarantool-patches
2021-04-25 9:21 ` Serge Petrenko via Tarantool-patches
2021-04-23 23:15 ` [Tarantool-patches] [PATCH 2/2] fiber: use wakeup safely on self everywhere Vladislav Shpilevoy via Tarantool-patches
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-04-23 23:15 UTC (permalink / raw)
To: tarantool-patches, gorcunov, sergepetrenko
fiber.wakeup() in Lua and fiber_wakeup() in C could lead to a
crash or undefined behaviour when called on the currently running
fiber.
In particular, if after wakeup was started a new fiber in a
blocking way (fiber.create() and fiber_start()) it would crash in
debug build, and lead to unknown results in release.
If after wakeup was made a sleep with non-zero timeout or an
infinite yield (fiber_yield()), the fiber woke up in the same
event loop iteration regardless of any timeout or other wakeups.
It was a spurious wakeup, which is not expected in most of the
places internally.
The patch makes the wakeup nop on the current fiber making it safe
to use anywhere.
Closes #5292
Closes #6043
@TarantoolBot document
Title: fiber.wakeup() in Lua and fiber_wakeup() in C are nop on self
In Lua `fiber.wakeup()` being called on the current fiber does not
do anything, safe to use. The same for `fiber_wakeup()` in C.
---
.../unreleased/gh-6043-fiber-wakeup-self.md | 33 +++++++
src/lib/core/fiber.c | 85 +++++++++++--------
src/lib/core/fiber.h | 6 +-
.../gh-6043-fiber-wakeup-self.test.lua | 50 +++++++++++
test/unit/fiber.cc | 33 ++++++-
test/unit/fiber.result | 2 +
test/unit/fiber_stress.cc | 6 +-
7 files changed, 171 insertions(+), 44 deletions(-)
create mode 100644 changelogs/unreleased/gh-6043-fiber-wakeup-self.md
create mode 100755 test/app-tap/gh-6043-fiber-wakeup-self.test.lua
diff --git a/changelogs/unreleased/gh-6043-fiber-wakeup-self.md b/changelogs/unreleased/gh-6043-fiber-wakeup-self.md
new file mode 100644
index 000000000..5869e2696
--- /dev/null
+++ b/changelogs/unreleased/gh-6043-fiber-wakeup-self.md
@@ -0,0 +1,33 @@
+## bugfix/core
+
+* **[Breaking change]** `fiber.wakeup()` in Lua and `fiber_wakeup()` in C became
+ NOP on the currently running fiber. Previously they allowed to "ignore" the
+ next yield or sleep leading to unexpected spurious wakeups. Could lead to a
+ crash (in debug build) or undefined behaviour (in release build) if called
+ right before `fiber.create()` in Lua or `fiber_start()` in C (gh-6043).
+
+ There was a single usecase for that - reschedule in the same event loop
+ iteration which is not the same as `fiber.sleep(0)` in Lua and
+ `fiber_sleep(0)` in C. Could be done in C like that:
+ ```C
+ fiber_wakeup(fiber_self());
+ fiber_yield();
+ ```
+ and in Lua like that:
+ ```Lua
+ fiber.self():wakeup()
+ fiber.yield()
+ ```
+ Now to get the same effect in C use `fiber_reschedule()`. In Lua it is now
+ simply impossible to reschedule the current fiber in the same event loop
+ iteration directly. But still can reschedule self through a second fiber like
+ this (**never use it, please**):
+ ```Lua
+ local self = fiber.self()
+ fiber.new(function() self:wakeup() end)
+ fiber.sleep(0)
+ ```
+
+----
+Breaking change: `fiber.wakeup()` in Lua and `fiber_wakeup()` in C became NOP on
+the currently running fiber.
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index f8b85d99d..cb6295171 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -396,10 +396,13 @@ fiber_call_impl(struct fiber *callee)
assert(rlist_empty(&callee->state));
assert(caller);
assert(caller != callee);
+ assert((caller->flags & FIBER_IS_RUNNING) != 0);
+ assert((callee->flags & FIBER_IS_RUNNING) == 0);
+ caller->flags &= ~FIBER_IS_RUNNING;
cord->fiber = callee;
+ callee->flags = (callee->flags & ~FIBER_IS_READY) | FIBER_IS_RUNNING;
- callee->flags &= ~FIBER_IS_READY;
ASAN_START_SWITCH_FIBER(asan_state, 1,
callee->stack,
callee->stack_size);
@@ -443,19 +446,9 @@ fiber_checkstack(void)
return false;
}
-/**
- * Interrupt a synchronous wait of a fiber inside the event loop.
- * We do so by keeping an "async" event in every fiber, solely
- * for this purpose, and raising this event here.
- *
- * @note: if this is sent to self, followed by a fiber_yield()
- * call, it simply reschedules the fiber after other ready
- * fibers in the same event loop iteration.
- */
-void
-fiber_wakeup(struct fiber *f)
+static void
+fiber_make_ready(struct fiber *f)
{
- assert(! (f->flags & FIBER_IS_DEAD));
/**
* Do nothing if the fiber is already in cord->ready
* list *or* is in the call chain created by
@@ -464,21 +457,12 @@ fiber_wakeup(struct fiber *f)
* but the same game is deadly when the fiber is in
* the callee list created by fiber_schedule_list().
*
- * To put it another way, fiber_wakeup() is a 'request' to
+ * To put it another way, fiber_make_ready() is a 'request' to
* schedule the fiber for execution, and once it is executing
- * a wakeup request is considered complete and it must be
+ * the 'make ready' request is considered complete and it must be
* removed.
- *
- * A dead fiber can be lingering in the cord fiber list
- * if it is joinable. This makes it technically possible
- * to schedule it. We would never make such a mistake
- * in our own code, hence the assert above. But as long
- * as fiber.wakeup() is a part of public Lua API, an
- * external rock can mess things up. Ignore such attempts
- * as well.
*/
- if (f->flags & (FIBER_IS_READY | FIBER_IS_DEAD))
- return;
+ assert((f->flags & (FIBER_IS_DEAD | FIBER_IS_READY)) == 0);
struct cord *cord = cord();
if (rlist_empty(&cord->ready)) {
/*
@@ -503,6 +487,23 @@ fiber_wakeup(struct fiber *f)
f->flags |= FIBER_IS_READY;
}
+void
+fiber_wakeup(struct fiber *f)
+{
+ /*
+ * DEAD is checked both in the assertion and in the release build
+ * because it should not ever happen, at least internally. But in some
+ * user modules it might happen, and better ignore such fibers.
+ * Especially since this was allowed for quite some time in the public
+ * API and need to keep it if it costs nothing, for backward
+ * compatibility.
+ */
+ assert((f->flags & FIBER_IS_DEAD) == 0);
+ const int no_flags = FIBER_IS_READY | FIBER_IS_DEAD | FIBER_IS_RUNNING;
+ if ((f->flags & no_flags) == 0)
+ fiber_make_ready(f);
+}
+
/** Cancel the subject fiber.
*
* Note: cancelation is asynchronous. Use fiber_join() to wait for the
@@ -521,7 +522,6 @@ void
fiber_cancel(struct fiber *f)
{
assert(f->fid != 0);
- struct fiber *self = fiber();
/**
* Do nothing if the fiber is dead, since cancelling
* the fiber would clear the diagnostics area and
@@ -535,10 +535,8 @@ fiber_cancel(struct fiber *f)
/**
* Don't wake self and zombies.
*/
- if (f != self) {
- if (f->flags & FIBER_IS_CANCELLABLE)
- fiber_wakeup(f);
- }
+ if ((f->flags & FIBER_IS_CANCELLABLE) != 0)
+ fiber_wakeup(f);
}
/**
@@ -602,7 +600,14 @@ fiber_clock64(void)
void
fiber_reschedule(void)
{
- fiber_wakeup(fiber());
+ struct fiber *f = fiber();
+ /*
+ * The current fiber can't be dead, the flag is set when the fiber
+ * function returns. Can't be ready, because such status is assigned
+ * only to the queued fibers in the ready-list.
+ */
+ assert((f->flags & (FIBER_IS_READY | FIBER_IS_DEAD)) == 0);
+ fiber_make_ready(f);
fiber_yield();
}
@@ -686,8 +691,13 @@ fiber_yield(void)
assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
assert(! (callee->flags & FIBER_IS_DEAD));
+ assert((caller->flags & FIBER_IS_RUNNING) != 0);
+ assert((callee->flags & FIBER_IS_RUNNING) == 0);
+
+ caller->flags &= ~FIBER_IS_RUNNING;
cord->fiber = callee;
- callee->flags &= ~FIBER_IS_READY;
+ callee->flags = (callee->flags & ~FIBER_IS_READY) | FIBER_IS_RUNNING;
+
ASAN_START_SWITCH_FIBER(asan_state,
(caller->flags & FIBER_IS_DEAD) == 0,
callee->stack,
@@ -748,10 +758,6 @@ fiber_sleep(double delay)
if (delay == 0) {
ev_idle_start(loop(), &cord()->idle_event);
}
- /*
- * We don't use fiber_wakeup() here to ensure there is
- * no infinite wakeup loop in case of fiber_sleep(0).
- */
fiber_yield_timeout(delay);
if (delay == 0) {
@@ -864,7 +870,11 @@ fiber_reset(struct fiber *fiber)
{
rlist_create(&fiber->on_yield);
rlist_create(&fiber->on_stop);
- fiber->flags = FIBER_DEFAULT_FLAGS;
+ /*
+ * Preserve the running flag if set. Reset might be called on the
+ * current fiber when it is recycled.
+ */
+ fiber->flags = FIBER_DEFAULT_FLAGS | (fiber->flags & FIBER_IS_RUNNING);
#if ENABLE_FIBER_TOP
clock_stat_reset(&fiber->clock_stat);
#endif /* ENABLE_FIBER_TOP */
@@ -1449,6 +1459,7 @@ cord_create(struct cord *cord, const char *name)
cord->sched.name = NULL;
fiber_set_name(&cord->sched, "sched");
cord->fiber = &cord->sched;
+ cord->sched.flags |= FIBER_IS_RUNNING;
cord->max_fid = FIBER_ID_MAX_RESERVED;
/*
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index b9eeef697..586bc9433 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -162,6 +162,10 @@ enum {
* This flag is set when fiber uses custom stack size.
*/
FIBER_CUSTOM_STACK = 1 << 5,
+ /**
+ * The flag is set for the fiber currently being executed by the cord.
+ */
+ FIBER_IS_RUNNING = 1 << 6,
FIBER_DEFAULT_FLAGS = FIBER_IS_CANCELLABLE
};
@@ -279,7 +283,7 @@ API_EXPORT void
fiber_start(struct fiber *callee, ...);
/**
- * Interrupt a synchronous wait of a fiber
+ * Interrupt a synchronous wait of a fiber. Nop for the currently running fiber.
*
* \param f fiber to be woken up
*/
diff --git a/test/app-tap/gh-6043-fiber-wakeup-self.test.lua b/test/app-tap/gh-6043-fiber-wakeup-self.test.lua
new file mode 100755
index 000000000..082c1228d
--- /dev/null
+++ b/test/app-tap/gh-6043-fiber-wakeup-self.test.lua
@@ -0,0 +1,50 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+local fiber = require('fiber')
+
+--
+-- gh-6043: fiber.wakeup() on self could lead to a crash if followed by
+-- fiber.create(). Also it could lead to spurious wakeup if followed by a sleep
+-- with non-zero timeout, or by, for example, an infinite yield such as WAL
+-- write.
+--
+
+local function test_wakeup_self_and_call(test)
+ test:plan(3)
+
+ local self = fiber.self()
+ fiber.wakeup(self)
+ local count = 0
+ local f = fiber.create(function()
+ count = count + 1
+ fiber.yield()
+ count = count + 1
+ end)
+ test:is(count, 1, 'created a fiber')
+ fiber.yield()
+ test:is(count, 2, 'it yielded')
+ test:is(f:status(), 'dead', 'and ended')
+end
+
+local function test_wakeup_self_and_wal_write(test)
+ test:plan(1)
+
+ local s = box.schema.create_space('test')
+ s:create_index('pk')
+
+ fiber.wakeup(fiber.self())
+ local lsn = box.info.lsn
+ s:replace{1}
+ test:is(box.info.lsn, lsn + 1, 'written to WAL')
+ s:drop()
+end
+
+box.cfg{}
+
+local test = tap.test('gh-6043-fiber-wakeup-self')
+test:plan(2)
+test:test('wakeup self + call', test_wakeup_self_and_call)
+test:test('wakeup self + wal write', test_wakeup_self_and_wal_write)
+
+os.exit(test:check() and 0 or 1)
diff --git a/test/unit/fiber.cc b/test/unit/fiber.cc
index 9c1a23bdd..a1e3ded87 100644
--- a/test/unit/fiber.cc
+++ b/test/unit/fiber.cc
@@ -138,8 +138,7 @@ fiber_join_test()
fiber_set_joinable(fiber, true);
fiber_wakeup(fiber);
/** Let the fiber schedule */
- fiber_wakeup(fiber());
- fiber_yield();
+ fiber_reschedule();
note("by this time the fiber should be dead already");
fiber_cancel(fiber);
fiber_join(fiber);
@@ -201,12 +200,42 @@ fiber_name_test()
footer();
}
+static void
+fiber_wakeup_self_test()
+{
+ header();
+
+ struct fiber *f = fiber();
+
+ fiber_wakeup(f);
+ double duration = 0.001;
+ uint64_t t1 = fiber_clock64();
+ fiber_sleep(duration);
+ uint64_t t2 = fiber_clock64();
+ /*
+ * It was a real sleep, not 0 duration. Wakeup is nop on the running
+ * fiber.
+ */
+ assert(t2 - t1 >= duration);
+
+ /*
+ * Wakeup + start of a new fiber. This is different from yield but
+ * works without crashes too.
+ */
+ struct fiber *newf = fiber_new_xc("nop", noop_f);
+ fiber_wakeup(f);
+ fiber_start(newf);
+
+ footer();
+}
+
static int
main_f(va_list ap)
{
fiber_name_test();
fiber_join_test();
fiber_stack_test();
+ fiber_wakeup_self_test();
ev_break(loop(), EVBREAK_ALL);
return 0;
}
diff --git a/test/unit/fiber.result b/test/unit/fiber.result
index a61e0a2b8..36bf2a599 100644
--- a/test/unit/fiber.result
+++ b/test/unit/fiber.result
@@ -17,3 +17,5 @@ SystemError Failed to allocate 42 bytes in allocator for exception: Cannot alloc
# normal-stack fiber not crashed
# big-stack fiber not crashed
*** fiber_stack_test: done ***
+ *** fiber_wakeup_self_test ***
+ *** fiber_wakeup_self_test: done ***
diff --git a/test/unit/fiber_stress.cc b/test/unit/fiber_stress.cc
index 44ab12487..aa892d0a0 100644
--- a/test/unit/fiber_stress.cc
+++ b/test/unit/fiber_stress.cc
@@ -9,10 +9,8 @@ enum {
static int
yield_f(va_list ap)
{
- for (int i = 0; i < ITERATIONS; i++) {
- fiber_wakeup(fiber());
- fiber_yield();
- }
+ for (int i = 0; i < ITERATIONS; i++)
+ fiber_reschedule();
return 0;
}
--
2.24.3 (Apple Git-128)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Tarantool-patches] [PATCH 2/2] fiber: use wakeup safely on self everywhere
2021-04-23 23:15 [Tarantool-patches] [PATCH 0/2] fiber_wakeup() nop on self Vladislav Shpilevoy via Tarantool-patches
2021-04-23 23:15 ` [Tarantool-patches] [PATCH 1/2] fiber: make wakeup in Lua and C " Vladislav Shpilevoy via Tarantool-patches
@ 2021-04-23 23:15 ` Vladislav Shpilevoy via Tarantool-patches
2021-04-25 9:23 ` Serge Petrenko via Tarantool-patches
2021-04-26 21:56 ` [Tarantool-patches] [PATCH 0/2] fiber_wakeup() nop on self Cyrill Gorcunov via Tarantool-patches
2021-04-27 12:03 ` Kirill Yukhin via Tarantool-patches
3 siblings, 1 reply; 9+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-04-23 23:15 UTC (permalink / raw)
To: tarantool-patches, gorcunov, sergepetrenko
The previous commit made fiber_wakeup() safe to use on the current
fiber. Leverage the new behaviour everywhere in the source code to
remove all checks f != fiber() before fiber_wakeup(f) calls.
Follow-up #5292
---
src/box/applier.cc | 4 +---
src/box/journal.c | 6 ++++++
src/box/journal.h | 7 +++++++
src/box/raft.c | 14 +++-----------
src/box/txn.c | 6 ++----
src/box/txn_limbo.c | 14 +-------------
src/lib/swim/swim.c | 1 -
7 files changed, 20 insertions(+), 32 deletions(-)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index dc05c91d3..33181fdbf 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -792,9 +792,7 @@ apply_synchro_row_cb(struct journal_entry *entry)
txn_limbo_process(&txn_limbo, synchro_entry->req);
trigger_run(&replicaset.applier.on_wal_write, NULL);
}
- /* The fiber is the same on final join. */
- if (synchro_entry->owner != fiber())
- fiber_wakeup(synchro_entry->owner);
+ fiber_wakeup(synchro_entry->owner);
}
/** Process a synchro request. */
diff --git a/src/box/journal.c b/src/box/journal.c
index 886a15139..df491610a 100644
--- a/src/box/journal.c
+++ b/src/box/journal.c
@@ -63,6 +63,12 @@ journal_entry_new(size_t n_rows, struct region *region,
return entry;
}
+void
+journal_entry_fiber_wakeup_cb(struct journal_entry *entry)
+{
+ fiber_wakeup(entry->complete_data);
+}
+
void
journal_queue_wakeup(void)
{
diff --git a/src/box/journal.h b/src/box/journal.h
index 8f3d56a61..4ab7e8afb 100644
--- a/src/box/journal.h
+++ b/src/box/journal.h
@@ -112,6 +112,13 @@ journal_entry_new(size_t n_rows, struct region *region,
journal_write_async_f write_async_cb,
void *complete_data);
+/**
+ * Treat complete_data like a fiber pointer and wake it up when journal write is
+ * done.
+ */
+void
+journal_entry_fiber_wakeup_cb(struct journal_entry *entry);
+
struct journal_queue {
/** Maximal size of entries enqueued in journal (in bytes). */
int64_t max_size;
diff --git a/src/box/raft.c b/src/box/raft.c
index 47c869712..6b52c9876 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -155,8 +155,7 @@ box_raft_schedule_async(struct raft *raft)
* adapted to this. Also don't wakeup the current fiber - it leads to
* undefined behaviour.
*/
- if ((box_raft_worker->flags & FIBER_IS_CANCELLABLE) != 0 &&
- fiber() != box_raft_worker)
+ if ((box_raft_worker->flags & FIBER_IS_CANCELLABLE) != 0)
fiber_wakeup(box_raft_worker);
box_raft_has_work = true;
}
@@ -279,13 +278,6 @@ box_raft_broadcast(struct raft *raft, const struct raft_msg *msg)
relay_push_raft(replica->relay, &req);
}
-/** Wakeup Raft state writer fiber waiting for WAL write end. */
-static void
-box_raft_write_cb(struct journal_entry *entry)
-{
- fiber_wakeup(entry->complete_data);
-}
-
static void
box_raft_write(struct raft *raft, const struct raft_msg *msg)
{
@@ -307,8 +299,8 @@ box_raft_write(struct raft *raft, const struct raft_msg *msg)
if (xrow_encode_raft(&row, region, &req) != 0)
goto fail;
- journal_entry_create(entry, 1, xrow_approx_len(&row), box_raft_write_cb,
- fiber());
+ journal_entry_create(entry, 1, xrow_approx_len(&row),
+ journal_entry_fiber_wakeup_cb, fiber());
/*
* A non-cancelable fiber is considered non-wake-able, generally. Raft
diff --git a/src/box/txn.c b/src/box/txn.c
index 03b39e0de..07ab131a2 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -504,9 +504,7 @@ txn_free_or_wakeup(struct txn *txn)
txn_free(txn);
else {
txn_set_flags(txn, TXN_IS_DONE);
- if (txn->fiber != fiber())
- /* Wake a waiting fiber up. */
- fiber_wakeup(txn->fiber);
+ fiber_wakeup(txn->fiber);
}
}
@@ -578,7 +576,7 @@ txn_on_journal_write(struct journal_entry *entry)
txn_run_wal_write_triggers(txn);
if (!txn_has_flag(txn, TXN_WAIT_SYNC))
txn_complete_success(txn);
- else if (txn->fiber != NULL && txn->fiber != fiber())
+ else if (txn->fiber != NULL)
fiber_wakeup(txn->fiber);
finish:
fiber_set_txn(fiber(), NULL);
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index c22bd6665..5b23d1bb1 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -306,18 +306,6 @@ complete:
return 0;
}
-/**
- * A callback for synchronous write: txn_limbo_write_synchro fiber
- * waiting to proceed once a record is written to WAL.
- */
-static void
-txn_limbo_write_cb(struct journal_entry *entry)
-{
- assert(entry->complete_data != NULL);
- if (fiber() != entry->complete_data)
- fiber_wakeup(entry->complete_data);
-}
-
static void
txn_limbo_write_synchro(struct txn_limbo *limbo, uint16_t type, int64_t lsn,
uint64_t term)
@@ -346,7 +334,7 @@ txn_limbo_write_synchro(struct txn_limbo *limbo, uint16_t type, int64_t lsn,
xrow_encode_synchro(&row, body, &req);
journal_entry_create(entry, 1, xrow_approx_len(&row),
- txn_limbo_write_cb, fiber());
+ journal_entry_fiber_wakeup_cb, fiber());
if (journal_write(entry) != 0 || entry->res < 0) {
diag_set(ClientError, ER_WAL_IO);
diff --git a/src/lib/swim/swim.c b/src/lib/swim/swim.c
index 1ecc90414..3955c3130 100644
--- a/src/lib/swim/swim.c
+++ b/src/lib/swim/swim.c
@@ -2219,7 +2219,6 @@ swim_kill_event_handler(struct swim *swim)
* reused.
*/
swim->event_handler = NULL;
- fiber_wakeup(f);
fiber_cancel(f);
fiber_join(f);
}
--
2.24.3 (Apple Git-128)
^ permalink raw reply [flat|nested] 9+ messages in thread