[Tarantool-patches] [PATCH 1/1] limbo: don't wake self fiber on CONFIRM write

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Sep 11 00:53:57 MSK 2020


During recovery WAL writes end immediately, without yields.
Therefore WAL write completion callback is executed in the
currently active fiber.

Txn limbo on CONFIRM WAL write wakes up the waiting fiber, which
appears to be the same as the active fiber during recovery.

That breaks the fiber scheduler, because apparently it is not safe
to wake the currently active fiber unless it is going to call
fiber_yield() immediately after. See a comment in fiber_wakeup()
implementation about that way of usage.

The patch simply stops waking the waiting fiber, if it is the
currently active one.

Closes #5288
Closes #5232
---
Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5288-limbo-wakeup
Issue: https://github.com/tarantool/tarantool/issues/5288
Issue: https://github.com/tarantool/tarantool/issues/5232

 src/box/txn_limbo.c                           |  3 ++-
 .../replication/gh-5288-qsync-recovery.result | 27 +++++++++++++++++++
 .../gh-5288-qsync-recovery.test.lua           | 11 ++++++++
 3 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 test/replication/gh-5288-qsync-recovery.result
 create mode 100644 test/replication/gh-5288-qsync-recovery.test.lua

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 2ff3bdd05..3655338d8 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -281,7 +281,8 @@ static void
 txn_limbo_write_cb(struct journal_entry *entry)
 {
 	assert(entry->complete_data != NULL);
-	fiber_wakeup(entry->complete_data);
+	if (fiber() != entry->complete_data)
+		fiber_wakeup(entry->complete_data);
 }
 
 static void
diff --git a/test/replication/gh-5288-qsync-recovery.result b/test/replication/gh-5288-qsync-recovery.result
new file mode 100644
index 000000000..dc0babef6
--- /dev/null
+++ b/test/replication/gh-5288-qsync-recovery.result
@@ -0,0 +1,27 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+--
+-- gh-5288: transaction limbo could crash during recovery, because in WAL write
+-- completion callback it woken up the currently active fiber.
+--
+s = box.schema.space.create('sync', {is_sync = true})
+ | ---
+ | ...
+_ = s:create_index('pk')
+ | ---
+ | ...
+s:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+test_run:cmd('restart server default')
+ | 
+box.space.sync:drop()
+ | ---
+ | ...
diff --git a/test/replication/gh-5288-qsync-recovery.test.lua b/test/replication/gh-5288-qsync-recovery.test.lua
new file mode 100644
index 000000000..00bff7b87
--- /dev/null
+++ b/test/replication/gh-5288-qsync-recovery.test.lua
@@ -0,0 +1,11 @@
+test_run = require('test_run').new()
+--
+-- gh-5288: transaction limbo could crash during recovery, because in WAL write
+-- completion callback it woken up the currently active fiber.
+--
+s = box.schema.space.create('sync', {is_sync = true})
+_ = s:create_index('pk')
+s:insert{1}
+box.snapshot()
+test_run:cmd('restart server default')
+box.space.sync:drop()
-- 
2.21.1 (Apple Git-122.3)



More information about the Tarantool-patches mailing list