[Tarantool-patches] [PATCH 1/1] raft: fix crash in worker fiber

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Sat Nov 7 02:43:53 MSK 2020


Raft worker fiber does all the heavy and yielding jobs. These are
2 - disk write, and network broadcast. Disk write yields. Network
broadcast is slow, so it happens at most once per event loop
iteration.

The worker on each iteration check if any of these 2 jobs is
active, and if not, it goes to sleep until an explicit wakeup.

But there was a bug. Before going to sleep it did a yield + a
check that there is nothing to do. However during the yield new
tasks could appear, and the check failed, leading to a crash.

The patch reorganizes this part of the code so now the worker does
not yield between checking new tasks and going to sleep.

No test, because extremely hard to reproduce, and don't want to
clog this part of the code with error injections.
---
Branch: http://github.com/tarantool/tarantool/tree/gerold103/raft-crash

 src/box/raft.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/box/raft.c b/src/box/raft.c
index e1e60ce94..914b0d68f 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -682,11 +682,11 @@ raft_worker_f(va_list args)
 			raft_worker_handle_broadcast();
 			is_idle = false;
 		}
+		if (is_idle) {
+			assert(raft_is_fully_on_disk());
+			fiber_yield();
+		}
 		fiber_sleep(0);
-		if (!is_idle)
-			continue;
-		assert(raft_is_fully_on_disk());
-		fiber_yield();
 	}
 	return 0;
 }
-- 
2.21.1 (Apple Git-122.3)



More information about the Tarantool-patches mailing list