From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 4D2E9469719 for ; Sat, 17 Oct 2020 20:17:59 +0300 (MSK) From: Vladislav Shpilevoy Date: Sat, 17 Oct 2020 19:17:54 +0200 Message-Id: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH 0/3] Raft on leader election recovery restart List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org There were 2 issues with the relay restarting recovery cursor when the node is elected as a leader. Fixed in the last 2 commits. First was about local LSN not being set, second about GC not being propagated. The first patch is not related to the bugs above directly. Just was found while working on this. In theory without the first patch we can get flakiness into the testes changed in this commit, but only if a replication connection will break without a reason. Additionally, the new test - gh-5433-election-restart-recovery - hangs on my machine when I start tens of it. All workers, after executing it several times, hang. But!!! not in something related to the raft - they hang in the first box.snapshot(), where the election is not even enabled yet. From some debug prints I see it hangs somewhere in engine_being_checkpoint(), and consumes ~80% of the CPU. But it may be just a consequence of the corrupted memory on Mac, due to libeio being broken. Don't know what to do with that now. Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5433-raft-leader-recovery-restart Issue: https://github.com/tarantool/tarantool/issues/5433 Vladislav Shpilevoy (3): raft: send state to new subscribers if Raft worked raft: use local LSN in relay recovery restart raft: don't drop GC when restart relay recovery src/box/box.cc | 14 +- src/box/raft.h | 10 + src/box/relay.cc | 22 ++- .../gh-5426-election-on-off.result | 59 ++++-- .../gh-5426-election-on-off.test.lua | 26 ++- .../gh-5433-election-restart-recovery.result | 174 ++++++++++++++++++ ...gh-5433-election-restart-recovery.test.lua | 87 +++++++++ test/replication/suite.cfg | 1 + 8 files changed, 367 insertions(+), 26 deletions(-) create mode 100644 test/replication/gh-5433-election-restart-recovery.result create mode 100644 test/replication/gh-5433-election-restart-recovery.test.lua -- 2.21.1 (Apple Git-122.3)