From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp51.i.mail.ru (smtp51.i.mail.ru [94.100.177.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 8ABF5469719 for ; Thu, 22 Oct 2020 11:55:31 +0300 (MSK) Date: Thu, 22 Oct 2020 11:55:29 +0300 From: "Alexander V. Tikhonov" Message-ID: <20201022085529.GB630047@hpalx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH 0/3] Raft on leader election recovery restart List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Hi Vlad, thanks for the patchset. After your removed one of the patch from it test began to pass [1]. No new degradations found. Patchset LGTM. [1] - https://gitlab.com/tarantool/tarantool/-/pipelines/206144990 On Sat, Oct 17, 2020 at 07:17:54PM +0200, Vladislav Shpilevoy wrote: > There were 2 issues with the relay restarting recovery cursor when the node is > elected as a leader. Fixed in the last 2 commits. First was about local LSN not > being set, second about GC not being propagated. > > The first patch is not related to the bugs above directly. Just was found while > working on this. In theory without the first patch we can get flakiness into > the testes changed in this commit, but only if a replication connection will > break without a reason. > > Additionally, the new test - gh-5433-election-restart-recovery - hangs on my > machine when I start tens of it. All workers, after executing it several times, > hang. But!!! not in something related to the raft - they hang in the first > box.snapshot(), where the election is not even enabled yet. From some debug > prints I see it hangs somewhere in engine_being_checkpoint(), and consumes > ~80% of the CPU. But it may be just a consequence of the corrupted memory on > Mac, due to libeio being broken. Don't know what to do with that now. > > Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5433-raft-leader-recovery-restart > Issue: https://github.com/tarantool/tarantool/issues/5433 > > Vladislav Shpilevoy (3): > raft: send state to new subscribers if Raft worked > raft: use local LSN in relay recovery restart > raft: don't drop GC when restart relay recovery > > src/box/box.cc | 14 +- > src/box/raft.h | 10 + > src/box/relay.cc | 22 ++- > .../gh-5426-election-on-off.result | 59 ++++-- > .../gh-5426-election-on-off.test.lua | 26 ++- > .../gh-5433-election-restart-recovery.result | 174 ++++++++++++++++++ > ...gh-5433-election-restart-recovery.test.lua | 87 +++++++++ > test/replication/suite.cfg | 1 + > 8 files changed, 367 insertions(+), 26 deletions(-) > create mode 100644 test/replication/gh-5433-election-restart-recovery.result > create mode 100644 test/replication/gh-5433-election-restart-recovery.test.lua > > -- > 2.21.1 (Apple Git-122.3) >