From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp34.i.mail.ru (smtp34.i.mail.ru [94.100.177.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 8B1E345C305 for ; Mon, 7 Dec 2020 15:26:15 +0300 (MSK) Date: Mon, 7 Dec 2020 12:26:14 +0000 From: Nikita Pettik Message-ID: <20201207122614.GB21104@tarantool.org> References: <5b204cb5b32bb96c6a2cec2b06f399b360137250.1605465603.git.avtikhon@tarantool.org> <20201116210654.GA13996@tarantool.org> <20201202185743.GA232692@hpalx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20201202185743.GA232692@hpalx> Subject: Re: [Tarantool-patches] [PATCH v1] test: fix hang of vinyl/select_consistency.* test List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Alexander V. Tikhonov" Cc: tarantool-patches@dev.tarantool.org On 02 Dec 21:57, Alexander V. Tikhonov wrote: > Hi Nikita, thanks for the review. I've made all of your suggestions, > please review below. > > On Mon, Nov 16, 2020 at 09:06:54PM +0000, Nikita Pettik wrote: > > On 15 Nov 21:43, Alexander V. Tikhonov wrote: > > > > > > It happened because on heavy loaded hosts may occure the situation > > > when the previous snapshot was inprogress when the new snapshot came > > > with the same file name *.snap.inprogress. It happens before the > > > current snapshot completed and printed "dump completed" in log file. > > > Also this file *.snap.inprogress was seen left on manual debug, when > > > the test hanged before this patch. To resolve the test issue fiber > > > sleep delay after it can be increased, but we want to save the issue > > > reproducable. The current patch corrects the test to avoid of hang on > > > > I guess increasing sleep and fileing exact repro with 'bug' label is > > enough to deal with this test. > > > > Ok, I've added instructions how to reproduce the issue in the issue and > increased the sleep from 0.1 to 0.5. > > > > box snapshot, to be able to continue testing after it failed. Fiber > > > sleep was even decreased after adding fiber for box.snapshot to be > > > able to reproduce the issue. > > > > > > Needed for #4385 > > > --- > > > @@ -75,8 +80,13 @@ end; > > > ... > > > function snap_loop() > > > while not stop do > > > - box.snapshot() > > > - fiber.sleep(0.1) > > > + local ok, err = fiber.create(function() local ok, err = pcall(box.snapshot) return ok, err end) > > > > Why not simply wrap box.snapshot() in pcall? Why do you need another > > one separate fiber for it? > > Could you please file new issue since it is not longer 'flaky test', but rather vinyl bug? I mean this situation with simultanious instance dumps seems to core issue. Also it would be nice to have reproducer out of test-run environemt, so owner of the bug can run Tarantool, copy-paste reproducer to the console and get mentioned errors.