From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp60.i.mail.ru (smtp60.i.mail.ru [217.69.128.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 9EDE6469719 for ; Tue, 17 Nov 2020 00:06:56 +0300 (MSK) Date: Mon, 16 Nov 2020 21:06:54 +0000 From: Nikita Pettik Message-ID: <20201116210654.GA13996@tarantool.org> References: <5b204cb5b32bb96c6a2cec2b06f399b360137250.1605465603.git.avtikhon@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <5b204cb5b32bb96c6a2cec2b06f399b360137250.1605465603.git.avtikhon@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v1] test: fix hang of vinyl/select_consistency.* test List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Alexander V. Tikhonov" Cc: tarantool-patches@dev.tarantool.org On 15 Nov 21:43, Alexander V. Tikhonov wrote: > > It happened because on heavy loaded hosts may occure the situation > when the previous snapshot was inprogress when the new snapshot came > with the same file name *.snap.inprogress. It happens before the > current snapshot completed and printed "dump completed" in log file. > Also this file *.snap.inprogress was seen left on manual debug, when > the test hanged before this patch. To resolve the test issue fiber > sleep delay after it can be increased, but we want to save the issue > reproducable. The current patch corrects the test to avoid of hang on I guess increasing sleep and fileing exact repro with 'bug' label is enough to deal with this test. > box snapshot, to be able to continue testing after it failed. Fiber > sleep was even decreased after adding fiber for box.snapshot to be > able to reproduce the issue. > > Needed for #4385 > --- > @@ -75,8 +80,13 @@ end; > ... > function snap_loop() > while not stop do > - box.snapshot() > - fiber.sleep(0.1) > + local ok, err = fiber.create(function() local ok, err = pcall(box.snapshot) return ok, err end) Why not simply wrap box.snapshot() in pcall? Why do you need another one separate fiber for it? > @@ -97,10 +107,7 @@ _ = fiber.create(dml_loop); > _ = fiber.create(snap_loop); > --- > ... > -failed = {}; > ---- > -... > -for i = 1, 10000 do > +function run_iter() I'd call it select_loop() (as dml/snap loops above). > local val = math.random(MAX_VAL) > box.begin() > local res1 = s.index.i1:select({val}) > @@ -117,25 +124,37 @@ for i = 1, 10000 do > end > end > if not found then > + log.error("error: equal not found for #res1 = " .. #res1 .. ", #res2 = " .. #res2) What's the point of printing number of elements in res1/res2? Why not print the whole result set? > +end; > +--- > +... > +for i = 1, 10000 do > + if failed or not run_iter(i) then > + log.error("error: failed on iteration " .. i) > + failed = true > + break > + end > end; > +function check_get() > + for i = 1, ch:size() do > + if not test_run:wait_cond(function() return ch:get() ~= nil end) then As I see from docs, ch:get() has optional timeout param. Why not use it? I mean not only this particular test, but in general. > + log.error("error: hanged on ch:get() on iteration " .. i) > + return false > + end > + end > + return true > end; > --- > ... > @@ -143,10 +162,14 @@ test_run:cmd("setopt delimiter ''"); > --- > - true > ... > -#failed == 0 or failed > +test_run:wait_cond(function() return check_get() end) Why do you need another one wait_cond over check_get?