From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp52.i.mail.ru (smtp52.i.mail.ru [94.100.177.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 43C3E46970F for ; Thu, 21 Nov 2019 01:11:53 +0300 (MSK) References: <4a5e0fa4-02bd-8022-a5a9-32177392b2e8@tarantool.org> <1574213348.532614234@f512.i.mail.ru> From: Vladislav Shpilevoy Message-ID: Date: Wed, 20 Nov 2019 23:18:25 +0100 MIME-Version: 1.0 In-Reply-To: <1574213348.532614234@f512.i.mail.ru> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH 0/2] fix replica iteration issue & stabilize quorum test List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ilya Kosarev Cc: tarantool-patches@dev.tarantool.org On 20/11/2019 02:29, Ilya Kosarev wrote: > Hi! > > Thanks for your review. > > Did u run tests on exactly this patchset or on the branch, > https://github.com/tarantool/tarantool/tree/i.kosarev/gh-4586-fix-quorum-test > which also contains relay: fix join vclock obtainment in > relay_initial_join ? > It is not yet checked in master (and might not get there, as far as > Georgy has alternative fix as a part of sync replication), but > is vital for the stability of the test. > > On my machine (Ubuntu 18.04.3 LTS) quorum test works perfectly on > mentioned branch. I use next bash instruction to run it under 'load': > l=0 ; while ./test-run.py -j20 `for r in {1..64} ; do echo quorum ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done I run it on your branch, and it fails. Even without these complex bash scripts. I run it 5-10 times, and it fails/crashes. > > Anyway, I guess provided problems are not connected with join_vclock > patch but are mac os specific, as far as i can't reproduce them locally. > Guess we have some mac os machines, i will ask for access. Maybe they are not related. But at least the wrong test results mean, that the ticket can't be closed. Because ticket is called: test: flaky segfault on replication/quorum test under high load And it is still flaky. For the crash you can open a new ticket. > > It seems to me that wrong results problem is quite easy to handle. > I have no idea for now how to handle provided segfault, however, i am > sure it has nothing to do with the segfault mentioned in the issue, as > far as it was caused by unsafe iteration of anon replicas. Other wrong > results problems, mentioned in the ticket, are also handled in the > patchset. The problems, mentioned in the ticket, are just comments. Helpers. The ticket is about flakiness. Not about several concrete fails. If the test is still flaky, what is a point of closing this issue and opening exactly the same again? > > Therefore i propose to close this ticket with the provided patchset > although there are some other problems. Then i will open new issue with > error info you provided and start to work on it as soon as i get remote > access to some mac os machine