[Tarantool-patches] [PATCH 0/2] fix replica iteration issue & stabilize quorum test

Thu Nov 21 01:18:25 MSK 2019

On 20/11/2019 02:29, Ilya Kosarev wrote:
> Hi!
> 
> Thanks for your review.
> 
> Did u run tests on exactly this patchset or on the branch,
> https://github.com/tarantool/tarantool/tree/i.kosarev/gh-4586-fix-quorum-test
> which also contains relay: fix join vclock obtainment in <https://github.com/tarantool/tarantool/commit/83fa385f2e82911dee1a9189f90b1a94c8455023>
> relay_initial_join <https://github.com/tarantool/tarantool/commit/83fa385f2e82911dee1a9189f90b1a94c8455023>?
> It is not yet checked in master (and might not get there, as far as
> Georgy has alternative fix as a part of sync replication), but
> is vital for the stability of the test.
> 
> On my machine (Ubuntu 18.04.3 LTS) quorum test works perfectly on
> mentioned branch. I use next bash instruction to run it under 'load':
> l=0 ; while ./test-run.py -j20 `for r in {1..64} ; do echo quorum ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done

I run it on your branch, and it fails. Even without these
complex bash scripts. I run it 5-10 times, and it fails/crashes.

> 
> Anyway, I guess provided problems are not connected with join_vclock
> patch but are mac os specific, as far as i can't reproduce them locally.
> Guess we have some mac os machines, i will ask for access.

Maybe they are not related. But at least the wrong test results mean,
that the ticket can't be closed. Because ticket is called:

    test: flaky segfault on replication/quorum test under high load

And it is still flaky.

For the crash you can open a new ticket.

> 
> It seems to me that wrong results problem is quite easy to handle.
> I have no idea for now how to handle provided segfault, however, i am
> sure it has nothing to do with the segfault mentioned in the issue, as
> far as it was caused by unsafe iteration of anon replicas. Other wrong
> results problems, mentioned in the ticket, are also handled in the
> patchset.

The problems, mentioned in the ticket, are just comments. Helpers.
The ticket is about flakiness. Not about several concrete fails. If the
test is still flaky, what is a point of closing this issue and opening
exactly the same again?

> 
> Therefore i propose to close this ticket with the provided patchset
> although there are some other problems. Then i will open new issue with
> error info you provided and start to work on it as soon as i get remote
> access to some mac os machine