From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 29 Oct 2018 14:29:42 +0300 From: Alexander Turenko Subject: Re: [tarantool-patches] Re: [PATCH] test: fix unix socket conflict in socket.test.lua Message-ID: <20181029112942.tmbmf6dcxvqkx7fc@tkn_work_nb> References: <671f12793def4a2c57a9672e851986877616a81f.1540433737.git.alexander.turenko@tarantool.org> <20181025094339.jb3652vrqnlvwpkm@esperanza> <20181029082528.ou2zooi33vcblk6b@tkn_work_nb> <20181029093229.ljq7gcvssleckfvo@esperanza> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20181029093229.ljq7gcvssleckfvo@esperanza> To: Vladimir Davydov Cc: Sergei Voronezhskii , tarantool-patches@freelists.org List-ID: On Mon, Oct 29, 2018 at 12:32:29PM +0300, Vladimir Davydov wrote: > On Mon, Oct 29, 2018 at 11:25:29AM +0300, Alexander Turenko wrote: > > On Thu, Oct 25, 2018 at 12:43:39PM +0300, Vladimir Davydov wrote: > > > On Thu, Oct 25, 2018 at 05:21:45AM +0300, Alexander Turenko wrote: > > > > It is needed to run the test in parallel on several test-run workers to > > > > investigate flaky failures of the test. > > > > > > I don't think I understand. Do you mean that this test doesn't fix the > > > test flakiness and is only needed for further investigation? > > > > > > > Yes (but now I have added some fixes that can eliminate some flaky > > failures). It is often convenient to run something like... > > > > ``` > > TEST_RUN_TESTS=$(for i in $(seq 1 100); do echo -n "app/ "; done) make test > > ``` > > > > ...to prove the suite is ready to run in parallel mode. > > > > > > diff --git a/test/app/socket.result b/test/app/socket.result > > > > index 2f002a37e..1a570b9fa 100644 > > > > --- a/test/app/socket.result > > > > +++ b/test/app/socket.result > > > > @@ -42,6 +42,29 @@ test_run:cmd("push filter '(error: .builtin/.*[.]lua):[0-9]+' to '\\1'") > > > > --- > > > > - true > > > > ... > > > > +test_run:cmd("push filter '(/tmp/tarantool-test-socket)-[0-9]+' to '\\1'") > > > > +--- > > > > +- true > > > > +... > > > > +-- /tmp/tarantool-test-socket-${TEST_RUN_WORKER_ID} > > > > +test_run:cmd("setopt delimiter ';'") > > > > +--- > > > > +- true > > > > +... > > > > +function get_temp_socket_path() > > > > + local base_path = '/tmp/tarantool-test-socket' > > > > + local worker_id = os.getenv('TEST_RUN_WORKER_ID') > > > > + if not worker_id then > > > > + return base_path > > > > + end > > > > + return ('%s-%s'):format(base_path, worker_id) > > > > +end; > > > > > > Why don't you simply create the file in the worker directory? > > > > Sure, I can. Rewritten the test. But socket.skipcond still uses > > /tmp/tarantool-test-socket-${WORKER_ID}, because its working directory > > is `test`. > > > > The whole new patch is below. > > > > commit 224fcdf2a5e8f84971253f1d585b94213d06a384 > > Author: Alexander Turenko > > Date: Thu Oct 25 04:06:46 2018 +0300 > > > > test: fix unix socket conflict in socket.test.lua > > > > Increased socket.readable / socket.wait timeouts. > > > > Rewritten port choosing: repeat bind+listen until succeeds, exclude > > incorrect port 65536 from the range. > > > > All these changes are needed to run the test in parallel on several > > test-run workers to investigate flaky failures of the test / of the > > suite. Some of these changes can also eliminate possible flaky failures. > > ./test-run -j -1 app/socket.test.lua doesn't work with this patch: > > } app/socket.test.lua > } Test.run() received the following error: > } Traceback (most recent call last): > } File "/home/vlad/src/tarantool/test-run/lib/test.py", line 174, in run > } execfile(self.skip_cond, dict(locals(), **server.__dict__)) > } File "app/socket.skipcond", line 8, in > } worker_id = os.environ['TEST_RUN_WORKER_ID'] > } File "/usr/lib/python2.7/UserDict.py", line 40, in __getitem__ > } raise KeyError(key) > } KeyError: 'TEST_RUN_WORKER_ID' > > I think you shouldn't use TEST_RUN_WORKER_ID in skipcond after all. > Instead you should make a temporary file. Thanks for catching it up. Fixed. See the patch at end of the email. > Also, I've seen the following failure once. Not sure if it's related: > > } [001] app/socket.test.lua [ fail ] > } [001] > } [001] Test failed! Result content mismatch: > } [001] --- app/socket.result Mon Oct 29 12:06:39 2018 > } [001] +++ socket.reject Mon Oct 29 12:13:34 2018 > } [001] @@ -2810,7 +2810,7 @@ > } [001] ... > } [001] echo_fiber ~= nil > } [001] --- > } [001] -- true > } [001] +- false > } [001] ... > } [001] client:write('hello') > } [001] --- > } [001] I've observed some fails too, but when run the test, say, 500 times on 16 workers. I think it is now stable enough to unblock the testing scenario like `./test-run.py app/ app/ app/ app/ app/` at least on Linux. (I've a way more frequent fails on Mac OS.) Changes: diff --git a/test/app/socket.skipcond b/test/app/socket.skipcond index c7d03a681..89a293f81 100644 --- a/test/app/socket.skipcond +++ b/test/app/socket.skipcond @@ -4,12 +4,10 @@ import re import os.path import socket import os +import tempfile -worker_id = os.environ['TEST_RUN_WORKER_ID'] -test_path = '/tmp/tarantool-test-socket-' + worker_id - -if os.path.exists(test_path): - os.remove(test_path) +test_dir = tempfile.mkdtemp(prefix='tarantool-test-socket') +test_path = os.path.join(test_dir, 'socket') s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) try: @@ -21,3 +19,4 @@ s.close() if os.path.exists(test_path): os.remove(test_path) + os.rmdir(test_dir)