[Tarantool-patches] [PATCH v1] test: replication-py/init_storage.test.py

Alexander V. Tikhonov avtikhon at tarantool.org
Wed Dec 2 23:31:42 MSK 2020


Hi Sergey, thanks for the review, please check my comments below.

On Tue, Nov 10, 2020 at 12:06:34PM +0300, Serge Petrenko wrote:
> 
> 08.11.2020 09:24, Alexander V. Tikhonov пишет:
> > Found that test failed in 2 common places when it tried to start the
> > replica and wait it within 'JOIN' either 'SUBSCRIBE' test parts.
> > It used to wait for replica start check the 'wait_until_started()'
> > function 'TarantoolServer' class from test-run repository. But it
> > didn't try resolve connection issues on replica creation, like:
> > 
> >    [30534] main/103/replica I> connecting to 1 replicas
> >    [30534] main/112/applier/localhost:49168 I> can't connect to master
> >    [30534] main/112/applier/localhost:49168 sio.c:208 !> SystemError connect to 127.0.0.1:49168, called on fd 27, aka 127.0.0.1:47954: Connection refused
> >    [30534] main/112/applier/localhost:49168 I> will retry every 0.10 second
> >    [30534] main/112/applier/localhost:49168 I> remote master c5d480c3-219c-11eb-ac14-080027727614 at 127.0.0.1:49168 running Tarantool 2.7.0
> >    [30534] main/103/replica I> connected to 1 replicas
> >    [30534] main/103/replica I> bootstrapping replica from c5d480c3-219c-11eb-ac14-080027727614 at 127.0.0.1:49168
> >    [30534] main/112/applier/localhost:49168 I> can't read row
> >    [30534] main/112/applier/localhost:49168 box.cc:183 E> ER_READONLY: Can't modify data because this instance is in read-only mode.
> >    [30534] main/103/replica box.cc:183 E> ER_READONLY: Can't modify data because this instance is in read-only mode.
> >    [30534] main/103/replica F> can't initialize storage: Can't modify data because this instance is in read-only mode.
> >    [30534] main/103/replica F> can't initialize storage: Can't modify data because this instance is in read-only mode.
> 
> 
> Hi! Thanks for the patch!
> 
> Please find my comments below.
> 
> 
> > 
> > To resolve it the test was changed to be able to catch exception
> > 'TarantoolStartError' from test-run. Also the test should have the
> > ability to be restarted by test-run using fragile list and in this way
> > 'crash_expected' flag was disabled to let the test fail with exception.
> 
> You're saying `crash_expected` is disabled, but you enable it in your patch.
> 
>

Right, corrected.

> > 
> > Needed by #4949
> > ---
> > 
> > Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4949
> > Issue: https://github.com/tarantool/tarantool/issues/4949
> > 
> >   test/replication-py/init_storage.test.py | 27 ++++++++++++++++++++----
> >   test/replication-py/suite.ini            |  5 +++--
> >   2 files changed, 26 insertions(+), 6 deletions(-)
> > 
> > diff --git a/test/replication-py/init_storage.test.py b/test/replication-py/init_storage.test.py
> > index 4be531f8d..a830821cd 100644
> > --- a/test/replication-py/init_storage.test.py
> > +++ b/test/replication-py/init_storage.test.py
> > @@ -1,5 +1,6 @@
> >   import os
> >   import glob
> > +from lib.tarantool_server import TarantoolStartError
> >   from lib.tarantool_server import TarantoolServer
> >   # master server
> > @@ -64,8 +65,17 @@ replica.deploy(wait=False)
> >   print 'waiting reconnect on JOIN...'
> >   server.start()
> > -replica.wait_until_started()
> > -print 'ok'
> > +try:
> > +    # Replica may fail to start due connection issues may occur, check
> > +    # gh-4949. Also the test should have the ability to be restarted by
> > +    # test-run using fragile list and in this way 'crash_expected' flag
> > +    # should be disabled to let the test fail with exception.
> > +    replica.crash_expected = True
> > +    replica.wait_until_started()
> > +except TarantoolStartError:
> > +    print 'not ok - server failed to start'
> > +else:
> > +    print 'ok'
> 
> You're saying that crash_expected flag should be disabled,
> but you're enabling it.
>

Right, corrected.

> What happens if you don't catch the exception?
> Is this added to make the test produce the same output on
> multiple failures?
> 

This test uses test-run internal functions and using it, it should
analyze its results in the same way as test-run does. So this exception
is needed to be catched to avoid of test crash. It helps the test to
finish it's testing with wrong result and have the ability to be rerun
using checksum checker on found issue.

> >   replica.stop()
> >   server.stop()
> > @@ -73,8 +83,17 @@ server.stop()
> >   print 'waiting reconnect on SUBSCRIBE...'
> >   replica.start(wait=False)
> >   server.start()
> > -replica.wait_until_started()
> > -print 'ok'
> > +try:
> > +    # Replica may fail to start due connection issues may occur, check
> > +    # gh-4949. Also the test should have the ability to be restarted by
> > +    # test-run using fragile list and in this way 'crash_expected' flag
> > +    # should be disabled to let the test fail with exception.
> > +    replica.crash_expected = True
> > +    replica.wait_until_started()
> > +except TarantoolStartError:
> > +    print 'not ok - server failed to start'
> > +else:
> > +    print 'ok'
> >   replica.stop()
> >   replica.cleanup()
> > diff --git a/test/replication-py/suite.ini b/test/replication-py/suite.ini
> > index b563b9fca..db4403457 100644
> > --- a/test/replication-py/suite.ini
> > +++ b/test/replication-py/suite.ini
> > @@ -4,10 +4,11 @@ script =  master.lua
> >   description = tarantool/box, replication
> >   is_parallel = True
> >   fragile = {
> > -    "retries": 10,
> > +    "retries": 50,
> 
> Why 50 retries?  Don't we have a rule that only 10 retries  are allowed?
> I understand that 10 retries aren't sufficient sometimes, but still. 50 is
> too much.
> 

Right, I've reverted it back to 10.

> I guess we should teach test_run to retry starting a server instead of
> restarting the
> whole test, when the problem is only with replica start.
> 

This patch is not fixing the issue, but is needed to resolve the issue
with crashing test. The issue gh-4949 will be left opened and as the
next fix we can implement your suggestion to completely fix the test.

> >       "tests": {
> >           "init_storage.test.py": {
> > -            "issues": [ "gh-4949" ]
> > +            "issues": [ "gh-4949" ],
> > +            "checksums": [ "9b4235bb6bb9d76aa6a1f7dc8f088075", "4c5fc871955a3166d67fbfa9f254f68a", "bc2781acdb5745d01da2f533a0d519f9" ]
> >           },
> >           "conflict.test.py": {
> >               "issues": [ "gh-4980" ]
> 
> -- 
> Serge Petrenko
> 


More information about the Tarantool-patches mailing list