[Tarantool-patches] [PATCH v1] test: replication-py/init_storage.test.py

Tue Nov 10 12:06:34 MSK 2020

08.11.2020 09:24, Alexander V. Tikhonov пишет:
> Found that test failed in 2 common places when it tried to start the
> replica and wait it within 'JOIN' either 'SUBSCRIBE' test parts.
> It used to wait for replica start check the 'wait_until_started()'
> function 'TarantoolServer' class from test-run repository. But it
> didn't try resolve connection issues on replica creation, like:
>
>    [30534] main/103/replica I> connecting to 1 replicas
>    [30534] main/112/applier/localhost:49168 I> can't connect to master
>    [30534] main/112/applier/localhost:49168 sio.c:208 !> SystemError connect to 127.0.0.1:49168, called on fd 27, aka 127.0.0.1:47954: Connection refused
>    [30534] main/112/applier/localhost:49168 I> will retry every 0.10 second
>    [30534] main/112/applier/localhost:49168 I> remote master c5d480c3-219c-11eb-ac14-080027727614 at 127.0.0.1:49168 running Tarantool 2.7.0
>    [30534] main/103/replica I> connected to 1 replicas
>    [30534] main/103/replica I> bootstrapping replica from c5d480c3-219c-11eb-ac14-080027727614 at 127.0.0.1:49168
>    [30534] main/112/applier/localhost:49168 I> can't read row
>    [30534] main/112/applier/localhost:49168 box.cc:183 E> ER_READONLY: Can't modify data because this instance is in read-only mode.
>    [30534] main/103/replica box.cc:183 E> ER_READONLY: Can't modify data because this instance is in read-only mode.
>    [30534] main/103/replica F> can't initialize storage: Can't modify data because this instance is in read-only mode.
>    [30534] main/103/replica F> can't initialize storage: Can't modify data because this instance is in read-only mode.

Hi! Thanks for the patch!

Please find my comments below.

>
> To resolve it the test was changed to be able to catch exception
> 'TarantoolStartError' from test-run. Also the test should have the
> ability to be restarted by test-run using fragile list and in this way
> 'crash_expected' flag was disabled to let the test fail with exception.

You're saying `crash_expected` is disabled, but you enable it in your patch.

>
> Needed by #4949
> ---
>
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4949
> Issue: https://github.com/tarantool/tarantool/issues/4949
>
>   test/replication-py/init_storage.test.py | 27 ++++++++++++++++++++----
>   test/replication-py/suite.ini            |  5 +++--
>   2 files changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/test/replication-py/init_storage.test.py b/test/replication-py/init_storage.test.py
> index 4be531f8d..a830821cd 100644
> --- a/test/replication-py/init_storage.test.py
> +++ b/test/replication-py/init_storage.test.py
> @@ -1,5 +1,6 @@
>   import os
>   import glob
> +from lib.tarantool_server import TarantoolStartError
>   from lib.tarantool_server import TarantoolServer
>   
>   # master server
> @@ -64,8 +65,17 @@ replica.deploy(wait=False)
>   
>   print 'waiting reconnect on JOIN...'
>   server.start()
> -replica.wait_until_started()
> -print 'ok'
> +try:
> +    # Replica may fail to start due connection issues may occur, check
> +    # gh-4949. Also the test should have the ability to be restarted by
> +    # test-run using fragile list and in this way 'crash_expected' flag
> +    # should be disabled to let the test fail with exception.
> +    replica.crash_expected = True
> +    replica.wait_until_started()
> +except TarantoolStartError:
> +    print 'not ok - server failed to start'
> +else:
> +    print 'ok'

You're saying that crash_expected flag should be disabled,
but you're enabling it.

What happens if you don't catch the exception?
Is this added to make the test produce the same output on
multiple failures?

>   
>   replica.stop()
>   server.stop()
> @@ -73,8 +83,17 @@ server.stop()
>   print 'waiting reconnect on SUBSCRIBE...'
>   replica.start(wait=False)
>   server.start()
> -replica.wait_until_started()
> -print 'ok'
> +try:
> +    # Replica may fail to start due connection issues may occur, check
> +    # gh-4949. Also the test should have the ability to be restarted by
> +    # test-run using fragile list and in this way 'crash_expected' flag
> +    # should be disabled to let the test fail with exception.
> +    replica.crash_expected = True
> +    replica.wait_until_started()
> +except TarantoolStartError:
> +    print 'not ok - server failed to start'
> +else:
> +    print 'ok'
>   
>   replica.stop()
>   replica.cleanup()
> diff --git a/test/replication-py/suite.ini b/test/replication-py/suite.ini
> index b563b9fca..db4403457 100644
> --- a/test/replication-py/suite.ini
> +++ b/test/replication-py/suite.ini
> @@ -4,10 +4,11 @@ script =  master.lua
>   description = tarantool/box, replication
>   is_parallel = True
>   fragile = {
> -    "retries": 10,
> +    "retries": 50,

Why 50 retries?  Don't we have a rule that only 10 retries  are allowed?
I understand that 10 retries aren't sufficient sometimes, but still. 50 
is too much.

I guess we should teach test_run to retry starting a server instead of 
restarting the
whole test, when the problem is only with replica start.

>       "tests": {
>           "init_storage.test.py": {
> -            "issues": [ "gh-4949" ]
> +            "issues": [ "gh-4949" ],
> +            "checksums": [ "9b4235bb6bb9d76aa6a1f7dc8f088075", "4c5fc871955a3166d67fbfa9f254f68a", "bc2781acdb5745d01da2f533a0d519f9" ]
>           },
>           "conflict.test.py": {
>               "issues": [ "gh-4980" ]

-- 
Serge Petrenko