Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v1] test: replication-py/init_storage.test.py
@ 2020-11-08  6:24 Alexander V. Tikhonov
  2020-11-10  9:06 ` Serge Petrenko
  0 siblings, 1 reply; 3+ messages in thread
From: Alexander V. Tikhonov @ 2020-11-08  6:24 UTC (permalink / raw)
  To: Kirill Yukhin, Serge Petrenko; +Cc: tarantool-patches

Found that test failed in 2 common places when it tried to start the
replica and wait it within 'JOIN' either 'SUBSCRIBE' test parts.
It used to wait for replica start check the 'wait_until_started()'
function 'TarantoolServer' class from test-run repository. But it
didn't try resolve connection issues on replica creation, like:

  [30534] main/103/replica I> connecting to 1 replicas
  [30534] main/112/applier/localhost:49168 I> can't connect to master
  [30534] main/112/applier/localhost:49168 sio.c:208 !> SystemError connect to 127.0.0.1:49168, called on fd 27, aka 127.0.0.1:47954: Connection refused
  [30534] main/112/applier/localhost:49168 I> will retry every 0.10 second
  [30534] main/112/applier/localhost:49168 I> remote master c5d480c3-219c-11eb-ac14-080027727614 at 127.0.0.1:49168 running Tarantool 2.7.0
  [30534] main/103/replica I> connected to 1 replicas
  [30534] main/103/replica I> bootstrapping replica from c5d480c3-219c-11eb-ac14-080027727614 at 127.0.0.1:49168
  [30534] main/112/applier/localhost:49168 I> can't read row
  [30534] main/112/applier/localhost:49168 box.cc:183 E> ER_READONLY: Can't modify data because this instance is in read-only mode.
  [30534] main/103/replica box.cc:183 E> ER_READONLY: Can't modify data because this instance is in read-only mode.
  [30534] main/103/replica F> can't initialize storage: Can't modify data because this instance is in read-only mode.
  [30534] main/103/replica F> can't initialize storage: Can't modify data because this instance is in read-only mode.

To resolve it the test was changed to be able to catch exception
'TarantoolStartError' from test-run. Also the test should have the
ability to be restarted by test-run using fragile list and in this way
'crash_expected' flag was disabled to let the test fail with exception.

Needed by #4949
---

Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4949
Issue: https://github.com/tarantool/tarantool/issues/4949

 test/replication-py/init_storage.test.py | 27 ++++++++++++++++++++----
 test/replication-py/suite.ini            |  5 +++--
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/test/replication-py/init_storage.test.py b/test/replication-py/init_storage.test.py
index 4be531f8d..a830821cd 100644
--- a/test/replication-py/init_storage.test.py
+++ b/test/replication-py/init_storage.test.py
@@ -1,5 +1,6 @@
 import os
 import glob
+from lib.tarantool_server import TarantoolStartError
 from lib.tarantool_server import TarantoolServer
 
 # master server
@@ -64,8 +65,17 @@ replica.deploy(wait=False)
 
 print 'waiting reconnect on JOIN...'
 server.start()
-replica.wait_until_started()
-print 'ok'
+try:
+    # Replica may fail to start due connection issues may occur, check
+    # gh-4949. Also the test should have the ability to be restarted by
+    # test-run using fragile list and in this way 'crash_expected' flag
+    # should be disabled to let the test fail with exception.
+    replica.crash_expected = True
+    replica.wait_until_started()
+except TarantoolStartError:
+    print 'not ok - server failed to start'
+else:
+    print 'ok'
 
 replica.stop()
 server.stop()
@@ -73,8 +83,17 @@ server.stop()
 print 'waiting reconnect on SUBSCRIBE...'
 replica.start(wait=False)
 server.start()
-replica.wait_until_started()
-print 'ok'
+try:
+    # Replica may fail to start due connection issues may occur, check
+    # gh-4949. Also the test should have the ability to be restarted by
+    # test-run using fragile list and in this way 'crash_expected' flag
+    # should be disabled to let the test fail with exception.
+    replica.crash_expected = True
+    replica.wait_until_started()
+except TarantoolStartError:
+    print 'not ok - server failed to start'
+else:
+    print 'ok'
 
 replica.stop()
 replica.cleanup()
diff --git a/test/replication-py/suite.ini b/test/replication-py/suite.ini
index b563b9fca..db4403457 100644
--- a/test/replication-py/suite.ini
+++ b/test/replication-py/suite.ini
@@ -4,10 +4,11 @@ script =  master.lua
 description = tarantool/box, replication
 is_parallel = True
 fragile = {
-    "retries": 10,
+    "retries": 50,
     "tests": {
         "init_storage.test.py": {
-            "issues": [ "gh-4949" ]
+            "issues": [ "gh-4949" ],
+            "checksums": [ "9b4235bb6bb9d76aa6a1f7dc8f088075", "4c5fc871955a3166d67fbfa9f254f68a", "bc2781acdb5745d01da2f533a0d519f9" ]
         },
         "conflict.test.py": {
             "issues": [ "gh-4980" ]
-- 
2.25.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Tarantool-patches] [PATCH v1] test: replication-py/init_storage.test.py
  2020-11-08  6:24 [Tarantool-patches] [PATCH v1] test: replication-py/init_storage.test.py Alexander V. Tikhonov
@ 2020-11-10  9:06 ` Serge Petrenko
  2020-12-02 20:31   ` Alexander V. Tikhonov
  0 siblings, 1 reply; 3+ messages in thread
From: Serge Petrenko @ 2020-11-10  9:06 UTC (permalink / raw)
  To: Alexander V. Tikhonov, Kirill Yukhin; +Cc: tarantool-patches


08.11.2020 09:24, Alexander V. Tikhonov пишет:
> Found that test failed in 2 common places when it tried to start the
> replica and wait it within 'JOIN' either 'SUBSCRIBE' test parts.
> It used to wait for replica start check the 'wait_until_started()'
> function 'TarantoolServer' class from test-run repository. But it
> didn't try resolve connection issues on replica creation, like:
>
>    [30534] main/103/replica I> connecting to 1 replicas
>    [30534] main/112/applier/localhost:49168 I> can't connect to master
>    [30534] main/112/applier/localhost:49168 sio.c:208 !> SystemError connect to 127.0.0.1:49168, called on fd 27, aka 127.0.0.1:47954: Connection refused
>    [30534] main/112/applier/localhost:49168 I> will retry every 0.10 second
>    [30534] main/112/applier/localhost:49168 I> remote master c5d480c3-219c-11eb-ac14-080027727614 at 127.0.0.1:49168 running Tarantool 2.7.0
>    [30534] main/103/replica I> connected to 1 replicas
>    [30534] main/103/replica I> bootstrapping replica from c5d480c3-219c-11eb-ac14-080027727614 at 127.0.0.1:49168
>    [30534] main/112/applier/localhost:49168 I> can't read row
>    [30534] main/112/applier/localhost:49168 box.cc:183 E> ER_READONLY: Can't modify data because this instance is in read-only mode.
>    [30534] main/103/replica box.cc:183 E> ER_READONLY: Can't modify data because this instance is in read-only mode.
>    [30534] main/103/replica F> can't initialize storage: Can't modify data because this instance is in read-only mode.
>    [30534] main/103/replica F> can't initialize storage: Can't modify data because this instance is in read-only mode.


Hi! Thanks for the patch!

Please find my comments below.


>
> To resolve it the test was changed to be able to catch exception
> 'TarantoolStartError' from test-run. Also the test should have the
> ability to be restarted by test-run using fragile list and in this way
> 'crash_expected' flag was disabled to let the test fail with exception.

You're saying `crash_expected` is disabled, but you enable it in your patch.


>
> Needed by #4949
> ---
>
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4949
> Issue: https://github.com/tarantool/tarantool/issues/4949
>
>   test/replication-py/init_storage.test.py | 27 ++++++++++++++++++++----
>   test/replication-py/suite.ini            |  5 +++--
>   2 files changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/test/replication-py/init_storage.test.py b/test/replication-py/init_storage.test.py
> index 4be531f8d..a830821cd 100644
> --- a/test/replication-py/init_storage.test.py
> +++ b/test/replication-py/init_storage.test.py
> @@ -1,5 +1,6 @@
>   import os
>   import glob
> +from lib.tarantool_server import TarantoolStartError
>   from lib.tarantool_server import TarantoolServer
>   
>   # master server
> @@ -64,8 +65,17 @@ replica.deploy(wait=False)
>   
>   print 'waiting reconnect on JOIN...'
>   server.start()
> -replica.wait_until_started()
> -print 'ok'
> +try:
> +    # Replica may fail to start due connection issues may occur, check
> +    # gh-4949. Also the test should have the ability to be restarted by
> +    # test-run using fragile list and in this way 'crash_expected' flag
> +    # should be disabled to let the test fail with exception.
> +    replica.crash_expected = True
> +    replica.wait_until_started()
> +except TarantoolStartError:
> +    print 'not ok - server failed to start'
> +else:
> +    print 'ok'

You're saying that crash_expected flag should be disabled,
but you're enabling it.

What happens if you don't catch the exception?
Is this added to make the test produce the same output on
multiple failures?

>   
>   replica.stop()
>   server.stop()
> @@ -73,8 +83,17 @@ server.stop()
>   print 'waiting reconnect on SUBSCRIBE...'
>   replica.start(wait=False)
>   server.start()
> -replica.wait_until_started()
> -print 'ok'
> +try:
> +    # Replica may fail to start due connection issues may occur, check
> +    # gh-4949. Also the test should have the ability to be restarted by
> +    # test-run using fragile list and in this way 'crash_expected' flag
> +    # should be disabled to let the test fail with exception.
> +    replica.crash_expected = True
> +    replica.wait_until_started()
> +except TarantoolStartError:
> +    print 'not ok - server failed to start'
> +else:
> +    print 'ok'
>   
>   replica.stop()
>   replica.cleanup()
> diff --git a/test/replication-py/suite.ini b/test/replication-py/suite.ini
> index b563b9fca..db4403457 100644
> --- a/test/replication-py/suite.ini
> +++ b/test/replication-py/suite.ini
> @@ -4,10 +4,11 @@ script =  master.lua
>   description = tarantool/box, replication
>   is_parallel = True
>   fragile = {
> -    "retries": 10,
> +    "retries": 50,

Why 50 retries?  Don't we have a rule that only 10 retries  are allowed?
I understand that 10 retries aren't sufficient sometimes, but still. 50 
is too much.

I guess we should teach test_run to retry starting a server instead of 
restarting the
whole test, when the problem is only with replica start.

>       "tests": {
>           "init_storage.test.py": {
> -            "issues": [ "gh-4949" ]
> +            "issues": [ "gh-4949" ],
> +            "checksums": [ "9b4235bb6bb9d76aa6a1f7dc8f088075", "4c5fc871955a3166d67fbfa9f254f68a", "bc2781acdb5745d01da2f533a0d519f9" ]
>           },
>           "conflict.test.py": {
>               "issues": [ "gh-4980" ]

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Tarantool-patches] [PATCH v1] test: replication-py/init_storage.test.py
  2020-11-10  9:06 ` Serge Petrenko
@ 2020-12-02 20:31   ` Alexander V. Tikhonov
  0 siblings, 0 replies; 3+ messages in thread
From: Alexander V. Tikhonov @ 2020-12-02 20:31 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches

Hi Sergey, thanks for the review, please check my comments below.

On Tue, Nov 10, 2020 at 12:06:34PM +0300, Serge Petrenko wrote:
> 
> 08.11.2020 09:24, Alexander V. Tikhonov пишет:
> > Found that test failed in 2 common places when it tried to start the
> > replica and wait it within 'JOIN' either 'SUBSCRIBE' test parts.
> > It used to wait for replica start check the 'wait_until_started()'
> > function 'TarantoolServer' class from test-run repository. But it
> > didn't try resolve connection issues on replica creation, like:
> > 
> >    [30534] main/103/replica I> connecting to 1 replicas
> >    [30534] main/112/applier/localhost:49168 I> can't connect to master
> >    [30534] main/112/applier/localhost:49168 sio.c:208 !> SystemError connect to 127.0.0.1:49168, called on fd 27, aka 127.0.0.1:47954: Connection refused
> >    [30534] main/112/applier/localhost:49168 I> will retry every 0.10 second
> >    [30534] main/112/applier/localhost:49168 I> remote master c5d480c3-219c-11eb-ac14-080027727614 at 127.0.0.1:49168 running Tarantool 2.7.0
> >    [30534] main/103/replica I> connected to 1 replicas
> >    [30534] main/103/replica I> bootstrapping replica from c5d480c3-219c-11eb-ac14-080027727614 at 127.0.0.1:49168
> >    [30534] main/112/applier/localhost:49168 I> can't read row
> >    [30534] main/112/applier/localhost:49168 box.cc:183 E> ER_READONLY: Can't modify data because this instance is in read-only mode.
> >    [30534] main/103/replica box.cc:183 E> ER_READONLY: Can't modify data because this instance is in read-only mode.
> >    [30534] main/103/replica F> can't initialize storage: Can't modify data because this instance is in read-only mode.
> >    [30534] main/103/replica F> can't initialize storage: Can't modify data because this instance is in read-only mode.
> 
> 
> Hi! Thanks for the patch!
> 
> Please find my comments below.
> 
> 
> > 
> > To resolve it the test was changed to be able to catch exception
> > 'TarantoolStartError' from test-run. Also the test should have the
> > ability to be restarted by test-run using fragile list and in this way
> > 'crash_expected' flag was disabled to let the test fail with exception.
> 
> You're saying `crash_expected` is disabled, but you enable it in your patch.
> 
>

Right, corrected.

> > 
> > Needed by #4949
> > ---
> > 
> > Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4949
> > Issue: https://github.com/tarantool/tarantool/issues/4949
> > 
> >   test/replication-py/init_storage.test.py | 27 ++++++++++++++++++++----
> >   test/replication-py/suite.ini            |  5 +++--
> >   2 files changed, 26 insertions(+), 6 deletions(-)
> > 
> > diff --git a/test/replication-py/init_storage.test.py b/test/replication-py/init_storage.test.py
> > index 4be531f8d..a830821cd 100644
> > --- a/test/replication-py/init_storage.test.py
> > +++ b/test/replication-py/init_storage.test.py
> > @@ -1,5 +1,6 @@
> >   import os
> >   import glob
> > +from lib.tarantool_server import TarantoolStartError
> >   from lib.tarantool_server import TarantoolServer
> >   # master server
> > @@ -64,8 +65,17 @@ replica.deploy(wait=False)
> >   print 'waiting reconnect on JOIN...'
> >   server.start()
> > -replica.wait_until_started()
> > -print 'ok'
> > +try:
> > +    # Replica may fail to start due connection issues may occur, check
> > +    # gh-4949. Also the test should have the ability to be restarted by
> > +    # test-run using fragile list and in this way 'crash_expected' flag
> > +    # should be disabled to let the test fail with exception.
> > +    replica.crash_expected = True
> > +    replica.wait_until_started()
> > +except TarantoolStartError:
> > +    print 'not ok - server failed to start'
> > +else:
> > +    print 'ok'
> 
> You're saying that crash_expected flag should be disabled,
> but you're enabling it.
>

Right, corrected.

> What happens if you don't catch the exception?
> Is this added to make the test produce the same output on
> multiple failures?
> 

This test uses test-run internal functions and using it, it should
analyze its results in the same way as test-run does. So this exception
is needed to be catched to avoid of test crash. It helps the test to
finish it's testing with wrong result and have the ability to be rerun
using checksum checker on found issue.

> >   replica.stop()
> >   server.stop()
> > @@ -73,8 +83,17 @@ server.stop()
> >   print 'waiting reconnect on SUBSCRIBE...'
> >   replica.start(wait=False)
> >   server.start()
> > -replica.wait_until_started()
> > -print 'ok'
> > +try:
> > +    # Replica may fail to start due connection issues may occur, check
> > +    # gh-4949. Also the test should have the ability to be restarted by
> > +    # test-run using fragile list and in this way 'crash_expected' flag
> > +    # should be disabled to let the test fail with exception.
> > +    replica.crash_expected = True
> > +    replica.wait_until_started()
> > +except TarantoolStartError:
> > +    print 'not ok - server failed to start'
> > +else:
> > +    print 'ok'
> >   replica.stop()
> >   replica.cleanup()
> > diff --git a/test/replication-py/suite.ini b/test/replication-py/suite.ini
> > index b563b9fca..db4403457 100644
> > --- a/test/replication-py/suite.ini
> > +++ b/test/replication-py/suite.ini
> > @@ -4,10 +4,11 @@ script =  master.lua
> >   description = tarantool/box, replication
> >   is_parallel = True
> >   fragile = {
> > -    "retries": 10,
> > +    "retries": 50,
> 
> Why 50 retries?  Don't we have a rule that only 10 retries  are allowed?
> I understand that 10 retries aren't sufficient sometimes, but still. 50 is
> too much.
> 

Right, I've reverted it back to 10.

> I guess we should teach test_run to retry starting a server instead of
> restarting the
> whole test, when the problem is only with replica start.
> 

This patch is not fixing the issue, but is needed to resolve the issue
with crashing test. The issue gh-4949 will be left opened and as the
next fix we can implement your suggestion to completely fix the test.

> >       "tests": {
> >           "init_storage.test.py": {
> > -            "issues": [ "gh-4949" ]
> > +            "issues": [ "gh-4949" ],
> > +            "checksums": [ "9b4235bb6bb9d76aa6a1f7dc8f088075", "4c5fc871955a3166d67fbfa9f254f68a", "bc2781acdb5745d01da2f533a0d519f9" ]
> >           },
> >           "conflict.test.py": {
> >               "issues": [ "gh-4980" ]
> 
> -- 
> Serge Petrenko
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-12-02 20:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-08  6:24 [Tarantool-patches] [PATCH v1] test: replication-py/init_storage.test.py Alexander V. Tikhonov
2020-11-10  9:06 ` Serge Petrenko
2020-12-02 20:31   ` Alexander V. Tikhonov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox