Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-*
@ 2020-09-07 12:13 Alexander V. Tikhonov
  2020-09-09 15:45 ` Serge Petrenko
  2020-09-09 21:12 ` Vladislav Shpilevoy
  0 siblings, 2 replies; 5+ messages in thread
From: Alexander V. Tikhonov @ 2020-09-07 12:13 UTC (permalink / raw)
  To: Kirill Yukhin, Serge Petrenko, Vladislav Shpilevoy; +Cc: tarantool-patches

On heavy loaded hosts found the following issue:

   box.cfg{replication_synchro_quorum = 2}
    | ---
  + | - error: '[string "test_run:wait_cond(function()                ..."]:1: attempt to
  + |     index field ''vclock'' (a nil value)'
    | ...
   test_run:wait_cond(function() return f:status() == 'dead' end)
    | ---
  - | - true
  + | - false
    | ...
   ok, err
    | ---
  - | - true
  - | - [2]
  + | - null
  + | - null
    | ...
   box.space.sync:select{}
    | ---

It happened because replication vclock field was not exist at the moment
of its check. To fix the issue, vclock field had to be waited to be
available using test_run:wait_cond() routine.

Closes #5230
---

Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5230-fix-qsync-write-5195
Issue: https://github.com/tarantool/tarantool/issues/5230

 test/replication/gh-5195-qsync-replica-write.result   | 10 ++++++----
 test/replication/gh-5195-qsync-replica-write.test.lua |  7 +++----
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result
index 3999e8f5e..b47359fde 100644
--- a/test/replication/gh-5195-qsync-replica-write.result
+++ b/test/replication/gh-5195-qsync-replica-write.result
@@ -96,10 +96,12 @@ test_run:wait_downstream(replica_id, {status='follow'})
  | - true
  | ...
 test_run:wait_cond(function()                                                   \
-        local info = box.info.replication[replica_id]                           \
-        local lsn = info.downstream.vclock[replica_id]                          \
-        return lsn and lsn >= replica_lsn                                       \
-end)                                                                            \
+        local lsn = box.info.replication[replica_id].downstream.vclock          \
+        return (lsn and lsn[replica_id] and lsn[replica_id] >= replica_lsn)     \
+    end) or box.info
+ | ---
+ | - true
+ | ...
 
 box.cfg{replication_synchro_quorum = 2}
  | ---
diff --git a/test/replication/gh-5195-qsync-replica-write.test.lua b/test/replication/gh-5195-qsync-replica-write.test.lua
index 2b9909e21..7843ad6c9 100644
--- a/test/replication/gh-5195-qsync-replica-write.test.lua
+++ b/test/replication/gh-5195-qsync-replica-write.test.lua
@@ -41,10 +41,9 @@ replica_id = test_run:get_server_id('replica')
 replica_lsn = test_run:get_lsn('replica', replica_id)
 test_run:wait_downstream(replica_id, {status='follow'})
 test_run:wait_cond(function()                                                   \
-        local info = box.info.replication[replica_id]                           \
-        local lsn = info.downstream.vclock[replica_id]                          \
-        return lsn and lsn >= replica_lsn                                       \
-end)                                                                            \
+        local lsn = box.info.replication[replica_id].downstream.vclock          \
+        return (lsn and lsn[replica_id] and lsn[replica_id] >= replica_lsn)     \
+    end) or box.info
 
 box.cfg{replication_synchro_quorum = 2}
 test_run:wait_cond(function() return f:status() == 'dead' end)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-*
  2020-09-07 12:13 [Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-* Alexander V. Tikhonov
@ 2020-09-09 15:45 ` Serge Petrenko
  2020-09-09 21:12 ` Vladislav Shpilevoy
  1 sibling, 0 replies; 5+ messages in thread
From: Serge Petrenko @ 2020-09-09 15:45 UTC (permalink / raw)
  To: Alexander V. Tikhonov, Kirill Yukhin, Vladislav Shpilevoy
  Cc: tarantool-patches


07.09.2020 15:13, Alexander V. Tikhonov пишет:
> On heavy loaded hosts found the following issue:
>
>     box.cfg{replication_synchro_quorum = 2}
>      | ---
>    + | - error: '[string "test_run:wait_cond(function()                ..."]:1: attempt to
>    + |     index field ''vclock'' (a nil value)'
>      | ...
>     test_run:wait_cond(function() return f:status() == 'dead' end)
>      | ---
>    - | - true
>    + | - false
>      | ...
>     ok, err
>      | ---
>    - | - true
>    - | - [2]
>    + | - null
>    + | - null
>      | ...
>     box.space.sync:select{}
>      | ---
>
> It happened because replication vclock field was not exist at the moment
> of its check. To fix the issue, vclock field had to be waited to be
> available using test_run:wait_cond() routine.
>
> Closes #5230
> ---
>
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5230-fix-qsync-write-5195
> Issue: https://github.com/tarantool/tarantool/issues/5230
>
>   test/replication/gh-5195-qsync-replica-write.result   | 10 ++++++----
>   test/replication/gh-5195-qsync-replica-write.test.lua |  7 +++----
>   2 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result
> index 3999e8f5e..b47359fde 100644
> --- a/test/replication/gh-5195-qsync-replica-write.result
> +++ b/test/replication/gh-5195-qsync-replica-write.result
> @@ -96,10 +96,12 @@ test_run:wait_downstream(replica_id, {status='follow'})
>    | - true
>    | ...
>   test_run:wait_cond(function()                                                   \
> -        local info = box.info.replication[replica_id]                           \
> -        local lsn = info.downstream.vclock[replica_id]                          \
> -        return lsn and lsn >= replica_lsn                                       \
> -end)                                                                            \
> +        local lsn = box.info.replication[replica_id].downstream.vclock          \
> +        return (lsn and lsn[replica_id] and lsn[replica_id] >= replica_lsn)     \
> +    end) or box.info
> + | ---
> + | - true
> + | ...
>   
>   box.cfg{replication_synchro_quorum = 2}
>    | ---
> diff --git a/test/replication/gh-5195-qsync-replica-write.test.lua b/test/replication/gh-5195-qsync-replica-write.test.lua
> index 2b9909e21..7843ad6c9 100644
> --- a/test/replication/gh-5195-qsync-replica-write.test.lua
> +++ b/test/replication/gh-5195-qsync-replica-write.test.lua
> @@ -41,10 +41,9 @@ replica_id = test_run:get_server_id('replica')
>   replica_lsn = test_run:get_lsn('replica', replica_id)
>   test_run:wait_downstream(replica_id, {status='follow'})
>   test_run:wait_cond(function()                                                   \
> -        local info = box.info.replication[replica_id]                           \
> -        local lsn = info.downstream.vclock[replica_id]                          \
> -        return lsn and lsn >= replica_lsn                                       \
> -end)                                                                            \
> +        local lsn = box.info.replication[replica_id].downstream.vclock          \
> +        return (lsn and lsn[replica_id] and lsn[replica_id] >= replica_lsn)     \
> +    end) or box.info
>   
>   box.cfg{replication_synchro_quorum = 2}
>   test_run:wait_cond(function() return f:status() == 'dead' end)
LGTM

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-*
  2020-09-07 12:13 [Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-* Alexander V. Tikhonov
  2020-09-09 15:45 ` Serge Petrenko
@ 2020-09-09 21:12 ` Vladislav Shpilevoy
  2020-09-10  8:13   ` Alexander V. Tikhonov
  1 sibling, 1 reply; 5+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-09 21:12 UTC (permalink / raw)
  To: Alexander V. Tikhonov, Kirill Yukhin, Serge Petrenko; +Cc: tarantool-patches

Thanks for the patch!

See 3 comments below.

On 07.09.2020 14:13, Alexander V. Tikhonov wrote:
> On heavy loaded hosts found the following issue:
> 
>    box.cfg{replication_synchro_quorum = 2}
>     | ---
>   + | - error: '[string "test_run:wait_cond(function()                ..."]:1: attempt to
>   + |     index field ''vclock'' (a nil value)'

1. How is that output possible? box.cfg has nothing to do with wait_cond
nor with vclocks. The diff looks broken, misplaced.

>     | ...
>    test_run:wait_cond(function() return f:status() == 'dead' end)
>     | ---
>   - | - true
>   + | - false
>     | ...
>    ok, err
>     | ---
>   - | - true
>   - | - [2]
>   + | - null
>   + | - null
>     | ...
>    box.space.sync:select{}
>     | ---
> 
> It happened because replication vclock field was not exist at the moment
> of its check. To fix the issue, vclock field had to be waited to be
> available using test_run:wait_cond() routine.

2. But in the fix you don't wait for it. You just added one more level
of checking inside the existing wait_cond.

> Closes #5230
> ---
> 
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5230-fix-qsync-write-5195
> Issue: https://github.com/tarantool/tarantool/issues/5230
> 
>  test/replication/gh-5195-qsync-replica-write.result   | 10 ++++++----
>  test/replication/gh-5195-qsync-replica-write.test.lua |  7 +++----
>  2 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result
> index 3999e8f5e..b47359fde 100644
> --- a/test/replication/gh-5195-qsync-replica-write.result
> +++ b/test/replication/gh-5195-qsync-replica-write.result
> @@ -96,10 +96,12 @@ test_run:wait_downstream(replica_id, {status='follow'})
>   | - true
>   | ...
>  test_run:wait_cond(function()                                                   \
> -        local info = box.info.replication[replica_id]                           \
> -        local lsn = info.downstream.vclock[replica_id]                          \
> -        return lsn and lsn >= replica_lsn                                       \
> -end)                                                                            \
> +        local lsn = box.info.replication[replica_id].downstream.vclock          \

3. It is not lsn, it is vclock now. Also I don't think you need to inline 'info'
variable, it was perfectly fine.

> +        return (lsn and lsn[replica_id] and lsn[replica_id] >= replica_lsn)     \
> +    end) or box.info
> + | ---
> + | - true
> + | ...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-*
  2020-09-09 21:12 ` Vladislav Shpilevoy
@ 2020-09-10  8:13   ` Alexander V. Tikhonov
  2020-09-10 19:21     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 5+ messages in thread
From: Alexander V. Tikhonov @ 2020-09-10  8:13 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi Vlad, thanks for the review. I've corrected the commit message and
variable name. Also commented your review.

On Wed, Sep 09, 2020 at 11:12:40PM +0200, Vladislav Shpilevoy wrote:
> Thanks for the patch!
> 
> See 3 comments below.
> 
> On 07.09.2020 14:13, Alexander V. Tikhonov wrote:
> > On heavy loaded hosts found the following issue:
> > 
> >    box.cfg{replication_synchro_quorum = 2}
> >     | ---
> >   + | - error: '[string "test_run:wait_cond(function()                ..."]:1: attempt to
> >   + |     index field ''vclock'' (a nil value)'
> 
> 1. How is that output possible? box.cfg has nothing to do with wait_cond
> nor with vclocks. The diff looks broken, misplaced.

Right, the error message from test-run was not informative and correct,
but to compare it with output of the new issues it should be saved in
commit message as it occurred. Also I've added the really caused issue
command wait_cond() from the test to the commit message.
> 
> >     | ...
> >    test_run:wait_cond(function() return f:status() == 'dead' end)
> >     | ---
> >   - | - true
> >   + | - false
> >     | ...
> >    ok, err
> >     | ---
> >   - | - true
> >   - | - [2]
> >   + | - null
> >   + | - null
> >     | ...
> >    box.space.sync:select{}
> >     | ---
> > 
> > It happened because replication vclock field was not exist at the moment
> > of its check. To fix the issue, vclock field had to be waited to be
> > available using test_run:wait_cond() routine.
> 
> 2. But in the fix you don't wait for it. You just added one more level
> of checking inside the existing wait_cond.

Right, but this check avoid of the fail, due to wait_cond() checks vclock
existence and denies use it till it will be available.

> 
> > Closes #5230
> > ---
> > 
> > Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5230-fix-qsync-write-5195
> > Issue: https://github.com/tarantool/tarantool/issues/5230
> > 
> >  test/replication/gh-5195-qsync-replica-write.result   | 10 ++++++----
> >  test/replication/gh-5195-qsync-replica-write.test.lua |  7 +++----
> >  2 files changed, 9 insertions(+), 8 deletions(-)
> > 
> > diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result
> > index 3999e8f5e..b47359fde 100644
> > --- a/test/replication/gh-5195-qsync-replica-write.result
> > +++ b/test/replication/gh-5195-qsync-replica-write.result
> > @@ -96,10 +96,12 @@ test_run:wait_downstream(replica_id, {status='follow'})
> >   | - true
> >   | ...
> >  test_run:wait_cond(function()                                                   \
> > -        local info = box.info.replication[replica_id]                           \
> > -        local lsn = info.downstream.vclock[replica_id]                          \
> > -        return lsn and lsn >= replica_lsn                                       \
> > -end)                                                                            \
> > +        local lsn = box.info.replication[replica_id].downstream.vclock          \
> 
> 3. It is not lsn, it is vclock now. Also I don't think you need to inline 'info'
> variable, it was perfectly fine.

Right, the variable name was not correct - fixed. 'info' variable wasn't
need to be set due to used only once.

> 
> > +        return (lsn and lsn[replica_id] and lsn[replica_id] >= replica_lsn)     \
> > +    end) or box.info
> > + | ---
> > + | - true
> > + | ...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-*
  2020-09-10  8:13   ` Alexander V. Tikhonov
@ 2020-09-10 19:21     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 5+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-10 19:21 UTC (permalink / raw)
  To: Alexander V. Tikhonov; +Cc: tarantool-patches

>>> diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result
>>> index 3999e8f5e..b47359fde 100644
>>> --- a/test/replication/gh-5195-qsync-replica-write.result
>>> +++ b/test/replication/gh-5195-qsync-replica-write.result
>>> @@ -96,10 +96,12 @@ test_run:wait_downstream(replica_id, {status='follow'})
>>>   | - true
>>>   | ...
>>>  test_run:wait_cond(function()                                                   \
>>> -        local info = box.info.replication[replica_id]                           \
>>> -        local lsn = info.downstream.vclock[replica_id]                          \
>>> -        return lsn and lsn >= replica_lsn                                       \
>>> -end)                                                                            \
>>> +        local lsn = box.info.replication[replica_id].downstream.vclock          \
>>
>> 3. It is not lsn, it is vclock now. Also I don't think you need to inline 'info'
>> variable, it was perfectly fine.
> 
> Right, the variable name was not correct - fixed. 'info' variable wasn't
> need to be set due to used only once.

That is not about number of usages. It is about code being easier to read
by avoiding too big one-liners.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-10 19:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-07 12:13 [Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-* Alexander V. Tikhonov
2020-09-09 15:45 ` Serge Petrenko
2020-09-09 21:12 ` Vladislav Shpilevoy
2020-09-10  8:13   ` Alexander V. Tikhonov
2020-09-10 19:21     ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox