Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: "Alexander V. Tikhonov" <avtikhon@tarantool.org>,
	Kirill Yukhin <kyukhin@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status
Date: Wed, 9 Sep 2020 18:41:11 +0300	[thread overview]
Message-ID: <23d93ca8-c632-aab5-7e57-44b6f8c3a93e@tarantool.org> (raw)
In-Reply-To: <87a0a297a6c954f37bed6a79f81949a87196a218.1599440334.git.avtikhon@tarantool.org>


07.09.2020 04:00, Alexander V. Tikhonov пишет:
> On heavy loaded hosts found the following 3 issues:
>
> line 174:
>
>   [026] --- replication/status.result	Thu Jun 11 12:07:39 2020
>   [026] +++ replication/status.reject	Sun Jun 14 03:20:21 2020
>   [026] @@ -174,15 +174,17 @@
>   [026]  ...
>   [026]  replica.downstream.status == 'follow'
>   [026]  ---
>   [026] -- true
>   [026] +- false
>   [026]  ...
>
> It happened because replication downstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_downstream() routine.
>
> line 178:
>
> [024] --- replication/status.result	Mon Sep  7 00:22:52 2020
> [024] +++ replication/status.reject	Mon Sep  7 00:36:01 2020
> [024] @@ -178,11 +178,13 @@
> [024]  ...
> [024]  replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> [024]  ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to
> [024] +    index field ''vclock'' (a nil value)'
> [024]  ...
> [024]  replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> [024]  ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to
> [024] +    index field ''vclock'' (a nil value)'
> [024]  ...
> [024]  --
> [024]  -- Replica
>
> It happened because replication vclock field was not exist at the moment
> of its check. To fix the issue, vclock field had to be waited to be
> available using test_run:wait_cond() routine. Also the replication data
> downstream had to be read at the same moment.
>
> line 224:
>
> [014] --- replication/status.result	Fri Jul  3 04:29:56 2020
> [014] +++ replication/status.reject	Mon Sep  7 00:17:30 2020
> [014] @@ -224,7 +224,7 @@
> [014]  ...
> [014]  master.upstream.status == "follow"
> [014]  ---
> [014] -- true
> [014] +- false
> [014]  ...
> [014]  master.upstream.lag < 1
> [014]  ---
>
> It happened because replication upstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_upstream() routine.
>
> Removed test from 'fragile' test_run tool list to run it in parallel.
>
> Closes #5110
> ---
>
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5110-repl-status-174
> Issue: https://github.com/tarantool/tarantool/issues/5110
>
>   test/replication/status.result   | 16 +++++++++-------
>   test/replication/status.test.lua | 13 +++++++++----
>   test/replication/suite.ini       |  1 -
>   3 files changed, 18 insertions(+), 12 deletions(-)
>
> diff --git a/test/replication/status.result b/test/replication/status.result
> index a86f48774..d5addbc80 100644
> --- a/test/replication/status.result
> +++ b/test/replication/status.result
> @@ -172,15 +172,17 @@ replica.upstream == nil
>   ---
>   - true
>   ...
> -replica.downstream.status == 'follow'
> +test_run:wait_downstream(replica_id, {status == 'follow'})
>   ---
>   - true
>   ...
> -replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> ----
> -- true
> -...
> -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> +-- wait for the replication vclock
> +test_run:wait_cond(function()                    \
> +    local r = box.info.replication[replica_id].downstream.vclock \
> +    return (r ~= nil and box.info.vclock ~= nil and \
> +            r[master_id] == box.info.vclock[master_id] and \
> +            r[replica_id] == box.info.vclock[replica_id]) \
> +    end) or box.info
>   ---
>   - true
>   ...
> @@ -222,7 +224,7 @@ master.uuid == box.space._cluster:get(master_id)[2]
>   ---
>   - true
>   ...
> -master.upstream.status == "follow"
> +test_run:wait_upstream(master_id, {status == 'follow'})
>   ---
>   - true
>   ...
> diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
> index 090968172..6006ce9cf 100644
> --- a/test/replication/status.test.lua
> +++ b/test/replication/status.test.lua
> @@ -64,9 +64,14 @@ replica.uuid == box.space._cluster:get(replica_id)[2]
>   -- replica.lsn == box.info.vclock[replica_id]
>   replica.lsn == 0
>   replica.upstream == nil
> -replica.downstream.status == 'follow'
> -replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> +test_run:wait_downstream(replica_id, {status == 'follow'})
> +-- wait for the replication vclock
> +test_run:wait_cond(function()                    \
> +    local r = box.info.replication[replica_id].downstream.vclock \
> +    return (r ~= nil and box.info.vclock ~= nil and \
> +            r[master_id] == box.info.vclock[master_id] and \
> +            r[replica_id] == box.info.vclock[replica_id]) \
> +    end) or box.info
>   
>   --
>   -- Replica
> @@ -83,7 +88,7 @@ box.info.vclock[master_id] == 2
>   master = box.info.replication[master_id]
>   master.id == master_id
>   master.uuid == box.space._cluster:get(master_id)[2]
> -master.upstream.status == "follow"
> +test_run:wait_upstream(master_id, {status == 'follow'})
>   master.upstream.lag < 1
>   master.upstream.idle < 1
>   master.upstream.peer:match("unix/")
> diff --git a/test/replication/suite.ini b/test/replication/suite.ini
> index ab9c3dabd..9bba9d125 100644
> --- a/test/replication/suite.ini
> +++ b/test/replication/suite.ini
> @@ -23,4 +23,3 @@ fragile = errinj.test.lua            ; gh-3870
>             box_set_replication_stress.test.lua     ; gh-4992 gh-4986
>             gh-4605-empty-password.test.lua         ; gh-5030
>             anon.test.lua              ; gh-5058
> -          status.test.lua            ; gh-5110


LGTM

-- 
Serge Petrenko

  reply	other threads:[~2020-09-09 15:41 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07  1:00 Alexander V. Tikhonov
2020-09-09 15:41 ` Serge Petrenko [this message]
2020-09-11 10:36 ` Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23d93ca8-c632-aab5-7e57-44b6f8c3a93e@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=avtikhon@tarantool.org \
    --cc=kyukhin@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox