From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Alexander Turenko <alexander.turenko@tarantool.org>
Cc: avtikhon <avtikhon@tarantool.org>, tarantool-patches@freelists.org
Subject: Re: [PATCH v3] Use SIGKILL to stop server replica
Date: Thu, 2 May 2019 12:51:29 +0300 [thread overview]
Message-ID: <20190502095129.dhufqnds2f3duwxa@esperanza> (raw)
In-Reply-To: <20190502013945.rgo5qjwgzv7p3wlg@tkn_work_nb>
On Thu, May 02, 2019 at 04:39:45AM +0300, Alexander Turenko wrote:
> Vladimir, can you please verify my description of the fix? I have put it
> at end of the email. I see that my terminology differs from your: I
> describe things w/o terms 'advance gc', 'recovering xlog' and so on
> because I don't sure I can use them appropriately. Instead I tried to
> write it in terms of externally visible behaviour: 'remove xlog',
> 'reading xlog' and so on.
>
> My variant of the description:
>
> test: fix replication/gc flaky failures
>
> Two problems are fixed here. The first one is about correctness of the
> test case. The second is about flaky failures.
>
> About correctness. The test case contains the following lines:
>
> | test_run:cmd("switch replica")
> | -- Unblock the replica and break replication.
> | box.error.injection.set("ERRINJ_WAL_DELAY", false)
> | box.cfg{replication = {}}
>
> Usually rows are applied and the new vclock is sent to the master before
> replication will be disabled. So the master removes old xlog before the
> replica restart and the next case tests nothing.
>
> This commit uses the new test-run's ability to stop a tarantool instance
> with a custom signal and stops the replica with SIGKILL w/o dropping
> ERRINJ_WAL_DELAY. This change fixes the race between applying rows and
> disabling replication and so makes the test case correct.
>
> About flaky failures. They were look like so:
>
> | [029] --- replication/gc.result Mon Apr 15 14:58:09 2019
> | [029] +++ replication/gc.reject Tue Apr 16 09:17:47 2019
> | [029] @@ -290,7 +290,12 @@
> | [029] ...
> | [029] wait_xlog(1) or fio.listdir('./master')
> | [029] ---
> | [029] -- true
> | [029] +- - 00000000000000000305.vylog
> | [029] + - 00000000000000000305.xlog
> | [029] + - '512'
> | [029] + - 00000000000000000310.xlog
> | [029] + - 00000000000000000310.vylog
> | [029] + - 00000000000000000310.snap
> | [029] ...
> | [029] -- Stop the replica.
> | [029] test_run:cmd("stop server replica")
> | <...next cases could have induced mismathes too...>
>
> The reason of the fail is that a replica applied all rows from the old
> xlog, but didn't sent an ACK with a new vclock to a master, because the
> replication was disabled before that. The master stops relay and keeps
> the old xlog. When the replica starts again it subscribes with the
> vclock value that instructs a relay to open the new xlog.
>
> Tarantool can remove an old xlog just after a replica's ACK when
> observes that the xlog was fully read by all replicas. But tarantool
> does not remove xlogs when a replica is subscribed. This is not a big
> problem, because such 'stuck' xlog file will be removed with a next xlog
> removal.
>
> There was the attempt to fix this behaviour and remove old xlogs at
> subscribe, see the following commits:
>
> * b5b4809cf2e6d48230eb9e4301eac188b080e0f4 ('replication: update replica
> gc state on subscribe');
> * 766cd3e1015f6f76460a748c37212fb4c8791500 ('Revert "replication: update
> replica gc state on subscribe"').
>
> Anyway, this commit fixes this flaky failures, because stops the replica
> before applying rows from the old xlog. So when the replica starts it
> continues reading from the old xlog and the xlog file will be removed
> when will be fully read.
>
> Closes #4162
Looks good to me, thanks.
next prev parent reply other threads:[~2019-05-02 9:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-01 18:15 [tarantool-patches] " avtikhon
2019-05-02 1:39 ` Alexander Turenko
2019-05-02 9:51 ` Vladimir Davydov [this message]
2019-05-02 22:41 ` Alexander Turenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190502095129.dhufqnds2f3duwxa@esperanza \
--to=vdavydov.dev@gmail.com \
--cc=alexander.turenko@tarantool.org \
--cc=avtikhon@tarantool.org \
--cc=tarantool-patches@freelists.org \
--subject='Re: [PATCH v3] Use SIGKILL to stop server replica' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox