[tarantool-patches] Re: [PATCH v2] Use SIGKILL to stop server replica

Alexander Turenko alexander.turenko at tarantool.org
Tue Apr 30 05:58:04 MSK 2019


The change of the test case itself is okay for me, but I have some
comments around. See below.

WBR, Alexander Turenko.

On Mon, Apr 29, 2019 at 10:07:05PM +0300, avtikhon wrote:
> Used the signal option set to SIGKILL to stop server replica
> routine to be able to stop the replica imediately to imitate

Typo: imediately -> immediately.

> the replica crash and, then, wake up.
> Just 'stop server replica' (SIGTERM) is not sufficient to stop
> a tarantool instance when ERRINJ_WAL_DELAY is set, because
> "tarantool" thread wait for paused "wal" thread infinitely.
> Changed server stop routine to to kill routine to be able

Typo: to to -> to.

> to use SIGKILL instead of SIGTERM to the replica server. In
> this way the server replica will be killed immediately and
> *.xlog files will be removed as it has to be.

I would clarify that here you solves two problems: the incorrect test
case and flaky failures. Also it worth to mention an issue about the
flaky failures.

> Close #4162

And, I guess, #4034.

> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4162-stop-kill
> Issue: https://github.com/tarantool/tarantool/issues/4162

The patch contains many changes that are not related to the test case
itself. Please, factor them out or, better, just drop (if they don't
affect a behaviour of the test).

This will make the patch more clear for future readers. And, to be
honest, I don't think that there is any real difference between 10
seconds and 60 seconds for our test cases: each of them should be quite
short even when a system is under heavy load. Don't sure about a
swapping system, though; but unlikely we can do something with timeouts
tweaking for this case in general.

If these changes are really matter it worth to look at a reason and fix
separately. So we'll sure we don't just pollute git blame output.

> -test_run:cmd("switch replica")
> +-- Imitate the replica crash and, then, wake up.
> +-- Just 'stop server replica' (SIGTERM) is not sufficient to stop
> +-- a tarantool instance when ERRINJ_WAL_DELAY is set, because
> +-- "tarantool" thread wait for paused "wal" thread infinitely.
> +test_run:cmd("kill server replica")

Please, update according to test-run changes: stop server replica with
signal=SIGKILL (right?).

>  -- Check that garbage collection removed the snapshot once
>  -- the replica released the corresponding checkpoint.
> -wait_gc(1) or box.info.gc()
> -wait_xlog(1) or fio.listdir('./master') -- Make sure the replica will not receive data until
> +wait_gc(1)
> +wait_xlog(1)
>  -- we test garbage collection.

See, it was the comment:

> -- Make sure the replica will not receive data until
> -- we test garbage collection.

Now only the second part remains :)




More information about the Tarantool-patches mailing list