[PATCH] Fix force_recovery on empty xlog

Vladimir Davydov vdavydov.dev at gmail.com
Mon Jan 29 13:42:14 MSK 2018


On Mon, Jan 29, 2018 at 12:52:39PM +0300, Konstantin Belyavskiy wrote:

> diff --git a/src/box/recovery.cc b/src/box/recovery.cc
> index 281ac1838..cef7f11bd 100644
> --- a/src/box/recovery.cc
> +++ b/src/box/recovery.cc

> @@ -330,20 +332,24 @@ recovery_finalize(struct recovery *r, struct xstream *stream)
>  	recovery_close_log(r);
>  
>  	/*
> -	 * Check that the last xlog file has rows.
> +	 * Delete the last xlog file if either:
> +	 *  - file is corrupted
> +	 *  - file has zero size
>  	 */
> -	if (vclockset_last(&r->wal_dir.index) != NULL &&
> -	    vclock_sum(&r->vclock) ==
> -	    vclock_sum(vclockset_last(&r->wal_dir.index))) {
> -		/*
> -		 * Delete the last empty xlog file.
> -		 */
> +	if (vclockset_last(&r->wal_dir.index) != NULL) {
>  		char *name = xdir_format_filename(&r->wal_dir,
>  						  vclock_sum(&r->vclock),
>  						  NONE);
> -		if (unlink(name) != 0) {
> -			tnt_raise(SystemError, "%s: failed to unlink file",
> -				  name);
> +		struct stat st;
> +		if ((stat(name, &st) == 0 && st.st_size == 0) ||
> +		    vclock_sum(&r->vclock) ==
> +		    vclock_sum(vclockset_last(&r->wal_dir.index))) {

I don't understand why you check the file size if you intend to delete
the last xlog file in case we failed to recover any records from it.
Doesn't the latter imply the former?

Anyway, deletion of a non-empty file doesn't sound right to me.

> +			say_info("delete corrupted xlog %s", name);
> +			if (unlink(name) != 0) {
> +				tnt_raise(SystemError,
> +					  "%s: failed to unlink file",
> +					  name);
> +			}
>  		}
>  	}
>  }

> diff --git a/test/xlog/skip_empty_and_delete_last_corrupted_xlog_on_recovery.test.lua b/test/xlog/skip_empty_and_delete_last_corrupted_xlog_on_recovery.test.lua
> new file mode 100644
> index 000000000..8e8ebe126
> --- /dev/null
> +++ b/test/xlog/skip_empty_and_delete_last_corrupted_xlog_on_recovery.test.lua

Please call the test, xlog/force_recovery.test.lua. It's less verbose
but still clear, and we won't have to rename it if we add yet another
test case.

> @@ -0,0 +1,42 @@
> +#!/usr/bin/env tarantool
> +
> +env = require('test_run')
> +test_run = env.new()
> +
> +box.cfg{}
> +
> +test_run:cmd('create server test with script = "xlog/force_recovery.lua"')
> +
> +test_run:cmd("start server test")
> +test_run:cmd("switch test")
> +box.space._schema:replace({'test'})
> +test_run:cmd("switch default")
> +test_run:cmd("stop server test")
> +
> +test_run:cmd("start server test")
> +test_run:cmd("switch test")
> +box.space._schema:replace({'lost'})
> +test_run:cmd("switch default")
> +test_run:cmd("stop server test")
> +
> +test_run:cmd("start server test")
> +test_run:cmd("switch test")
> +box.space._schema:replace({'tost'})
> +test_run:cmd("switch default")
> +test_run:cmd("stop server test")
> +
> +os.execute("rm force_recovery/00000000000000000001.xlog")
> +os.execute("touch force_recovery/00000000000000000001.xlog")

We have the fio module for working with files.

> +
> +test_run:cmd("start server test")
> +test_run:cmd("switch test")
> +box.space._schema:replace({'last'})
> +test_run:cmd("switch default")
> +test_run:cmd("stop server test")
> +
> +os.execute("rm force_recovery/00000000000000000003.xlog")
> +os.execute("touch force_recovery/00000000000000000003.xlog")
> +
> +test_run:cmd("start server test")
> +test_run:cmd("switch test")
> +box.space._schema:replace({'list'})

AFAICS you only test recovery in case the last xlog file is empty.
If the last xlog file has a corrupted header, it doesn't seem to work
as expected. Try running this script:

    box.cfg{}
    s = box.schema.space.create('test')
    _ = s:create_index('pk')
    box.snapshot()
    s:insert{1}
    -- Corrupt the last xlog file.
    fio = require 'fio'
    path = fio.pathjoin(box.cfg.wal_dir, string.format('%020d.xlog', box.info.lsn - 1))
    f = fio.open(path, {'O_WRONLY'})
    f:write('DEAD')
    f:close()
    os.exit(0)

Tarantool does recover if force_recovery is set, but any attempt to
write to the database fails with:

    tarantool> box.space.test:insert{2}
    2018-01-29 13:36:20.520 [8308] wal/101/main xlog.c:701 !> SystemError file './00000000000000000003.xlog' already exists: File exists
    2018-01-29 13:36:20.520 [8308] main/101/interactive txn.c:271 E> ER_WAL_IO: Failed to write to disk
    ---
    - error: Failed to write to disk
    ...

I think we should fix this, too.



More information about the Tarantool-patches mailing list