[PATCH v3] xlog: Remove inprogress files on start

Vladimir Davydov vdavydov.dev at gmail.com
Wed Jun 20 15:08:53 MSK 2018


On Tue, Jun 19, 2018 at 04:14:11PM +0300, Ilya Markov wrote:
> When tarantool crashes during writing to vylog, index, run or snapshot,
> inprogress files remain. But garbage collector doesn't take into account
> these files. So they remain until they are removed manually.
> 
> Fix this with adding deletion of such files to functions memtx_engine_end_recovery,
> vinyl_engine_end_recovery and vy_run_remove_files.
> 
> Add 3 errinj which simulate the crash before renaming inprogress files.
> Moved const string inprogress_suffix to xlog.h in order to reuse it in
> vinyl run and index files removal.
> 
> Closes #3406
> ---
> branch: https://github.com/tarantool/tarantool/tree/gh-3406-remove-inprogress-files
> issue: https://github.com/tarantool/tarantool/issues/3406
> 
> Changes in v3:
> - Delete abortion of end_recovery in case if inprgress file deletion fails.
>   Replace it with diag_log.
> - Move deletion of vylog.inprogress to vy_log_end_recovery().
> - Move error new injections before relevant calls of xlog_rename().
> - Change names and error messages of error injections.
> - Removed copy&paste in error injection code and add comment where it's impossible.
> - Changed signature of xdir_collect_inprogress function.
> - Removed call xdir_collect_inprogress from vy_run_remove_files,
>   replace it with implicit call of coio_unlink.
> - Replace coio_unlink in xdir_collect_inprogress with unlink.
> - Move const string inprogress_suffix to xlog.h.
> 
>  src/box/memtx_engine.c   |   2 +
>  src/box/vy_log.c         |   7 +-
>  src/box/vy_run.c         |  40 +++++++++--
>  src/box/xlog.c           |  35 +++++++++-
>  src/box/xlog.h           |  11 ++++
>  src/errinj.h             |   3 +
>  test/box/errinj.result   | 168 +++++++++++++++++++++++++++++++++++++++++++++++
>  test/box/errinj.test.lua |  63 ++++++++++++++++++
>  8 files changed, 321 insertions(+), 8 deletions(-)
> 
> diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
> index 9a9aff5..1566d26 100644
> --- a/src/box/memtx_engine.c
> +++ b/src/box/memtx_engine.c
> @@ -303,6 +303,8 @@ memtx_engine_end_recovery(struct engine *engine)
>  		if (space_foreach(memtx_build_secondary_keys, memtx) != 0)
>  			return -1;
>  	}
> +	if (xdir_collect_inprogress(&memtx->snap_dir) < 0)
> +		diag_log();

You already log error in xdir_collect_inprogress(). No need in diag_log.
Better make xdir_collect_inprogress() return void.

>  	return 0;
>  }
> 
> diff --git a/src/box/vy_log.c b/src/box/vy_log.c
> index 51e3753..cd343b9 100644
> --- a/src/box/vy_log.c
> +++ b/src/box/vy_log.c
> @@ -852,6 +852,10 @@ vy_log_open(struct xlog *xlog)
>  	    xlog_write_row(xlog, &row) < 0)
>  		goto fail_close_xlog;
> 
> +	ERROR_INJECT(ERRINJ_VY_LOG_FILE_RENAME, {
> +		diag_set(ClientError, ER_INJECTION, "vinyl log file rename");
> +		return -1;
> +	});

You need to close xlog on error injection.

>  	if (xlog_rename(xlog) < 0)
>  		goto fail_close_xlog;
> 
> @@ -976,7 +980,8 @@ vy_log_end_recovery(void)
>  			return -1;
>  		}
>  	}
> -
> +	if (xdir_collect_inprogress(&vy_log.dir) < 0)
> +		diag_log();
>  	vy_log.recovery = NULL;
>  	return 0;
>  }
> diff --git a/src/box/vy_run.c b/src/box/vy_run.c
> index 980bc4d..96736ba 100644
> --- a/src/box/vy_run.c
> +++ b/src/box/vy_run.c
> @@ -2021,6 +2021,19 @@ vy_run_write_index(struct vy_run *run, const char *dirpath,
>  			goto fail;
>  	}
> 
> +	ERROR_INJECT(ERRINJ_VY_INDEX_FILE_RENAME, {
> +		/*
> +		 * Can't use here goto label fail,
> +		 * because we don't want to unlink file in errinj and can't move
> +		 * unlink before xlog_close call.
> +		 */

OK. I see. Remove this comment, please. It'd be OK if you had told me
that in reply to my review email.

> +		region_truncate(region, mem_used);
> +		xlog_tx_rollback(&index_xlog);
> +		xlog_close(&index_xlog, false);
> +		diag_set(ClientError, ER_INJECTION,
> +			 "vinyl index file rename");
> +		return -1;
> +	});
>  	if (xlog_tx_commit(&index_xlog) < 0 ||
>  	    xlog_flush(&index_xlog) < 0 ||
>  	    xlog_rename(&index_xlog) < 0)
> @@ -2263,6 +2276,12 @@ vy_run_writer_commit(struct vy_run_writer *writer)
>  	run->info.max_key = vy_key_dup(key);
>  	if (run->info.max_key == NULL)
>  		goto out;
> +	ERROR_INJECT(ERRINJ_VY_RUN_FILE_RENAME, {
> +		diag_set(ClientError, ER_INJECTION,
> +			 "vinyl run file rename");

> +		rc = -1;

Nit: this line is not needed, because rc is set to -1 in the beginning
of this function AFAICS.

> +		goto out;
> +	});
> 
>  	/* Sync data and link the file to the final name. */
>  	if (xlog_sync(&writer->data_xlog) < 0 ||
> @@ -2430,6 +2449,15 @@ close_err:
>  	return -1;
>  }
> 
> +static inline void
> +remove_file(const char *path, int *ret) {

Coding style.

> +	say_info("removing %s", path);
> +	if (coio_unlink(path) < 0 && errno != ENOENT) {
> +		say_syserror("error while removing %s", path);
> +		*ret = -1;
> +	}

I don't want to see "removing ..." message in the log if inprogress file
didn't exist (i.e. removing a committed run file on garbage collection).

> +}
> +
>  int
>  vy_run_remove_files(const char *dir, uint32_t space_id,
>  		    uint32_t iid, int64_t run_id)
> @@ -2439,14 +2467,14 @@ vy_run_remove_files(const char *dir, uint32_t space_id,
>  				(long long)run_id); return -1;});
>  	int ret = 0;
>  	char path[PATH_MAX];
> +	int total = 0;
>  	for (int type = 0; type < vy_file_MAX; type++) {
> -		vy_run_snprint_path(path, sizeof(path), dir,
> +		total = vy_run_snprint_path(path, sizeof(path), dir,
>  				    space_id, iid, run_id, type);
> -		say_info("removing %s", path);
> -		if (coio_unlink(path) < 0 && errno != ENOENT) {
> -			say_syserror("error while removing %s", path);
> -			ret = -1;
> -		}
> +		remove_file(path, &ret);
> +		snprintf(path + total, sizeof(path) - total, "%s",
> +			 inprogress_suffix);
> +		remove_file(path, &ret);

No, I don't really like it. I think adding inprogress files to
vy_file_type enum would be more concise and clean. I think it's
OK, because this enum is only used for formatting file names.
Something like this:

diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 8a177446..db5c018a 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -3579,6 +3579,9 @@ vinyl_engine_backup(struct engine *engine, const struct vclock *vclock,
 				continue;
 			char path[PATH_MAX];
 			for (int type = 0; type < vy_file_MAX; type++) {
+				if (type == VY_FILE_RUN_INPROGRESS ||
+				    type == VY_FILE_INDEX_INPROGRESS)
+					continue;
 				vy_run_snprint_path(path, sizeof(path),
 						    env->path,
 						    lsm_info->space_id,
diff --git a/src/box/vy_run.c b/src/box/vy_run.c
index e2edbcaa..72e06244 100644
--- a/src/box/vy_run.c
+++ b/src/box/vy_run.c
@@ -66,8 +66,10 @@ static const uint64_t vy_run_info_key_map = (1 << VY_RUN_INFO_MIN_KEY) |
 #define XLOG_META_TYPE_INDEX "INDEX"
 
 const char *vy_file_suffix[] = {
-	"index",	/* VY_FILE_INDEX */
-	"run",		/* VY_FILE_RUN */
+	"index",		/* VY_FILE_INDEX */
+	"index.inprogress",	/* VY_FILE_INDEX_INPROGRESS */
+	"run",			/* VY_FILE_RUN */
+	"run.inprogress",	/* VY_FILE_RUN_INPROGRESS */
 };
 
 /**
diff --git a/src/box/vy_run.h b/src/box/vy_run.h
index 7bafffec..5030886c 100644
--- a/src/box/vy_run.h
+++ b/src/box/vy_run.h
@@ -377,7 +377,9 @@ vy_run_rebuild_index(struct vy_run *run, const char *dir,
 
 enum vy_file_type {
 	VY_FILE_INDEX,
+	VY_FILE_INDEX_INPROGRESS,
 	VY_FILE_RUN,
+	VY_FILE_RUN_INPROGRESS,
 	vy_file_MAX,
 };
 

>  	}
>  	return ret;
>  }
> diff --git a/src/box/xlog.c b/src/box/xlog.c
> index 824ad11..7bd618e 100644
> --- a/src/box/xlog.c
> +++ b/src/box/xlog.c
> @@ -58,7 +58,6 @@ typedef uint32_t log_magic_t;
>  static const log_magic_t row_marker = mp_bswap_u32(0xd5ba0bab); /* host byte order */
>  static const log_magic_t zrow_marker = mp_bswap_u32(0xd5ba0bba); /* host byte order */
>  static const log_magic_t eof_marker = mp_bswap_u32(0xd510aded); /* host byte order */
> -static const char inprogress_suffix[] = ".inprogress";
> 
>  enum {
>  	/**
> @@ -621,6 +620,40 @@ xdir_collect_garbage(struct xdir *dir, int64_t signature, bool use_coio)
>  	return 0;
>  }
> 
> +int
> +xdir_collect_inprogress(struct xdir *xdir)
> +{
> +	const char *dirname = xdir->dirname;
> +	DIR *dh = opendir(dirname);
> +	if (dh == NULL) {
> +		if (errno == ENOENT)
> +			return 0;
> +		diag_set(SystemError, "error reading directory '%s'",
> +			 dirname);

Since you don't use the return code anyway, replace the diag with
say_syserror and make the function return void.

> +		return -1;
> +	}
> +
> +	struct dirent *dent;
> +	char path[PATH_MAX];
> +	int total = snprintf(path, sizeof(path), "%s/", dirname);
> +
> +	while ((dent = readdir(dh)) != NULL) {
> +		char *ext = strrchr(dent->d_name, '.');
> +		if (ext == NULL || strcmp(ext, inprogress_suffix) != 0)
> +			continue;
> +		strcpy(path + total, dent->d_name);

This may write beyond path if strlen(path) + strlen(d_name) >
sizeof(path).

> +		say_info("removing %s", path);
> +		int rc = unlink(path);
> +		if (rc < 0) {
> +			say_syserror("error while removing %s", path);
> +			diag_set(SystemError, "failed to unlink file '%s'",
> +				 path);
> +			return -1;
> +		}
> +	}
> +	return 0;
> +}

> diff --git a/test/box/errinj.test.lua b/test/box/errinj.test.lua
> index 3af1b74..756a8e7 100644
> --- a/test/box/errinj.test.lua
> +++ b/test/box/errinj.test.lua
> @@ -1,5 +1,7 @@
>  errinj = box.error.injection
>  net_box = require('net.box')
> +fio = require("fio")
> +fiber = require('fiber')
> 
>  space = box.schema.space.create('tweedledum')
>  index = space:create_index('primary', { type = 'hash' })
> @@ -203,6 +205,67 @@ box.snapshot()
>  errinj.set("ERRINJ_WAL_WRITE", false)
>  space:drop()
> 
> +errinj.set("ERRINJ_VY_LOG_FILE_RENAME", true)
> +s = box.schema.space.create("vinyl_test", {engine='vinyl'})
> +_ = s:create_index("primary")
> +vinyl_vylog = string.format("%s/*.vylog.inprogress", box.cfg.vinyl_dir)
> +#fio.glob(vinyl_vylog) > 0 -- true
> +
> +errinj.set("ERRINJ_SNAP_COMMIT_DELAY", true)
> +_ = fiber.create(function() box.snapshot() end)
> +memtx_snap = string.format("%s/*.snap.inprogress", box.cfg.memtx_dir)
> +while #fio.glob(memtx_snap) == 0 do fiber.sleep(0.001) end-- true
> +#fio.glob(memtx_snap) > 0 --true
> +
> +test_run:cmd("restart server default")
> +errinj = box.error.injection
> +net_box = require('net.box')
> +fio = require("fio")
> +fiber = require('fiber')
> +
> +memtx_snap = string.format("%s/*.snap.inprogress", box.cfg.memtx_dir)
> +#fio.glob(memtx_snap) == 0 -- true
> +
> +vinyl_vylog = string.format("%s/*.vylog.inprogress", box.cfg.vinyl_dir)
> +#fio.glob(vinyl_vylog) == 0 --true
> +
> +s = box.space.vinyl_test
> +vinyl_index = string.format("%s/%s/*/*.index.inprogress", box.cfg.vinyl_dir, s.id)
> +errinj.set("ERRINJ_VY_INDEX_FILE_RENAME", true)
> +-- insert big tuples in order to cause compaction without box.snapshot.
> +for i = 1, 10000 do s:insert{i, string.rep('a', 10000)} end

First, this test doesn't pass on travis.

Second, it's going to take too long, because you insert so many tuples.
Do you really need it. Why can't you use box.snapshot()?



More information about the Tarantool-patches mailing list