[PATCH v2] xlog: Remove inprogress files on start

Vladimir Davydov vdavydov.dev at gmail.com
Tue Jun 19 14:50:43 MSK 2018


On Mon, Jun 18, 2018 at 07:23:34PM +0300, Ilya Markov wrote:
> When tarantool crashes during writing to vylog, index, run or snapshot,
> inprogress files remain. But garbage collector doesn't take into account
> these files. So they remain until they are removed manually.
> 
> Fix this with adding deletion of such files to functions memtx_engine_end_recovery,
> vinyl_engine_end_recovery and vy_run_remove_files.
> 
> Add 3 errinj which simulate the crash before renaming inprogress files.
> 
> Closes #3406
> ---
> branch: https://github.com/tarantool/tarantool/tree/gh-3406-remove-inprogress-files
> issue: https://github.com/tarantool/tarantool/issues/3406

List of changes between v1 and v2 is missing, although I asked you to
add one:

On Mon, Jun 18, 2018 at 12:44:07PM +0300, Vladimir Davydov wrote:
>    When you resubmit a new version of your patch, please don't forget to
>    append the version to the subject prefix (e.g. PATCH v2), and write a
>    brief change log, e.g. see
> 
>    https://www.freelists.org/post/tarantool-patches/PATCH-v2-box-make-gc-info-public

Don't forget it next time.

> 
>  src/box/memtx_engine.c   |   2 +-
>  src/box/vinyl.c          |   1 +
>  src/box/vy_log.c         |   4 ++
>  src/box/vy_run.c         |  17 +++++
>  src/box/xlog.c           |  38 +++++++++++
>  src/box/xlog.h           |   6 ++
>  src/errinj.h             |   3 +
>  test/box/errinj.result   | 168 +++++++++++++++++++++++++++++++++++++++++++++++
>  test/box/errinj.test.lua |  63 ++++++++++++++++++
>  9 files changed, 301 insertions(+), 1 deletion(-)
> 
> diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
> index 9a9aff5..3ab60f0 100644
> --- a/src/box/memtx_engine.c
> +++ b/src/box/memtx_engine.c
> @@ -303,7 +303,7 @@ memtx_engine_end_recovery(struct engine *engine)
>  		if (space_foreach(memtx_build_secondary_keys, memtx) != 0)
>  			return -1;
>  	}
> -	return 0;
> +	return xdir_collect_inprogress(memtx->snap_dir.dirname);

No need to abort recovery if an inprogress file can't be deleted, a
warning in the log will do. Garbage collection should be best-effort.

>  }
> 
>  static struct space *
> diff --git a/src/box/vinyl.c b/src/box/vinyl.c
> index 552d42b..69646b6 100644
> --- a/src/box/vinyl.c
> +++ b/src/box/vinyl.c
> @@ -2941,6 +2941,7 @@ vinyl_engine_end_recovery(struct engine *engine)
>  	case VINYL_FINAL_RECOVERY_LOCAL:
>  		if (vy_log_end_recovery() != 0)
>  			return -1;
> +		xdir_collect_inprogress(e->path);

Removal of vylog files should be done from vy_log_end_recovery()
(for the sake of encapsulation).

>  		/*
>  		 * If the instance is shut down while a dump or
>  		 * compaction task is in progress, we'll get an
> diff --git a/src/box/vy_log.c b/src/box/vy_log.c
> index 51e3753..3519bde 100644
> --- a/src/box/vy_log.c
> +++ b/src/box/vy_log.c
> @@ -848,6 +848,10 @@ vy_log_open(struct xlog *xlog)
>  	struct vy_log_record record;
>  	vy_log_record_init(&record);
>  	record.type = VY_LOG_SNAPSHOT;
> +	ERROR_INJECT(ERRINJ_VY_LOG_RENAME, {
> +		diag_set(ClientError, ER_INJECTION, "vy_log open");
> +		return -1;
> +	});

The error injection should go right before xlog_rename(), and it should
close the xlog before returning from the function.

Also, be consistent with other error messages used on injection:

  ERRINJ_VY_READ_PAGE      "vinyl page read"
  ERRINJ_VY_INDEX_DUMP     "vinyl index dump"
  ERRINJ_VY_LOG_FLUSH      "vinyl log flush"

So for ERRINJ_VY_LOG_RENAME it should be "vinyl log rename".


>  	if (vy_log_record_encode(&record, &row) < 0 ||
>  	    xlog_write_row(xlog, &row) < 0)
>  		goto fail_close_xlog;
> diff --git a/src/box/vy_run.c b/src/box/vy_run.c
> index 980bc4d..1dda93d 100644
> --- a/src/box/vy_run.c
> +++ b/src/box/vy_run.c
> @@ -2021,6 +2021,14 @@ vy_run_write_index(struct vy_run *run, const char *dirpath,
>  			goto fail;
>  	}
> 
> +	ERROR_INJECT(ERRINJ_VY_INDEX_FILE_RENAME, {
> +		region_truncate(region, mem_used);
> +		xlog_tx_rollback(&index_xlog);
> +		xlog_close(&index_xlog, false);
> +		diag_set(ClientError, ER_INJECTION,
> +			 "vinyl index write");
> +		return -1;
> +	});

Fix the error message as per above and use labels to remove this
copy&paste.

>  	if (xlog_tx_commit(&index_xlog) < 0 ||
>  	    xlog_flush(&index_xlog) < 0 ||
>  	    xlog_rename(&index_xlog) < 0)
> @@ -2263,6 +2271,12 @@ vy_run_writer_commit(struct vy_run_writer *writer)
>  	run->info.max_key = vy_key_dup(key);
>  	if (run->info.max_key == NULL)
>  		goto out;
> +	ERROR_INJECT(ERRINJ_VY_RUN_RENAME, {
> +		region_truncate(&fiber()->gc, region_svp);
> +		diag_set(ClientError, ER_INJECTION,
> +			 "vinyl writer commit");
> +		return -1;
> +	});

Ditto.

> 
>  	/* Sync data and link the file to the final name. */
>  	if (xlog_sync(&writer->data_xlog) < 0 ||
> @@ -2439,6 +2453,9 @@ vy_run_remove_files(const char *dir, uint32_t space_id,
>  				(long long)run_id); return -1;});
>  	int ret = 0;
>  	char path[PATH_MAX];
> +	vy_index_snprint_path(path, PATH_MAX, dir, space_id, iid);
> +	if (xdir_collect_inprogress(path) < 0)
> +		return -1;

What's the point of using xdir_collect_inprogress here? It scans the
directory which is an overkill, as file names are already known. Besides
it blocks the tx thread. You have a loop removing files below. Simply
make it remove inprogress files as well.

>  	for (int type = 0; type < vy_file_MAX; type++) {
>  		vy_run_snprint_path(path, sizeof(path), dir,
>  				    space_id, iid, run_id, type);
> diff --git a/src/box/xlog.c b/src/box/xlog.c
> index 824ad11..b5cc494 100644
> --- a/src/box/xlog.c
> +++ b/src/box/xlog.c
> @@ -621,6 +621,44 @@ xdir_collect_garbage(struct xdir *dir, int64_t signature, bool use_coio)
>  	return 0;
>  }
> 
> +int
> +xdir_collect_inprogress(const char *dirname)

Judging by its signature, this function should take an xdir object.

> +{
> +	DIR *dh = opendir(dirname);
> +	if (dh == NULL) {
> +		if (errno == ENOENT)
> +			return 0;
> +		diag_set(SystemError, "error reading directory '%s'",
> +			 dirname);
> +		return -1;
> +	}
> +
> +	struct dirent *dent;
> +	char path[PATH_MAX];
> +	int total = snprintf(path, sizeof(path), "%s/", dirname);
> +	if (total < 0) {

snprintf never fails.

> +		say_syserror("error snprintf %s", dirname);
> +		diag_set(SystemError, "error snprintf'",
> +			 path);
> +		return -1;
> +	}
> +	while ((dent = readdir(dh)) != NULL) {
> +		char *ext = strrchr(dent->d_name, '.');
> +		if (ext == NULL || strcmp(ext, inprogress_suffix) != 0)
> +			continue;
> +		strcpy(path + total, dent->d_name);
> +		say_info("removing %s", path);
> +		int rc = coio_unlink(path);

readdir is blocking so why use coio_unlink?

> +		if (rc < 0) {
> +			say_syserror("error while removing %s", path);
> +			diag_set(SystemError, "failed to unlink file '%s'",
> +				 path);
> +			return -1;
> +		}
> +	}
> +	return 0;
> +}
> +
>  /* }}} */
> 
> 
> diff --git a/src/box/xlog.h b/src/box/xlog.h
> index 973910d..4faa763 100644
> --- a/src/box/xlog.h
> +++ b/src/box/xlog.h
> @@ -181,6 +181,12 @@ int
>  xdir_collect_garbage(struct xdir *dir, int64_t signature, bool use_coio);
> 
>  /**
> + * Remove .inprogress files in specified directory.
> + */
> +int
> +xdir_collect_inprogress(const char *dirname);
> +
> +/**
>   * Return LSN and vclock (unless @vclock is NULL) of the newest
>   * file in a directory or -1 if the directory is empty.
>   */
> diff --git a/src/errinj.h b/src/errinj.h
> index 78514ac..8444479 100644
> --- a/src/errinj.h
> +++ b/src/errinj.h
> @@ -111,6 +111,9 @@ struct errinj {
>  	_(ERRINJ_IPROTO_TX_DELAY, ERRINJ_BOOL, {.bparam = false}) \
>  	_(ERRINJ_LOG_ROTATE, ERRINJ_BOOL, {.bparam = false}) \
>  	_(ERRINJ_SNAP_COMMIT_DELAY, ERRINJ_BOOL, {.bparam = 0}) \
> +	_(ERRINJ_VY_LOG_RENAME, ERRINJ_BOOL, {.bparam = false}) \
> +	_(ERRINJ_VY_INDEX_FILE_RENAME, ERRINJ_BOOL, {.bparam = false}) \
> +	_(ERRINJ_VY_RUN_RENAME, ERRINJ_BOOL, {.bparam = false}) \

Error injection names are inconsistent (VY_INDEX_FILE_RENAME vs
VY_RUN_RENAME). Fix it.



More information about the Tarantool-patches mailing list