[PATCH v2] xlog: Remove inprogress files on start
Vladimir Davydov
vdavydov.dev at gmail.com
Tue Jun 19 14:50:43 MSK 2018
On Mon, Jun 18, 2018 at 07:23:34PM +0300, Ilya Markov wrote:
> When tarantool crashes during writing to vylog, index, run or snapshot,
> inprogress files remain. But garbage collector doesn't take into account
> these files. So they remain until they are removed manually.
>
> Fix this with adding deletion of such files to functions memtx_engine_end_recovery,
> vinyl_engine_end_recovery and vy_run_remove_files.
>
> Add 3 errinj which simulate the crash before renaming inprogress files.
>
> Closes #3406
> ---
> branch: https://github.com/tarantool/tarantool/tree/gh-3406-remove-inprogress-files
> issue: https://github.com/tarantool/tarantool/issues/3406
List of changes between v1 and v2 is missing, although I asked you to
add one:
On Mon, Jun 18, 2018 at 12:44:07PM +0300, Vladimir Davydov wrote:
> When you resubmit a new version of your patch, please don't forget to
> append the version to the subject prefix (e.g. PATCH v2), and write a
> brief change log, e.g. see
>
> https://www.freelists.org/post/tarantool-patches/PATCH-v2-box-make-gc-info-public
Don't forget it next time.
>
> src/box/memtx_engine.c | 2 +-
> src/box/vinyl.c | 1 +
> src/box/vy_log.c | 4 ++
> src/box/vy_run.c | 17 +++++
> src/box/xlog.c | 38 +++++++++++
> src/box/xlog.h | 6 ++
> src/errinj.h | 3 +
> test/box/errinj.result | 168 +++++++++++++++++++++++++++++++++++++++++++++++
> test/box/errinj.test.lua | 63 ++++++++++++++++++
> 9 files changed, 301 insertions(+), 1 deletion(-)
>
> diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
> index 9a9aff5..3ab60f0 100644
> --- a/src/box/memtx_engine.c
> +++ b/src/box/memtx_engine.c
> @@ -303,7 +303,7 @@ memtx_engine_end_recovery(struct engine *engine)
> if (space_foreach(memtx_build_secondary_keys, memtx) != 0)
> return -1;
> }
> - return 0;
> + return xdir_collect_inprogress(memtx->snap_dir.dirname);
No need to abort recovery if an inprogress file can't be deleted, a
warning in the log will do. Garbage collection should be best-effort.
> }
>
> static struct space *
> diff --git a/src/box/vinyl.c b/src/box/vinyl.c
> index 552d42b..69646b6 100644
> --- a/src/box/vinyl.c
> +++ b/src/box/vinyl.c
> @@ -2941,6 +2941,7 @@ vinyl_engine_end_recovery(struct engine *engine)
> case VINYL_FINAL_RECOVERY_LOCAL:
> if (vy_log_end_recovery() != 0)
> return -1;
> + xdir_collect_inprogress(e->path);
Removal of vylog files should be done from vy_log_end_recovery()
(for the sake of encapsulation).
> /*
> * If the instance is shut down while a dump or
> * compaction task is in progress, we'll get an
> diff --git a/src/box/vy_log.c b/src/box/vy_log.c
> index 51e3753..3519bde 100644
> --- a/src/box/vy_log.c
> +++ b/src/box/vy_log.c
> @@ -848,6 +848,10 @@ vy_log_open(struct xlog *xlog)
> struct vy_log_record record;
> vy_log_record_init(&record);
> record.type = VY_LOG_SNAPSHOT;
> + ERROR_INJECT(ERRINJ_VY_LOG_RENAME, {
> + diag_set(ClientError, ER_INJECTION, "vy_log open");
> + return -1;
> + });
The error injection should go right before xlog_rename(), and it should
close the xlog before returning from the function.
Also, be consistent with other error messages used on injection:
ERRINJ_VY_READ_PAGE "vinyl page read"
ERRINJ_VY_INDEX_DUMP "vinyl index dump"
ERRINJ_VY_LOG_FLUSH "vinyl log flush"
So for ERRINJ_VY_LOG_RENAME it should be "vinyl log rename".
> if (vy_log_record_encode(&record, &row) < 0 ||
> xlog_write_row(xlog, &row) < 0)
> goto fail_close_xlog;
> diff --git a/src/box/vy_run.c b/src/box/vy_run.c
> index 980bc4d..1dda93d 100644
> --- a/src/box/vy_run.c
> +++ b/src/box/vy_run.c
> @@ -2021,6 +2021,14 @@ vy_run_write_index(struct vy_run *run, const char *dirpath,
> goto fail;
> }
>
> + ERROR_INJECT(ERRINJ_VY_INDEX_FILE_RENAME, {
> + region_truncate(region, mem_used);
> + xlog_tx_rollback(&index_xlog);
> + xlog_close(&index_xlog, false);
> + diag_set(ClientError, ER_INJECTION,
> + "vinyl index write");
> + return -1;
> + });
Fix the error message as per above and use labels to remove this
copy&paste.
> if (xlog_tx_commit(&index_xlog) < 0 ||
> xlog_flush(&index_xlog) < 0 ||
> xlog_rename(&index_xlog) < 0)
> @@ -2263,6 +2271,12 @@ vy_run_writer_commit(struct vy_run_writer *writer)
> run->info.max_key = vy_key_dup(key);
> if (run->info.max_key == NULL)
> goto out;
> + ERROR_INJECT(ERRINJ_VY_RUN_RENAME, {
> + region_truncate(&fiber()->gc, region_svp);
> + diag_set(ClientError, ER_INJECTION,
> + "vinyl writer commit");
> + return -1;
> + });
Ditto.
>
> /* Sync data and link the file to the final name. */
> if (xlog_sync(&writer->data_xlog) < 0 ||
> @@ -2439,6 +2453,9 @@ vy_run_remove_files(const char *dir, uint32_t space_id,
> (long long)run_id); return -1;});
> int ret = 0;
> char path[PATH_MAX];
> + vy_index_snprint_path(path, PATH_MAX, dir, space_id, iid);
> + if (xdir_collect_inprogress(path) < 0)
> + return -1;
What's the point of using xdir_collect_inprogress here? It scans the
directory which is an overkill, as file names are already known. Besides
it blocks the tx thread. You have a loop removing files below. Simply
make it remove inprogress files as well.
> for (int type = 0; type < vy_file_MAX; type++) {
> vy_run_snprint_path(path, sizeof(path), dir,
> space_id, iid, run_id, type);
> diff --git a/src/box/xlog.c b/src/box/xlog.c
> index 824ad11..b5cc494 100644
> --- a/src/box/xlog.c
> +++ b/src/box/xlog.c
> @@ -621,6 +621,44 @@ xdir_collect_garbage(struct xdir *dir, int64_t signature, bool use_coio)
> return 0;
> }
>
> +int
> +xdir_collect_inprogress(const char *dirname)
Judging by its signature, this function should take an xdir object.
> +{
> + DIR *dh = opendir(dirname);
> + if (dh == NULL) {
> + if (errno == ENOENT)
> + return 0;
> + diag_set(SystemError, "error reading directory '%s'",
> + dirname);
> + return -1;
> + }
> +
> + struct dirent *dent;
> + char path[PATH_MAX];
> + int total = snprintf(path, sizeof(path), "%s/", dirname);
> + if (total < 0) {
snprintf never fails.
> + say_syserror("error snprintf %s", dirname);
> + diag_set(SystemError, "error snprintf'",
> + path);
> + return -1;
> + }
> + while ((dent = readdir(dh)) != NULL) {
> + char *ext = strrchr(dent->d_name, '.');
> + if (ext == NULL || strcmp(ext, inprogress_suffix) != 0)
> + continue;
> + strcpy(path + total, dent->d_name);
> + say_info("removing %s", path);
> + int rc = coio_unlink(path);
readdir is blocking so why use coio_unlink?
> + if (rc < 0) {
> + say_syserror("error while removing %s", path);
> + diag_set(SystemError, "failed to unlink file '%s'",
> + path);
> + return -1;
> + }
> + }
> + return 0;
> +}
> +
> /* }}} */
>
>
> diff --git a/src/box/xlog.h b/src/box/xlog.h
> index 973910d..4faa763 100644
> --- a/src/box/xlog.h
> +++ b/src/box/xlog.h
> @@ -181,6 +181,12 @@ int
> xdir_collect_garbage(struct xdir *dir, int64_t signature, bool use_coio);
>
> /**
> + * Remove .inprogress files in specified directory.
> + */
> +int
> +xdir_collect_inprogress(const char *dirname);
> +
> +/**
> * Return LSN and vclock (unless @vclock is NULL) of the newest
> * file in a directory or -1 if the directory is empty.
> */
> diff --git a/src/errinj.h b/src/errinj.h
> index 78514ac..8444479 100644
> --- a/src/errinj.h
> +++ b/src/errinj.h
> @@ -111,6 +111,9 @@ struct errinj {
> _(ERRINJ_IPROTO_TX_DELAY, ERRINJ_BOOL, {.bparam = false}) \
> _(ERRINJ_LOG_ROTATE, ERRINJ_BOOL, {.bparam = false}) \
> _(ERRINJ_SNAP_COMMIT_DELAY, ERRINJ_BOOL, {.bparam = 0}) \
> + _(ERRINJ_VY_LOG_RENAME, ERRINJ_BOOL, {.bparam = false}) \
> + _(ERRINJ_VY_INDEX_FILE_RENAME, ERRINJ_BOOL, {.bparam = false}) \
> + _(ERRINJ_VY_RUN_RENAME, ERRINJ_BOOL, {.bparam = false}) \
Error injection names are inconsistent (VY_INDEX_FILE_RENAME vs
VY_RUN_RENAME). Fix it.
More information about the Tarantool-patches
mailing list