From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 19 Jun 2018 14:50:43 +0300 From: Vladimir Davydov Subject: Re: [PATCH v2] xlog: Remove inprogress files on start Message-ID: <20180619115043.u7ptoxanydzywg75@esperanza> References: <20180618094407.ue6ll2yazn4dsgq5@esperanza> <1529339014-22688-1-git-send-email-imarkov@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1529339014-22688-1-git-send-email-imarkov@tarantool.org> To: Ilya Markov Cc: georgy@tarantool.org, tarantool-patches@freelists.org List-ID: On Mon, Jun 18, 2018 at 07:23:34PM +0300, Ilya Markov wrote: > When tarantool crashes during writing to vylog, index, run or snapshot, > inprogress files remain. But garbage collector doesn't take into account > these files. So they remain until they are removed manually. > > Fix this with adding deletion of such files to functions memtx_engine_end_recovery, > vinyl_engine_end_recovery and vy_run_remove_files. > > Add 3 errinj which simulate the crash before renaming inprogress files. > > Closes #3406 > --- > branch: https://github.com/tarantool/tarantool/tree/gh-3406-remove-inprogress-files > issue: https://github.com/tarantool/tarantool/issues/3406 List of changes between v1 and v2 is missing, although I asked you to add one: On Mon, Jun 18, 2018 at 12:44:07PM +0300, Vladimir Davydov wrote: > When you resubmit a new version of your patch, please don't forget to > append the version to the subject prefix (e.g. PATCH v2), and write a > brief change log, e.g. see > > https://www.freelists.org/post/tarantool-patches/PATCH-v2-box-make-gc-info-public Don't forget it next time. > > src/box/memtx_engine.c | 2 +- > src/box/vinyl.c | 1 + > src/box/vy_log.c | 4 ++ > src/box/vy_run.c | 17 +++++ > src/box/xlog.c | 38 +++++++++++ > src/box/xlog.h | 6 ++ > src/errinj.h | 3 + > test/box/errinj.result | 168 +++++++++++++++++++++++++++++++++++++++++++++++ > test/box/errinj.test.lua | 63 ++++++++++++++++++ > 9 files changed, 301 insertions(+), 1 deletion(-) > > diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c > index 9a9aff5..3ab60f0 100644 > --- a/src/box/memtx_engine.c > +++ b/src/box/memtx_engine.c > @@ -303,7 +303,7 @@ memtx_engine_end_recovery(struct engine *engine) > if (space_foreach(memtx_build_secondary_keys, memtx) != 0) > return -1; > } > - return 0; > + return xdir_collect_inprogress(memtx->snap_dir.dirname); No need to abort recovery if an inprogress file can't be deleted, a warning in the log will do. Garbage collection should be best-effort. > } > > static struct space * > diff --git a/src/box/vinyl.c b/src/box/vinyl.c > index 552d42b..69646b6 100644 > --- a/src/box/vinyl.c > +++ b/src/box/vinyl.c > @@ -2941,6 +2941,7 @@ vinyl_engine_end_recovery(struct engine *engine) > case VINYL_FINAL_RECOVERY_LOCAL: > if (vy_log_end_recovery() != 0) > return -1; > + xdir_collect_inprogress(e->path); Removal of vylog files should be done from vy_log_end_recovery() (for the sake of encapsulation). > /* > * If the instance is shut down while a dump or > * compaction task is in progress, we'll get an > diff --git a/src/box/vy_log.c b/src/box/vy_log.c > index 51e3753..3519bde 100644 > --- a/src/box/vy_log.c > +++ b/src/box/vy_log.c > @@ -848,6 +848,10 @@ vy_log_open(struct xlog *xlog) > struct vy_log_record record; > vy_log_record_init(&record); > record.type = VY_LOG_SNAPSHOT; > + ERROR_INJECT(ERRINJ_VY_LOG_RENAME, { > + diag_set(ClientError, ER_INJECTION, "vy_log open"); > + return -1; > + }); The error injection should go right before xlog_rename(), and it should close the xlog before returning from the function. Also, be consistent with other error messages used on injection: ERRINJ_VY_READ_PAGE "vinyl page read" ERRINJ_VY_INDEX_DUMP "vinyl index dump" ERRINJ_VY_LOG_FLUSH "vinyl log flush" So for ERRINJ_VY_LOG_RENAME it should be "vinyl log rename". > if (vy_log_record_encode(&record, &row) < 0 || > xlog_write_row(xlog, &row) < 0) > goto fail_close_xlog; > diff --git a/src/box/vy_run.c b/src/box/vy_run.c > index 980bc4d..1dda93d 100644 > --- a/src/box/vy_run.c > +++ b/src/box/vy_run.c > @@ -2021,6 +2021,14 @@ vy_run_write_index(struct vy_run *run, const char *dirpath, > goto fail; > } > > + ERROR_INJECT(ERRINJ_VY_INDEX_FILE_RENAME, { > + region_truncate(region, mem_used); > + xlog_tx_rollback(&index_xlog); > + xlog_close(&index_xlog, false); > + diag_set(ClientError, ER_INJECTION, > + "vinyl index write"); > + return -1; > + }); Fix the error message as per above and use labels to remove this copy&paste. > if (xlog_tx_commit(&index_xlog) < 0 || > xlog_flush(&index_xlog) < 0 || > xlog_rename(&index_xlog) < 0) > @@ -2263,6 +2271,12 @@ vy_run_writer_commit(struct vy_run_writer *writer) > run->info.max_key = vy_key_dup(key); > if (run->info.max_key == NULL) > goto out; > + ERROR_INJECT(ERRINJ_VY_RUN_RENAME, { > + region_truncate(&fiber()->gc, region_svp); > + diag_set(ClientError, ER_INJECTION, > + "vinyl writer commit"); > + return -1; > + }); Ditto. > > /* Sync data and link the file to the final name. */ > if (xlog_sync(&writer->data_xlog) < 0 || > @@ -2439,6 +2453,9 @@ vy_run_remove_files(const char *dir, uint32_t space_id, > (long long)run_id); return -1;}); > int ret = 0; > char path[PATH_MAX]; > + vy_index_snprint_path(path, PATH_MAX, dir, space_id, iid); > + if (xdir_collect_inprogress(path) < 0) > + return -1; What's the point of using xdir_collect_inprogress here? It scans the directory which is an overkill, as file names are already known. Besides it blocks the tx thread. You have a loop removing files below. Simply make it remove inprogress files as well. > for (int type = 0; type < vy_file_MAX; type++) { > vy_run_snprint_path(path, sizeof(path), dir, > space_id, iid, run_id, type); > diff --git a/src/box/xlog.c b/src/box/xlog.c > index 824ad11..b5cc494 100644 > --- a/src/box/xlog.c > +++ b/src/box/xlog.c > @@ -621,6 +621,44 @@ xdir_collect_garbage(struct xdir *dir, int64_t signature, bool use_coio) > return 0; > } > > +int > +xdir_collect_inprogress(const char *dirname) Judging by its signature, this function should take an xdir object. > +{ > + DIR *dh = opendir(dirname); > + if (dh == NULL) { > + if (errno == ENOENT) > + return 0; > + diag_set(SystemError, "error reading directory '%s'", > + dirname); > + return -1; > + } > + > + struct dirent *dent; > + char path[PATH_MAX]; > + int total = snprintf(path, sizeof(path), "%s/", dirname); > + if (total < 0) { snprintf never fails. > + say_syserror("error snprintf %s", dirname); > + diag_set(SystemError, "error snprintf'", > + path); > + return -1; > + } > + while ((dent = readdir(dh)) != NULL) { > + char *ext = strrchr(dent->d_name, '.'); > + if (ext == NULL || strcmp(ext, inprogress_suffix) != 0) > + continue; > + strcpy(path + total, dent->d_name); > + say_info("removing %s", path); > + int rc = coio_unlink(path); readdir is blocking so why use coio_unlink? > + if (rc < 0) { > + say_syserror("error while removing %s", path); > + diag_set(SystemError, "failed to unlink file '%s'", > + path); > + return -1; > + } > + } > + return 0; > +} > + > /* }}} */ > > > diff --git a/src/box/xlog.h b/src/box/xlog.h > index 973910d..4faa763 100644 > --- a/src/box/xlog.h > +++ b/src/box/xlog.h > @@ -181,6 +181,12 @@ int > xdir_collect_garbage(struct xdir *dir, int64_t signature, bool use_coio); > > /** > + * Remove .inprogress files in specified directory. > + */ > +int > +xdir_collect_inprogress(const char *dirname); > + > +/** > * Return LSN and vclock (unless @vclock is NULL) of the newest > * file in a directory or -1 if the directory is empty. > */ > diff --git a/src/errinj.h b/src/errinj.h > index 78514ac..8444479 100644 > --- a/src/errinj.h > +++ b/src/errinj.h > @@ -111,6 +111,9 @@ struct errinj { > _(ERRINJ_IPROTO_TX_DELAY, ERRINJ_BOOL, {.bparam = false}) \ > _(ERRINJ_LOG_ROTATE, ERRINJ_BOOL, {.bparam = false}) \ > _(ERRINJ_SNAP_COMMIT_DELAY, ERRINJ_BOOL, {.bparam = 0}) \ > + _(ERRINJ_VY_LOG_RENAME, ERRINJ_BOOL, {.bparam = false}) \ > + _(ERRINJ_VY_INDEX_FILE_RENAME, ERRINJ_BOOL, {.bparam = false}) \ > + _(ERRINJ_VY_RUN_RENAME, ERRINJ_BOOL, {.bparam = false}) \ Error injection names are inconsistent (VY_INDEX_FILE_RENAME vs VY_RUN_RENAME). Fix it.