[Tarantool-patches] [PATCH v2 2/3] Fix test box-tap/cfg.test.lua on openSuSE
Alexander V. Tikhonov
avtikhon at tarantool.org
Fri Aug 14 15:42:10 MSK 2020
Hi Alexander, thanks a lot for a deep investigation. I've used your
patch and rewrote the commit message. Also I've made there all your
suggestions.
On Fri, Aug 14, 2020 at 05:44:42AM +0300, Alexander Turenko wrote:
> > Fix test box-tap/cfg.test.lua on openSuSE
> >
> > Found that test fails on its subtest checking:
> > "gh-2872 bootstrap is aborted if vinyl_dir
> > contains vylog files left from previous runs"
> >
> > Debugging src/box/xlog.c found that all checks
> > were correct, but at function:
> > src/box/vy_log.c:vy_log_bootstrap()
> > the check on of the errno on ENOENT blocked the
> > negative return from function:
> > src/box/xlog.c:xdir_scan()
> >
> > Found that errno was already set to ENOENT before
> > the xdir_scan() call. To fix the issue the errno
> > must be clean before the call to xdir_scan, because
> > we are interesting in it only from this function.
> > The same issue fixed at function:
> > src/box/vy_log.c:vy_log_begin_recovery()
> >
> > Closes #4594
> > Needed for #4562
>
> My wish is to find the following information in the commit message:
>
> - The header: the subsystem prefix (vinyl) and what is fixed (vinyl_dir
> existence check at bootstrap).
> - What was wrong and becomes right (high-level or from a user
> perspective): when memtx_dir is not exists, but vinyl_dir exists and
> errno is set to ENOENT, box configuration succeeds, however it is
> should not.
> - What is the reason of the wrong behaviour: not all failure paths in
> xdir_scan() are set errno, but the caller assumes it.
> - From where the problem appears: openSUSE and box-tap/cfg.test.lua.
>
> > ---
> > src/box/vy_log.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
>
> > diff --git a/src/box/vy_log.c b/src/box/vy_log.c
> > index 311985c72..86599fd15 100644
> > --- a/src/box/vy_log.c
> > +++ b/src/box/vy_log.c
> > @@ -1014,6 +1014,7 @@ vy_log_rebootstrap(void)
> > int
> > vy_log_bootstrap(void)
> > {
> > + errno = 0;
> > if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT)
> > return -1;
>
> You spotted the problem right.
>
> A bit more context:
>
> Usual C convention is to report success or failure using a return
> value and set errno at any error. So a caller usually just checks a
> return value and if it means a failure (usually -1), it checks errno
> to determine an exact reason.
>
> Usual convention in tarantool is a bit different: we use a special
> diagnostics area to report a reason of a failure.
>
> Not all failure paths of xdir_scan() sets errno (including our
> 'invalid instance UUID' case), so we cannot be sure that errno is
> not remains unchanged after a failure of the function.
>
> So, again, your fix is correct.
>
> However the approach with checking errno against ENOENT (No such file or
> directory) is not good (with or without your patch). For example:
>
> - What if xdir_scan() would be changed in future and, say, some call
> will rewrite errno after the opendir() call?
> - What if some other call inside xdir_scan() will set ENOENT: say,
> open() in xdir_open_cursor() due to some race?
>
> We lean on implementation details of the callee, not its contract. I
> think this way is too fragile and we should either:
>
> - check whether the directory exists before xdir_scan() call
> - or pass a flag to xdir_scan() whether the directory should exist.
>
> The latter looks better for me: it does not lead to code duplication.
> See the proposed diff below the email.
>
> BTW, there is the simple way to test the problem on any OS:
>
> | diff --git a/test/box-tap/cfg.test.lua b/test/box-tap/cfg.test.lua
> | index 569b5f463..2eda2f3c2 100755
> | --- a/test/box-tap/cfg.test.lua
> | +++ b/test/box-tap/cfg.test.lua
> | @@ -392,6 +392,8 @@ s:create_index('pk')
> | os.exit(0)
> | ]], vinyl_dir))
> | code = string.format([[
> | +local errno = require('errno')
> | +errno(errno.ENOENT)
> | box.cfg{vinyl_dir = '%s'}
> | os.exit(0)
> | ]], vinyl_dir)
>
> Of course, we should not add it right to this test case, let's copy it
> somewhere and add separately.
>
> ----
>
> diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
> index dfd6fce6e..9f079a6b5 100644
> --- a/src/box/memtx_engine.c
> +++ b/src/box/memtx_engine.c
> @@ -992,7 +992,7 @@ memtx_engine_new(const char *snap_dirname, bool force_recovery,
> &xlog_opts_default);
> memtx->snap_dir.force_recovery = force_recovery;
>
> - if (xdir_scan(&memtx->snap_dir) != 0)
> + if (xdir_scan(&memtx->snap_dir, true) != 0)
> goto fail;
>
> /*
> diff --git a/src/box/recovery.cc b/src/box/recovery.cc
> index d1a503cfc..cd33e7635 100644
> --- a/src/box/recovery.cc
> +++ b/src/box/recovery.cc
> @@ -121,7 +121,7 @@ void
> recovery_scan(struct recovery *r, struct vclock *end_vclock,
> struct vclock *gc_vclock)
> {
> - xdir_scan_xc(&r->wal_dir);
> + xdir_scan_xc(&r->wal_dir, true);
>
> if (xdir_last_vclock(&r->wal_dir, end_vclock) < 0 ||
> vclock_compare(end_vclock, &r->vclock) < 0) {
> @@ -307,7 +307,7 @@ recover_remaining_wals(struct recovery *r, struct xstream *stream,
> struct vclock *clock;
>
> if (scan_dir)
> - xdir_scan_xc(&r->wal_dir);
> + xdir_scan_xc(&r->wal_dir, true);
>
> if (xlog_cursor_is_open(&r->cursor)) {
> /* If there's a WAL open, recover from it first. */
> diff --git a/src/box/vy_log.c b/src/box/vy_log.c
> index 311985c72..da3c50e87 100644
> --- a/src/box/vy_log.c
> +++ b/src/box/vy_log.c
> @@ -1014,7 +1014,7 @@ vy_log_rebootstrap(void)
> int
> vy_log_bootstrap(void)
> {
> - if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT)
> + if (xdir_scan(&vy_log.dir, false) < 0)
> return -1;
> if (xdir_last_vclock(&vy_log.dir, &vy_log.last_checkpoint) >= 0)
> return vy_log_rebootstrap();
> @@ -1036,7 +1036,7 @@ vy_log_begin_recovery(const struct vclock *vclock)
> * because vinyl might not be even in use. Complain only
> * on an attempt to write a vylog.
> */
> - if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT)
> + if (xdir_scan(&vy_log.dir, false) < 0)
> return NULL;
>
> if (xdir_last_vclock(&vy_log.dir, &vy_log.last_checkpoint) < 0) {
> diff --git a/src/box/wal.c b/src/box/wal.c
> index d8c92aa36..2b894d680 100644
> --- a/src/box/wal.c
> +++ b/src/box/wal.c
> @@ -559,7 +559,7 @@ wal_enable(void)
> * existing WAL files. Required for garbage collection,
> * see wal_collect_garbage().
> */
> - if (xdir_scan(&writer->wal_dir))
> + if (xdir_scan(&writer->wal_dir, true))
> return -1;
>
> /* Open the most recent WAL file. */
> diff --git a/src/box/xlog.c b/src/box/xlog.c
> index 6ccd3d68d..74f761994 100644
> --- a/src/box/xlog.c
> +++ b/src/box/xlog.c
> @@ -511,13 +511,15 @@ xdir_open_cursor(struct xdir *dir, int64_t signature,
> * @return nothing.
> */
> int
> -xdir_scan(struct xdir *dir)
> +xdir_scan(struct xdir *dir, bool is_dir_required)
> {
> DIR *dh = opendir(dir->dirname); /* log dir */
> int64_t *signatures = NULL; /* log file names */
> size_t s_count = 0, s_capacity = 0;
>
> if (dh == NULL) {
> + if (!is_dir_required && errno == ENOENT)
> + return 0;
> diag_set(SystemError, "error reading directory '%s'",
> dir->dirname);
> return -1;
> diff --git a/src/box/xlog.h b/src/box/xlog.h
> index 9ffce598b..3400eb75f 100644
> --- a/src/box/xlog.h
> +++ b/src/box/xlog.h
> @@ -187,7 +187,7 @@ xdir_destroy(struct xdir *dir);
> * snapshot or scan through all logs.
> */
> int
> -xdir_scan(struct xdir *dir);
> +xdir_scan(struct xdir *dir, bool is_dir_required);
>
> /**
> * Check that a directory exists and is writable.
> @@ -821,9 +821,9 @@ xdir_open_cursor(struct xdir *dir, int64_t signature,
> #include "exception.h"
>
> static inline void
> -xdir_scan_xc(struct xdir *dir)
> +xdir_scan_xc(struct xdir *dir, bool is_dir_required)
> {
> - if (xdir_scan(dir) == -1)
> + if (xdir_scan(dir, is_dir_required) == -1)
> diag_raise();
> }
>
More information about the Tarantool-patches
mailing list