From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp44.i.mail.ru (smtp44.i.mail.ru [94.100.177.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 12471430408 for ; Fri, 14 Aug 2020 15:42:12 +0300 (MSK) Date: Fri, 14 Aug 2020 15:42:10 +0300 From: "Alexander V. Tikhonov" Message-ID: <20200814124210.GA4609@hpalx> References: <99ba534fc775257849ba5b3d9b6bdd5711c817cb.1594126705.git.avtikhon@tarantool.org> <20200814024442.4lgthzt576txghch@tkn_work_nb> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200814024442.4lgthzt576txghch@tkn_work_nb> Subject: Re: [Tarantool-patches] [PATCH v2 2/3] Fix test box-tap/cfg.test.lua on openSuSE List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Turenko Cc: tarantool-patches@dev.tarantool.org Hi Alexander, thanks a lot for a deep investigation. I've used your patch and rewrote the commit message. Also I've made there all your suggestions. On Fri, Aug 14, 2020 at 05:44:42AM +0300, Alexander Turenko wrote: > > Fix test box-tap/cfg.test.lua on openSuSE > > > > Found that test fails on its subtest checking: > > "gh-2872 bootstrap is aborted if vinyl_dir > > contains vylog files left from previous runs" > > > > Debugging src/box/xlog.c found that all checks > > were correct, but at function: > > src/box/vy_log.c:vy_log_bootstrap() > > the check on of the errno on ENOENT blocked the > > negative return from function: > > src/box/xlog.c:xdir_scan() > > > > Found that errno was already set to ENOENT before > > the xdir_scan() call. To fix the issue the errno > > must be clean before the call to xdir_scan, because > > we are interesting in it only from this function. > > The same issue fixed at function: > > src/box/vy_log.c:vy_log_begin_recovery() > > > > Closes #4594 > > Needed for #4562 > > My wish is to find the following information in the commit message: > > - The header: the subsystem prefix (vinyl) and what is fixed (vinyl_dir > existence check at bootstrap). > - What was wrong and becomes right (high-level or from a user > perspective): when memtx_dir is not exists, but vinyl_dir exists and > errno is set to ENOENT, box configuration succeeds, however it is > should not. > - What is the reason of the wrong behaviour: not all failure paths in > xdir_scan() are set errno, but the caller assumes it. > - From where the problem appears: openSUSE and box-tap/cfg.test.lua. > > > --- > > src/box/vy_log.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > > diff --git a/src/box/vy_log.c b/src/box/vy_log.c > > index 311985c72..86599fd15 100644 > > --- a/src/box/vy_log.c > > +++ b/src/box/vy_log.c > > @@ -1014,6 +1014,7 @@ vy_log_rebootstrap(void) > > int > > vy_log_bootstrap(void) > > { > > + errno = 0; > > if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT) > > return -1; > > You spotted the problem right. > > A bit more context: > > Usual C convention is to report success or failure using a return > value and set errno at any error. So a caller usually just checks a > return value and if it means a failure (usually -1), it checks errno > to determine an exact reason. > > Usual convention in tarantool is a bit different: we use a special > diagnostics area to report a reason of a failure. > > Not all failure paths of xdir_scan() sets errno (including our > 'invalid instance UUID' case), so we cannot be sure that errno is > not remains unchanged after a failure of the function. > > So, again, your fix is correct. > > However the approach with checking errno against ENOENT (No such file or > directory) is not good (with or without your patch). For example: > > - What if xdir_scan() would be changed in future and, say, some call > will rewrite errno after the opendir() call? > - What if some other call inside xdir_scan() will set ENOENT: say, > open() in xdir_open_cursor() due to some race? > > We lean on implementation details of the callee, not its contract. I > think this way is too fragile and we should either: > > - check whether the directory exists before xdir_scan() call > - or pass a flag to xdir_scan() whether the directory should exist. > > The latter looks better for me: it does not lead to code duplication. > See the proposed diff below the email. > > BTW, there is the simple way to test the problem on any OS: > > | diff --git a/test/box-tap/cfg.test.lua b/test/box-tap/cfg.test.lua > | index 569b5f463..2eda2f3c2 100755 > | --- a/test/box-tap/cfg.test.lua > | +++ b/test/box-tap/cfg.test.lua > | @@ -392,6 +392,8 @@ s:create_index('pk') > | os.exit(0) > | ]], vinyl_dir)) > | code = string.format([[ > | +local errno = require('errno') > | +errno(errno.ENOENT) > | box.cfg{vinyl_dir = '%s'} > | os.exit(0) > | ]], vinyl_dir) > > Of course, we should not add it right to this test case, let's copy it > somewhere and add separately. > > ---- > > diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c > index dfd6fce6e..9f079a6b5 100644 > --- a/src/box/memtx_engine.c > +++ b/src/box/memtx_engine.c > @@ -992,7 +992,7 @@ memtx_engine_new(const char *snap_dirname, bool force_recovery, > &xlog_opts_default); > memtx->snap_dir.force_recovery = force_recovery; > > - if (xdir_scan(&memtx->snap_dir) != 0) > + if (xdir_scan(&memtx->snap_dir, true) != 0) > goto fail; > > /* > diff --git a/src/box/recovery.cc b/src/box/recovery.cc > index d1a503cfc..cd33e7635 100644 > --- a/src/box/recovery.cc > +++ b/src/box/recovery.cc > @@ -121,7 +121,7 @@ void > recovery_scan(struct recovery *r, struct vclock *end_vclock, > struct vclock *gc_vclock) > { > - xdir_scan_xc(&r->wal_dir); > + xdir_scan_xc(&r->wal_dir, true); > > if (xdir_last_vclock(&r->wal_dir, end_vclock) < 0 || > vclock_compare(end_vclock, &r->vclock) < 0) { > @@ -307,7 +307,7 @@ recover_remaining_wals(struct recovery *r, struct xstream *stream, > struct vclock *clock; > > if (scan_dir) > - xdir_scan_xc(&r->wal_dir); > + xdir_scan_xc(&r->wal_dir, true); > > if (xlog_cursor_is_open(&r->cursor)) { > /* If there's a WAL open, recover from it first. */ > diff --git a/src/box/vy_log.c b/src/box/vy_log.c > index 311985c72..da3c50e87 100644 > --- a/src/box/vy_log.c > +++ b/src/box/vy_log.c > @@ -1014,7 +1014,7 @@ vy_log_rebootstrap(void) > int > vy_log_bootstrap(void) > { > - if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT) > + if (xdir_scan(&vy_log.dir, false) < 0) > return -1; > if (xdir_last_vclock(&vy_log.dir, &vy_log.last_checkpoint) >= 0) > return vy_log_rebootstrap(); > @@ -1036,7 +1036,7 @@ vy_log_begin_recovery(const struct vclock *vclock) > * because vinyl might not be even in use. Complain only > * on an attempt to write a vylog. > */ > - if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT) > + if (xdir_scan(&vy_log.dir, false) < 0) > return NULL; > > if (xdir_last_vclock(&vy_log.dir, &vy_log.last_checkpoint) < 0) { > diff --git a/src/box/wal.c b/src/box/wal.c > index d8c92aa36..2b894d680 100644 > --- a/src/box/wal.c > +++ b/src/box/wal.c > @@ -559,7 +559,7 @@ wal_enable(void) > * existing WAL files. Required for garbage collection, > * see wal_collect_garbage(). > */ > - if (xdir_scan(&writer->wal_dir)) > + if (xdir_scan(&writer->wal_dir, true)) > return -1; > > /* Open the most recent WAL file. */ > diff --git a/src/box/xlog.c b/src/box/xlog.c > index 6ccd3d68d..74f761994 100644 > --- a/src/box/xlog.c > +++ b/src/box/xlog.c > @@ -511,13 +511,15 @@ xdir_open_cursor(struct xdir *dir, int64_t signature, > * @return nothing. > */ > int > -xdir_scan(struct xdir *dir) > +xdir_scan(struct xdir *dir, bool is_dir_required) > { > DIR *dh = opendir(dir->dirname); /* log dir */ > int64_t *signatures = NULL; /* log file names */ > size_t s_count = 0, s_capacity = 0; > > if (dh == NULL) { > + if (!is_dir_required && errno == ENOENT) > + return 0; > diag_set(SystemError, "error reading directory '%s'", > dir->dirname); > return -1; > diff --git a/src/box/xlog.h b/src/box/xlog.h > index 9ffce598b..3400eb75f 100644 > --- a/src/box/xlog.h > +++ b/src/box/xlog.h > @@ -187,7 +187,7 @@ xdir_destroy(struct xdir *dir); > * snapshot or scan through all logs. > */ > int > -xdir_scan(struct xdir *dir); > +xdir_scan(struct xdir *dir, bool is_dir_required); > > /** > * Check that a directory exists and is writable. > @@ -821,9 +821,9 @@ xdir_open_cursor(struct xdir *dir, int64_t signature, > #include "exception.h" > > static inline void > -xdir_scan_xc(struct xdir *dir) > +xdir_scan_xc(struct xdir *dir, bool is_dir_required) > { > - if (xdir_scan(dir) == -1) > + if (xdir_scan(dir, is_dir_required) == -1) > diag_raise(); > } >