[Tarantool-patches] [PATCH v2 2/3] Fix test box-tap/cfg.test.lua on openSuSE

Alexander Turenko alexander.turenko at tarantool.org
Fri Aug 14 05:44:42 MSK 2020


> Fix test box-tap/cfg.test.lua on openSuSE
>
> Found that test fails on its subtest checking:
>   "gh-2872 bootstrap is aborted if vinyl_dir
>    contains vylog files left from previous runs"
> 
> Debugging src/box/xlog.c found that all checks
> were correct, but at function:
>   src/box/vy_log.c:vy_log_bootstrap()
> the check on of the errno on ENOENT blocked the
> negative return from function:
>   src/box/xlog.c:xdir_scan()
> 
> Found that errno was already set to ENOENT before
> the xdir_scan() call. To fix the issue the errno
> must be clean before the call to xdir_scan, because
> we are interesting in it only from this function.
> The same issue fixed at function:
>   src/box/vy_log.c:vy_log_begin_recovery()
> 
> Closes #4594
> Needed for #4562

My wish is to find the following information in the commit message:

- The header: the subsystem prefix (vinyl) and what is fixed (vinyl_dir
  existence check at bootstrap).
- What was wrong and becomes right (high-level or from a user
  perspective): when memtx_dir is not exists, but vinyl_dir exists and
  errno is set to ENOENT, box configuration succeeds, however it is
  should not.
- What is the reason of the wrong behaviour: not all failure paths in
  xdir_scan() are set errno, but the caller assumes it.
- From where the problem appears: openSUSE and box-tap/cfg.test.lua.

> ---
>  src/box/vy_log.c | 2 ++
>  1 file changed, 2 insertions(+)
> 

> diff --git a/src/box/vy_log.c b/src/box/vy_log.c
> index 311985c72..86599fd15 100644
> --- a/src/box/vy_log.c
> +++ b/src/box/vy_log.c
> @@ -1014,6 +1014,7 @@ vy_log_rebootstrap(void)
>  int
>  vy_log_bootstrap(void)
>  {
> +	errno = 0;
>  	if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT)
>  		return -1;

You spotted the problem right.

A bit more context:

    Usual C convention is to report success or failure using a return
    value and set errno at any error. So a caller usually just checks a
    return value and if it means a failure (usually -1), it checks errno
    to determine an exact reason.

    Usual convention in tarantool is a bit different: we use a special
    diagnostics area to report a reason of a failure.

    Not all failure paths of xdir_scan() sets errno (including our
    'invalid instance UUID' case), so we cannot be sure that errno is
    not remains unchanged after a failure of the function.

So, again, your fix is correct.

However the approach with checking errno against ENOENT (No such file or
directory) is not good (with or without your patch). For example:

- What if xdir_scan() would be changed in future and, say, some call
  will rewrite errno after the opendir() call?
- What if some other call inside xdir_scan() will set ENOENT: say,
  open() in xdir_open_cursor() due to some race?

We lean on implementation details of the callee, not its contract.  I
think this way is too fragile and we should either:

- check whether the directory exists before xdir_scan() call
- or pass a flag to xdir_scan() whether the directory should exist.

The latter looks better for me: it does not lead to code duplication.
See the proposed diff below the email.

BTW, there is the simple way to test the problem on any OS:

 | diff --git a/test/box-tap/cfg.test.lua b/test/box-tap/cfg.test.lua
 | index 569b5f463..2eda2f3c2 100755
 | --- a/test/box-tap/cfg.test.lua
 | +++ b/test/box-tap/cfg.test.lua
 | @@ -392,6 +392,8 @@ s:create_index('pk')
 |  os.exit(0)
 |  ]], vinyl_dir))
 |  code = string.format([[
 | +local errno = require('errno')
 | +errno(errno.ENOENT)
 |  box.cfg{vinyl_dir = '%s'}
 |  os.exit(0)
 |  ]], vinyl_dir)

Of course, we should not add it right to this test case, let's copy it
somewhere and add separately.

----

diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index dfd6fce6e..9f079a6b5 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -992,7 +992,7 @@ memtx_engine_new(const char *snap_dirname, bool force_recovery,
 		    &xlog_opts_default);
 	memtx->snap_dir.force_recovery = force_recovery;
 
-	if (xdir_scan(&memtx->snap_dir) != 0)
+	if (xdir_scan(&memtx->snap_dir, true) != 0)
 		goto fail;
 
 	/*
diff --git a/src/box/recovery.cc b/src/box/recovery.cc
index d1a503cfc..cd33e7635 100644
--- a/src/box/recovery.cc
+++ b/src/box/recovery.cc
@@ -121,7 +121,7 @@ void
 recovery_scan(struct recovery *r, struct vclock *end_vclock,
 	      struct vclock *gc_vclock)
 {
-	xdir_scan_xc(&r->wal_dir);
+	xdir_scan_xc(&r->wal_dir, true);
 
 	if (xdir_last_vclock(&r->wal_dir, end_vclock) < 0 ||
 	    vclock_compare(end_vclock, &r->vclock) < 0) {
@@ -307,7 +307,7 @@ recover_remaining_wals(struct recovery *r, struct xstream *stream,
 	struct vclock *clock;
 
 	if (scan_dir)
-		xdir_scan_xc(&r->wal_dir);
+		xdir_scan_xc(&r->wal_dir, true);
 
 	if (xlog_cursor_is_open(&r->cursor)) {
 		/* If there's a WAL open, recover from it first. */
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index 311985c72..da3c50e87 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -1014,7 +1014,7 @@ vy_log_rebootstrap(void)
 int
 vy_log_bootstrap(void)
 {
-	if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT)
+	if (xdir_scan(&vy_log.dir, false) < 0)
 		return -1;
 	if (xdir_last_vclock(&vy_log.dir, &vy_log.last_checkpoint) >= 0)
 		return vy_log_rebootstrap();
@@ -1036,7 +1036,7 @@ vy_log_begin_recovery(const struct vclock *vclock)
 	 * because vinyl might not be even in use. Complain only
 	 * on an attempt to write a vylog.
 	 */
-	if (xdir_scan(&vy_log.dir) < 0 && errno != ENOENT)
+	if (xdir_scan(&vy_log.dir, false) < 0)
 		return NULL;
 
 	if (xdir_last_vclock(&vy_log.dir, &vy_log.last_checkpoint) < 0) {
diff --git a/src/box/wal.c b/src/box/wal.c
index d8c92aa36..2b894d680 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -559,7 +559,7 @@ wal_enable(void)
 	 * existing WAL files. Required for garbage collection,
 	 * see wal_collect_garbage().
 	 */
-	if (xdir_scan(&writer->wal_dir))
+	if (xdir_scan(&writer->wal_dir, true))
 		return -1;
 
 	/* Open the most recent WAL file. */
diff --git a/src/box/xlog.c b/src/box/xlog.c
index 6ccd3d68d..74f761994 100644
--- a/src/box/xlog.c
+++ b/src/box/xlog.c
@@ -511,13 +511,15 @@ xdir_open_cursor(struct xdir *dir, int64_t signature,
  * @return nothing.
  */
 int
-xdir_scan(struct xdir *dir)
+xdir_scan(struct xdir *dir, bool is_dir_required)
 {
 	DIR *dh = opendir(dir->dirname);        /* log dir */
 	int64_t *signatures = NULL;             /* log file names */
 	size_t s_count = 0, s_capacity = 0;
 
 	if (dh == NULL) {
+		if (!is_dir_required && errno == ENOENT)
+			return 0;
 		diag_set(SystemError, "error reading directory '%s'",
 			  dir->dirname);
 		return -1;
diff --git a/src/box/xlog.h b/src/box/xlog.h
index 9ffce598b..3400eb75f 100644
--- a/src/box/xlog.h
+++ b/src/box/xlog.h
@@ -187,7 +187,7 @@ xdir_destroy(struct xdir *dir);
  * snapshot or scan through all logs.
  */
 int
-xdir_scan(struct xdir *dir);
+xdir_scan(struct xdir *dir, bool is_dir_required);
 
 /**
  * Check that a directory exists and is writable.
@@ -821,9 +821,9 @@ xdir_open_cursor(struct xdir *dir, int64_t signature,
 #include "exception.h"
 
 static inline void
-xdir_scan_xc(struct xdir *dir)
+xdir_scan_xc(struct xdir *dir, bool is_dir_required)
 {
-	if (xdir_scan(dir) == -1)
+	if (xdir_scan(dir, is_dir_required) == -1)
 		diag_raise();
 }



More information about the Tarantool-patches mailing list