[Tarantool-patches] [PATCH v5 1/2] vinyl: fix check vinyl_dir existence at bootstrap

Alexander Turenko alexander.turenko at tarantool.org
Wed Aug 19 00:10:14 MSK 2020


Thanks for the update! Now we can discuss the message.

I left comments, where something confuses me in the message and the
test.

WBR, Alexander Turenko.

On Mon, Aug 17, 2020 at 08:29:14AM +0300, Alexander V. Tikhonov wrote:
> During implementation of openSUSE got failed box-tap/cfg.test.lua test.

Implementation of openSUSE... build?

> Found that when memtx_dir didn't exist and vinyl_dir existed and also
> errno was set to ENOENT, box configuration succeeded, but it shouldn't.
> Reason of this wrong behaviour was that not all of the failure paths in
> xdir_scan() set errno, but the caller assumed it.
> 
> Debugging src/box/xlog.c found that all checks were correct, but at:
> 
>   src/box/vy_log.c:vy_log_bootstrap()
>   src/box/vy_log.c:vy_log_begin_recovery()
> 
> the checks on of the errno on ENOENT blocked the negative return from:
> 
>   src/box/xlog.c:xdir_scan()

'blocked negative return' is not clear for me. I guess that you want to
express the following two points:

- A negative return value is not considered as an error when errno is
  set to ENOENT.
- The idea of this check is to handle the situation when vinyl_dir is
  not exists.

I would rephrase it.

> 
> Found that errno was already set to ENOENT before the xdir_scan() call.
> To fix the issue the errno could be clean before the call to xdir_scan,
> because we are interesting in it only from xdir_scan function.

Strictly speaking, the reason is different: because xdir_scan() may
return a negative value and leave errno unchanged.

> 
> After discussions found that there were alternative better solution to
> fix it. The fix with resetting errno was not good because xdir_scan()
> was not system call in real and some internal routines could set it
> to ENOENT itself, so it couldn't be controled from outside of function.

'internal routines could set it to ENOENT itself' -- I would say that
something that is not vinyl_dir existence check.

One more point: I don't see how it may happen except due to a race:

- a file name is obtained from readdir(),
- the file is deleted by another process,
- we open the file and fail with ENOENT.

And I'm not sure the race is possible. I want to say that I'm unable to
state that 'internal routines could set it to ENOENT': I don't know, in
fact. But the implementation can be changed in a future and there is no
guarantee that returning with -1 and ENOENT means only lack of
vinyl_dir.

I mean, I would highlight that the sentence does not assert that the
situation is possible with the current implementation. Just that we have
no corresponding guarantee.

Typo: was not system call -> is not a system call

Typo: controled -> controlled.

> 
> To be sure in behaviour of the changing errno decided to pass a flag to
> xdir_scan() if the directory should exist.

'behaviour of the changing errno' is vague. I guess the idea of the
sentence is that the variant with the flag is better, because (I guess)
it is more explicit and should be less fragile.

> 
> Closes #4594
> Needed for #4562
> 
> Co-authored-by: Alexander Turenko <alexander.turenko at tarantool.org>
> ---
> 
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4562-suse-pack-full-ci
> Issue: https://github.com/tarantool/tarantool/issues/4562

> diff --git a/test/box-tap/gh-4562-errno-at-xdir_scan.test.lua b/test/box-tap/gh-4562-errno-at-xdir_scan.test.lua
> new file mode 100755
> index 000000000..cbf7b1f35
> --- /dev/null
> +++ b/test/box-tap/gh-4562-errno-at-xdir_scan.test.lua
> @@ -0,0 +1,47 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +local test = tap.test('cfg')
> +local fio = require('fio')
> +test:plan(1)
> +
> +local tarantool_bin = arg[-1]
> +local PANIC = 256
> +
> +function run_script(code)

Nit: It is the rule of thumb to don't fill the global namespace with
fields (variables and functions) if you actually don't need it.

> +    local dir = fio.tempdir()
> +    local script_path = fio.pathjoin(dir, 'script.lua')
> +    local script = fio.open(script_path, {'O_CREAT', 'O_WRONLY', 'O_APPEND'},
> +        tonumber('0777', 8))
> +    script:write(code)
> +    script:write("\nos.exit(0)")
> +    script:close()
> +    local cmd = [[/bin/sh -c 'cd "%s" && "%s" ./script.lua 2> /dev/null']]
> +    local res = os.execute(string.format(cmd, dir, tarantool_bin))
> +    fio.rmtree(dir)
> +    return res
> +end
> +
> +--
> +-- gh-4594: when memtx_dir is not exists, but vinyl_dir exists and
> +-- errno is set to ENOENT, box configuration succeeds, however it
> +-- should not
> +--
> +vinyl_dir = fio.tempdir()

Same here, it would be good to use 'local' here.

> +run_script(string.format([[
> +box.cfg{vinyl_dir = '%s'}
> +s = box.schema.space.create('test', {engine = 'vinyl'})
> +s:create_index('pk')
> +os.exit(0)
> +]], vinyl_dir))
> +code = string.format([[

Same here.

> +local errno = require('errno')
> +errno(errno.ENOENT)
> +box.cfg{vinyl_dir = '%s'}
> +os.exit(0)
> +]], vinyl_dir)
> +test:is(run_script(code), PANIC, "bootstrap with ENOENT from non-empty vinyl_dir")
> +fio.rmtree(vinyl_dir)

I propose to clarify the test steps a bit:

 | vinyl_dir = <...>
 |
 | -- Fill vinyl_dir.
 | run_script(<...>)
 |
 | -- Verify the case described above.
 | local code = <...>
 | test:is(<...>)
 |
 | -- Remove vinyl_dir.
 | fio.rmtree(vinyl_dir)

> +
> +test:check()
> +os.exit(0)

I suggest to highlight a test status in the exit code (it follows the
Lua Style Guide proposal [1]):

 | os.exit(test:check() and 0 or 1)

[1]: https://github.com/tarantool/doc/issues/1004


More information about the Tarantool-patches mailing list