[Tarantool-patches] [PATCH v1] test: fix flaky box/on_shutdown.test.lua on asan

Igor Munkin imun at tarantool.org
Wed Sep 9 15:04:07 MSK 2020


Sasha,

Thanks for the patch! I just dump my thoughts regarding the issue, as
you asked in offline.

I'm totally not a test-run expert, so I can only say something from
Tarantool Lua sockets and Lua GC side.

<socket> connection releases the descriptor in a two ways:
* via explicit <socket_close>[1] call
* implicitly using <gc_socket_t> __gc metamethod[2]

I see no explicit close for socket created within <test_run.cmd> function
so I guess you faced the latter approach.

I guess your solution fixes the issue, since GC machinery *might* call
the corresponding __gc metamethod after <fiber.sleep> returns. However,
with <collectrgarbage> call the GC engine makes a full collection cycle
and "dead" sock object is released (strictly saying at least its __gc
metamethod is called).

I have several questions to you:
* How does test-run handle these commands and connections?
* Why do other start/stop actions in this test chunk stay unaffected?

I believe you need to go deeper a bit to find the exact root cause. As a
result you can describe the issue in a more precise way. Feel free to
ask me about Lua behaviour that you find strange/mystifying.

On 31.08.20, Alexander V. Tikhonov wrote:
> Found that box/on_shutdown.test.lua test fails on asan build with:
> 
>   2020-08-26 09:04:06.750 [42629] main/102/on_shutdown [string "_ = box.ctl.on_shutdown(function() log.warn("..."]:1 W> on_shutdown 5
>   Starting instance proxy...
>   Run console at unix/:/tnt/test/var/001_box/proxy.control
>   Start failed: builtin/box/console.lua:865: failed to create server unix/:/tnt/test/var/001_box/proxy.control: Address already in use
> 
> It happened on ASAN build, because server stop routine
> 
>   test-run/lib/preprocessor.py:TestState.server_stop() ->
>     test-run/lib/tarantool_server.py:TarantoolServer.stop()
> 
> needs to free the proxy.control socket created by
> 
>   test-run/lib/preprocessor.py:TestState.server_start() ->
>     tarantoolctl:process_local()
> 
> On some builds like ASAN server stop routine needs more time to free
> the 'proxy.control' socket. So instead of time delay before server
> restart need to use garbage collector to be sure that it will be freed.
> 
> Closes #5260
> Part of #4360
> ---
> 
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5260-on-shutdown-test
> Issue: https://github.com/tarantool/tarantool/issues/5260
> Issue: https://github.com/tarantool/tarantool/issues/4360
> 

<snipped>

> 

[1]: https://github.com/tarantool/tarantool/blob/master/src/lua/socket.lua#L108L119
[2]: https://github.com/tarantool/tarantool/blob/master/src/lua/socket.lua#L64L72

-- 
Best regards,
IM


More information about the Tarantool-patches mailing list