Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v1] test: fix flaky box/on_shutdown.test.lua on asan
@ 2020-08-31  8:15 Alexander V. Tikhonov
  2020-09-09 12:04 ` Igor Munkin
  0 siblings, 1 reply; 3+ messages in thread
From: Alexander V. Tikhonov @ 2020-08-31  8:15 UTC (permalink / raw)
  To: Kirill Yukhin, Igor Munkin; +Cc: tarantool-patches, Alexander Turenko

Found that box/on_shutdown.test.lua test fails on asan build with:

  2020-08-26 09:04:06.750 [42629] main/102/on_shutdown [string "_ = box.ctl.on_shutdown(function() log.warn("..."]:1 W> on_shutdown 5
  Starting instance proxy...
  Run console at unix/:/tnt/test/var/001_box/proxy.control
  Start failed: builtin/box/console.lua:865: failed to create server unix/:/tnt/test/var/001_box/proxy.control: Address already in use

It happened on ASAN build, because server stop routine

  test-run/lib/preprocessor.py:TestState.server_stop() ->
    test-run/lib/tarantool_server.py:TarantoolServer.stop()

needs to free the proxy.control socket created by

  test-run/lib/preprocessor.py:TestState.server_start() ->
    tarantoolctl:process_local()

On some builds like ASAN server stop routine needs more time to free
the 'proxy.control' socket. So instead of time delay before server
restart need to use garbage collector to be sure that it will be freed.

Closes #5260
Part of #4360
---

Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5260-on-shutdown-test
Issue: https://github.com/tarantool/tarantool/issues/5260
Issue: https://github.com/tarantool/tarantool/issues/4360

 test/box/on_shutdown.result   | 7 ++++++-
 test/box/on_shutdown.skipcond | 7 -------
 test/box/on_shutdown.test.lua | 6 +++++-
 3 files changed, 11 insertions(+), 9 deletions(-)
 delete mode 100644 test/box/on_shutdown.skipcond

diff --git a/test/box/on_shutdown.result b/test/box/on_shutdown.result
index ccbdf45cb..cf5c36c8d 100644
--- a/test/box/on_shutdown.result
+++ b/test/box/on_shutdown.result
@@ -133,8 +133,13 @@ test_run:eval("test", "_ = fiber.new(function() os.exit() while true do end end)
 ---
 - []
 ...
-fiber.sleep(0.1)
+-- On some builds like ASAN server stop routine needs more time to free
+-- the 'proxy.control' socket created by tarantoolctl:process_local(),
+-- check gh-5260. So instead of time delay before server restart need
+-- to use garbage collector to be sure that it will be freed.
+collectgarbage('collect')
 ---
+- 0
 ...
 -- The server should be already stopped by os.exit(),
 -- but start doesn't work without a prior call to stop.
diff --git a/test/box/on_shutdown.skipcond b/test/box/on_shutdown.skipcond
deleted file mode 100644
index e46fd1088..000000000
--- a/test/box/on_shutdown.skipcond
+++ /dev/null
@@ -1,7 +0,0 @@
-import os
-
-# Disabled at ASAN build due to issue #4360.
-if os.getenv("ASAN") == 'ON':
-    self.skip = 1
-
-# vim: set ft=python:
diff --git a/test/box/on_shutdown.test.lua b/test/box/on_shutdown.test.lua
index 2a9143404..5437443b3 100644
--- a/test/box/on_shutdown.test.lua
+++ b/test/box/on_shutdown.test.lua
@@ -54,7 +54,11 @@ test_run:cmd("switch default")
 -- instance to make sure test_run doesn't lose connection to the
 -- shutting down instance.
 test_run:eval("test", "_ = fiber.new(function() os.exit() while true do end end)")
-fiber.sleep(0.1)
+-- On some builds like ASAN server stop routine needs more time to free
+-- the 'proxy.control' socket created by tarantoolctl:process_local(),
+-- check gh-5260. So instead of time delay before server restart need
+-- to use garbage collector to be sure that it will be freed.
+collectgarbage('collect')
 -- The server should be already stopped by os.exit(),
 -- but start doesn't work without a prior call to stop.
 test_run:cmd("stop server test")
-- 
2.17.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Tarantool-patches] [PATCH v1] test: fix flaky box/on_shutdown.test.lua on asan
  2020-08-31  8:15 [Tarantool-patches] [PATCH v1] test: fix flaky box/on_shutdown.test.lua on asan Alexander V. Tikhonov
@ 2020-09-09 12:04 ` Igor Munkin
  0 siblings, 0 replies; 3+ messages in thread
From: Igor Munkin @ 2020-09-09 12:04 UTC (permalink / raw)
  To: Alexander V. Tikhonov; +Cc: tarantool-patches, Alexander Turenko

Sasha,

Thanks for the patch! I just dump my thoughts regarding the issue, as
you asked in offline.

I'm totally not a test-run expert, so I can only say something from
Tarantool Lua sockets and Lua GC side.

<socket> connection releases the descriptor in a two ways:
* via explicit <socket_close>[1] call
* implicitly using <gc_socket_t> __gc metamethod[2]

I see no explicit close for socket created within <test_run.cmd> function
so I guess you faced the latter approach.

I guess your solution fixes the issue, since GC machinery *might* call
the corresponding __gc metamethod after <fiber.sleep> returns. However,
with <collectrgarbage> call the GC engine makes a full collection cycle
and "dead" sock object is released (strictly saying at least its __gc
metamethod is called).

I have several questions to you:
* How does test-run handle these commands and connections?
* Why do other start/stop actions in this test chunk stay unaffected?

I believe you need to go deeper a bit to find the exact root cause. As a
result you can describe the issue in a more precise way. Feel free to
ask me about Lua behaviour that you find strange/mystifying.

On 31.08.20, Alexander V. Tikhonov wrote:
> Found that box/on_shutdown.test.lua test fails on asan build with:
> 
>   2020-08-26 09:04:06.750 [42629] main/102/on_shutdown [string "_ = box.ctl.on_shutdown(function() log.warn("..."]:1 W> on_shutdown 5
>   Starting instance proxy...
>   Run console at unix/:/tnt/test/var/001_box/proxy.control
>   Start failed: builtin/box/console.lua:865: failed to create server unix/:/tnt/test/var/001_box/proxy.control: Address already in use
> 
> It happened on ASAN build, because server stop routine
> 
>   test-run/lib/preprocessor.py:TestState.server_stop() ->
>     test-run/lib/tarantool_server.py:TarantoolServer.stop()
> 
> needs to free the proxy.control socket created by
> 
>   test-run/lib/preprocessor.py:TestState.server_start() ->
>     tarantoolctl:process_local()
> 
> On some builds like ASAN server stop routine needs more time to free
> the 'proxy.control' socket. So instead of time delay before server
> restart need to use garbage collector to be sure that it will be freed.
> 
> Closes #5260
> Part of #4360
> ---
> 
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5260-on-shutdown-test
> Issue: https://github.com/tarantool/tarantool/issues/5260
> Issue: https://github.com/tarantool/tarantool/issues/4360
> 

<snipped>

> 

[1]: https://github.com/tarantool/tarantool/blob/master/src/lua/socket.lua#L108L119
[2]: https://github.com/tarantool/tarantool/blob/master/src/lua/socket.lua#L64L72

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Tarantool-patches] [PATCH v1] test: fix flaky box/on_shutdown.test.lua on asan
@ 2020-08-26 14:24 Alexander V. Tikhonov
  0 siblings, 0 replies; 3+ messages in thread
From: Alexander V. Tikhonov @ 2020-08-26 14:24 UTC (permalink / raw)
  To: Kirill Yukhin, Alexander Turenko; +Cc: tarantool-patches

Found that box/on_shutdown.test.lua test fails on asan build with:

  2020-08-26 09:04:06.750 [42629] main/102/on_shutdown [string "_ = box.ctl.on_shutdown(function() log.warn("..."]:1 W> on_shutdown 5
  Starting instance proxy...
  Run console at unix/:/tnt/test/var/001_box/proxy.control
  Start failed: builtin/box/console.lua:865: failed to create server unix/:/tnt/test/var/001_box/proxy.control: Address already in use

It happened on ASAN build, because server stop routine

  test-run/lib/preprocessor.py:TestState.server_stop() ->
    test-run/lib/tarantool_server.py:TarantoolServer.stop()

needs some delay to free the proxy.control socket created by

  test-run/lib/preprocessor.py:TestState.server_start() ->
    tarantoolctl:process_local()

To fix the issue added fiber.sleep() to give the needed delay.

Closes #5260
Part of #4360
---

Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5260-on-shutdown-test
Issue: https://github.com/tarantool/tarantool/issues/5260
Issue: https://github.com/tarantool/tarantool/issues/4360

 test/box/on_shutdown.result   | 5 +++++
 test/box/on_shutdown.skipcond | 7 -------
 test/box/on_shutdown.test.lua | 3 +++
 3 files changed, 8 insertions(+), 7 deletions(-)
 delete mode 100644 test/box/on_shutdown.skipcond

diff --git a/test/box/on_shutdown.result b/test/box/on_shutdown.result
index ccbdf45cb..ea3f8ca4a 100644
--- a/test/box/on_shutdown.result
+++ b/test/box/on_shutdown.result
@@ -142,6 +142,11 @@ test_run:cmd("stop server test")
 ---
 - true
 ...
+-- On ASAN build server stop needs some delay to free the proxy.control
+-- socket created by tarantoolctl:process_local(), check gh-5260.
+fiber.sleep(0.1)
+---
+...
 test_run:cmd("start server test")
 ---
 - true
diff --git a/test/box/on_shutdown.skipcond b/test/box/on_shutdown.skipcond
deleted file mode 100644
index e46fd1088..000000000
--- a/test/box/on_shutdown.skipcond
+++ /dev/null
@@ -1,7 +0,0 @@
-import os
-
-# Disabled at ASAN build due to issue #4360.
-if os.getenv("ASAN") == 'ON':
-    self.skip = 1
-
-# vim: set ft=python:
diff --git a/test/box/on_shutdown.test.lua b/test/box/on_shutdown.test.lua
index 2a9143404..91ff36f55 100644
--- a/test/box/on_shutdown.test.lua
+++ b/test/box/on_shutdown.test.lua
@@ -58,6 +58,9 @@ fiber.sleep(0.1)
 -- The server should be already stopped by os.exit(),
 -- but start doesn't work without a prior call to stop.
 test_run:cmd("stop server test")
+-- On ASAN build server stop needs some delay to free the proxy.control
+-- socket created by tarantoolctl:process_local(), check gh-5260.
+fiber.sleep(0.1)
 test_run:cmd("start server test")
 test_run:wait_log('test', 'on_shutdown 5', nil, 30, {noreset=true})
 -- make sure we exited because of os.exit(), not a signal.
-- 
2.17.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-09-09 12:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-31  8:15 [Tarantool-patches] [PATCH v1] test: fix flaky box/on_shutdown.test.lua on asan Alexander V. Tikhonov
2020-09-09 12:04 ` Igor Munkin
  -- strict thread matches above, loose matches on Subject: below --
2020-08-26 14:24 Alexander V. Tikhonov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox