From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp55.i.mail.ru (smtp55.i.mail.ru [217.69.128.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id BE021469711 for ; Mon, 18 May 2020 14:44:12 +0300 (MSK) From: Alexander Turenko Date: Mon, 18 May 2020 14:42:58 +0300 Message-Id: <411df46a59315e582f39fd9526b7f95bdd023a3f.1589799677.git.alexander.turenko@tarantool.org> In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH 2/2] test: popen: fix popen test 'hang' under test-run List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy , "Alexander V . Tikhonov" Cc: tarantool-patches@dev.tarantool.org, Alexander Turenko killpg() on Mac OS may don't deliver a signal to a process: it seems that there is a race when a process is just forked. It means that popen_handle:close() may leave a process alive, when `opts.setsid` and `opts.group_signal` are set. There is simple reproducer, which does not leave alive `sleep 120` processes on Linux, but does it on Mac OS (within three-four runs in rows): | #include | #include | #include | | int | main() | { | char *child_argv[] = { | "/bin/sh", | "-c", | "sleep 120", | NULL, | }; | pid_t pid; | int fd[2]; | pipe(fd); | fcntl(fd[0], F_SETFD, FD_CLOEXEC); | fcntl(fd[1], F_SETFD, FD_CLOEXEC); | | if ((pid = fork()) == 0) { | /* Child. */ | close(fd[0]); | setpgrp(); | for (int i = 0; i < 10; ++i) { | /* Proceed with killpg. */ | if (i == 5) | close(fd[1]); | if (fork() == 0) { | /* Child. */ | execve("/bin/sh", child_argv, NULL); | } | } | } else { | /* Parent. */ | close(fd[1]); | char c; | read(fd[0], &c, 1); | killpg(pid, SIGKILL); | } | return 0; | } Compile it (`cc test.c -o test`) and run several times: $ for i in $(seq 1 1000); do \ echo $i; \ ./test \ && ps -o pid,pgid,command -ax | grep [s]leep \ && break; \ done This is the reason why `sleep 120` process may be alive even when the whole test passes. test-run captures stdout and stderr of a 'core = app' test and waits EOF on them. If a child process inherit one of them or both, the fd is still open for writing and so EOF situation will not appear until `sleep 120` will exit. This commit doesn't try to overcome the root of the problem, but close stdout and stderr for the child process that may not be killed / exited in time. Aside of this, updated found Mac OS peculiars in API comments of C and Lua popen modules. Fixes #4938 @TarantoolBot document Title: popen: add note re group signaling on Mac OS Copyed from the popen_handle:signal() updated description: > Note: Mac OS may don't deliver a signal to a process in a group when > opts.setsid and opts.group_signal are set. It seems there is a race > here: when a process is just forked it may be not signaled. Copyed from the popen_handle:close() updated description: > Details about signaling: > > <...> > - There are peculiars in group signaling on Mac OS, > @see popen_handle:signal() for details. Follows up https://github.com/tarantool/doc/issues/1248 --- src/lib/core/popen.c | 5 +++++ src/lua/popen.c | 7 +++++++ test/app-tap/popen.test.lua | 24 +++++++++++++++++++++++- 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/src/lib/core/popen.c b/src/lib/core/popen.c index da4b89757..3b5fd1062 100644 --- a/src/lib/core/popen.c +++ b/src/lib/core/popen.c @@ -683,6 +683,11 @@ popen_state_str(unsigned int state) * signal to the process group even when ..._SETSID and * ..._GROUP_SIGNAL are set. * + * Mac OS may don't deliver a signal to a processes in a group + * when ..._SETSID and ..._GROUP_SIGNAL are set. It seems there + * is a race here: when a process is just forked it may be not + * signaled. + * * Return 0 at success or -1 at failure (and set a diag). * * Possible errors: diff --git a/src/lua/popen.c b/src/lua/popen.c index 471964ee6..18c8282f1 100644 --- a/src/lua/popen.c +++ b/src/lua/popen.c @@ -1502,6 +1502,11 @@ lbox_popen_shell(struct lua_State *L) * Note: The module offers popen.signal.SIG* constants, because * some signals have different numbers on different platforms. * + * Note: Mac OS may don't deliver a signal to a process in a + * group when opts.setsid and opts.group_signal are set. It + * seems there is a race here: when a process is just forked it + * may be not signaled. + * * Raise an error on incorrect parameters: * * - IllegalParams: an incorrect handle parameter. @@ -2207,6 +2212,8 @@ lbox_popen_info(struct lua_State *L) * - The signal is sent to a process or a grocess group depending * of opts.group_signal. (@see lbox_popen_new() for details of * group signaling). + * - There are peculiars in group signaling on Mac OS, + * @see lbox_popen_signal() for details. * * Resources are released disregarding of whether a signal * sending succeeds: fds are closed, memory is released, diff --git a/test/app-tap/popen.test.lua b/test/app-tap/popen.test.lua index e1aef9ef9..3b6d9469f 100755 --- a/test/app-tap/popen.test.lua +++ b/test/app-tap/popen.test.lua @@ -269,7 +269,29 @@ local function test_read_chunk(test) local payload = 'hello' local script = ('printf "%s"; sleep 120'):format(payload) - local ph = popen.shell(script, 'r') + + -- It is important to close stderr here. + -- + -- killpg() on Mac OS may don't deliver a signal to a process + -- that is just forked at killpg() call. So :close() at end of + -- the test does not guarantee that 'sleep 120' will be + -- signaled. + -- + -- test-run captures stdout and stderr for a 'core = app' test + -- (like this one) using pipes and waits EOF on its side for + -- both. If `sleep 120` process inherits stderr, it will not + -- close it and the file descriptor will remain open for + -- writing. In this case EOF situation will not occur on the + -- reading end of pipe even despite that the test process + -- itself already exited. + local ph = popen.new({script}, { + shell = true, + setsid = true, + group_signal = true, + stdin = popen.opts.CLOSE, + stdout = popen.opts.PIPE, + stderr = popen.opts.CLOSE, + }) -- When a first byte is available, read() should return all -- bytes arrived at the time. -- 2.25.0