[Tarantool-patches] [PATCH 2/2] test: popen: fix popen test 'hang' under test-run

Alexander Turenko alexander.turenko at tarantool.org
Mon May 18 14:42:58 MSK 2020


killpg() on Mac OS may don't deliver a signal to a process: it seems
that there is a race when a process is just forked. It means that
popen_handle:close() may leave a process alive, when `opts.setsid` and
`opts.group_signal` are set.

There is simple reproducer, which does not leave alive `sleep 120`
processes on Linux, but does it on Mac OS (within three-four runs in
rows):

 | #include <signal.h>
 | #include <unistd.h>
 | #include <fcntl.h>
 |
 | int
 | main()
 | {
 | 	char *child_argv[] = {
 | 		"/bin/sh",
 | 		"-c",
 | 		"sleep 120",
 | 		NULL,
 | 	};
 | 	pid_t pid;
 | 	int fd[2];
 | 	pipe(fd);
 | 	fcntl(fd[0], F_SETFD, FD_CLOEXEC);
 | 	fcntl(fd[1], F_SETFD, FD_CLOEXEC);
 |
 | 	if ((pid = fork()) == 0) {
 | 		/* Child. */
 | 		close(fd[0]);
 | 		setpgrp();
 | 		for (int i = 0; i < 10; ++i) {
 | 			/* Proceed with killpg. */
 | 			if (i == 5)
 | 				close(fd[1]);
 | 			if (fork() == 0) {
 | 				/* Child. */
 | 				execve("/bin/sh", child_argv, NULL);
 | 			}
 | 		}
 | 	} else {
 | 		/* Parent. */
 | 		close(fd[1]);
 | 		char c;
 | 		read(fd[0], &c, 1);
 | 		killpg(pid, SIGKILL);
 | 	}
 | 	return 0;
 | }

Compile it (`cc test.c -o test`) and run several times:

$ for i in $(seq 1 1000); do                     \
    echo $i;                                     \
    ./test                                       \
    && ps -o pid,pgid,command -ax | grep [s]leep \
    && break;                                    \
done

This is the reason why `sleep 120` process may be alive even when the
whole test passes.

test-run captures stdout and stderr of a 'core = app' test and waits EOF
on them. If a child process inherit one of them or both, the fd is still
open for writing and so EOF situation will not appear until `sleep 120`
will exit.

This commit doesn't try to overcome the root of the problem, but close
stdout and stderr for the child process that may not be killed / exited
in time.

Aside of this, updated found Mac OS peculiars in API comments of C and
Lua popen modules.

Fixes #4938

@TarantoolBot document
Title: popen: add note re group signaling on Mac OS

Copyed from the popen_handle:signal() updated description:

> Note: Mac OS may don't deliver a signal to a process in a group when
> opts.setsid and opts.group_signal are set. It seems there is a race
> here: when a process is just forked it may be not signaled.

Copyed from the popen_handle:close() updated description:

> Details about signaling:
>
> <...>
> - There are peculiars in group signaling on Mac OS,
>   @see popen_handle:signal() for details.

Follows up https://github.com/tarantool/doc/issues/1248
---
 src/lib/core/popen.c        |  5 +++++
 src/lua/popen.c             |  7 +++++++
 test/app-tap/popen.test.lua | 24 +++++++++++++++++++++++-
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/src/lib/core/popen.c b/src/lib/core/popen.c
index da4b89757..3b5fd1062 100644
--- a/src/lib/core/popen.c
+++ b/src/lib/core/popen.c
@@ -683,6 +683,11 @@ popen_state_str(unsigned int state)
  * signal to the process group even when ..._SETSID and
  * ..._GROUP_SIGNAL are set.
  *
+ * Mac OS may don't deliver a signal to a processes in a group
+ * when ..._SETSID and ..._GROUP_SIGNAL are set. It seems there
+ * is a race here: when a process is just forked it may be not
+ * signaled.
+ *
  * Return 0 at success or -1 at failure (and set a diag).
  *
  * Possible errors:
diff --git a/src/lua/popen.c b/src/lua/popen.c
index 471964ee6..18c8282f1 100644
--- a/src/lua/popen.c
+++ b/src/lua/popen.c
@@ -1502,6 +1502,11 @@ lbox_popen_shell(struct lua_State *L)
  * Note: The module offers popen.signal.SIG* constants, because
  * some signals have different numbers on different platforms.
  *
+ * Note: Mac OS may don't deliver a signal to a process in a
+ * group when opts.setsid and opts.group_signal are set. It
+ * seems there is a race here: when a process is just forked it
+ * may be not signaled.
+ *
  * Raise an error on incorrect parameters:
  *
  * - IllegalParams:    an incorrect handle parameter.
@@ -2207,6 +2212,8 @@ lbox_popen_info(struct lua_State *L)
  * - The signal is sent to a process or a grocess group depending
  *   of opts.group_signal. (@see lbox_popen_new() for details of
  *   group signaling).
+ * - There are peculiars in group signaling on Mac OS,
+ *   @see lbox_popen_signal() for details.
  *
  * Resources are released disregarding of whether a signal
  * sending succeeds: fds are closed, memory is released,
diff --git a/test/app-tap/popen.test.lua b/test/app-tap/popen.test.lua
index e1aef9ef9..3b6d9469f 100755
--- a/test/app-tap/popen.test.lua
+++ b/test/app-tap/popen.test.lua
@@ -269,7 +269,29 @@ local function test_read_chunk(test)
 
     local payload = 'hello'
     local script = ('printf "%s"; sleep 120'):format(payload)
-    local ph = popen.shell(script, 'r')
+
+    -- It is important to close stderr here.
+    --
+    -- killpg() on Mac OS may don't deliver a signal to a process
+    -- that is just forked at killpg() call. So :close() at end of
+    -- the test does not guarantee that 'sleep 120' will be
+    -- signaled.
+    --
+    -- test-run captures stdout and stderr for a 'core = app' test
+    -- (like this one) using pipes and waits EOF on its side for
+    -- both. If `sleep 120` process inherits stderr, it will not
+    -- close it and the file descriptor will remain open for
+    -- writing. In this case EOF situation will not occur on the
+    -- reading end of pipe even despite that the test process
+    -- itself already exited.
+    local ph = popen.new({script}, {
+        shell = true,
+        setsid = true,
+        group_signal = true,
+        stdin = popen.opts.CLOSE,
+        stdout = popen.opts.PIPE,
+        stderr = popen.opts.CLOSE,
+    })
 
     -- When a first byte is available, read() should return all
     -- bytes arrived at the time.
-- 
2.25.0



More information about the Tarantool-patches mailing list