From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5957946970F for ; Thu, 28 Nov 2019 23:45:40 +0300 (MSK) Received: by mail-lf1-f68.google.com with SMTP id l18so2908024lfc.1 for ; Thu, 28 Nov 2019 12:45:40 -0800 (PST) From: Cyrill Gorcunov Date: Thu, 28 Nov 2019 23:45:08 +0300 Message-Id: <20191128204512.19732-2-gorcunov@gmail.com> In-Reply-To: <20191128204512.19732-1-gorcunov@gmail.com> References: <20191128204512.19732-1-gorcunov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH 1/5] popen: Introduce a backend engine List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tml In the patch we introduce popen backend engine which provides a way to execute external programs and communicate with their stdin/stdout/stderr streams. It is possible to run a child process with: a) completely closed stdX descriptors b) provide /dev/null descriptors to appropritate stdX c) pass new transport into a child (currently we use pipes for this sake, but mae extend to tty/sockets) d) inherit stdX from a parent, iow do nothing On tarantool start we create @popen_pids_map hash which maps created processes PIDs to popen_handle structure, this structure keeps everything needed to control and communicate with the children. The hash will allow us to find a hild process quickly from inside of a signal handler. Each handle links into @popen_head list, which is need to be able to destory children processes on exit procedure (ie when we exit tarantool and need to cleanup the resources used). Every new process is born by vfork call - we can't use fork() because of at_fork handlers in libeio which cause deadlocking in internal mutex usage. Thus the caller waits until vfork finishes its work and runs exec (or exit with error). The good news is that we are supposed to create new processes from coio separate thread thus other threads are not affected. Because children processes are running without any limitations they can exit by self or can be killed by some other third side (say user of a hw node), we need to watch their state which is done by setting a hook with ev_child_start helper. This helper allows us to catch SIGCHLD when a child get exited/signaled and unregister it from a pool or currently running children. Note the libev wait() reaps child zomby by self. Such specifics force us to use signal blocking procedure as a sync point to keep children pids consistent (without signal blocking pids could be simply reused inbetween and we can't distinguish the pid we're tracking belongs our child process or some other process in the system). This engine provides the following API: - popen_create -- to create a new child process - popen_destroy -- to release resources occupied and terminate a chile process - popen_kill -- to kill a child process, note that this routine doesn't wait for process termination it simply sends SIGKILL signal - popen_wstatus -- to fetch current system dependant status of a child process - popen_stat -- to fetch statistic of a child process Known issues: - environment variables are flushed to zero, should we provide a way to adjust it (via options) or inherit it instead? - popen_kill always send SIGKILL, should not we provide a portable way to customize signal sedning (say symbolic names for signals and pass them here)? - for native mode we don't do additional processing of arguments thus only plain name of elf executable will be working, we should provide a way for argv explicit passing or do analyze @command for arguments by hands; - for linux base systems we use popen2 system call passing O_CLOEXEC flag so that two concurrent popen_create calls would not affect each other with pipes inheritance (while currently we don't have a case where concurrent calls could be done as far as I know, still better to be on a safe side from the beginning); - there are some files (such as xlog) which tarantool opens for own needs without setting O_CLOEXEC flag and it get propagated to a children process; for linux based systems we use close_inherited_fds helper which walks over opened files of a process and close them but for other targets like MachO or FreeBSD this helper just zapped simply because I don't have such machines to experimant with; we should investigate this moment in more details later once base code is merged in; - need to consider a case where we will be using piping for descriptors (for example we might be writting into stdin of a child from another pipe, for this sake we could use splice() syscall which gonna be a way faster than copying data inside kernel between process). Still the question is -- do we really need it? Since we use interanal flags in popen handle this should not be a big problem to extend this interfaces. Part of #4031 Signed-off-by: Cyrill Gorcunov --- src/lib/core/CMakeLists.txt | 1 + src/lib/core/popen.c | 1204 +++++++++++++++++++++++++++++++++++ src/lib/core/popen.h | 135 ++++ src/main.cc | 4 + 4 files changed, 1344 insertions(+) create mode 100644 src/lib/core/popen.c create mode 100644 src/lib/core/popen.h diff --git a/src/lib/core/CMakeLists.txt b/src/lib/core/CMakeLists.txt index e60b5199e..0f080061e 100644 --- a/src/lib/core/CMakeLists.txt +++ b/src/lib/core/CMakeLists.txt @@ -15,6 +15,7 @@ set(core_sources coio.cc coio_task.c coio_file.c + popen.c coio_buf.cc fio.c exception.cc diff --git a/src/lib/core/popen.c b/src/lib/core/popen.c new file mode 100644 index 000000000..460d3746c --- /dev/null +++ b/src/lib/core/popen.c @@ -0,0 +1,1204 @@ +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "popen.h" +#include "assoc.h" +#include "say.h" + +/* Children pids map popen_handle map */ +static struct mh_i32ptr_t *popen_pids_map = NULL; + +/* All popen handles to be able to cleanup them on exit */ +static RLIST_HEAD(popen_head); + +/* /dev/null to be used inside children if requested */ +static int dev_null_fd_ro = -1; +static int dev_null_fd_wr = -1; + +/* To block SIGCHLD delivery when need a sync point */ +static sigset_t popen_blockmask; + +/* + * In case if something really unexpected happened + * and we no longer able to unblock SIGCHLD instead + * of exiting with error in a middle of program work + * we rather disable new popen openings leaving a user + * a way to shutdown without loosing a memory data. + */ +static bool popen_blockmask_broken = false; + +/** + * popen_register - register popen handle in a pids map + * @handle: a handle to register + */ +static void +popen_register(struct popen_handle *handle) +{ + struct mh_i32ptr_node_t node = { + .key = handle->pid, + .val = handle, + }; + say_debug("popen: register %d", handle->pid); + mh_i32ptr_put(popen_pids_map, &node, NULL, NULL); +} + +/** + * popen_find - find popen handler by its pid + * @pid: pid of a handler + * + * Returns a handle if found or NULL otherwise. + */ +static struct popen_handle * +popen_find(pid_t pid) +{ + mh_int_t k = mh_i32ptr_find(popen_pids_map, pid, NULL); + if (k == mh_end(popen_pids_map)) + return NULL; + return mh_i32ptr_node(popen_pids_map, k)->val; +} + +/** + * popen_unregister - remove popen handler from a pids map + * @handle: a handle to remove + */ +static void +popen_unregister(struct popen_handle *handle) +{ + struct mh_i32ptr_node_t node = { + .key = handle->pid, + .val = NULL, + }; + say_debug("popen: unregister %d", handle->pid); + mh_i32ptr_remove(popen_pids_map, &node, NULL); +} + +/** + * handle_alloc - allocate new popen hanldle with flags specified + * @flags: flags to be used + * + * Everything else initialized to default values. + * + * Returns pointer to a new popen or NULL on error. + */ +static struct popen_handle * +handle_alloc(unsigned int flags) +{ + struct popen_handle *handle; + + handle = malloc(sizeof(*handle)); + if (!handle) { + say_syserror("popen: Can't allocate handle"); + return NULL; + } + + handle->wstatus = 0; + handle->pid = -1; + handle->flags = flags; + + rlist_create(&handle->list); + + /* all fds to -1 */ + memset(handle->fds, 0xff, sizeof(handle->fds)); + + say_debug("popen: allocated %p", handle); + return handle; +} + +/** + * handle_free - free memory allocated for a handle + * @handle: a handle to free + * + * Just to match handle_alloc(). + */ +static void +handle_free(struct popen_handle *handle) +{ + say_debug("popen: handle %p free %p", handle); + free(handle); +} + +/** + * popen_may_io - test if handle can run io operation + * @handle: popen handle + * @idx: index of a file descriptor to operate on + * @io_flags: popen_flag_bits flags + * + * Returns true if IO is allowed and false otherwise + * (setting an appropriate @errno). + */ +static inline bool +popen_may_io(struct popen_handle *handle, unsigned int idx, + unsigned int io_flags) +{ + if (!handle) { + errno = ESRCH; + return false; + } + + if (!(io_flags & handle->flags)) { + errno = EINVAL; + return false; + } + + if (handle->fds[idx] < 0) { + errno = EPIPE; + return false; + } + return true; +} + +/** + * popen_may_pidop - test if handle is valid for pid related operations + * @handle: popen handle + * + * This is shortcut to test if handle is not nil and still have + * a valid child process. + * + * Returns true if ops are allowed and false otherwise + * (setting an appropriate @errno). + */ +static inline bool +popen_may_pidop(struct popen_handle *handle) +{ + if (!handle || handle->pid == -1) { + errno = ESRCH; + return false; + } + return true; +} + +/** + * popen_stat - fill popen object statistics + * @handle: popen handle + * @st: destination popen_stat to fill + * + * Returns 0 on succes, -1 otherwise. + */ +int +popen_stat(struct popen_handle *handle, struct popen_stat *st) +{ + if (!handle) { + errno = ESRCH; + return -1; + } + + st->pid = handle->pid; + st->flags = handle->flags; + + static_assert(lengthof(st->fds) == lengthof(st->fds), + "Statistics fds are screwed"); + + memcpy(st->fds, handle->fds, sizeof(handle->fds)); + + return 0; +} + +/** + * popen_write - write data to the child stdin + * @handle: popen handle + * @buf: data to write + * @count: number of bytes to write + * @flags: a flag representing stdin peer + * + * Returns number of bytes written or -1 on error. + */ +ssize_t +popen_write(struct popen_handle *handle, void *buf, + size_t count, unsigned int flags) +{ + if (!popen_may_io(handle, STDIN_FILENO, flags)) + return -1; + + say_debug("popen: %d: write idx %d", + handle->pid, STDIN_FILENO); + + return write(handle->fds[STDIN_FILENO], buf, count); +} + +/** + * popen_wait_read - wait for data appear on a child's peer + * @handle: popen handle + * @fd: peer fd to wait on + * @timeout_msecs: timeout in microseconds + * + * Returns 1 if there is data to read, -EAGAIN if timeout happened + * and -1 on other errors setting errno accordingly. + */ +static int +popen_wait_read(struct popen_handle *handle, int fd, int timeout_msecs) +{ + struct pollfd pollfd = { + .fd = fd, + .events = POLLIN, + }; + int ret; + + ret = poll(&pollfd, 1, timeout_msecs); + say_debug("popen: %d: poll: ret %d fd %d events %#x revents %#x", + handle->pid, ret, fd, pollfd.events, pollfd.revents); + + if (ret == 1) { + if (pollfd.revents == POLLIN) { + return 1; + } else { + say_error("popen: %d: unexpected revents %#x", + handle->pid, pollfd.revents); + return -EINVAL; + } + } + + return ret < 0 ? -errno : -EAGAIN; +} + +/** + * popen_read_timeout - read data from a child's peer with timeout + * @handle: popen handle + * @buf: destination buffer + * @count: number of bytes to read + * @flags: POPEN_FLAG_FD_STDOUT or POPEN_FLAG_FD_STDERR + * @timeout_msecs: time to wait in microseconds if no + * data available; ignored if less or equal to zero + * + * Returns number of bytes read or -EAGAIN if @timeout_msecs expired. + * On other errors -1 returned and errno set accordingly. + */ +ssize_t +popen_read_timeout(struct popen_handle *handle, void *buf, + size_t count, unsigned int flags, + int timeout_msecs) +{ + int idx, ret; + + idx = flags & POPEN_FLAG_FD_STDOUT ? + STDOUT_FILENO : STDERR_FILENO; + + if (!popen_may_io(handle, idx, flags)) + return -1; + + say_debug("popen: %d: read idx %d fds %d timeout_msecs %d", + handle->pid, idx, handle->fds[idx], timeout_msecs); + + if (timeout_msecs > 0) { + ret = popen_wait_read(handle, handle->fds[idx], + timeout_msecs); + if (ret < 0) { + if (ret != -EAGAIN) { + errno = -ret; + say_syserror("popen: %d: data wait failed", + handle->pid); + } + return ret; + } + } + + return read(handle->fds[idx], buf, count); +} + +/** + * wstatus_str - encode signal status into human readable form + * @buf: destination buffer + * @size: buffer size + * @wstatus: status to encode + * + * Operates on S_DEBUG level only simply because snprintf + * is pretty heavy in performance. + * + * Returns pointer to @buf with encoded string. + */ +static char * +wstatus_str(char *buf, size_t size, int wstatus) +{ + static const char fmt[] = + "wstatus %#x exited %d status %d " + "signaled %d wtermsig %d " + "stopped %d stopsig %d " + "coredump %d continued %d"; + + assert(size > 0); + + if (say_log_level_is_enabled(S_DEBUG)) { + snprintf(buf, size, fmt, wstatus, + WIFEXITED(wstatus), + WIFEXITED(wstatus) ? + WEXITSTATUS(wstatus) : -1, + WIFSIGNALED(wstatus), + WIFSIGNALED(wstatus) ? + WTERMSIG(wstatus) : -1, + WIFSTOPPED(wstatus), + WIFSTOPPED(wstatus) ? + WSTOPSIG(wstatus) : -1, + WCOREDUMP(wstatus), + WIFCONTINUED(wstatus)); + } + + return buf; +} + +/** + * __wstatus_str - shortcut to wstatus_str with static buffer + * @wstatus: status to encode + * + * Returns pointer to a buffer with encoded message. + * Note this function uses the local static buffer thus + * should not be called in parallel. + */ +static char * +__wstatus_str(int wstatus) +{ + static char buf[128]; + return wstatus_str(buf, sizeof(buf), wstatus); +} + +/** + * popen_notify_sigchld - notify popen subsisteb about SIGCHLD event + * @pid: PID of a process which changed its state + * @wstatus: signal status of a process + * + * The function is called from global SIGCHLD watcher in libev so + * we need to figure out if it is our process which possibly been + * terminated. + * + * Note the libev calls for wait() by self so we don't need to do + * furter processing and reap children. + */ +static void +popen_notify_sigchld(pid_t pid, int wstatus) +{ + struct popen_handle *handle; + static char buf[128]; + + say_debug("popen: sigchld notify %d (%s)", + pid, wstatus_str(buf, sizeof(buf), wstatus)); + + handle = popen_find(pid); + if (!handle) + return; + + handle->wstatus = wstatus; + if (WIFEXITED(wstatus) || WIFSIGNALED(wstatus)) { + assert(handle->pid == pid); + /* + * libev calls for waitpid by self so + * we don't have to wait here. + */ + popen_unregister(handle); + /* + * Since SIGCHLD may come to us not + * due to exit/kill reason (consider + * a case when someone stopped a child + * process) we should continue wathcing + * state changes, thus we stop monitoring + * dead children only. + */ + say_debug("popen: ev_child_stop %d", handle->pid); + ev_child_stop(EV_DEFAULT_ &handle->ev_sigchld); + handle->pid = -1; + } +} + +/** + * ev_sigchld_cb - handle SIGCHLD from a child process. + * @w: a child exited + * @revents: unused + */ +static void +ev_sigchld_cb(EV_P_ ev_child *w, int revents) +{ + (void)revents; + ev_child_stop(EV_A_ w); + + /* + * The reason for a separate helper is that + * we might need to notify more subsystems + * in future. + */ + popen_notify_sigchld(w->rpid, w->rstatus); +} + +/** + * popen_sigchld_block - block SIGCHLD + * @oldmask: a pointer where to save an old signal mask + * + * This routine is serialization point, we use signal blocking + * to prevent concurrent access to popen handle from external + * users which may kill programs by hands in any moment. + * + * Returns 0 on success, -1 otherwise. + */ +static int +popen_sigchld_block(sigset_t *oldmask) +{ + if (unlikely(popen_blockmask_broken)) { + return 0; + } else if (sigprocmask(SIG_BLOCK, &popen_blockmask, oldmask)) { + say_syserror("popen: Can't block SIGCHLD"); + return -1; + } + return 0; +} + +/** + * popen_sigchld_block - unblock SIGCHLD + * @oldmask: a pointer to a mask to restore + * + * Returns 0 on success, -1 otherwise. + */ +static int +popen_sigchld_unblock(sigset_t *oldmask) +{ + if (unlikely(popen_blockmask_broken)) { + return 0; + } else if (sigprocmask(SIG_SETMASK, oldmask, NULL)) { + say_syserror("popen: Can't unblock SIGCHLD"); + /* + * This is critial issue but give users + * an opportunity to shutdown. + */ + say_crit("popen: Signal handling is broken, " + "please consider restarting the program."); + popen_blockmask_broken = true; + return -1; + } + return 0; +} + +/** + * popen_wstatus_blocked - fetch popen child process wait status + * @handle: popen handle to inspect + * @wstatus: status to be filled if process exited + * @early: early bootstrap testing, don't print error if true + * + * The SIGCHLD must be blocked. + * + * Returns 1 if process changed its state filling + * optional @wstatus, 0 if process is still running + * and -1 on error. + */ +static inline int +popen_wstatus_blocked(struct popen_handle *handle, + int *wstatus, bool early) +{ + int pid; + + if (!popen_may_pidop(handle)) { + if (handle && wstatus) + *wstatus = handle->wstatus; + /* + * Here is a trick if @handle is passed + * and its pid = -1 it means we already + * obtained sigchld so caller is interested + * in child status way after the child is + * finished. + */ + return handle ? 1 : -1; + } + + pid = waitpid(handle->pid, &handle->wstatus, WNOHANG); + if (pid == -1 && !early) { + say_syserror("popen: Unable to wait pid %d (%s)", + handle->pid, __wstatus_str(handle->wstatus)); + } else if (pid > 0) { + if (wstatus) + *wstatus = handle->wstatus; + pid = 1; + } + + return pid; +} + + +/** + * popen_wstatus - fetch popen child process wait status + * @handle: popen handle to inspect + * @wstatus: status to be filled if process exited + * + * Returns 1 if process changed its state filling + * optional @wstatus, 0 if process is still running + * and -1 on error. + */ +int +popen_wstatus(struct popen_handle *handle, int *wstatus) +{ + sigset_t oldmask; + int ret; + + /* + * The pid in handle might be already killed + * by external signal or via natural exit of + * a program, so need to block. + */ + if (popen_sigchld_block(&oldmask)) + return -1; + ret = popen_wstatus_blocked(handle, wstatus, false); + popen_sigchld_unblock(&oldmask); + + return ret; +} + +/** + * popen_kill_blocked - kills a child process with signals blocked + * @handle: popen handle + * + * The SIGCHLD must be blocked. + * + * Returns 0 if child has been killed, -1 otherwise. + */ +static inline int +popen_kill_blocked(struct popen_handle *handle) +{ + int ret; + + /* + * A child may be killed or exited already. + */ + if (popen_may_pidop(handle)) { + say_debug("popen: killpg %d", handle->pid); + ret = killpg(handle->pid, SIGKILL); + if (ret < 0) { + say_syserror("popen: Unable to kill %d", + handle->pid); + } + } else + ret = -1; + + return ret; +} + +/** + * popen_kill - kills a child process associated with popen handle + * @handle: popen handle + * + * Returns 0 if child has been killed, -1 otherwise. + */ +int +popen_kill(struct popen_handle *handle) +{ + sigset_t oldmask; + int ret; + + if (popen_sigchld_block(&oldmask)) + return -1; + ret = popen_kill_blocked(handle); + + popen_sigchld_unblock(&oldmask); + return ret; +} + +/** + * popen_destroy_blocked - destory a popen handle + * @handle: a popen handle to destroy + * + * The function kills a child process and + * close all fds and remove the handle from + * a living list and finally frees the handle. + * + * The SIGCHLD must be blocked or the handle + * must be not registered yet. + * + * Returns 0 on success, -1 otherwise. + */ +static inline int +popen_destroy_blocked(struct popen_handle *handle) +{ + size_t i; + + if (popen_kill(handle) && errno != ESRCH) + return -1; + + for (i = 0; i < lengthof(handle->fds); i++) { + if (handle->fds[i] != -1) + close(handle->fds[i]); + } + + /* + * We won't be wathcing this child anymore if + * kill signal is not yet delivered. + */ + if (handle->pid != -1) { + say_debug("popen: ev_child_stop %d", handle->pid); + ev_child_stop(EV_DEFAULT_ &handle->ev_sigchld); + } + + rlist_del(&handle->list); + handle_free(handle); + return 0; +} + +/** + * popen_destroy - destory a popen handle + * @handle: pointer to a popen handle + * + * The function kills a child process associated with the + * popen handle, closes all pipes and frees memory. + * + * After this call the popen object no longer usable. + * + * Returns 0 on succsess, -1 otherwise. + */ +int +popen_destroy(struct popen_handle *handle) +{ + sigset_t oldmask; + int ret; + + if (!handle) { + errno = ESRCH; + return -1; + } + + if (popen_sigchld_block(&oldmask)) + return -1; + ret = popen_destroy_blocked(handle); + popen_sigchld_unblock(&oldmask); + return ret; +} + +/** + * create_pipe - create nonblocking cloexec pipe + * @pfd: pipe ends to setup + * + * Returns 0 on success, -1 on error. + */ +static int +create_pipe(int pfd[2]) +{ +#ifdef TARGET_OS_LINUX + if (pipe2(pfd, O_CLOEXEC)) { + say_syserror("popen: Can't create pipe2"); + return -1; + } +#else + if (pipe(pfd)) { + say_syserror("popen: Can't create pipe"); + return -1; + } + if (fcntl(pfd[0], F_SETFL, O_CLOEXEC) || + fcntl(pfd[1], F_SETFL, O_CLOEXEC)) { + int saved_errno = errno; + say_syserror("popen: Can't unblock pipe"); + close(pfd[0]), pfd[0] = -1; + close(pfd[1]), pfd[1] = -1; + errno = saved_errno; + return -1; + } +#endif + return 0; +} + +/** + * close_inherited_fds - close inherited file descriptors + * @skip_fds: an array of descriptors which should + * be kept opened + * @nr_skip_fds: number of elements in @skip_fds + * + * Returns 0 on success, -1 otherwise. + */ +static int +close_inherited_fds(int *skip_fds, size_t nr_skip_fds) +{ +#ifdef TARGET_OS_LINUX + static const char path[] = "/proc/self/fd"; + struct dirent *de; + int fd_no, fd_dir; + DIR *dir; + size_t i; + + dir = opendir(path); + if (!dir) { + say_syserror("popen: fdin: Can't open %s", path); + return -1; + } + fd_dir = dirfd(dir); + + for (de = readdir(dir); de; de = readdir(dir)) { + if (!strcmp(de->d_name, ".") || + !strcmp(de->d_name, "..")) + continue; + + fd_no = atoi(de->d_name); + + if (fd_no == fd_dir) + continue; + + /* We don't expect many numbers here */ + for (i = 0; i < nr_skip_fds; i++) { + if (fd_no == skip_fds[i]) { + fd_no = -1; + break; + } + } + + if (fd_no == -1) + continue; + + if (close(fd_no)) { + int saved_errno = errno; + say_syserror("popen: fdin: Can't close %d", fd_no); + closedir(dir); + errno = saved_errno; + return -1; + } + } + + if (closedir(dir)) { + say_syserror("popen: fdin: Can't close %s", path); + return -1; + } +#else + /* FIXME: What about FreeBSD/MachO? */ + (void)skip_fds; + (void)nr_skip_fds; +#endif + return 0; +} + +/** + * popen_create - Create new popen handle + * @command: a command to run inside child process + * @flags: child pipe ends specification + * + * This function creates a new child process and passes it + * pipe ends to communicate with child's stdin/stdout/stderr + * depending on @flags. Where @flags could be the bitwise or + * for the following values: + * + * POPEN_FLAG_FD_STDIN - to write to stdin + * POPEN_FLAG_FD_STDOUT - to read from stdout + * POPEN_FLAG_FD_STDERR - to read from stderr + * + * When need to pass /dev/null descriptor into a child + * the following values can be used: + * + * POPEN_FLAG_FD_STDIN_DEVNULL + * POPEN_FLAG_FD_STDOUT_DEVNULL + * POPEN_FLAG_FD_STDERR_DEVNULL + * + * These flags do have no effect if appropriate POPEN_FLAG_FD_STDx + * flags are set. + * + * When need to completely close the descriptors the + * following values can be used: + * + * POPEN_FLAG_FD_STDIN_CLOSE + * POPEN_FLAG_FD_STDOUT_CLOSE + * POPEN_FLAG_FD_STDERR_CLOSE + * + * These flags do have no effect if appropriate POPEN_FLAG_FD_STDx + * flags are set. + * + * If none of POPEN_FLAG_FD_STDx flags are specified the child + * process will run with all files inherited from a parent. + * + * By default the @command is executed via "sh -c". To execute + * @command directly use the POPEN_FLAG_NOSHELL flag. + * + * Returns pointer to new popen handle on success, + * otherwise NULL returned. + */ +struct popen_handle * +popen_create(const char *command, unsigned int flags) +{ + struct popen_handle *handle = NULL; + + char * const argv_native[] = { + (char *)command, NULL, + }; + char * const argv_sh[] = { + (char *)"sh", (char *)"-c", + (char *)command, NULL, + }; + /* + * FIXME: Need to pass env in arguments? + * Better discuss with a team. + */ + char * const envp[] = { }; + + int pfd[POPEN_FLAG_FD_STDEND_BIT][2] = { + {-1, -1}, {-1, -1}, {-1, -1}, + }; + + int saved_errno; + int ret = -1; + pid_t pid; + size_t i; + + static const struct { + unsigned int mask; + unsigned int mask_devnull; + unsigned int mask_close; + int fileno; + int *dev_null_fd; + int parent_idx; + int child_idx; + } pfd_map[POPEN_FLAG_FD_STDEND_BIT] = { + { + .mask = POPEN_FLAG_FD_STDIN, + .mask_devnull = POPEN_FLAG_FD_STDIN_DEVNULL, + .mask_close = POPEN_FLAG_FD_STDIN_CLOSE, + .fileno = STDIN_FILENO, + .dev_null_fd = &dev_null_fd_ro, + .parent_idx = 1, + .child_idx = 0, + }, { + .mask = POPEN_FLAG_FD_STDOUT, + .mask_devnull = POPEN_FLAG_FD_STDOUT_DEVNULL, + .mask_close = POPEN_FLAG_FD_STDOUT_CLOSE, + .fileno = STDOUT_FILENO, + .dev_null_fd = &dev_null_fd_wr, + .parent_idx = 0, + .child_idx = 1, + }, { + .mask = POPEN_FLAG_FD_STDERR, + .mask_devnull = POPEN_FLAG_FD_STDERR_DEVNULL, + .mask_close = POPEN_FLAG_FD_STDERR_CLOSE, + .fileno = STDERR_FILENO, + .dev_null_fd = &dev_null_fd_wr, + .parent_idx = 0, + .child_idx = 1, + }, + }; + /* + * At max we could be skipping each pipe end + * plus dev/null variants. + */ + int skip_fds[POPEN_FLAG_FD_STDEND_BIT * 2 + 2]; + size_t nr_skip_fds = 0; + + sigset_t oldmask; + + say_debug("popen: command \"%s\" flags %#x", command, flags); + + if (!command) { + errno = EINVAL; + say_syserror("popen: No command provided"); + return NULL; + } + + /* + * If sometime earlier we've been unable + * to unblock signals don't allow to create + * new pipes, the system is unstable. + */ + if (unlikely(popen_blockmask_broken)) { + errno = EINVAL; + say_error("popen: Service unavailable"); + return NULL; + } + + static_assert(STDIN_FILENO == 0 && + STDOUT_FILENO == 1 && + STDERR_FILENO == 2, + "stdin/out/err are not posix compatible"); + + static_assert(lengthof(pfd) == lengthof(pfd_map), + "Pipes number does not map to fd bits"); + + static_assert(POPEN_FLAG_FD_STDIN_BIT == STDIN_FILENO && + POPEN_FLAG_FD_STDOUT_BIT == STDOUT_FILENO && + POPEN_FLAG_FD_STDERR_BIT == STDERR_FILENO, + "Popen flags do not match stdX"); + + handle = handle_alloc(flags); + if (!handle) + return NULL; + + skip_fds[nr_skip_fds++] = dev_null_fd_ro; + skip_fds[nr_skip_fds++] = dev_null_fd_wr; + assert(nr_skip_fds <= lengthof(skip_fds)); + + for (i = 0; i < lengthof(pfd_map); i++) { + if (flags & pfd_map[i].mask) { + if (create_pipe(pfd[i])) + goto out_err; + + skip_fds[nr_skip_fds++] = pfd[i][0]; + skip_fds[nr_skip_fds++] = pfd[i][1]; + assert(nr_skip_fds <= lengthof(skip_fds)); + + say_debug("popen: created pipe %d [%d:%d]", + i, pfd[i][0], pfd[i][1]); + } else if (!(flags & pfd_map[i].mask_devnull) && + !(flags & pfd_map[i].mask_close)) { + skip_fds[nr_skip_fds++] = pfd_map[i].fileno; + + say_debug("popen: inherit fd %d", + pfd_map[i].fileno); + } + } + + /* + * Need to block signals so we won't hit + * a race where child process exit early + * and this pid will get reused by someone + * else (remember the libev wait() by self). + */ + if (popen_sigchld_block(&oldmask)) { + say_syserror("popen: Unable to block SIGCHLD"); + goto out_err; + } + + /* + * We have to use vfork here because libev has own + * at_fork helpers with mutex, so we will have double + * lock here and stuck forever otherwise. + * + * The good news that this affect coio only the + * other tarantoll threads are not waiting for + * vfork to complete. + */ + handle->pid = vfork(); + if (handle->pid < 0) { + goto out_err_unblock; + } else if (handle->pid == 0) { + /* + * The documentation for libev says that + * each new fork should call ev_loop_fork(EV_DEFAULT) + * on every new child process, but we're going + * to execute bew binary anyway thus everything + * related to OS resources will be eliminated except + * file descriptors we use for piping. Thus don't + * do anything special. + */ + + /* + * We have to be a session leader otherwise + * won't be able to kill a group of children. + */ + ret = setsid(); + if (ret < 0) + _exit(errno); + + ret = close_inherited_fds(skip_fds, nr_skip_fds); + if (ret) + _exit(errno); + + for (i = 0; !ret && i < lengthof(pfd_map); i++) { + int fileno = pfd_map[i].fileno; + if (flags & pfd_map[i].mask) { + int child_idx = pfd_map[i].child_idx; + + /* put child peer end at known place */ + if (dup2(pfd[i][child_idx], fileno) < 0) { + ret = errno; + continue; + } + + /* parent's pipe no longer needed */ + if (close(pfd[i][0])) { + ret = errno; + continue; + } else if (close(pfd[i][1])) { + ret = errno; + continue; + } + } else { + /* Use /dev/null if requested */ + if (flags & pfd_map[i].mask_devnull) { + if (dup2(*pfd_map[i].dev_null_fd, fileno) < 0) { + ret = errno; + continue; + } + } + + /* Or close the destination completely */ + if (flags & pfd_map[i].mask_close) { + if (close(fileno) && errno != EBADF) { + ret = errno; + continue; + } + } + + /* Otherwise inherit from a parent */ + } + } + + if (close(dev_null_fd_ro)) + ret = errno; + else if (close(dev_null_fd_wr)) + ret = errno; + + if (!ret) { + if (flags & POPEN_FLAG_SHELL) + ret = execve(_PATH_BSHELL, argv_sh, envp); + else + ret = execve(command, argv_native, envp); + } + _exit(ret); + unreachable(); + } + + for (i = 0; i < lengthof(pfd_map); i++) { + if (flags & pfd_map[i].mask) { + int parent_idx = pfd_map[i].parent_idx; + int child_idx = pfd_map[i].child_idx; + + handle->fds[i] = pfd[i][parent_idx]; + say_debug("popen: keep pipe %d [%d]", + i, handle->fds[i]); + + if (close(pfd[i][child_idx])) + goto out_err_unblock; + + pfd[i][child_idx] = -1; + } + } + + pid = popen_wstatus_blocked(handle, NULL, true); + if (pid == -1) { + say_debug("popen: Child %d exited early", + handle->pid); + handle->pid = -1; + } else if (pid == 1) { + bool exited = WIFEXITED(handle->wstatus); + bool signaled = WIFSIGNALED(handle->wstatus); + + if (exited || signaled) { + say_debug("popen: Child %d %s with %d", + pid, exited ? "exited" : "signaled", + exited ? WEXITSTATUS(handle->wstatus) : + WTERMSIG(handle->wstatus)); + handle->pid = -1; + } + } + + /* + * Link it into global list for force + * cleanup on exit. + */ + rlist_add(&popen_head, &handle->list); + + if (handle->pid != -1) { + /* + * To watch when a child get exited. + */ + popen_register(handle); + + say_debug("popen: ev_child_start %d", handle->pid); + ev_child_init(&handle->ev_sigchld, ev_sigchld_cb, pid, 0); + ev_child_start(EV_DEFAULT_ &handle->ev_sigchld); + } + + say_debug("popen: created child %d", handle->pid); + + popen_sigchld_unblock(&oldmask); + return handle; + +out_err_unblock: + saved_errno = errno; + popen_sigchld_unblock(&oldmask); + errno = saved_errno; +out_err: + saved_errno = errno; + popen_destroy(handle); + for (i = 0; i < lengthof(pfd); i++) { + if (pfd[i][0] != -1) + close(pfd[i][0]); + if (pfd[i][1] != -1) + close(pfd[i][1]); + } + errno = saved_errno; + return NULL; +} + +/** + * popen_init - initialize popen subsystem + * + * Allocates resource needed for popen management. + */ +void +popen_init(void) +{ + static const int flags = O_CLOEXEC; + static const char dev_null_path[] = "/dev/null"; + + say_debug("popen: initialize"); + popen_pids_map = mh_i32ptr_new(); + + dev_null_fd_ro = open(dev_null_path, O_RDONLY | flags); + if (dev_null_fd_ro < 0) + goto out_err; + dev_null_fd_wr = open(dev_null_path, O_WRONLY | flags); + if (dev_null_fd_wr < 0) + goto out_err; + + /* + * FIXME: We should allocate them somewhere + * after STDERR_FILENO so the child would be + * able to find these fd numbers not occupied. + * Other option is to use unix scm and send + * them to the child on demand. + * + * For now left as is since we don't close + * our main stdX descriptors and they are + * inherited when we call first vfork. + */ + if (dev_null_fd_ro <= STDERR_FILENO || + dev_null_fd_wr <= STDERR_FILENO) { + say_error("popen: /dev/null %d %d numbers are too low", + dev_null_fd_ro, dev_null_fd_wr); + } + + sigemptyset(&popen_blockmask); + sigaddset(&popen_blockmask, SIGCHLD); + return; + +out_err: + say_syserror("popen: Can't open %s", dev_null_path); + if (dev_null_fd_ro >= 0) + close(dev_null_fd_ro); + if (dev_null_fd_wr >= 0) + close(dev_null_fd_wr); + mh_i32ptr_delete(popen_pids_map); + exit(EXIT_FAILURE); +} + +/** + * popen_fini - finalize popen subsystem + * + * Kills all running children and frees resources. + */ +void +popen_fini(void) +{ + struct popen_handle *handle, *tmp; + sigset_t oldmask; + + say_debug("popen: finalize"); + + close(dev_null_fd_ro); + close(dev_null_fd_wr); + dev_null_fd_ro = -1; + dev_null_fd_wr = -1; + + if (popen_sigchld_block(&oldmask)) + return; + + rlist_foreach_entry_safe(handle, &popen_head, list, tmp) { + /* + * If children are still running we should move + * them out of the pool and kill them all then. + * Note though that we don't do an explicit wait + * here, OS will reap them anyway, just release + * the memory occupied for popen handles. + */ + if (popen_may_pidop(handle)) + popen_unregister(handle); + popen_destroy_blocked(handle); + } + + popen_sigchld_unblock(&oldmask); + + if (popen_pids_map) { + mh_i32ptr_delete(popen_pids_map); + popen_pids_map = NULL; + } +} diff --git a/src/lib/core/popen.h b/src/lib/core/popen.h new file mode 100644 index 000000000..471607662 --- /dev/null +++ b/src/lib/core/popen.h @@ -0,0 +1,135 @@ +#ifndef TARANTOOL_LIB_CORE_POPEN_H_INCLUDED +#define TARANTOOL_LIB_CORE_POPEN_H_INCLUDED + +#if defined(__cplusplus) +extern "C" { +#endif + +#include +#include + +#include + +#include "third_party/tarantool_ev.h" + +enum popen_flag_bits { + POPEN_FLAG_NONE = (0 << 0), + + /* + * Which fd we should handle. + */ + POPEN_FLAG_FD_STDIN_BIT = 0, + POPEN_FLAG_FD_STDIN = (1 << POPEN_FLAG_FD_STDIN_BIT), + + POPEN_FLAG_FD_STDOUT_BIT = 1, + POPEN_FLAG_FD_STDOUT = (1 << POPEN_FLAG_FD_STDOUT_BIT), + + POPEN_FLAG_FD_STDERR_BIT = 2, + POPEN_FLAG_FD_STDERR = (1 << POPEN_FLAG_FD_STDERR_BIT), + + /* + * Number of bits occupied for stdX descriptors. + */ + POPEN_FLAG_FD_STDEND_BIT = POPEN_FLAG_FD_STDERR_BIT + 1, + + /* + * Instead of inheriting fds from a parent + * rather use /dev/null. + */ + POPEN_FLAG_FD_STDIN_DEVNULL_BIT = 3, + POPEN_FLAG_FD_STDIN_DEVNULL = (1 << POPEN_FLAG_FD_STDIN_DEVNULL_BIT), + POPEN_FLAG_FD_STDOUT_DEVNULL_BIT= 4, + POPEN_FLAG_FD_STDOUT_DEVNULL = (1 << POPEN_FLAG_FD_STDOUT_DEVNULL_BIT), + POPEN_FLAG_FD_STDERR_DEVNULL_BIT= 5, + POPEN_FLAG_FD_STDERR_DEVNULL = (1 << POPEN_FLAG_FD_STDERR_DEVNULL_BIT), + + /* + * Instead of inheriting fds from a parent + * close fds completely. + */ + POPEN_FLAG_FD_STDIN_CLOSE_BIT = 6, + POPEN_FLAG_FD_STDIN_CLOSE = (1 << POPEN_FLAG_FD_STDIN_CLOSE_BIT), + POPEN_FLAG_FD_STDOUT_CLOSE_BIT = 7, + POPEN_FLAG_FD_STDOUT_CLOSE = (1 << POPEN_FLAG_FD_STDOUT_CLOSE_BIT), + POPEN_FLAG_FD_STDERR_CLOSE_BIT = 8, + POPEN_FLAG_FD_STDERR_CLOSE = (1 << POPEN_FLAG_FD_STDERR_CLOSE_BIT), + + /* + * Call exec directly or via shell. + */ + POPEN_FLAG_SHELL_BIT = 9, + POPEN_FLAG_SHELL = (1 << POPEN_FLAG_SHELL_BIT), +}; + +/** + * struct popen_handle - an instance of popen object + * + * @pid: pid of a child process + * @wstatus: exit status of a child process + * @ev_sigchld: needed by the libev to watch children + * @flags: popen_flag_bits + * @fds: std(in|out|err) + */ +struct popen_handle { + pid_t pid; + int wstatus; + ev_child ev_sigchld; + struct rlist list; + unsigned int flags; + int fds[POPEN_FLAG_FD_STDEND_BIT]; +}; + +/** + * struct popen_handle - popen object statistics + * + * @pid: pid of a child process + * @wstatus: exit status of a child process + * @flags: popen_flag_bits + * @fds: std(in|out|err) + * + * This is a short version of struct popen_handle which should + * be used by external code and which should be changed/extended + * with extreme caution since it is used in Lua code. Consider it + * as API for external modules. + */ +struct popen_stat { + pid_t pid; + unsigned int flags; + int fds[POPEN_FLAG_FD_STDEND_BIT]; +}; + +extern int +popen_stat(struct popen_handle *handle, struct popen_stat *st); + +extern ssize_t +popen_write(struct popen_handle *handle, void *buf, + size_t count, unsigned int flags); + +extern ssize_t +popen_read_timeout(struct popen_handle *handle, void *buf, + size_t count, unsigned int flags, + int timeout_msecs); + +extern int +popen_wstatus(struct popen_handle *handle, int *wstatus); + +extern int +popen_kill(struct popen_handle *handle); + +extern int +popen_destroy(struct popen_handle *handle); + +extern struct popen_handle * +popen_create(const char *command, unsigned int flags); + +extern void +popen_init(void); + +extern void +popen_fini(void); + +#if defined(__cplusplus) +} /* extern "C" */ +#endif + +#endif /* TARANTOOL_LIB_CORE_POPEN_H_INCLUDED */ diff --git a/src/main.cc b/src/main.cc index 0ff2213b6..a9999b47d 100644 --- a/src/main.cc +++ b/src/main.cc @@ -77,6 +77,7 @@ #include "box/session.h" #include "systemd.h" #include "crypto/crypto.h" +#include "core/popen.h" static pid_t master_pid = getpid(); static struct pidfh *pid_file_handle; @@ -614,6 +615,8 @@ tarantool_free(void) title_free(main_argc, main_argv); + popen_fini(); + /* unlink pidfile. */ if (pid_file_handle != NULL && pidfile_remove(pid_file_handle) == -1) say_syserror("failed to remove pid file '%s'", pid_file); @@ -801,6 +804,7 @@ main(int argc, char **argv) exception_init(); fiber_init(fiber_cxx_invoke); + popen_init(); coio_init(); coio_enable(); signal_init(); -- 2.20.1