[Tarantool-patches] [PATCH 2/2] popen: add popen Lua module

Igor Munkin imun at tarantool.org
Sun Apr 19 15:38:45 MSK 2020


Sasha,

Thanks for the patch! I left several comments regarding a docbot request
with API description and usage, though they can be freely ignored.

On 18.04.20, Alexander Turenko wrote:
> Co-developed-by: Cyrill Gorcunov <gorcunov at gmail.com>
> Co-developed-by: Igor Munkin <imun at tarantool.org>
> 
> Fixes #4031
> 
> @TarantoolBot document
> Title: popen module
> 

<snipped>

> 
> Module functions
> ================
> 
> The `popen` module provides two functions to create that named popen
> object `popen.shell` which is the similar to libc `popen` syscall and

Typo: I guess a colon is missing here.

> `popen.new` to create popen object with more specific options.
> 
> `popen.shell(command[, mode]) -> handle, err`
> ---------------------------------------------
> 
> Execute a shell command.
> 
> @param command  a command to run, mandatory
> @param mode     communication mode, optional
>                 'w'    to use ph:write()
>                 'r'    to use ph:read()
>                 'R'    to use ph:read({stderr = true})
>                 nil    inherit parent's std* file descriptors
> 
> Several mode characters can be set together: 'rw', 'rRw', etc.
> 
> This function is just shortcut for popen.new({command}, opts)
> with opts.{shell,setsid,group_signal} set to `true` and
> and opts.{stdin,stdout,stderr} set based on `mode` parameter.
> 
> All std* streams are inherited from parent by default if it is
> not changed using mode: 'r' for stdout, 'R' for stderr, 'w' for
> stdin.
> 
> Raise an error on incorrect parameters:
> 
> - IllegalParams: incorrect type or value of a parameter.
> 
> Return a popen handle on success.
> 
> Return `nil, err` on a failure.
> @see popen.new() for possible reasons.
> 
> Example:
> 
>  | local popen = require('popen')
>  |
>  | local ph = popen.shell('date', 'r')
>  | local date = ph:read():rstrip()

Minor: IMHO <rstrip> call might lead to a misuse, considering there is
no word about it in description below. I guess you can drop it.

>  | ph:close()
>  | print(date)
> 
> Execute 'sh -c date' command, read the output and close the
> popen object.
> 
> `popen.new(argv[, opts]) -> handle, err`
> ----------------------------------------
> 
> Execute a child program in a new process.
> 
> @param argv  an array of a program to run with
>              command line options, mandatory;
>              absolute path to the program is required

Absolute path is not required when shell option is set to <true>.

> 
> @param opts  table of options
> 
> @param opts.stdin   action on STDIN_FILENO
> @param opts.stdout  action on STDOUT_FILENO
> @param opts.stderr  action on STDERR_FILENO
> 
> File descriptor actions:
> 
>     popen.opts.INHERIT  (== 'inherit') [default]
>                         inherit the fd from the parent
>     popen.opts.DEVNULL  (== 'devnull')
>                         open /dev/null on the fd
>     popen.opts.CLOSE    (== 'close')
>                         close the fd
>     popen.opts.PIPE     (== 'pipe')
>                         feed data from/to the fd to parent
>                         using a pipe
> 
> @param opts.env  an array of environment variables to
>                  be used inside a process;
>                  - when is not set then the current
>                    environment is inherited;
>                  - if set then the environment will be
>                    replaced
>                  - if set to an empty array then the
>                    environment will be dropped

Minor: I looks more convenient to sort the lines above the following
way:
* when is not set <...>
* if set to an empty table <...>
* if set to a non-empty table <...>

> 
> @param opts.shell            (boolean, default: false)
>        true                  run a child process via
>                              'sh -c "${opts.argv}"'
>        false                 call the executable directly
> 
> @param opts.setsid           (boolean, default: false)
>        true                  start executable inside a new
>                              session
>        false                 don't do that

Minor: I reword it in more clear way below:
| @param opts.setsid           (boolean, default: false)
|        true                  run the program in a new
|                              session
|        false                 run the program in the
|                              tarantool instance's
|                              session

> 
> @param opts.close_fds        (boolean, default: true)
>        true                  close all inherited fds from a
>                              parent
>        false                 don't do that
> 
> @param opts.restore_signals  (boolean, default: true)
>        true                  reset all signal actions
>                              modified in parent's process
>        false                 inherit changed actions
> 
> @param opts.group_signal     (boolean, default: false)
>        true                  send signal to a child process
>                              group (only when opts.setsid is
>                              enabled)
>        false                 send signal to a child process
>                              only
> 
> @param opts.keep_child       (boolean, default: false)
>        true                  don't send SIGKILL to a child
>                              process at freeing (by :close()
>                              or Lua GC)
>        false                 send SIGKILL to a child process
>                              (or a process group if
>                              opts.group_signal is enabled) at
>                              :close() or collecting of the
>                              handle by Lua GC
> 
> The returned handle is subject to garbage collection, but
> it may be freed explicitly using :close(). SIGKILL is sent
> to the child process at :close() / GC if it is alive and
> opts.keep_child is not set.

Minor: Let's adjust the terminology. What do you mean speaking about
"handle"? If it's a GC object, then there is no way to release it
manually or "explicitly", only via platform GC. If it's a GC object
*payload* then it's not managed by Lua GC and need to be explicitly
released via special method or the corresponding __gc metamethod.

> 
> It is recommended to use opts.setsid + opts.group_signal
> if a child process may spawn its own childs and they all
> should be killed together.
> 
> Note: A signal will not be sent if the child process is
> already dead: otherwise we might kill another process that
> occupies the same PID later. This means that if the child
> process dies before its own childs, the function will not
> send a signal to the process group even when opts.setsid and
> opts.group_signal are set.
> 
> Use os.environ() to pass copy of current environment with
> several replacements (see example 2 below).
> 
> Raise an error on incorrect parameters:
> 
> - IllegalParams: incorrect type or value of a parameter.
> - IllegalParams: group signal is set, while setsid is not.
> 
> Return a popen handle on success.
> 
> Return `nil, err` on a failure. Possible reasons:
> 
> - SystemError: dup(), fcntl(), pipe(), vfork() or close()
>                fails in the parent process.
> - SystemError: (temporary restriction) the parent process
>                has closed stdin, stdout or stderr.
> - OutOfMemory: unable to allocate the handle or a temporary
>                buffer.
> 
> Example 1:
> 
>  | local popen = require('popen')
>  |
>  | local ph = popen.new({'/bin/date'}, {
>  |     stdout = popen.opts.PIPE,
>  | })
>  | local date = ph:read():rstrip()
>  | ph:close()
>  | print(date) -- Thu 16 Apr 2020 01:40:56 AM MSK
> 
> Execute 'date' command, read the result and close the
> popen object.

Minor: Again, no mention about rstrip.

> 
> Example 2:
> 
>  | local popen = require('popen')
>  |
>  | local env = os.environ()
>  | env['FOO'] = 'bar'
>  |
>  | local ph = popen.new({'echo "${FOO}"'}, {
>  |     stdout = popen.opts.PIPE,
>  |     shell = true,
>  |     env = env,
>  | })
>  | local res = ph:read():rstrip()
>  | ph:close()
>  | print(res) -- bar
> 
> It is quite similar to the previous one, but sets the
> environment variable and uses shell builtin 'echo' to
> show it.
> 
> Example 3:
> 
>  | local popen = require('popen')
>  |
>  | local ph = popen.new({'echo hello >&2'}, { -- !!
>  |     stderr = popen.opts.PIPE,              -- !!
>  |     shell = true,
>  | })
>  | local res = ph:read({stderr = true}):rstrip()
>  | ph:close()
>  | print(res) -- hello
> 
> This example demonstrates how to capture child's stderr.
> 
> Example 4:
> 
>  | local function call_jq(stdin, filter)

Minor: <stdin> look a bit misleading. What about <input>, <json>, etc?

>  |     -- Start jq process, connect to stdin, stdout and stderr.
>  |     local jq_argv = {'/usr/bin/jq', '-M', '--unbuffered', filter}
>  |     local ph, err = popen.new(jq_argv, {
>  |         stdin = popen.opts.PIPE,
>  |         stdout = popen.opts.PIPE,
>  |         stderr = popen.opts.PIPE,
>  |     })
>  |     if ph == nil then return nil, err end
>  |
>  |     -- Write input data to child's stdin and send EOF.
>  |     local ok, err = ph:write(stdin)
>  |     if not ok then return nil, err end
>  |     ph:shutdown({stdin = true})
>  |
>  |     -- Read everything until EOF.
>  |     local chunks = {}
>  |     while true do
>  |         local chunk, err = ph:read()
>  |         if chunk == nil then
>  |             ph:close()
>  |             return nil, err
>  |         end
>  |         if chunk == '' then break end -- EOF
>  |         table.insert(chunks, chunk)
>  |     end
>  |
>  |     -- Read diagnostics from stderr if any.
>  |     local err = ph:read({stderr = true})
>  |     if err ~= '' then
>  |         ph:close()
>  |         return nil, err
>  |     end
>  |
>  |     -- Glue all chunks, strip trailing newline.

Side note: This comment is definitely needed near <rstrip> calls above.

>  |     return table.concat(chunks):rstrip()
>  | end
> 
> Demonstrates how to run a stream program (like `grep`, `sed`
> and so), write to its stdin and read from its stdout.
> 
> The example assumes that input data are small enough to fit
> a pipe buffer (typically 64 KiB, but depends on a platform
> and its configuration). It will stuck in :write() for large
> data. How to handle this case: call :read() in a loop in
> another fiber (start it before a first :write()).
> 
> If a process writes large text to stderr, it may fill out
> stderr pipe buffer and block in write(2, ...). So we need to
> read stderr too in another fiber to handle this case.

Typo: s/stderr too in another fiber/stderr in another fiber too/.

> 
> Handle methods
> ==============
> 
> `popen_handle:read([opts]) -> str, err`
> ---------------------------------------
> 
> Read data from a child peer.
> 
> @param handle        handle of a child process
> @param opts          an options table
> @param opts.stdout   whether to read from stdout, boolean
>                      (default: true)
> @param opts.stderr   whether to read from stderr, boolean
>                      (default: false)
> @param opts.timeout  time quota in seconds
>                      (default: 100 years)
> 
> Read data from stdout or stderr streams with @a timeout.
> By default it reads from stdout. Set @a opts.stderr to
> `true` to read from stderr.

If only <stderr> is set to true, is stdout output also captured?

> 
> Raise an error on incorrect parameters or when the fiber is
> cancelled:
> 
> - IllegalParams:    incorrect type or value of a parameter.
> - IllegalParams:    called on a closed handle.
> - IllegalParams:    opts.stdout and opts.stderr are set both
> - IllegalParams:    a requested IO operation is not supported
>                     by the handle (stdout / stderr is not
>                     piped).
> - IllegalParams:    attempt to operate on a closed file
>                     descriptor.
> - FiberIsCancelled: cancelled by an outside code.
> 
> Return a string on success, an empty string at EOF.
> 
> Return `nil, err` on a failure. Possible reasons:
> 
> - SocketError: an IO error occurs at read().

SocketError on pipe looks ridiculous to me. Maybe an IO one?

> - TimedOut:    @a timeout quota is exceeded.
> - OutOfMemory: no memory space for a buffer to read into.
> - LuajitError: ("not enough memory"): no memory space for
>                the Lua string.
> 

<snipped>

> 
> `popen_handle:shutdown(opts) -> true`

Minor: Considering the method description, <eof> looks more convenient
name here. On the other hand <eof> name might be misguiding one
considering feof(3) standart C function.

> ------------------------------------------
> 
> Close parent's ends of std* fds.
> 
> @param handle        handle of a child process
> @param opts          an options table
> @param opts.stdin    close parent's end of stdin, boolean
> @param opts.stdout   close parent's end of stdout, boolean
> @param opts.stderr   close parent's end of stderr, boolean
> 
> The main reason to use this function is to send EOF to
> child's stdin. However parent's end of stdout / stderr
> may be closed too.
> 
> The function does not fail on already closed fds (idempotence).
> However it fails on attempt to close the end of a pipe that was
> never exist. In other words, only those std* options that
> were set to popen.opts.PIPE at a handle creation may be used
> here (for popen.shell: 'r' corresponds to stdout, 'R' to stderr
> and 'w' to stdin).
> 
> The function does not close any fds on a failure: either all
> requested fds are closed or neither of them.
> 
> Example:
> 
>  | local popen = require('popen')
>  |
>  | local ph = popen.shell('sed s/foo/bar/', 'rw')
>  | ph:write('lorem foo ipsum')
>  | ph:shutdown({stdin = true})
>  | local res = ph:read()
>  | ph:close()
>  | print(res) -- lorem bar ipsum
> 
> Raise an error on incorrect parameters:
> 
> - IllegalParams:  an incorrect handle parameter.
> - IllegalParams:  called on a closed handle.
> - IllegalParams:  neither stdin, stdout nor stderr is choosen.
> - IllegalParams:  a requested IO operation is not supported
>                   by the handle (one of std* is not piped).
> 
> Return `true` on success.
> 

<snipped>

> 
> `popen_handle:info() -> res, err`

<err> is never returned.

> ---------------------------------
> 
> Return information about popen handle.
> 
> @param handle  a handle of a child process
> 
> Raise an error on incorrect parameters:
> 
> - IllegalParams: an incorrect handle parameter.
> - IllegalParams: called on a closed handle.
> 
> Return information about the handle in the following
> format:
> 
>     {
>         pid = <number> or <nil>,
>         command = <string>,
>         opts = <table>,
>         status = <table>,
>         stdin = one-of(
>             popen.stream.OPENED (== 'opened'),
>             popen.stream.CLOSED (== 'closed'),
>             nil,
>         ),
>         stdout = one-of(
>             popen.stream.OPENED (== 'opened'),
>             popen.stream.CLOSED (== 'closed'),
>             nil,
>         ),
>         stderr = one-of(
>             popen.stream.OPENED (== 'opened'),
>             popen.stream.CLOSED (== 'closed'),
>             nil,
>         ),
>     }
> 
> `pid` is a process id of the process when it is alive,
> otherwise `pid` is nil.
> 
> `command` is a concatenation of space separated arguments
> that were passed to execve(). Multiword arguments are quoted.
> Quotes inside arguments are not escaped.
> 
> `opts` is a table of handle options in the format of
> popen.new() `opts` parameter. `opts.env` is not shown here,
> because the environment variables map is not stored in a
> handle.
> 
> `status` is a table that represents a process status in the
> following format:
> 
>     {
>         state = one-of(
>             popen.state.ALIVE    (== 'alive'),
>             popen.state.EXITED   (== 'exited'),
>             popen.state.SIGNALED (== 'signaled'),
>         )
> 
>         -- Present when `state` is 'exited'.
>         exit_code = <number>,
> 
>         -- Present when `state` is 'signaled'.
>         signo = <number>,
>         signame = <string>,
>     }
> 
> `stdin`, `stdout`, `stderr` reflect status of parent's end
> of a piped stream. When a stream is not piped the field is
> not present (`nil`). When it is piped, the status may be
> one of the following:
> 
> - popen.stream.OPENED  (== 'opened')
> - popen.stream.CLOSED  (== 'closed')
> 
> The status may be changed from 'opened' to 'closed'
> by :shutdown({std... = true}) call.
> 
> Example 1 (tarantool console):
> 
>  | tarantool> require('popen').new({'/usr/bin/touch', '/tmp/foo'})
>  | ---
>  | - command: /usr/bin/touch /tmp/foo
>  |   status:
>  |     state: alive
>  |   opts:
>  |     stdout: inherit
>  |     stdin: inherit
>  |     group_signal: false
>  |     keep_child: false
>  |     close_fds: true
>  |     restore_signals: true
>  |     shell: false
>  |     setsid: false
>  |     stderr: inherit
>  |   pid: 9499
>  | ...
> 
> Example 2 (tarantool console):
> 
>  | tarantool> require('popen').shell('grep foo', 'wrR')
>  | ---
>  | - stdout: opened
>  |   command: sh -c 'grep foo'
>  |   stderr: opened
>  |   status:
>  |     state: alive
>  |   stdin: opened
>  |   opts:
>  |     stdout: pipe
>  |     stdin: pipe
>  |     group_signal: true
>  |     keep_child: false
>  |     close_fds: true
>  |     restore_signals: true
>  |     shell: true
>  |     setsid: true
>  |     stderr: pipe
>  |   pid: 10497
>  | ...
> 
> `popen_handle:wait() -> res, err`

<err> is never returned.

> ---------------------------------
> 
> Wait until a child process get exited or signaled.
> 
> @param handle  a handle of process to wait
> 
> Raise an error on incorrect parameters or when the fiber is
> cancelled:
> 
> - IllegalParams:    an incorrect handle parameter.
> - IllegalParams:    called on a closed handle.
> - FiberIsCancelled: cancelled by an outside code.
> 
> Return a process status table (the same as ph.status and
> ph.info().status). @see popen_handle:info() for the format
> of the table.
> 

<snipped>

> ---

<snipped>

> -- 
> 2.25.0
> 

-- 
Best regards,
IM


More information about the Tarantool-patches mailing list