[Tarantool-patches] [PATCH 2/2] popen: add popen Lua module

Alexander Turenko alexander.turenko at tarantool.org
Mon Apr 20 01:24:34 MSK 2020


Thanks for the careful proofreading!

Aside of changes below I changed `popen.stream.OPENED (== 'opened')` to
`popen.stream.OPEN (== 'open')` to match english grammar. Same for
statuses -> status (yep, 'statuses' is allowed in English, but it is
ugly).

Force-pushed Totktonada/gh-4031-popen-13-full-ci.

WBR, Alexander Turenko.

----

> > The `popen` module provides two functions to create that named popen
> > object `popen.shell` which is the similar to libc `popen` syscall and
> 
> Typo: I guess a colon is missing here.

Thanks! Fixed.

> > Example:
> > 
> >  | local popen = require('popen')
> >  |
> >  | local ph = popen.shell('date', 'r')
> >  | local date = ph:read():rstrip()
> 
> Minor: IMHO <rstrip> call might lead to a misuse, considering there is
> no word about it in description below. I guess you can drop it.

It is quite minor IMHO, but I don't mind.

Added the explanation to .shell() function description in the code and
the commit message:

diff --git a/src/lua/popen.c b/src/lua/popen.c
index 40f4343eb..8ae01b3f1 100644
--- a/src/lua/popen.c
+++ b/src/lua/popen.c
@@ -1408,6 +1396,18 @@ err:
  *
  * Execute 'sh -c date' command, read the output and close the
  * popen object.
+ *
+ * Unix defines a text file as a sequence of lines, each ends
+ * with the newline symbol. The same convention is usually
+ * applied for a text output of a command (so when it is
+ * redirected to a file, the file will be correct).
+ *
+ * However internally an application usually operates on
+ * strings, which are NOT newline terminated (e.g. literals
+ * for error messages). The newline is usually added right
+ * before a string is written to the outside world (stdout,
+ * console or log). :rstrip() in the example above is shown
+ * for this sake.
  */
 static int
 lbox_popen_shell(struct lua_State *L)

I left .new() description intact: it should be described once and
.shell() going first in the documentation request (commit message).

> > Execute a child program in a new process.
> > 
> > @param argv  an array of a program to run with
> >              command line options, mandatory;
> >              absolute path to the program is required
> 
> Absolute path is not required when shell option is set to <true>.

Good catch!

Added to the comment in the code and to the commit message:

diff --git a/src/lua/popen.c b/src/lua/popen.c
index 2aa0a5851..bf06bbfa7 100644
--- a/src/lua/popen.c
+++ b/src/lua/popen.c
@@ -1114,6 +1114,7 @@ luaT_popen_parse_mode(struct lua_State *L, int idx)
  * @param argv  an array of a program to run with
  *              command line options, mandatory;
  *              absolute path to the program is required
+ *              when @a opts.shell is false (default)
  *
  * @param opts  table of options
  *

> > @param opts.env  an array of environment variables to
> >                  be used inside a process;
> >                  - when is not set then the current
> >                    environment is inherited;
> >                  - if set then the environment will be
> >                    replaced
> >                  - if set to an empty array then the
> >                    environment will be dropped
> 
> Minor: I looks more convenient to sort the lines above the following
> way:
> * when is not set <...>
> * if set to an empty table <...>
> * if set to a non-empty table <...>

I don't mind.

Reordered in the code and the commit message.

Also I rephrased the description a bit to highlight the fact that the
parameter is key-value mapping, not an array.

diff --git a/src/lua/popen.c b/src/lua/popen.c
index bf06bbfa7..f17e3faf1 100644
--- a/src/lua/popen.c
+++ b/src/lua/popen.c
@@ -1134,14 +1134,16 @@ luaT_popen_parse_mode(struct lua_State *L, int idx)
  *                         feed data from/to the fd to parent
  *                         using a pipe
  *
- * @param opts.env  an array of environment variables to
- *                  be used inside a process;
+ * @param opts.env  a table of environment variables to
+ *                  be used inside a process; key is a
+ *                  variable name, value is a variable
+ *                  value.
  *                  - when is not set then the current
  *                    environment is inherited;
+ *                  - if set to an empty table then the
+ *                    environment will be dropped
  *                  - if set then the environment will be
  *                    replaced
- *                  - if set to an empty array then the
- *                    environment will be dropped
  *
  * @param opts.shell            (boolean, default: false)
  *        true                  run a child process via

> > @param opts.setsid           (boolean, default: false)
> >        true                  start executable inside a new
> >                              session
> >        false                 don't do that
> 
> Minor: I reword it in more clear way below:
> | @param opts.setsid           (boolean, default: false)
> |        true                  run the program in a new
> |                              session
> |        false                 run the program in the
> |                              tarantool instance's
> |                              session

So I would add "...and process group". Changed:

diff --git a/src/lua/popen.c b/src/lua/popen.c
index f17e3faf1..ec8c95662 100644
--- a/src/lua/popen.c
+++ b/src/lua/popen.c
@@ -1151,9 +1151,11 @@ luaT_popen_parse_mode(struct lua_State *L, int idx)
  *        false                 call the executable directly
  *
  * @param opts.setsid           (boolean, default: false)
- *        true                  start executable inside a new
+ *        true                  run the program in a new
  *                              session
- *        false                 don't do that
+ *        false                 run the program in the
+ *                              tarantool instance's
+ *                              session and process group
  *
  * @param opts.close_fds        (boolean, default: true)
  *        true                  close all inherited fds from a

> > The returned handle is subject to garbage collection, but
> > it may be freed explicitly using :close(). SIGKILL is sent
> > to the child process at :close() / GC if it is alive and
> > opts.keep_child is not set.
> 
> Minor: Let's adjust the terminology. What do you mean speaking about
> "handle"? If it's a GC object, then there is no way to release it
> manually or "explicitly", only via platform GC. If it's a GC object
> *payload* then it's not managed by Lua GC and need to be explicitly
> released via special method or the corresponding __gc metamethod.

My intention is to give two ideas here:

* Feel free to use explicit :close() or lean on Lua GC.
* :close() / GC handler kills the child process.

But being strict in terms and wording is goal for me too, so I reword
the paragraph to eliminate phrases, which requires a GC object / a GC
object payload as an object / a subject.

Changed in the code and the commit:

diff --git a/src/lua/popen.c b/src/lua/popen.c
index ec8c95662..31b4bc2b0 100644
--- a/src/lua/popen.c
+++ b/src/lua/popen.c
@@ -1184,10 +1184,11 @@ luaT_popen_parse_mode(struct lua_State *L, int idx)
  *                              :close() or collecting of the
  *                              handle by Lua GC
  *
- * The returned handle is subject to garbage collection, but
- * it may be freed explicitly using :close(). SIGKILL is sent
- * to the child process at :close() / GC if it is alive and
- * opts.keep_child is not set.
+ * The returned handle provides :close() method to explicitly
+ * release all occupied resources (including the child process
+ * itself if @a opts.keep_child is not set). However if the
+ * method is not called for a handle during its lifetime, the
+ * same freeing actions will be triggered by Lua GC.
  *
  * It is recommended to use opts.setsid + opts.group_signal
  * if a child process may spawn its own childs and they all

> > Example 1:
> > 
> >  | local popen = require('popen')
> >  |
> >  | local ph = popen.new({'/bin/date'}, {
> >  |     stdout = popen.opts.PIPE,
> >  | })
> >  | local date = ph:read():rstrip()
> >  | ph:close()
> >  | print(date) -- Thu 16 Apr 2020 01:40:56 AM MSK
> > 
> > Execute 'date' command, read the result and close the
> > popen object.
> 
> Minor: Again, no mention about rstrip.

I think once is enough. Added the popen.shell() description.

> > Example 4:
> > 
> >  | local function call_jq(stdin, filter)
> 
> Minor: <stdin> look a bit misleading. What about <input>, <json>, etc?

Okay, 'input'.

diff --git a/src/lua/popen.c b/src/lua/popen.c
index 31b4bc2b0..bf79a5d29 100644
--- a/src/lua/popen.c
+++ b/src/lua/popen.c
@@ -1270,7 +1270,7 @@ luaT_popen_parse_mode(struct lua_State *L, int idx)
  *
  * Example 4:
  *
- *  | local function call_jq(stdin, filter)
+ *  | local function call_jq(input, filter)
  *  |     -- Start jq process, connect to stdin, stdout and stderr.
  *  |     local jq_argv = {'/usr/bin/jq', '-M', '--unbuffered', filter}
  *  |     local ph, err = popen.new(jq_argv, {
@@ -1281,7 +1281,7 @@ luaT_popen_parse_mode(struct lua_State *L, int idx)
  *  |     if ph == nil then return nil, err end
  *  |
  *  |     -- Write input data to child's stdin and send EOF.
- *  |     local ok, err = ph:write(stdin)
+ *  |     local ok, err = ph:write(input)
  *  |     if not ok then return nil, err end
  *  |     ph:shutdown({stdin = true})
  *  |

> >  |     -- Read diagnostics from stderr if any.
> >  |     local err = ph:read({stderr = true})
> >  |     if err ~= '' then
> >  |         ph:close()
> >  |         return nil, err
> >  |     end
> >  |
> >  |     -- Glue all chunks, strip trailing newline.
> 
> Side note: This comment is definitely needed near <rstrip> calls above.

Added once, to popen.shell() example:

diff --git a/src/lua/popen.c b/src/lua/popen.c
index bf79a5d29..6cfb268a1 100644
--- a/src/lua/popen.c
+++ b/src/lua/popen.c
@@ -1395,9 +1395,16 @@ err:
  *
  *  | local popen = require('popen')
  *  |
+ *  | -- Run the program and save its handle.
  *  | local ph = popen.shell('date', 'r')
+ *  |
+ *  | -- Read program's output, strip trailing newline.
  *  | local date = ph:read():rstrip()
+ *  |
+ *  | -- Free resources. The process is killed (but 'date'
+ *  | -- exits itself anyway).
  *  | ph:close()
+ *  |
  *  | print(date)
  *
  * Execute 'sh -c date' command, read the output and close the

> > If a process writes large text to stderr, it may fill out
> > stderr pipe buffer and block in write(2, ...). So we need to
> > read stderr too in another fiber to handle this case.
> 
> Typo: s/stderr too in another fiber/stderr in another fiber too/.

I want to avoid possible understanding as 'read stderr here and in
another fiber too'. I can just remove 'too'. I also changed 'block' to
'stuck':

diff --git a/src/lua/popen.c b/src/lua/popen.c
index 6cfb268a1..6835286c7 100644
--- a/src/lua/popen.c
+++ b/src/lua/popen.c
@@ -1318,8 +1318,8 @@ luaT_popen_parse_mode(struct lua_State *L, int idx)
  * another fiber (start it before a first :write()).
  *
  * If a process writes large text to stderr, it may fill out
- * stderr pipe buffer and block in write(2, ...). So we need to
- * read stderr too in another fiber to handle this case.
+ * stderr pipe buffer and stuck in write(2, ...). So we need
+ * to read stderr in a separate fiber to handle this case.
  */
 static int
 lbox_popen_new(struct lua_State *L)

> > Read data from stdout or stderr streams with @a timeout.
> > By default it reads from stdout. Set @a opts.stderr to
> > `true` to read from stderr.
> 
> If only <stderr> is set to true, is stdout output also captured?

Nope. There is an error about it in the list below, but, yep, it worth
to highlight it. Added the note:

diff --git a/src/lua/popen.c b/src/lua/popen.c
index 6835286c7..a071065b4 100644
--- a/src/lua/popen.c
+++ b/src/lua/popen.c
@@ -1647,6 +1647,9 @@ lbox_popen_wait(struct lua_State *L)
  * By default it reads from stdout. Set @a opts.stderr to
  * `true` to read from stderr.
  *
+ * It is not possible to read from stdout and stderr both in
+ * one call. Set either @a opts.stdout or @a opts.stderr.
+ *
  * Raise an error on incorrect parameters or when the fiber is
  * cancelled:
  *

> > Return `nil, err` on a failure. Possible reasons:
> > 
> > - SocketError: an IO error occurs at read().
> 
> SocketError on pipe looks ridiculous to me. Maybe an IO one?

It comes from coio.cc. It would be good to refactor it someday, but not
tonight :)

> > `popen_handle:shutdown(opts) -> true`
> 
> Minor: Considering the method description, <eof> looks more convenient
> name here. On the other hand <eof> name might be misguiding one
> considering feof(3) standart C function.

:close() for ph.{stdin,stdout,stderr} would be ideal, but we don't have
something like separate read / write streams now. We have bidirectional
handle for reading and writing, so the nearest counterpart would be
socket's shutdown().

More on this: closing of stdin is EOF for a child, but closing of stdout
/ stderr will lead to SIGPIPE (+EPIPE in errno) at write() in the child.

I think the name we have now is good.

> > `popen_handle:info() -> res, err`
> 
> <err> is never returned.

Good catch!

Fixed.

> > `popen_handle:wait() -> res, err`
> 
> <err> is never returned.

Thanks!

Fixed.


More information about the Tarantool-patches mailing list