From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp34.i.mail.ru (smtp34.i.mail.ru [94.100.177.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 68B5546971A for ; Fri, 13 Dec 2019 05:50:51 +0300 (MSK) Date: Fri, 13 Dec 2019 05:50:43 +0300 From: Alexander Turenko Message-ID: <20191213025043.yxkcsojewdejqbnh@tkn_work_nb> References: <20191128204512.19732-1-gorcunov@gmail.com> <20191128204512.19732-2-gorcunov@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191128204512.19732-2-gorcunov@gmail.com> Subject: Re: [Tarantool-patches] [PATCH 1/5] popen: Introduce a backend engine List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Cyrill Gorcunov Cc: tml On Thu, Nov 28, 2019 at 11:45:08PM +0300, Cyrill Gorcunov wrote: > In the patch we introduce popen backend engine which provides > a way to execute external programs and communicate with their > stdin/stdout/stderr streams. I'll past common notes here (in v1 patchset), because a discussion is already started here. CCed Igor Munkin. As far as I know he also shows into this patchset. ---- I propose to split the discussion into the following threads: - fork/exec/signals (subtopic: Lua API) - io (subtopic: Lua API) - other (say, whether to use our diagnotics area or just errno in the backend) IO -- (For ones who has better knowledge around tarantool code base then me: skip this section and read 'Pre-proposal for Lua API for our pipes'; this is more a bunch of notes I made for myself re IO APIs we have, but they can be useful for someone else.) Let's discuss the latter topic. I'll return to the first one later. We have several libraries / primitives related to input / output: sio, libev / coio, eio. I'm not very familiar with them, just looked around docs and sources and tried to organize my understanding. Please, correct me if misunderstood some details. ## sio sio is the very thin wrapper around BSD sockets, which exposes errors using our diagnostics area (rather then errno) and draws a line between EWOULDBLOCK (and similar) errors from non-transient ones. We basically interested in sio_read(), sio_write() and sio_wouldblock(). ## libev libev deserves its own description, but in context of this discussion it is a wrapper around epoll() that allows to register a callback that will be called when a file descriptor becomes readable or writeable (or a timeout occurs). We basically use it in this way (more complex examples are iproto and swim): | // Get a cord local (means usually thread local) event loop. | struct ev_loop *loop = loop(); | // Initialize a watcher. | struct ev_io watcher; | ev_io_init(&watcher, callback, fd, EV_READ); // or EV_WRITE, or both | // Add the file descriptor to epoll. | ev_io_start(loop, &watcher); | // In callback (reading for example). | int rc = sio_read(<...>); | if (rc < 0 && ! sio_wouldblock(errno)) { | // handle an error | } else if (rc == 0) { // EOF | ev_io_stop(loop, &watcher); | // handle EOF | } else { | // handle actual read | } So it is natural to use libev + sio for a nonblocking file descriptor to work with it asynchronously, I would even say in event based way. We can also start a watcher on demand, when sio_wouldblock() returns true. ## coio (parts that do not use a thread pool under hood) We however need more: integrate our reads and writes with fibers to yield until an event (readability, writeability or timeout) occurs. It seems that coio_*() functions are about this. We have two functions for reading and writing and bunch of simple wrappers around them: coio_read_ahead_timeout() and coio_writev_timeout(). They stitch sio_*() and ev_io_*() functions to perform reads and writes. When an io operation would block, they registers a watcher and yields currect fiber. The watcher callback just schedules the fiber back. It also worth to mention coio_wait(fd, events, timeout), which yields until the file descriptor becomes readable / writeable (as asked in 'events') or the timeout occurs. We widely use this API, but it worth to highlight fio and socket Lua modules as examples. ## libeio + coio_task.h + coio_file.h It is basically a thread pool that allows to execute a task asynchronously. We integrate it with our fibers in coio_task.h API that provides general purpose coio_call() to perform arbitrary call in a thread (and yield until it'll end) and coio_getaddrinfo() to perform async DNS resolving. We also have a bunch of file oriented coio_*() wrappers that calls eio and yield until an operation ends (see coio_file.h). Here we potentially interested in coio_read() and coio_write(). However my understanding is that this API is more for file IO rather then pipes, because thread pool looks as overkill to perform async read or write of a pipe. I propose to consider coio_read_ahead_timeout(), coio_writev_timeout() (or wrappers) and coio_wait() to work with pipes. I looked at libuv docs, and they use its thread pool only for file operations, getaddrinfo() and user provided tasks: http://docs.libuv.org/en/v1.x/design.html Aside of that I was wondered why we even need multiple threads to perform file IO better and ends with the guess that a disc controller may schedule operations better, when there are more requested operations in-fly, while reads and writes that got back to us with EWOULDBLOCK are not 'in-fly' in this sense (a process may decided to don't retry them). So, we need more threads just to block more writes (or reads) when working with files :) My source is just this SO answer: https://stackoverflow.com/a/8157313/1598057 and my rich imagination. Pre-proposal for Lua API for our pipes -------------------------------------- I want to share some kind of sketchy notes around API for parent ends of stdin / stdout / stderr pipes. It seems that with our coio functions (ones that are not backed with libeio) we can just expose file descriptors in non-blocking mode from our backend and everything else can be handled in a quite simple way from, say, 'pipe' Lua module (the name is to be discussed). This drive us to the following points: * Backend just set fds as non-blocking and expose them. No read / write code are needed at all. * We can expose such file descriptor in Lua as 'pipe' object (or, maybe better, as 'reading stream' and 'writing stream' objects). * We need to call coio_*() read/write/wait functions Lua API implementation and this is basically all. I propose to implement relatively high-level API for read: * For binary protocols a user likely want to wait until N bytes will arrive (or timeout occurs). * For text protocol a user likely want to read until a delimiter (say a newline), or until a timeout; the delimiter can be multibyte. This more or less reflects socket_object:read() API, see https://www.tarantool.io/en/doc/2.2/reference/reference_lua/socket/#socket-read Don't sure however that the API should be exactly same; need to dig a bit more and understand it better, the doc is quite shetchy. Maybe we even can extract relevant parts of the code and use them for read streams and sockets both. It looks natural, because a socket is the case of a read stream. tarantool/luatest offers a shortcut to start a fiber for reading from a pipe. We should consider such shortcut too for running interactive commands (I mean, to read and write 'simultaneously'). We should consider also to allow a user to pass a buffer optionally as in fio_handle:read() (and return a Lua string if it is not passed). This provides the way to eliminate allocating of Lua strings for each read and so perform reads in more performant way. ---- I'll return to process handles and related Lua API separately. There are some thoughts around too :) WBR, Alexander Turenko.