[Tarantool-patches] [PATCH] socket: fix error while closing socket.tcp_server

Wed May 13 17:21:25 MSK 2020

TL;DR: I would ask you a case that will show the difference you mention.

The same, but with more words, below.

WBR, Alexander Turenko.

> > > @@ -1082,6 +1085,9 @@ local function tcp_server_loop(server, s, addr)
> > >       fiber.name(format("%s/%s:%s", server.name, addr.host, addr.port), {truncate = true})
> > >       log.info("started")
> > >       while socket_readable(s) do
> > > +        if s._server_socket_closed then
> > > +            break
> > > +        end
> >
> > Can we use the same check `<s>._gc_socket.fd >= 0` as in check_socket()
> > (we can wrap it to a function like socket_is_closed())?
> >
> > It seems that using the same way for determining whether a socket is
> > closed owuld simplify the code a bit.
> >
> > WBR, Alexander Turenko.
> >
> If I understand correctly, in this case we must not only check whether
> the socket is closed or not, but also check whether it was closed
> without errors by checking errno. In fact, we come to my first patchset,

Execution path is the following:

 | tcp_server()
 |     local s, addr = tcp_server_bind(...)
 |     if not s then <...> end
 |     <s is correct socket here>
 |     fiber.create(tcp_server_loop, server, s, addr)
 |         while socket_readable(s) do -- it is yield
 |             <a check should be added here>
 |             local sc, from = socket_accept(s)
 |             <...>
 |         end

What can appear with <s> when we're in yield? It can be closed or
mangled in some bad way by a user (say, `socket._gc_socket = {}`). The
latter is invalid usage, so we should handle only closing the socket.

The next question is what condition we should use to check whether a
socket is closed. Look at check_socket() and socket_close() functions
(marked your new code with '+' marks):

 | local function check_socket(socket)
 |     local gc_socket = type(socket) == 'table' and socket._gc_socket
 |     if ffi.istype(gc_socket_t, gc_socket) then
 |         local fd = gc_socket.fd
 |         if fd >= 0 then
 |             return fd
 |         else
 |             error("attempt to use closed socket")
 |         end
 |     else
 |         local msg = "Usage: socket:method()"
 |         if socket ~= nil then msg = msg .. ", called with non-socket" end
 |         error(msg)
 |     end
 | end

 | local gc_socket_sentinel = ffi.new(gc_socket_t, { fd = -1 })
 |
 | local function socket_close(socket)
 |     local fd = check_socket(socket)
 +     if socket._server_socket_closed ~= nil then
 +         socket._server_socket_closed = true
 +     end
 |     socket._errno = nil
 |     local r = ffi.C.coio_close(fd)
 |     -- .fd is const to prevent tampering
 |     ffi.copy(socket._gc_socket, gc_socket_sentinel, ffi.sizeof(gc_socket_t))
 |     if r ~= 0 then
 |         socket._errno = boxerrno()
 |         return false
 |     end
 |     return true
 | end

check_socket() use `<s>._gc_socket.fd >= 0` to determine whether socket
is closed. I don't see a reason to add another way to determine it. But
I would extract the check into a function:

 | local function socket_is_closed(socket)
 |     return socket._gc_socket.fd < 0
 | end

If I miss some case, please, show reproducer code, which reveals it.

> where I disable throwing an exception from the check_socket(). Disabling

Does not look the same: check_socket() is used everywhere in the API
functions.

> throwing an exception is the right way because now our API doesn't
> correspond to the documentation (I will create a ticket for this). But
> fixing a socket check leads to big changes in the code that are not
> related to the problem and will not pass the review. I can wrap the flag

We discussed Leonid's question externally. It is about raising an error
for a closed socket handle. I would consider closed socket as an illegal
parameter and raise an error as for any other illegal parameter error.
At least until someone will show that a correct code may pass a closed
socket to a socket module function / a handle method due to some
external influence (when other end of socket becomes closed or so).

Our documentation usually miss to mention that an illegal parameter
error is raised, not returned. This is not good, but it is usual
convention for our modules in fact.

> check in a method, but then all sockets should have a flag and it's look
> like overkill (this will duplicate the functionality of the
> check_socket() function after fix).

Since check_socket() change you propose (don't raise an error) is
questionable I cannot take it as the argument.

> In addition, we don't check EV_ERROR and it seems to me that this can
> also lead to the fact that the check you proposed may be insufficient.

To be honest, I don't get how it is related to the question we discuss,
but anyway.

>From http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod

 | EV_ERROR
 |
 | An unspecified error has occurred, the watcher has been stopped. This
 | might happen because the watcher could not be properly started
 | because libev ran out of memory, a file descriptor was found to be
 | closed or any other problem. Libev considers these application bugs.
 |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 |
 | You best act on it by reporting the problem and somehow coping with
 | the watcher being stopped. Note that well-written programs should not
 | receive an error ever, so when your watcher receives it, this usually
 | indicates a bug in your program.
 |
 | Libev will usually signal a few "dummy" events together with an
 | error, for example it might indicate that a fd is readable or
 | writable, and if your callbacks is well-written it can just attempt
 |                                                        ^^^^^^^^^^^^
 | the operation and cope with the error from read() or write(). This
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | will not work in multi-threaded programs, though, as the fd could
 | already be closed and reused for another thing, so beware.

So, first: it should not appear at all. Second, we can just read() /
write() and look at the return value + errno.