[tarantool-patches] [PATCH v2] Fix lua socket polling in case of a spurious wakeup

Vladimir Davydov vdavydov.dev at gmail.com
Fri Oct 5 15:54:00 MSK 2018


On Mon, Sep 24, 2018 at 10:53:04PM +0300, Georgy Kirichenko wrote:
> socket_writable/socket_readable handles socket.iowait spurious wakeup
> until event is happened or timeout is exceeded.
> 
> https://github.com/tarantool/tarantool/issues/3344
> https://github.com/tarantool/tarantool/tree/g.kirichenko/gh-3344-socket-io-spurios-wakeup
> 
> Changes in v2:
>  - Fixes according to Vlad's review

Nit: please put the links and the change log after ---, otherwise it
looks like they are a part of the commit message.

Also, according to our guideline, one should prefix the subject line
with a subsystem name, when it makes sense. In your case it should be

  socket: fix polling in case of spurious wakeup

> 
> Fixes: #3344

Nit: should be

Closes #3344

> ---
>  src/lua/socket.lua       | 52 ++++++++++++++-----------------
>  test/app/socket.result   | 66 ++++++++++++++++++++++++++++++++++++++++
>  test/app/socket.test.lua | 28 +++++++++++++++++
>  3 files changed, 117 insertions(+), 29 deletions(-)
> 
> diff --git a/src/lua/socket.lua b/src/lua/socket.lua
> index 8f5e2926b..02c93381b 100644
> --- a/src/lua/socket.lua
> +++ b/src/lua/socket.lua
> @@ -332,12 +332,17 @@ local function do_wait(self, what, timeout)
>      self._errno = nil
>      timeout = timeout or TIMEOUT_INFINITY
>  
> -    local res = internal.iowait(fd, what, timeout)
> -    if res == 0 then
> -        self._errno = boxerrno.ETIMEDOUT
> -        return 0
> +    local deadline = fiber.clock() + timeout
> +
> +    while fiber.clock() <= deadline do
> +        local events = internal.iowait(fd, what, deadline - fiber.clock())
> +        fiber.testcancel()
> +        if events ~= 0 and events ~= '' then
> +            return events
> +        end
>      end
> -    return res
> +    self._errno = boxerrno.ETIMEDOUT
> +    return 0
>  end
>  
>  local function socket_readable(self, timeout)
> @@ -666,7 +671,7 @@ local function read(self, limit, timeout, check, ...)
>          return data
>      end
>  
> -    while timeout > 0 do
> +    while timeout >= 0 do
>          local started = fiber.clock()
>  
>          assert(rbuf:size() < limit)
> @@ -693,15 +698,9 @@ local function read(self, limit, timeout, check, ...)
>              return nil
>          end
>  
> -        if not socket_readable(self, timeout) then
> -            return nil
> -        end
> -        if timeout <= 0 then
> -            break
> -        end
> +        socket_readable(self, timeout)
>          timeout = timeout - ( fiber.clock() - started )
>      end
> -    self._errno = boxerrno.ETIMEDOUT
>      return nil

I guess it's nitpicking, but calling socket_readable() without checking
what it returns looks confusing to me :-(

May be rewrite it with

  repeat
      local started = fiber.clock()
      ...
      timeout = timeout - (fiber.clock() - started)
  until not socket_readable(self, timeout)
  return nil

?

>  end
>  
> @@ -740,7 +739,7 @@ local function socket_write(self, octets, timeout)
>      end
>  
>      local started = fiber.clock()
> -    while true do
> +    while timeout >= 0 do

By the look at how 'timeout' is updated below, 'started' should be set
inside the loop. Looks like a bug, which is worth fixing in the scope of
this patch.

>          local written = syswrite(self, p, e - p)
>          if written == 0 then
>              return p - s -- eof
> @@ -754,11 +753,10 @@ local function socket_write(self, octets, timeout)
>              return nil
>          end
>  
> +        socket_writable(self, timeout)
>          timeout = timeout - (fiber.clock() - started)
> -        if timeout <= 0 or not socket_writable(self, timeout) then
> -            break
> -        end

Again, I'd rewrite this with repeat-until:

  repeat
      local started = fiber.clock()
      ...
      timeout = timeout - (fiber.clock() - started)
  until not socket_writable(self, timeout)
  return p > s and p - s or nil

>      end
> +    return p - s

I think this function should return nil on timeout in case nothing was
written. Why did you change that?

>  end
>  
>  local function socket_send(self, octets, flags)
> @@ -997,16 +995,12 @@ local function socket_tcp_connect(s, address, port, timeout)
>      -- Wait until the connection is established or ultimately fails.
>      -- In either condition the socket becomes writable. To tell these
>      -- conditions appart SO_ERROR must be consulted (man connect).
> -    if socket_writable(s, timeout) then
> +    local deadline = timeout + fiber.clock()
> +    if socket_writable(s, deadline - fiber.clock()) then

What's the point of using 'deadline' here?

>          s._errno = socket_getsockopt(s, 'SOL_SOCKET', 'SO_ERROR')
> -    else
> -        s._errno = boxerrno.ETIMEDOUT
> -    end
> -    if s._errno ~= 0 then
> -        return nil
> +        return s._errno == 0 or nil
>      end
> -    -- Connected
> -    return true
> +    return nil

Again, nitpicking. I understand why you removed '_errno = ETIMEDOUT',
but why change the rest? It looked pretty clean.

>  end
>  
>  local function tcp_connect(host, port, timeout)
> @@ -1387,7 +1381,7 @@ end
>  local function lsocket_tcp_accept(self)
>      check_socket(self)
>      local deadline = fiber.clock() + (self.timeout or TIMEOUT_INFINITY)
> -    repeat
> +    while socket_readable(self, deadline - fiber.clock()) do
>          local client = socket_accept(self)
>          if client then
>              setmetatable(client, lsocket_tcp_client_mt)
> @@ -1397,7 +1391,7 @@ local function lsocket_tcp_accept(self)
>          if not errno_is_transient[errno] then
>              break
>          end
> -    until not socket_readable(self, deadline - fiber.clock())
> +    end

Why did you replace repeat-until with while-do? Now, we will call iowait
even if the socket is ready. That is, it was one syscall for the fast
path (wait-less) before this change, now it's two. That said, this
change looks rather controversial to me, and is definitely out of the
scope of this patch. Let's please leave it as is.

>      return nil, socket_error(self)
>  end
>  
> @@ -1452,7 +1446,7 @@ local function lsocket_tcp_receive(self, pattern, prefix)
>      elseif pattern == "*a" then
>          local result = { prefix }
>          local deadline = fiber.clock() + (self.timeout or TIMEOUT_INFINITY)
> -        repeat
> +        while socket_readable(self, deadline - fiber.clock()) do
>              local data = socket_sysread(self)
>              if data == nil then
>                  if not errno_is_transient[self._errno] then
> @@ -1463,7 +1457,7 @@ local function lsocket_tcp_receive(self, pattern, prefix)
>              else
>                  table.insert(result, data)
>              end
> -        until not socket_readable(self, deadline - fiber.clock())
> +        end

Ditto.

>          if #result == 1 then
>              return nil, 'closed', table.concat(result)
>          end



More information about the Tarantool-patches mailing list