[tarantool-patches] Re: [PATCH] Pass max_total_connections parameter to httpclient

Alexander Turenko alexander.turenko at tarantool.org
Wed May 8 18:09:42 MSK 2019


The patch and the description looks okay for me.

Please, add docbot ( https://github.com/tarantool/docbot ) comment into
the commit or the issue. I think that certain recommendations for users
would be valuable result of your investigations.

Maybe all details you described in the commit message is not worth to
place in our documentation, but the following points looks important:

* How to achieve good performance in case of arbitrary host requests and in
  case of requests to the same small set of hosts: whether it worth to
  leave default settings or how to calculate good values (maybe just
  your observations if you don't have strict vision on this).
* What to check if a performance is degraded or there is such suspicion:
  at least it worth to mention that CentOS 7 ships curl 7.29 by default
  (AFAIR): so max_total_connections will be ignored and so we'll fall
  into the situation with decraged performance you described below.

See also three comments below.

WBR, Alexander Turenko.

> httpc: add MAX_TOTAL_CONNECTIONS option binding
> 
> Right now there is only one option which is configurable for http
> client. That is CURLMOPT_MAXCONNECTS. It can be setup like this:
> 
> > httpc = require('http.client').new({max_connections = 16})
> 
> Basically, this option tells curl to maintain this many connections in
> the cache during client instance lifetime. Caching connections are very
> usefull when user requests mostly same hosts.

Typo: usefull -> useful.

> 
> When connections cache is full and all of them are waiting for response
> and new request comes in, curl creates a new connection, starts request
> and then drops first available connection to keep connections cache size
> right.
> 
> There is one side effect, that when tcp connection is closed, system
> actually updates its state to TIME_WAIT. Then for some time resources
> for this socket can't be reused (usually 60 seconds).
> 
> When user wants to do lots of requests simultaneously (to the same
> host), curl ends up creating and dropping lots of connections, which is
> not very efficient. When this load is high enough, sockets won't be able
> to recover from TIME_WAIT because of timeout and system may run out of
> available sockets which results in performance reduce. And user right
> now cannot control or limit this behaviour.
> 
> The solution is to add a new binding for CURLMOPT_MAX_TOTAL_CONNECTIONS
> option. This option tells curl to hold a new connection until
> there is one available (request is finished). Only after that curl will
> either drop and create new connection or reuse an old one.
> 
> This patch bypasses this option into curl instance. It defaults to -1
> which means that there is no limit. To create a client with this option
> setup, user needs to set max_total_connections option like this:
> 
> > httpc = require('http.client').new({max_connections = 8,
>                                       max_total_connections = 8})
> 
> In general this options usually usefull when doing requests mostly to
> the same hosts. Other way, defaults should be enough.

Typo: usefull -> useful.

> 
> Option CURLMOPT_MAX_TOTAL_CONNECTIONS was added from 7.30.0 version, so
> if curl version is under 7.30.0, this option is simply ignored.
> https://curl.haxx.se/changes.html#7_30_0
> 
> Also, this patch adjusts the default for CURLMOPT_MAX_CONNECTS option to
> 0 which means that for every new easy handle curl will enlarge its max
> cache size by 4. See this option docs for more
> https://curl.haxx.se/libcurl/c/CURLMOPT_MAXCONNECTS.html

Please add 'Fixes #xxxx' mark or kinda to auto-close the issue.

On Mon, May 06, 2019 at 06:20:26AM -0700, Илья Конюхов wrote:
> Hi,
> 
> Thanks for your detailed review.
> 
> 
> I added couple simple changes to the patch (see diff from original patch
> below) and pushed it to the tarantool/tarantool repo. See
> branch runsfor/gh-3945-pass-new-opt-to-httpc.
> 
> Also I had rewrite the whole commit message trying to be more specific.
> 
> > First, why do you change max_connections from 5 to 8? I'm not against,
> > but any change should have a clear reason. It also worth to mention that
> > it possibly was set to 5 by mistake, because the default value of
> > another option (CURLOPT_MAXCONNECTS) is 5.
> 
> I agree with your comments about options defaults, so I changed defaults to
> match documentation. I was not able to find any reasons why max_connections
> option was default to 5. It was originally added in tarantool/curl repo
> without any comments and then migrate into tarantool.
> 
> ---
> diff --git a/src/lua/httpc.lua b/src/lua/httpc.lua
> index ef0efa4bf..50ff91a36 100644
> --- a/src/lua/httpc.lua
> +++ b/src/lua/httpc.lua
> @@ -41,8 +41,8 @@ local curl_mt
>  --
>  --  Parameters:
>  --
> ---  max_connections -  Maximum number of entries in the connection cache */
> ---  max_total_connections -  Maximum number of active connections */
> +--  max_connections -  Maximum number of entries in the connection cache
> +--  max_total_connections -  Maximum number of active connections
>  --
>  --  Returns:
>  --  curl object or raise error()
> @@ -52,8 +52,8 @@ local http_new = function(opts)
> 
>      opts = opts or {}
> 
> -    opts.max_connections = opts.max_connections or 8
> -    opts.max_total_connections = opts.max_total_connections or 8
> +    opts.max_connections = opts.max_connections or -1
> +    opts.max_total_connections = opts.max_total_connections or 0
> 
>      local curl = driver.new(opts.max_connections,
> opts.max_total_connections)
>      return setmetatable({ curl = curl, }, curl_mt )
> 




More information about the Tarantool-patches mailing list