Tarantool development patches archive
 help / color / mirror / Atom feed
* [tarantool-patches] [PATCH] Pass max_total_connections parameter to httpclient
@ 2019-04-17 14:51 Ilya Konyukhov
  2019-04-22  2:38 ` [tarantool-patches] " Alexander Turenko
  2019-05-14 14:24 ` Kirill Yukhin
  0 siblings, 2 replies; 6+ messages in thread
From: Ilya Konyukhov @ 2019-04-17 14:51 UTC (permalink / raw)
  To: tarantool-patches; +Cc: totktonada.ru, Ilya Konyukhov

There are some usecases when current http client may leak sockets (due
TIME_WAIT timeout). The reason is that the `max_connection` option of
tarantool's http client (actually is a binding for CURLMOPT_MAXCONNECTS
option in libcurl) may not be enough for some situations.

This option sets up a size of connection cache, which curl maintains
during its lifetime. When this cache is full (i.e. all connections are
waiting for a response from a server), newly created request would
create a new connection to handle it. In that case right after old
connection got response, curl has to close that connection to keep up an
appropriate number of connections in cache. That results in a new socket
in TIME_WAIT state (for 60 seconds by default).

This is actually easy to calculate. Imagine we have http client with
default 8 maximum connections in cache. Also lets say we have 16384
sockets available in the system. We need to make requests to external
system which responses in 100ms in average. So, if we start making
request every 11ms (100/9), every finished request will be followed by
curl closing that socket (switches to TIME_WAIT). Result is about 90
wasted sockets per second and additional overhead for socket creation
for each new request.

Also if we do more than 274 requests per second, we will likely to be
out of available sockets in 1 minute.

The solution is to add another binding for max_total_connections option,
which will let curl hold new requests, before old ones have finished.
It also adds more control over sockets resources.

-

New opt is ignored if curl version is lower 7.30.0
Option CURLMOPT_MAX_TOTAL_CONNECTIONS was added from 7.30.0 version
https://curl.haxx.se/changes.html#7_30_0

-

Fixes #3945

https://github.com/RunsFor/tarantool/tree/master
http://github.com/tarantool/tarantool/issues/3945

---

Changes:
- Added new option (max_total_connections) to http client (curl version >= 7.30.0)
- Increased default connection buffer from 5 to 8

 src/curl.c        | 7 ++++++-
 src/curl.h        | 2 +-
 src/httpc.c       | 4 ++--
 src/httpc.h       | 2 +-
 src/lua/httpc.c   | 3 ++-
 src/lua/httpc.lua | 8 +++++---
 6 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/src/curl.c b/src/curl.c
index b85a4553b..354e9615e 100644
--- a/src/curl.c
+++ b/src/curl.c
@@ -229,7 +229,7 @@ curl_multi_sock_cb(CURL *easy, curl_socket_t fd, int what, void *envp,
 
 
 int
-curl_env_create(struct curl_env *env, long max_conns)
+curl_env_create(struct curl_env *env, long max_conns, long max_total_conns)
 {
 	memset(env, 0, sizeof(*env));
 	mempool_create(&env->sock_pool, &cord()->slabc,
@@ -252,6 +252,11 @@ curl_env_create(struct curl_env *env, long max_conns)
 	curl_multi_setopt(env->multi, CURLMOPT_SOCKETDATA, (void *) env);
 
 	curl_multi_setopt(env->multi, CURLMOPT_MAXCONNECTS, max_conns);
+#if LIBCURL_VERSION_NUM >= 0x071e00
+	curl_multi_setopt(env->multi, CURLMOPT_MAX_TOTAL_CONNECTIONS, max_total_conns);
+#else
+	(void) max_total_conns;
+#endif
 
 	return 0;
 
diff --git a/src/curl.h b/src/curl.h
index cf9163664..ea50afe3e 100644
--- a/src/curl.h
+++ b/src/curl.h
@@ -87,7 +87,7 @@ struct curl_request {
  * @retval -1 on error, check diag
  */
 int
-curl_env_create(struct curl_env *env, long max_conns);
+curl_env_create(struct curl_env *env, long max_conns, long max_total_conns);
 
 /**
  * Destroy HTTP client environment
diff --git a/src/httpc.c b/src/httpc.c
index b673ec3e8..7e35f8fc0 100644
--- a/src/httpc.c
+++ b/src/httpc.c
@@ -82,13 +82,13 @@ curl_easy_header_cb(char *buffer, size_t size, size_t nitems, void *ctx)
 }
 
 int
-httpc_env_create(struct httpc_env *env, int max_conns)
+httpc_env_create(struct httpc_env *env, int max_conns, int max_total_conns)
 {
 	memset(env, 0, sizeof(*env));
 	mempool_create(&env->req_pool, &cord()->slabc,
 			sizeof(struct httpc_request));
 
-	return curl_env_create(&env->curl_env, max_conns);
+	return curl_env_create(&env->curl_env, max_conns, max_total_conns);
 }
 
 void
diff --git a/src/httpc.h b/src/httpc.h
index 821c73955..41e75515a 100644
--- a/src/httpc.h
+++ b/src/httpc.h
@@ -76,7 +76,7 @@ struct httpc_env {
  * @retval -1 on error, check diag
  */
 int
-httpc_env_create(struct httpc_env *ctx, int max_conns);
+httpc_env_create(struct httpc_env *ctx, int max_conns, int max_total_conns);
 
 /**
  * Destroy HTTP client environment
diff --git a/src/lua/httpc.c b/src/lua/httpc.c
index 706b9d90b..51133df12 100644
--- a/src/lua/httpc.c
+++ b/src/lua/httpc.c
@@ -368,7 +368,8 @@ luaT_httpc_new(lua_State *L)
 		return luaL_error(L, "lua_newuserdata failed: httpc_env");
 
 	long max_conns = luaL_checklong(L, 1);
-	if (httpc_env_create(ctx, max_conns) != 0)
+	long max_total_conns = luaL_checklong(L, 2);
+	if (httpc_env_create(ctx, max_conns, max_total_conns) != 0)
 		return luaT_error(L);
 
 	luaL_getmetatable(L, DRIVER_LUA_UDATA_NAME);
diff --git a/src/lua/httpc.lua b/src/lua/httpc.lua
index cd44b6054..8566db88d 100644
--- a/src/lua/httpc.lua
+++ b/src/lua/httpc.lua
@@ -41,7 +41,8 @@ local curl_mt
 --
 --  Parameters:
 --
---  max_connectionss -  Maximum number of entries in the connection cache */
+--  max_connections -  Maximum number of entries in the connection cache */
+--  max_total_connections -  Maximum number of active connections */
 --
 --  Returns:
 --  curl object or raise error()
@@ -51,9 +52,10 @@ local http_new = function(opts)
 
     opts = opts or {}
 
-    opts.max_connections = opts.max_connections or 5
+    opts.max_connections = opts.max_connections or 8
+    opts.max_total_connections = opts.max_total_connections or 8
 
-    local curl = driver.new(opts.max_connections)
+    local curl = driver.new(opts.max_connections, opts.max_total_connections)
     return setmetatable({ curl = curl, }, curl_mt )
 end
 
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tarantool-patches] Re: [PATCH] Pass max_total_connections parameter to httpclient
  2019-04-17 14:51 [tarantool-patches] [PATCH] Pass max_total_connections parameter to httpclient Ilya Konyukhov
@ 2019-04-22  2:38 ` Alexander Turenko
  2019-05-06 13:20   ` Илья Конюхов
  2019-05-14 14:24 ` Kirill Yukhin
  1 sibling, 1 reply; 6+ messages in thread
From: Alexander Turenko @ 2019-04-22  2:38 UTC (permalink / raw)
  To: Ilya Konyukhov; +Cc: tarantool-patches

Thanks for the investigation and the patch!

I propose to polish the commit message to make your results more clear
for anybody who will be interested and my review is mostly about the
wording.

Re process: It is better to create a branch within tarantool/tarantool
repository named nickname/gh-xxxx-short-description. (It will be easier
to pull your changes or push some fixups upward.)

Also, please, add my @tarantool.org email to To or CC when ask for
review; then I'll not miss your email.

Re code: It is okay, but several things need to be clarified (in the
commit message).

First, why do you change max_connections from 5 to 8? I'm not against,
but any change should have a clear reason. It also worth to mention that
it possibly was set to 5 by mistake, because the default value of
another option (CURLOPT_MAXCONNECTS) is 5.

Second. AFAIU, libcurl connections caching is useful when requests are
typically going to the same hosts. So I think we need to state
explicitly that we assume this case before show calculations about
amount of TIME_WAIT sockets and so on.

It also worth to mention that tarantool's http client configures libcurl
to optimize the case when connections are mainly going to the same hosts
and show the way to configure it for requests for arbitrary hosts: set
max_connections to -1 and max_total_connections to 0 (see lib/multi.c in
curl sources).

> There are some usecases when current http client may leak sockets (due
> TIME_WAIT timeout). The reason is that the `max_connection` option of
> tarantool's http client (actually is a binding for CURLMOPT_MAXCONNECTS
> option in libcurl) may not be enough for some situations.

Typo: usecases -> use cases.

'Leak' is not accurate term. 'opens many new connections and eventually
stuck when exhausted available ports range' -- is this better?  However
AFAIR the problem that was arose in the issue in not about TIME_WAIT
sockets. Correct me if I'm wrong, but it seems that we should describe
these two problems separatelly.

I understand that in this paragraph you try to describe thing 'in
common', but here a reader will have many questions: 'some use cases' --
which? 'may not be enough' -- for what? 'some situations' -- which?

If you need to describe how curl operates with cached and non-cached
requests with CURLMOPT_MAXCONNECTS and CURLMOPT_MAX_TOTAL_CONNECTIONS
options set to specific values then do it first. When you make this
clear it would be easy to describe which problems are possible and how
they can be handled using max_connections and max_total_connections
options.

My common suggestions when writing technical texts are try hard to avoid
non-specific descriptions ('some' word is the red flag) and avoid
forward references.

> This option sets up a size of connection cache, which curl maintains
> during its lifetime. When this cache is full (i.e. all connections are
> waiting for a response from a server), newly created request would
> create a new connection to handle it. In that case right after old
> connection got response, curl has to close that connection to keep up an
> appropriate number of connections in cache. That results in a new socket
> in TIME_WAIT state (for 60 seconds by default).
> 
> This is actually easy to calculate. Imagine we have http client with
> default 8 maximum connections in cache. Also lets say we have 16384
> sockets available in the system. We need to make requests to external
> system which responses in 100ms in average. So, if we start making
> request every 11ms (100/9), every finished request will be followed by
> curl closing that socket (switches to TIME_WAIT). Result is about 90
> wasted sockets per second and additional overhead for socket creation
> for each new request.
> 
> Also if we do more than 274 requests per second, we will likely to be
> out of available sockets in 1 minute.
> 
> The solution is to add another binding for max_total_connections option,
> which will let curl hold new requests, before old ones have finished.
> It also adds more control over sockets resources.

I don't see this description in the commit message on your branch.
Forgot to update?

Also we need to add a docbot request (see
https://github.com/tarantool/docbot) and it would be good to move this
description into the request to pass it to the documentation team when
the commit will land to the master.

> 
> -
> 
> New opt is ignored if curl version is lower 7.30.0
> Option CURLMOPT_MAX_TOTAL_CONNECTIONS was added from 7.30.0 version
> https://curl.haxx.se/changes.html#7_30_0
> 
> -
> 
> Fixes #3945
> 


> https://github.com/RunsFor/tarantool/tree/master
> http://github.com/tarantool/tarantool/issues/3945

Place a text that in not part of a commit message under '---' mark. It
will allow to apply your commit with `git am` and, more important, don't
confuse a reviewer what is part of a commit message and what is not.

> diff --git a/src/lua/httpc.lua b/src/lua/httpc.lua
> index cd44b6054..8566db88d 100644
> --- a/src/lua/httpc.lua
> +++ b/src/lua/httpc.lua
> @@ -41,7 +41,8 @@ local curl_mt
>  --
>  --  Parameters:
>  --
> ---  max_connectionss -  Maximum number of entries in the connection cache */
> +--  max_connections -  Maximum number of entries in the connection cache */
> +--  max_total_connections -  Maximum number of active connections */

'*/' here is just the typo.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tarantool-patches] Re: [PATCH] Pass max_total_connections parameter to httpclient
  2019-04-22  2:38 ` [tarantool-patches] " Alexander Turenko
@ 2019-05-06 13:20   ` Илья Конюхов
  2019-05-08 15:09     ` Alexander Turenko
  0 siblings, 1 reply; 6+ messages in thread
From: Илья Конюхов @ 2019-05-06 13:20 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko

[-- Attachment #1: Type: text/plain, Size: 7244 bytes --]

Hi,

Thanks for your detailed review.


I added couple simple changes to the patch (see diff from original patch
below) and pushed it to the tarantool/tarantool repo. See
branch runsfor/gh-3945-pass-new-opt-to-httpc.

Also I had rewrite the whole commit message trying to be more specific.

> First, why do you change max_connections from 5 to 8? I'm not against,
> but any change should have a clear reason. It also worth to mention that
> it possibly was set to 5 by mistake, because the default value of
> another option (CURLOPT_MAXCONNECTS) is 5.

I agree with your comments about options defaults, so I changed defaults to
match documentation. I was not able to find any reasons why max_connections
option was default to 5. It was originally added in tarantool/curl repo
without any comments and then migrate into tarantool.

---
diff --git a/src/lua/httpc.lua b/src/lua/httpc.lua
index ef0efa4bf..50ff91a36 100644
--- a/src/lua/httpc.lua
+++ b/src/lua/httpc.lua
@@ -41,8 +41,8 @@ local curl_mt
 --
 --  Parameters:
 --
---  max_connections -  Maximum number of entries in the connection cache */
---  max_total_connections -  Maximum number of active connections */
+--  max_connections -  Maximum number of entries in the connection cache
+--  max_total_connections -  Maximum number of active connections
 --
 --  Returns:
 --  curl object or raise error()
@@ -52,8 +52,8 @@ local http_new = function(opts)

     opts = opts or {}

-    opts.max_connections = opts.max_connections or 8
-    opts.max_total_connections = opts.max_total_connections or 8
+    opts.max_connections = opts.max_connections or -1
+    opts.max_total_connections = opts.max_total_connections or 0

     local curl = driver.new(opts.max_connections,
opts.max_total_connections)
     return setmetatable({ curl = curl, }, curl_mt )

On 22 April 2019 at 05:38:19, Alexander Turenko (
alexander.turenko@tarantool.org) wrote:

Thanks for the investigation and the patch!

I propose to polish the commit message to make your results more clear
for anybody who will be interested and my review is mostly about the
wording.

Re process: It is better to create a branch within tarantool/tarantool
repository named nickname/gh-xxxx-short-description. (It will be easier
to pull your changes or push some fixups upward.)

Also, please, add my @tarantool.org email to To or CC when ask for
review; then I'll not miss your email.

Re code: It is okay, but several things need to be clarified (in the
commit message).

First, why do you change max_connections from 5 to 8? I'm not against,
but any change should have a clear reason. It also worth to mention that
it possibly was set to 5 by mistake, because the default value of
another option (CURLOPT_MAXCONNECTS) is 5.

Second. AFAIU, libcurl connections caching is useful when requests are
typically going to the same hosts. So I think we need to state
explicitly that we assume this case before show calculations about
amount of TIME_WAIT sockets and so on.

It also worth to mention that tarantool's http client configures libcurl
to optimize the case when connections are mainly going to the same hosts
and show the way to configure it for requests for arbitrary hosts: set
max_connections to -1 and max_total_connections to 0 (see lib/multi.c in
curl sources).

> There are some usecases when current http client may leak sockets (due
> TIME_WAIT timeout). The reason is that the `max_connection` option of
> tarantool's http client (actually is a binding for CURLMOPT_MAXCONNECTS
> option in libcurl) may not be enough for some situations.

Typo: usecases -> use cases.

'Leak' is not accurate term. 'opens many new connections and eventually
stuck when exhausted available ports range' -- is this better? However
AFAIR the problem that was arose in the issue in not about TIME_WAIT
sockets. Correct me if I'm wrong, but it seems that we should describe
these two problems separatelly.

I understand that in this paragraph you try to describe thing 'in
common', but here a reader will have many questions: 'some use cases' --
which? 'may not be enough' -- for what? 'some situations' -- which?

If you need to describe how curl operates with cached and non-cached
requests with CURLMOPT_MAXCONNECTS and CURLMOPT_MAX_TOTAL_CONNECTIONS
options set to specific values then do it first. When you make this
clear it would be easy to describe which problems are possible and how
they can be handled using max_connections and max_total_connections
options.

My common suggestions when writing technical texts are try hard to avoid
non-specific descriptions ('some' word is the red flag) and avoid
forward references.

> This option sets up a size of connection cache, which curl maintains
> during its lifetime. When this cache is full (i.e. all connections are
> waiting for a response from a server), newly created request would
> create a new connection to handle it. In that case right after old
> connection got response, curl has to close that connection to keep up an
> appropriate number of connections in cache. That results in a new socket
> in TIME_WAIT state (for 60 seconds by default).
>
> This is actually easy to calculate. Imagine we have http client with
> default 8 maximum connections in cache. Also lets say we have 16384
> sockets available in the system. We need to make requests to external
> system which responses in 100ms in average. So, if we start making
> request every 11ms (100/9), every finished request will be followed by
> curl closing that socket (switches to TIME_WAIT). Result is about 90
> wasted sockets per second and additional overhead for socket creation
> for each new request.
>
> Also if we do more than 274 requests per second, we will likely to be
> out of available sockets in 1 minute.
>
> The solution is to add another binding for max_total_connections option,
> which will let curl hold new requests, before old ones have finished.
> It also adds more control over sockets resources.

I don't see this description in the commit message on your branch.
Forgot to update?

Also we need to add a docbot request (see
https://github.com/tarantool/docbot) and it would be good to move this
description into the request to pass it to the documentation team when
the commit will land to the master.

>
> -
>
> New opt is ignored if curl version is lower 7.30.0
> Option CURLMOPT_MAX_TOTAL_CONNECTIONS was added from 7.30.0 version
> https://curl.haxx.se/changes.html#7_30_0
>
> -
>
> Fixes #3945
>


> https://github.com/RunsFor/tarantool/tree/master
> http://github.com/tarantool/tarantool/issues/3945

Place a text that in not part of a commit message under '---' mark. It
will allow to apply your commit with `git am` and, more important, don't
confuse a reviewer what is part of a commit message and what is not.

> diff --git a/src/lua/httpc.lua b/src/lua/httpc.lua
> index cd44b6054..8566db88d 100644
> --- a/src/lua/httpc.lua
> +++ b/src/lua/httpc.lua
> @@ -41,7 +41,8 @@ local curl_mt
> --
> -- Parameters:
> --
> --- max_connectionss - Maximum number of entries in the connection cache
*/
> +-- max_connections - Maximum number of entries in the connection cache
*/
> +-- max_total_connections - Maximum number of active connections */

'*/' here is just the typo.

[-- Attachment #2: Type: text/html, Size: 12255 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tarantool-patches] Re: [PATCH] Pass max_total_connections parameter to httpclient
  2019-05-06 13:20   ` Илья Конюхов
@ 2019-05-08 15:09     ` Alexander Turenko
  2019-05-12 14:16       ` Alexander Turenko
  0 siblings, 1 reply; 6+ messages in thread
From: Alexander Turenko @ 2019-05-08 15:09 UTC (permalink / raw)
  To: Илья
	Конюхов
  Cc: tarantool-patches

The patch and the description looks okay for me.

Please, add docbot ( https://github.com/tarantool/docbot ) comment into
the commit or the issue. I think that certain recommendations for users
would be valuable result of your investigations.

Maybe all details you described in the commit message is not worth to
place in our documentation, but the following points looks important:

* How to achieve good performance in case of arbitrary host requests and in
  case of requests to the same small set of hosts: whether it worth to
  leave default settings or how to calculate good values (maybe just
  your observations if you don't have strict vision on this).
* What to check if a performance is degraded or there is such suspicion:
  at least it worth to mention that CentOS 7 ships curl 7.29 by default
  (AFAIR): so max_total_connections will be ignored and so we'll fall
  into the situation with decraged performance you described below.

See also three comments below.

WBR, Alexander Turenko.

> httpc: add MAX_TOTAL_CONNECTIONS option binding
> 
> Right now there is only one option which is configurable for http
> client. That is CURLMOPT_MAXCONNECTS. It can be setup like this:
> 
> > httpc = require('http.client').new({max_connections = 16})
> 
> Basically, this option tells curl to maintain this many connections in
> the cache during client instance lifetime. Caching connections are very
> usefull when user requests mostly same hosts.

Typo: usefull -> useful.

> 
> When connections cache is full and all of them are waiting for response
> and new request comes in, curl creates a new connection, starts request
> and then drops first available connection to keep connections cache size
> right.
> 
> There is one side effect, that when tcp connection is closed, system
> actually updates its state to TIME_WAIT. Then for some time resources
> for this socket can't be reused (usually 60 seconds).
> 
> When user wants to do lots of requests simultaneously (to the same
> host), curl ends up creating and dropping lots of connections, which is
> not very efficient. When this load is high enough, sockets won't be able
> to recover from TIME_WAIT because of timeout and system may run out of
> available sockets which results in performance reduce. And user right
> now cannot control or limit this behaviour.
> 
> The solution is to add a new binding for CURLMOPT_MAX_TOTAL_CONNECTIONS
> option. This option tells curl to hold a new connection until
> there is one available (request is finished). Only after that curl will
> either drop and create new connection or reuse an old one.
> 
> This patch bypasses this option into curl instance. It defaults to -1
> which means that there is no limit. To create a client with this option
> setup, user needs to set max_total_connections option like this:
> 
> > httpc = require('http.client').new({max_connections = 8,
>                                       max_total_connections = 8})
> 
> In general this options usually usefull when doing requests mostly to
> the same hosts. Other way, defaults should be enough.

Typo: usefull -> useful.

> 
> Option CURLMOPT_MAX_TOTAL_CONNECTIONS was added from 7.30.0 version, so
> if curl version is under 7.30.0, this option is simply ignored.
> https://curl.haxx.se/changes.html#7_30_0
> 
> Also, this patch adjusts the default for CURLMOPT_MAX_CONNECTS option to
> 0 which means that for every new easy handle curl will enlarge its max
> cache size by 4. See this option docs for more
> https://curl.haxx.se/libcurl/c/CURLMOPT_MAXCONNECTS.html

Please add 'Fixes #xxxx' mark or kinda to auto-close the issue.

On Mon, May 06, 2019 at 06:20:26AM -0700, Илья Конюхов wrote:
> Hi,
> 
> Thanks for your detailed review.
> 
> 
> I added couple simple changes to the patch (see diff from original patch
> below) and pushed it to the tarantool/tarantool repo. See
> branch runsfor/gh-3945-pass-new-opt-to-httpc.
> 
> Also I had rewrite the whole commit message trying to be more specific.
> 
> > First, why do you change max_connections from 5 to 8? I'm not against,
> > but any change should have a clear reason. It also worth to mention that
> > it possibly was set to 5 by mistake, because the default value of
> > another option (CURLOPT_MAXCONNECTS) is 5.
> 
> I agree with your comments about options defaults, so I changed defaults to
> match documentation. I was not able to find any reasons why max_connections
> option was default to 5. It was originally added in tarantool/curl repo
> without any comments and then migrate into tarantool.
> 
> ---
> diff --git a/src/lua/httpc.lua b/src/lua/httpc.lua
> index ef0efa4bf..50ff91a36 100644
> --- a/src/lua/httpc.lua
> +++ b/src/lua/httpc.lua
> @@ -41,8 +41,8 @@ local curl_mt
>  --
>  --  Parameters:
>  --
> ---  max_connections -  Maximum number of entries in the connection cache */
> ---  max_total_connections -  Maximum number of active connections */
> +--  max_connections -  Maximum number of entries in the connection cache
> +--  max_total_connections -  Maximum number of active connections
>  --
>  --  Returns:
>  --  curl object or raise error()
> @@ -52,8 +52,8 @@ local http_new = function(opts)
> 
>      opts = opts or {}
> 
> -    opts.max_connections = opts.max_connections or 8
> -    opts.max_total_connections = opts.max_total_connections or 8
> +    opts.max_connections = opts.max_connections or -1
> +    opts.max_total_connections = opts.max_total_connections or 0
> 
>      local curl = driver.new(opts.max_connections,
> opts.max_total_connections)
>      return setmetatable({ curl = curl, }, curl_mt )
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tarantool-patches] Re: [PATCH] Pass max_total_connections parameter to httpclient
  2019-05-08 15:09     ` Alexander Turenko
@ 2019-05-12 14:16       ` Alexander Turenko
  0 siblings, 0 replies; 6+ messages in thread
From: Alexander Turenko @ 2019-05-12 14:16 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: tarantool-patches,
	Илья
	Конюхов

Now docbot comment is added (see [1]), typos are fixed. The code is
okay. LGTM.

Kirill, please, proceed.

https://github.com/tarantool/tarantool/issues/3945
https://github.com/tarantool/tarantool/tree/runsfor/gh-3945-pass-new-opt-to-httpc

[1]: https://github.com/tarantool/tarantool/issues/3945#issuecomment-491539435

On Wed, May 08, 2019 at 06:09:42PM +0300, Alexander Turenko wrote:
> The patch and the description looks okay for me.
> 
> Please, add docbot ( https://github.com/tarantool/docbot ) comment into
> the commit or the issue. I think that certain recommendations for users
> would be valuable result of your investigations.
> 
> Maybe all details you described in the commit message is not worth to
> place in our documentation, but the following points looks important:
> 
> * How to achieve good performance in case of arbitrary host requests and in
>   case of requests to the same small set of hosts: whether it worth to
>   leave default settings or how to calculate good values (maybe just
>   your observations if you don't have strict vision on this).
> * What to check if a performance is degraded or there is such suspicion:
>   at least it worth to mention that CentOS 7 ships curl 7.29 by default
>   (AFAIR): so max_total_connections will be ignored and so we'll fall
>   into the situation with decraged performance you described below.
> 
> See also three comments below.
> 
> WBR, Alexander Turenko.
> 
> > httpc: add MAX_TOTAL_CONNECTIONS option binding
> > 
> > Right now there is only one option which is configurable for http
> > client. That is CURLMOPT_MAXCONNECTS. It can be setup like this:
> > 
> > > httpc = require('http.client').new({max_connections = 16})
> > 
> > Basically, this option tells curl to maintain this many connections in
> > the cache during client instance lifetime. Caching connections are very
> > usefull when user requests mostly same hosts.
> 
> Typo: usefull -> useful.
> 
> > 
> > When connections cache is full and all of them are waiting for response
> > and new request comes in, curl creates a new connection, starts request
> > and then drops first available connection to keep connections cache size
> > right.
> > 
> > There is one side effect, that when tcp connection is closed, system
> > actually updates its state to TIME_WAIT. Then for some time resources
> > for this socket can't be reused (usually 60 seconds).
> > 
> > When user wants to do lots of requests simultaneously (to the same
> > host), curl ends up creating and dropping lots of connections, which is
> > not very efficient. When this load is high enough, sockets won't be able
> > to recover from TIME_WAIT because of timeout and system may run out of
> > available sockets which results in performance reduce. And user right
> > now cannot control or limit this behaviour.
> > 
> > The solution is to add a new binding for CURLMOPT_MAX_TOTAL_CONNECTIONS
> > option. This option tells curl to hold a new connection until
> > there is one available (request is finished). Only after that curl will
> > either drop and create new connection or reuse an old one.
> > 
> > This patch bypasses this option into curl instance. It defaults to -1
> > which means that there is no limit. To create a client with this option
> > setup, user needs to set max_total_connections option like this:
> > 
> > > httpc = require('http.client').new({max_connections = 8,
> >                                       max_total_connections = 8})
> > 
> > In general this options usually usefull when doing requests mostly to
> > the same hosts. Other way, defaults should be enough.
> 
> Typo: usefull -> useful.
> 
> > 
> > Option CURLMOPT_MAX_TOTAL_CONNECTIONS was added from 7.30.0 version, so
> > if curl version is under 7.30.0, this option is simply ignored.
> > https://curl.haxx.se/changes.html#7_30_0
> > 
> > Also, this patch adjusts the default for CURLMOPT_MAX_CONNECTS option to
> > 0 which means that for every new easy handle curl will enlarge its max
> > cache size by 4. See this option docs for more
> > https://curl.haxx.se/libcurl/c/CURLMOPT_MAXCONNECTS.html
> 
> Please add 'Fixes #xxxx' mark or kinda to auto-close the issue.
> 
> On Mon, May 06, 2019 at 06:20:26AM -0700, Илья Конюхов wrote:
> > Hi,
> > 
> > Thanks for your detailed review.
> > 
> > 
> > I added couple simple changes to the patch (see diff from original patch
> > below) and pushed it to the tarantool/tarantool repo. See
> > branch runsfor/gh-3945-pass-new-opt-to-httpc.
> > 
> > Also I had rewrite the whole commit message trying to be more specific.
> > 
> > > First, why do you change max_connections from 5 to 8? I'm not against,
> > > but any change should have a clear reason. It also worth to mention that
> > > it possibly was set to 5 by mistake, because the default value of
> > > another option (CURLOPT_MAXCONNECTS) is 5.
> > 
> > I agree with your comments about options defaults, so I changed defaults to
> > match documentation. I was not able to find any reasons why max_connections
> > option was default to 5. It was originally added in tarantool/curl repo
> > without any comments and then migrate into tarantool.
> > 
> > ---
> > diff --git a/src/lua/httpc.lua b/src/lua/httpc.lua
> > index ef0efa4bf..50ff91a36 100644
> > --- a/src/lua/httpc.lua
> > +++ b/src/lua/httpc.lua
> > @@ -41,8 +41,8 @@ local curl_mt
> >  --
> >  --  Parameters:
> >  --
> > ---  max_connections -  Maximum number of entries in the connection cache */
> > ---  max_total_connections -  Maximum number of active connections */
> > +--  max_connections -  Maximum number of entries in the connection cache
> > +--  max_total_connections -  Maximum number of active connections
> >  --
> >  --  Returns:
> >  --  curl object or raise error()
> > @@ -52,8 +52,8 @@ local http_new = function(opts)
> > 
> >      opts = opts or {}
> > 
> > -    opts.max_connections = opts.max_connections or 8
> > -    opts.max_total_connections = opts.max_total_connections or 8
> > +    opts.max_connections = opts.max_connections or -1
> > +    opts.max_total_connections = opts.max_total_connections or 0
> > 
> >      local curl = driver.new(opts.max_connections,
> > opts.max_total_connections)
> >      return setmetatable({ curl = curl, }, curl_mt )
> > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tarantool-patches] Re: [PATCH] Pass max_total_connections parameter to httpclient
  2019-04-17 14:51 [tarantool-patches] [PATCH] Pass max_total_connections parameter to httpclient Ilya Konyukhov
  2019-04-22  2:38 ` [tarantool-patches] " Alexander Turenko
@ 2019-05-14 14:24 ` Kirill Yukhin
  1 sibling, 0 replies; 6+ messages in thread
From: Kirill Yukhin @ 2019-05-14 14:24 UTC (permalink / raw)
  To: tarantool-patches; +Cc: totktonada.ru, Ilya Konyukhov

Hello,

On 17 Apr 17:51, Ilya Konyukhov wrote:
> There are some usecases when current http client may leak sockets (due
> TIME_WAIT timeout). The reason is that the `max_connection` option of
> tarantool's http client (actually is a binding for CURLMOPT_MAXCONNECTS
> option in libcurl) may not be enough for some situations.
> 
> This option sets up a size of connection cache, which curl maintains
> during its lifetime. When this cache is full (i.e. all connections are
> waiting for a response from a server), newly created request would
> create a new connection to handle it. In that case right after old
> connection got response, curl has to close that connection to keep up an
> appropriate number of connections in cache. That results in a new socket
> in TIME_WAIT state (for 60 seconds by default).
> 
> This is actually easy to calculate. Imagine we have http client with
> default 8 maximum connections in cache. Also lets say we have 16384
> sockets available in the system. We need to make requests to external
> system which responses in 100ms in average. So, if we start making
> request every 11ms (100/9), every finished request will be followed by
> curl closing that socket (switches to TIME_WAIT). Result is about 90
> wasted sockets per second and additional overhead for socket creation
> for each new request.
> 
> Also if we do more than 274 requests per second, we will likely to be
> out of available sockets in 1 minute.
> 
> The solution is to add another binding for max_total_connections option,
> which will let curl hold new requests, before old ones have finished.
> It also adds more control over sockets resources.

I've checked your patch into 1.10, 2.1 and master.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-05-14 14:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-17 14:51 [tarantool-patches] [PATCH] Pass max_total_connections parameter to httpclient Ilya Konyukhov
2019-04-22  2:38 ` [tarantool-patches] " Alexander Turenko
2019-05-06 13:20   ` Илья Конюхов
2019-05-08 15:09     ` Alexander Turenko
2019-05-12 14:16       ` Alexander Turenko
2019-05-14 14:24 ` Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox