From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id B5FA22DB9F for ; Mon, 6 May 2019 09:20:28 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WRpq7NMFl409 for ; Mon, 6 May 2019 09:20:28 -0400 (EDT) Received: from mail-vs1-f68.google.com (mail-vs1-f68.google.com [209.85.217.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 75AE82C8AA for ; Mon, 6 May 2019 09:20:28 -0400 (EDT) Received: by mail-vs1-f68.google.com with SMTP id z145so8109605vsc.0 for ; Mon, 06 May 2019 06:20:28 -0700 (PDT) From: =?UTF-8?B?0JjQu9GM0Y8g0JrQvtC90Y7RhdC+0LI=?= In-Reply-To: <20190422023809.rip63gv6sqejlojq@tkn_work_nb> References: <20190417145139.59554-1-runsfor@gmail.com> <20190422023809.rip63gv6sqejlojq@tkn_work_nb> MIME-Version: 1.0 Date: Mon, 6 May 2019 06:20:26 -0700 Message-ID: Subject: [tarantool-patches] Re: [PATCH] Pass max_total_connections parameter to httpclient Content-Type: multipart/alternative; boundary="000000000000b83d3b058837f6e2" Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-Help: List-Unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-Subscribe: List-Owner: List-post: List-Archive: To: tarantool-patches@freelists.org Cc: Alexander Turenko --000000000000b83d3b058837f6e2 Content-Type: text/plain; charset="UTF-8" Hi, Thanks for your detailed review. I added couple simple changes to the patch (see diff from original patch below) and pushed it to the tarantool/tarantool repo. See branch runsfor/gh-3945-pass-new-opt-to-httpc. Also I had rewrite the whole commit message trying to be more specific. > First, why do you change max_connections from 5 to 8? I'm not against, > but any change should have a clear reason. It also worth to mention that > it possibly was set to 5 by mistake, because the default value of > another option (CURLOPT_MAXCONNECTS) is 5. I agree with your comments about options defaults, so I changed defaults to match documentation. I was not able to find any reasons why max_connections option was default to 5. It was originally added in tarantool/curl repo without any comments and then migrate into tarantool. --- diff --git a/src/lua/httpc.lua b/src/lua/httpc.lua index ef0efa4bf..50ff91a36 100644 --- a/src/lua/httpc.lua +++ b/src/lua/httpc.lua @@ -41,8 +41,8 @@ local curl_mt -- -- Parameters: -- --- max_connections - Maximum number of entries in the connection cache */ --- max_total_connections - Maximum number of active connections */ +-- max_connections - Maximum number of entries in the connection cache +-- max_total_connections - Maximum number of active connections -- -- Returns: -- curl object or raise error() @@ -52,8 +52,8 @@ local http_new = function(opts) opts = opts or {} - opts.max_connections = opts.max_connections or 8 - opts.max_total_connections = opts.max_total_connections or 8 + opts.max_connections = opts.max_connections or -1 + opts.max_total_connections = opts.max_total_connections or 0 local curl = driver.new(opts.max_connections, opts.max_total_connections) return setmetatable({ curl = curl, }, curl_mt ) On 22 April 2019 at 05:38:19, Alexander Turenko ( alexander.turenko@tarantool.org) wrote: Thanks for the investigation and the patch! I propose to polish the commit message to make your results more clear for anybody who will be interested and my review is mostly about the wording. Re process: It is better to create a branch within tarantool/tarantool repository named nickname/gh-xxxx-short-description. (It will be easier to pull your changes or push some fixups upward.) Also, please, add my @tarantool.org email to To or CC when ask for review; then I'll not miss your email. Re code: It is okay, but several things need to be clarified (in the commit message). First, why do you change max_connections from 5 to 8? I'm not against, but any change should have a clear reason. It also worth to mention that it possibly was set to 5 by mistake, because the default value of another option (CURLOPT_MAXCONNECTS) is 5. Second. AFAIU, libcurl connections caching is useful when requests are typically going to the same hosts. So I think we need to state explicitly that we assume this case before show calculations about amount of TIME_WAIT sockets and so on. It also worth to mention that tarantool's http client configures libcurl to optimize the case when connections are mainly going to the same hosts and show the way to configure it for requests for arbitrary hosts: set max_connections to -1 and max_total_connections to 0 (see lib/multi.c in curl sources). > There are some usecases when current http client may leak sockets (due > TIME_WAIT timeout). The reason is that the `max_connection` option of > tarantool's http client (actually is a binding for CURLMOPT_MAXCONNECTS > option in libcurl) may not be enough for some situations. Typo: usecases -> use cases. 'Leak' is not accurate term. 'opens many new connections and eventually stuck when exhausted available ports range' -- is this better? However AFAIR the problem that was arose in the issue in not about TIME_WAIT sockets. Correct me if I'm wrong, but it seems that we should describe these two problems separatelly. I understand that in this paragraph you try to describe thing 'in common', but here a reader will have many questions: 'some use cases' -- which? 'may not be enough' -- for what? 'some situations' -- which? If you need to describe how curl operates with cached and non-cached requests with CURLMOPT_MAXCONNECTS and CURLMOPT_MAX_TOTAL_CONNECTIONS options set to specific values then do it first. When you make this clear it would be easy to describe which problems are possible and how they can be handled using max_connections and max_total_connections options. My common suggestions when writing technical texts are try hard to avoid non-specific descriptions ('some' word is the red flag) and avoid forward references. > This option sets up a size of connection cache, which curl maintains > during its lifetime. When this cache is full (i.e. all connections are > waiting for a response from a server), newly created request would > create a new connection to handle it. In that case right after old > connection got response, curl has to close that connection to keep up an > appropriate number of connections in cache. That results in a new socket > in TIME_WAIT state (for 60 seconds by default). > > This is actually easy to calculate. Imagine we have http client with > default 8 maximum connections in cache. Also lets say we have 16384 > sockets available in the system. We need to make requests to external > system which responses in 100ms in average. So, if we start making > request every 11ms (100/9), every finished request will be followed by > curl closing that socket (switches to TIME_WAIT). Result is about 90 > wasted sockets per second and additional overhead for socket creation > for each new request. > > Also if we do more than 274 requests per second, we will likely to be > out of available sockets in 1 minute. > > The solution is to add another binding for max_total_connections option, > which will let curl hold new requests, before old ones have finished. > It also adds more control over sockets resources. I don't see this description in the commit message on your branch. Forgot to update? Also we need to add a docbot request (see https://github.com/tarantool/docbot) and it would be good to move this description into the request to pass it to the documentation team when the commit will land to the master. > > - > > New opt is ignored if curl version is lower 7.30.0 > Option CURLMOPT_MAX_TOTAL_CONNECTIONS was added from 7.30.0 version > https://curl.haxx.se/changes.html#7_30_0 > > - > > Fixes #3945 > > https://github.com/RunsFor/tarantool/tree/master > http://github.com/tarantool/tarantool/issues/3945 Place a text that in not part of a commit message under '---' mark. It will allow to apply your commit with `git am` and, more important, don't confuse a reviewer what is part of a commit message and what is not. > diff --git a/src/lua/httpc.lua b/src/lua/httpc.lua > index cd44b6054..8566db88d 100644 > --- a/src/lua/httpc.lua > +++ b/src/lua/httpc.lua > @@ -41,7 +41,8 @@ local curl_mt > -- > -- Parameters: > -- > --- max_connectionss - Maximum number of entries in the connection cache */ > +-- max_connections - Maximum number of entries in the connection cache */ > +-- max_total_connections - Maximum number of active connections */ '*/' here is just the typo. --000000000000b83d3b058837f6e2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =
Hi,

Thanks for your deta= iled review.


I added couple simple changes to t= he patch (see diff from original patch below) and pushed it to the tarantoo= l/tarantool repo. See branch=C2=A0runsfor/gh-3945-pass-new-opt-to-httpc.

Also I had= rewrite the whole commit message trying to be more specific.

>=C2=A0First,=C2=A0why=C2=A0do=C2=A0you=C2=A0change= =C2=A0max_connections=C2=A0from=C2=A05=C2=A0to=C2=A08?=C2=A0I'm=C2=A0no= t=C2=A0against,
> = but=C2=A0any=C2=A0change=C2=A0should=C2=A0have=C2=A0a=C2=A0clear=C2=A0reaso= n.=C2=A0It=C2=A0also=C2=A0worth=C2=A0to=C2=A0mention=C2=A0that
> it=C2=A0possibly=C2=A0was=C2=A0set=C2=A0to=C2= =A05=C2=A0by=C2=A0mistake,=C2=A0because=C2=A0the=C2=A0default=C2=A0value=C2= =A0of
<= span style=3D"color:rgb(51,51,51);font-family:"Helvetica Neue",He= lvetica,Arial,sans-serif;font-size:14px">> another=C2=A0option=C2=A0(CUR= LOPT_MAXCONNECTS)=C2=A0is=C2=A05.
=
I agree with your comments about= options defaults, so I changed defaults to match documentation. I was not = able to find any reasons why max_connections option was default to 5. It wa= s originally added in tarantool/curl repo without any comments and then mig= rate into tarantool.

---
diff --git a/src/lua/httpc.lua b/src/lua/httpc.l= ua
index ef0efa4bf..50ff91a36 100644
--- a/src/lua/httpc.lua
+++ b/src/lua/ht= tpc.lua
@@ -41,8 +41,8 @@ local curl_mt
=C2=A0--
=C2=A0-- =C2=A0Parameters:
=C2= =A0--
--- =C2=A0max_connections - =C2=A0Maximum number of entries in the con= nection cache */
--- =C2=A0max_total_connections - =C2=A0Maximum number of a= ctive connections */
+-- =C2=A0max_connections - =C2=A0Maximum number of ent= ries in the connection cache
+-- =C2=A0max_total_connections - =C2=A0Maximum= number of active connections
=C2=A0--
=C2=A0-- =C2=A0Returns:
=C2=A0-- =C2=A0= curl object or raise error()
@@ -52,8 +52,8 @@ local http_new =3D function(o= pts)

=C2=A0 =C2=A0 =C2=A0opts =3D opts or {}

- =C2=A0 =C2=A0opts.max_= connections =3D opts.max_connections or 8
- =C2=A0 =C2=A0opts.max_total_conn= ections =3D opts.max_total_connections or 8
+ =C2=A0 =C2=A0opts.max_connecti= ons =3D opts.max_connections or -1
+ =C2=A0 =C2=A0opts.max_total_connections= =3D opts.max_total_connections or 0

=C2=A0 =C2=A0 =C2=A0local curl =3D = driver.new(opts.max_connections, opts.max_total_connections)
=C2=A0 =C2=A0 = =C2=A0return setmetatable({ curl =3D curl, }, curl_mt )
<= div class=3D"gmail_signature">

On 22 Apri= l 2019 at 05:38:19, Alexander Turenko (alexander.turenko@tarantool.org) wrote:

Thanks for the = investigation and the patch!

I propose to polish the commit message to make your results more clear
for anybody who will be interested and my review is mostly about the
wording.

Re process: It is better to create a branch within tarantool/tarantool
repository named nickname/gh-xxxx-short-description. (It will be easier
to pull your changes or push some fixups upward.)

Also, please, add my @tarantool.org email to To or CC when ask for
review; then I'll not miss your email.

Re code: It is okay, but several things need to be clarified (in the
commit message).

First, why do you change max_connections from 5 to 8? I'm not again= st,
but any change should have a clear reason. It also worth to mention tha= t
it possibly was set to 5 by mistake, because the default value of
another option (CURLOPT_MAXCONNECTS) is 5.

Second. AFAIU, libcurl connections caching is useful when requests are
typically going to the same hosts. So I think we need to state
explicitly that we assume this case before show calculations about
amount of TIME_WAIT sockets and so on.

It also worth to mention that tarantool's http client configures li= bcurl
to optimize the case when connections are mainly going to the same host= s
and show the way to configure it for requests for arbitrary hosts: set
max_connections to -1 and max_total_connections to 0 (see lib/multi.c i= n
curl sources).

> There are some usecases when current http client may leak sockets = (due
> TIME_WAIT timeout). The reason is that the `max_connection` option= of
> tarantool's http client (actually is a binding for CURLMOPT_MA= XCONNECTS
> option in libcurl) may not be enough for some situations.

Typo: usecases -> use cases.

'Leak' is not accurate term. 'opens many new connections an= d eventually
stuck when exhausted available ports range' -- is this better? How= ever
AFAIR the problem that was arose in the issue in not about TIME_WAIT
sockets. Correct me if I'm wrong, but it seems that we should descr= ibe
these two problems separatelly.

I understand that in this paragraph you try to describe thing 'in
common', but here a reader will have many questions: 'some use = cases' --
which? 'may not be enough' -- for what? 'some situations= 9; -- which?

If you need to describe how curl operates with cached and non-cached
requests with CURLMOPT_MAXCONNECTS and CURLMOPT_MAX_TOTAL_CONNECTIONS
options set to specific values then do it first. When you make this
clear it would be easy to describe which problems are possible and how
they can be handled using max_connections and max_total_connections
options.

My common suggestions when writing technical texts are try hard to avoi= d
non-specific descriptions ('some' word is the red flag) and avo= id
forward references.

> This option sets up a size of connection cache, which curl maintai= ns
> during its lifetime. When this cache is full (i.e. all connections= are
> waiting for a response from a server), newly created request would
> create a new connection to handle it. In that case right after old
> connection got response, curl has to close that connection to keep= up an
> appropriate number of connections in cache. That results in a new = socket
> in TIME_WAIT state (for 60 seconds by default).
> =20
> This is actually easy to calculate. Imagine we have http client wi= th
> default 8 maximum connections in cache. Also lets say we have 1638= 4
> sockets available in the system. We need to make requests to exter= nal
> system which responses in 100ms in average. So, if we start making
> request every 11ms (100/9), every finished request will be followe= d by
> curl closing that socket (switches to TIME_WAIT). Result is about = 90
> wasted sockets per second and additional overhead for socket creat= ion
> for each new request.
> =20
> Also if we do more than 274 requests per second, we will likely to= be
> out of available sockets in 1 minute.
> =20
> The solution is to add another binding for max_total_connections o= ption,
> which will let curl hold new requests, before old ones have finish= ed.
> It also adds more control over sockets resources.

I don't see this description in the commit message on your branch.
Forgot to update?

Also we need to add a docbot request (see
https://github.com/tara= ntool/docbot) and it would be good to move this
description into the request to pass it to the documentation team when
the commit will land to the master.

> =20
> -
> =20
> New opt is ignored if curl version is lower 7.30.0
> Option CURLMOPT_MAX_TOTAL_CONNECTIONS was added from 7.30.0 versio= n
> https://curl.= haxx.se/changes.html#7_30_0
> =20
> -
> =20
> Fixes #3945
> =20


> https= ://github.com/RunsFor/tarantool/tree/master
> http= ://github.com/tarantool/tarantool/issues/3945

Place a text that in not part of a commit message under '---' m= ark. It
will allow to apply your commit with `git am` and, more important, don&= #39;t
confuse a reviewer what is part of a commit message and what is not.

> diff --git a/src/lua/httpc.lua b/src/lua/httpc.lua
> index cd44b6054..8566db88d 100644
> --- a/src/lua/httpc.lua
> +++ b/src/lua/httpc.lua
> @@ -41,7 +41,8 @@ local curl_mt
> --
> -- Parameters:
> --
> --- max_connectionss - Maximum number of entries in the connecti= on cache */
> +-- max_connections - Maximum number of entries in the connectio= n cache */
> +-- max_total_connections - Maximum number of active connections= */

'*/' here is just the typo.
--000000000000b83d3b058837f6e2--