Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH] httpc: fix redirection on libcurl-7.30 and older
@ 2019-07-08 16:08 Alexander Turenko
  2019-07-25 12:46 ` Vladimir Davydov
  0 siblings, 1 reply; 3+ messages in thread
From: Alexander Turenko @ 2019-07-08 16:08 UTC (permalink / raw)
  To: Vladimir Davydov, Daniel Stenberg; +Cc: Alexander Turenko, tarantool-patches

libcurl-7.30 and older have a problem with handling redirection. libcurl
postpones removing from an internal hash table and invoking
CURLMOPT_SOCKETFUNCTION callback to remove a closed socket. If timings
are so that a new socket is opened before the old one is fully removed
and they have the same file descriptors, CURLMOPT_SOCKETFUNCTION
callback will not be invoked for the new socket. See
curl_easy_sock_opt_cb() comment for more information.

The described situation will likely occur during redirection (say, at
'302 Found' HTTP response) when a request is machine-local. The request
will hang in the case.

The issue was fixed in libcurl-7.31: see the [discussion][1], the [bug
1248][2] and the [fix][3] (after that libcurl react on socket close
immediately). This commit provides a workaround for old libcurl
versions.

[1]: https://curl.haxx.se/mail/lib-2013-03/0045.html
[2]: https://sourceforge.net/p/curl/bugs/1248/
[3]: https://github.com/curl/curl/commit/88c5c63ffc3312a8c8471b48a44ec5f50420f2e3

Fixes #4180.
---

The patch provides workaround for old libcurl problem that was fixed in
libcurl-7.31. This is needed, because, say, CentOS 7 has libcurl-7.29 in
its repositories.

Vladimir, can you please review my patch?

I also added Daniel Stenberg to this discussion as author of the commit,
which fixes this problem in libcurl.

Daniel, I would appreciate if you can take glance to comments in this
commit and give me pointers if I did some wrong assumptions about
libcurl. Thanks in advance!

issue: https://github.com/tarantool/tarantool/issues/4180
branch: https://github.com/tarantool/tarantool/tree/Totktonada/gh-4180-httpc-fix-redirection-full-ci

 src/curl.c                        | 168 ++++++++++++++++++++++++++++++
 src/curl.h                        |  40 +++++++
 test/app-tap/http_client.test.lua |   8 +-
 3 files changed, 211 insertions(+), 5 deletions(-)

diff --git a/src/curl.c b/src/curl.c
index a41c65e4c..dc3de2948 100644
--- a/src/curl.c
+++ b/src/curl.c
@@ -37,6 +37,32 @@
 #include "fiber.h"
 #include "errinj.h"
 
+#if LIBCURL_PROBLEMATIC_REDIRECTION
+/*
+ * We need a hash table to track ev_io watchers on our end,
+ * because libcurl-7.30 and older doing it incorrectly in the case
+ * when a newly open socket has the same file descriptor as the
+ * old one.
+ *
+ * See curl_easy_sock_opt_cb() for more information.
+ */
+#define mh_name _curl_watchers
+#define mh_key_t int
+#define mh_node_t struct ev_io *
+#define mh_arg_t void *
+#define mh_hash(node, arg) ((*node)->fd)
+#define mh_hash_key(key, arg) (key)
+#define mh_cmp(node_a, node_b, arg) ((*node_a)->fd - (*node_b)->fd)
+#define mh_cmp_key(key, node, arg) ((key) - (*node)->fd)
+#define MH_SOURCE 1
+#include "salad/mhash.h"
+
+/* CURL_SOCKOPT_OK was introduced in libcurl-7.21.5. */
+#ifndef CURL_SOCKOPT_OK
+#define CURL_SOCKOPT_OK 0
+#endif
+#endif /* LIBCURL_PROBLEMATIC_REDIRECTION */
+
 /**
  * Process events
  */
@@ -181,6 +207,12 @@ curl_multi_sock_cb(CURL *easy, curl_socket_t fd, int what, void *envp,
 	if (what == CURL_POLL_REMOVE) {
 		say_debug("curl %p: remove fd=%d", env, fd);
 		assert(watcher != NULL);
+#if LIBCURL_PROBLEMATIC_REDIRECTION
+		/* Remove a watcher from the hash table. */
+		mh_int_t node = mh_curl_watchers_find(env->watchers, fd, NULL);
+		assert(node != mh_end(env->watchers));
+		mh_curl_watchers_del(env->watchers, node, NULL);
+#endif
 		ev_io_stop(loop(), watcher);
 		++env->stat.sockets_deleted;
 		mempool_free(&env->sock_pool, watcher);
@@ -198,6 +230,20 @@ curl_multi_sock_cb(CURL *easy, curl_socket_t fd, int what, void *envp,
 		watcher->data = env;
 		++env->stat.sockets_added;
 		curl_multi_assign(env->multi, fd, watcher);
+#if LIBCURL_PROBLEMATIC_REDIRECTION
+		/* Add a watcher to the hash table. */
+		struct ev_io **old_watcher_ptr = NULL;
+		mh_int_t node = mh_curl_watchers_put(
+			env->watchers, (const struct ev_io **) &watcher,
+			&old_watcher_ptr, NULL);
+		if (node == mh_end(env->watchers)) {
+			diag_set(OutOfMemory, 0, "mh_curl_watchers_put",
+				 "watcher");
+			return -1;
+		}
+		assert(old_watcher_ptr == NULL);
+		(void) old_watcher_ptr;
+#endif
 		say_debug("curl %p: add fd=%d", env, fd);
 	}
 
@@ -218,6 +264,95 @@ curl_multi_sock_cb(CURL *easy, curl_socket_t fd, int what, void *envp,
 	return 0;
 }
 
+#if LIBCURL_PROBLEMATIC_REDIRECTION
+/**
+ * libcurl callback for CURLOPT_SOCKOPTFUNCTION
+ * @see https://curl.haxx.se/libcurl/c/CURLOPT_SOCKOPTFUNCTION.html
+ *
+ * When libcurl handles redirection (say, '302 Found' HTTP
+ * response) it closes an old socket and open a new one (at least
+ * in some cases). libcurl-7.30 and older does not proceed with
+ * closed socket immediately: it postpones removing the closed
+ * socket from an internal hash and also postpones invoking an
+ * CURLMOPT_SOCKETFUNCTION callback (curl_multi_sock_cb()) with
+ * CURL_POLL_REMOVE. When a new socket is opened it is possible
+ * that those actions are not done yet.
+ *
+ * The new socket likely will get the same file descriptor as the
+ * old one. libcurl checks its internal hash by a file descriptor
+ * and judge that a socket is already opened. If needed events (
+ * CURL_POLL_IN / CURL_POLL_OUT) are the same as ones that was set
+ * for the old socket, then libcurl will just wait for events. If
+ * they are not the same, then libcurl will call
+ * curl_multi_sock_cb() with a ev_io watcher that has empty epoll
+ * set (a socket is removed from an epoll set when closed). Both
+ * cases lead to hanging of the HTTP request.
+ *
+ * This callback is necessary to work this behaviour around. When
+ * a new socket is opened it checks whether a watcher for such
+ * file descriptor exists. If so the fuction removes it and
+ * registers a new watcher for the new socket.
+ *
+ * See gh-4180 for links and more information.
+ */
+static int
+curl_easy_sock_opt_cb(void *envp, curl_socket_t fd, curlsocktype purpose)
+{
+	struct curl_env *env = (struct curl_env *) envp;
+
+	/*
+	 * libcurl calls the callback with CURLSOCKTYPE_IPCXN to
+	 * set options for actively created connections. Other
+	 * events are not possible here (at least at the moment of
+	 * libcurl-7.65.1), but if future versions of libcurl will
+	 * add more 'purposes', they are likely will unrelevant
+	 * for us. Ignore them.
+	 */
+	if (purpose != CURLSOCKTYPE_IPCXN)
+		return CURL_SOCKOPT_OK;
+
+	/*
+	 * If there are no old watcher with the same file
+	 * descriptor as the new one within the same curl multi
+	 * handle, nothing to do here.
+	 */
+	mh_int_t node = mh_curl_watchers_find(env->watchers, fd, NULL);
+	if (node == mh_end(env->watchers))
+		return CURL_SOCKOPT_OK;
+
+	/*
+	 * At this point we have an old watcher that was not
+	 * properly removed and a new watcher that will not be
+	 * added by curl itself due to the bug (because they have
+	 * the same file descriptor).
+	 */
+
+	struct ev_io *watcher = *mh_curl_watchers_node(env->watchers, node);
+	assert(watcher != NULL);
+
+	/*
+	 * libcurl will not ask us to poll for certain events if
+	 * it judges that a watcher already polls for them. So we
+	 * need to subscribe to the same events as an old watcher
+	 * did. See singlesocket() in curl's lib/multi.c.
+	 */
+	const int events = watcher->events;
+	const int action = ((events & EV_READ  ? CURL_POLL_IN  : 0) |
+			    (events & EV_WRITE ? CURL_POLL_OUT : 0));
+
+	/*
+	 * Remove an old watcher and register a new one.
+	 *
+	 * Note: An easy handle is not used in
+	 * curl_multi_sock_cb(), so we pass NULL here as the first
+	 * argument.
+	 */
+	curl_multi_sock_cb(NULL, fd, CURL_POLL_REMOVE, env, watcher);
+	curl_multi_sock_cb(NULL, fd, action, env, NULL);
+
+	return CURL_SOCKOPT_OK;
+}
+#endif /* LIBCURL_PROBLEMATIC_REDIRECTION */
 
 int
 curl_env_create(struct curl_env *env, long max_conns, long max_total_conns)
@@ -232,6 +367,15 @@ curl_env_create(struct curl_env *env, long max_conns, long max_total_conns)
 		goto error_exit;
 	}
 
+#if LIBCURL_PROBLEMATIC_REDIRECTION
+	env->watchers = mh_curl_watchers_new();
+	if (env->watchers == NULL) {
+		diag_set(OutOfMemory, 0, "mh_curl_socket_table_new",
+			 "watchers");
+		goto error_exit;
+	}
+#endif
+
 	ev_init(&env->timer_event, curl_timer_cb);
 	env->timer_event.data = (void *) env;
 	curl_multi_setopt(env->multi, CURLMOPT_TIMERFUNCTION,
@@ -260,6 +404,23 @@ void
 curl_env_destroy(struct curl_env *env)
 {
 	assert(env);
+
+#if LIBCURL_PROBLEMATIC_REDIRECTION
+	/*
+	 * If there are registered ev_io watchers, they will be
+	 * freed with the mempool destroying, but we should stop
+	 * them and account in statistics.
+	 */
+	mh_int_t node;
+	struct ev_io *watcher;
+	mh_foreach(env->watchers, node) {
+		watcher = *mh_curl_watchers_node(env->watchers, node);
+		curl_multi_sock_cb(NULL, watcher->fd, CURL_POLL_REMOVE, env,
+				   watcher);
+	}
+	mh_curl_watchers_delete(env->watchers);
+#endif
+
 	if (env->multi != NULL)
 		curl_multi_cleanup(env->multi);
 
@@ -298,6 +459,13 @@ curl_execute(struct curl_request *curl_request, struct curl_env *env,
 	if (mcode != CURLM_OK)
 		goto curl_merror;
 	ERROR_INJECT_YIELD(ERRINJ_HTTP_RESPONSE_ADD_WAIT);
+
+#if LIBCURL_PROBLEMATIC_REDIRECTION
+	curl_easy_setopt(curl_request->easy, CURLOPT_SOCKOPTFUNCTION,
+			 curl_easy_sock_opt_cb);
+	curl_easy_setopt(curl_request->easy, CURLOPT_SOCKOPTDATA, env);
+#endif
+
 	/* Don't wait on a cond if request has already failed or finished. */
 	if (curl_request->code == CURLE_OK && curl_request->in_progress) {
 		++env->stat.active_requests;
diff --git a/src/curl.h b/src/curl.h
index ea50afe3e..6c8d52cfb 100644
--- a/src/curl.h
+++ b/src/curl.h
@@ -39,6 +39,39 @@
 #include "diag.h"
 #include "fiber_cond.h"
 
+/*
+ * The macro determines whether we need to workaround a libcurl
+ * bug with tracking open sockets. libcurl-7.30 and older are
+ * affected.
+ *
+ * We check for a libcurl version at compile time and enable the
+ * workaround when it is less then 7.30. Operating systems where a
+ * default libcurl version is less then that will receive a
+ * tarantool package with the workaround enabled, but everything
+ * will work even if a user will update libcurl to more recent
+ * binary compatible version (at the moment it is any above
+ * 7.16.0). The memory and performance overhead should be
+ * negligible for most cases.
+ *
+ * For operating systems where a default libcurl version is 7.31
+ * or newer we build packages w/o this workaround.
+ *
+ * The bug can lead to 'hanging' of a request when a server
+ * responds with HTTP 3xx code (depends on timings, but often
+ * occurs with machine-local requests). See
+ * curl_easy_sock_opt_cb() in src/curl.c for more information.
+ *
+ * The workaround uses CURLOPT_SOCKOPTFUNCTION, so we also check
+ * for the minimum version where it is supported: libcurl-7.16.0.
+ */
+#define LIBCURL_PROBLEMATIC_REDIRECTION (LIBCURL_VERSION_NUM >= 0x071000 && \
+					 LIBCURL_VERSION_NUM < 0x071f00)
+
+/* Forward declarations. */
+#if LIBCURL_PROBLEMATIC_REDIRECTION
+struct mh_curl_watchers_t;
+#endif
+
 /**
  * CURL Statistics
  */
@@ -60,6 +93,13 @@ struct curl_env {
 	struct ev_timer timer_event;
 	/** Statistics. */
 	struct curl_stat stat;
+#if LIBCURL_PROBLEMATIC_REDIRECTION
+	/**
+	 * Hash table of ev_io watchers with file descriptors as
+	 * keys.
+	 */
+	struct mh_curl_watchers_t *watchers;
+#endif
 };
 
 /**
diff --git a/test/app-tap/http_client.test.lua b/test/app-tap/http_client.test.lua
index 680c78b35..2271b9e9c 100755
--- a/test/app-tap/http_client.test.lua
+++ b/test/app-tap/http_client.test.lua
@@ -62,7 +62,7 @@ local function stop_server(test, server)
 end
 
 local function test_http_client(test, url, opts)
-    test:plan(11)
+    test:plan(12)
 
     -- gh-4136: confusing httpc usage error message
     local ok, err = pcall(client.request, client)
@@ -85,10 +85,9 @@ local function test_http_client(test, url, opts)
     local r = client.request('GET', url, nil, opts)
     test:is(r.status, 200, 'request')
 
-    -- XXX: enable after resolving of gh-4180: httpc: redirects
-    -- are broken with libcurl-7.30 and older
-    --[[
     -- gh-4119: specify whether to follow 'Location' header
+    -- gh-4180: httpc: redirects are broken with libcurl-7.30 and
+    -- older
     test:test('gh-4119: follow location', function(test)
         test:plan(7)
         local endpoint = 'redirect'
@@ -111,7 +110,6 @@ local function test_http_client(test, url, opts)
         test:is(r.body, 'redirecting', 'do not follow location: body')
         test:is(r.headers['location'], '/', 'do not follow location: header')
     end)
-    ]]--
 end
 
 --
-- 
2.22.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-08-08 13:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-08 16:08 [PATCH] httpc: fix redirection on libcurl-7.30 and older Alexander Turenko
2019-07-25 12:46 ` Vladimir Davydov
2019-08-08 13:16   ` Alexander Turenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox