Tarantool development patches archive
 help / color / mirror / Atom feed
* [tarantool-patches] [PATCH] Don't throw an exception in a replication handler
@ 2018-08-27 13:28 Georgy Kirichenko
  2018-08-27 15:36 ` Vladimir Davydov
  2018-08-28 10:16 ` Vladimir Davydov
  0 siblings, 2 replies; 5+ messages in thread
From: Georgy Kirichenko @ 2018-08-27 13:28 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Georgy Kirichenko

It is an error to throw an error out of a cbus message handler because
it breaks cbus message delivery. In case of replication throwing an
error prevents iproto against replication socket closing.

Fixes 3642
---
Changes in v2:
  - Move rlimit ffi bindings to separate file

Changes in v2:
  - Test fixes (setrlimit, formating)
  - Test result file included

https://github.com/tarantool/tarantool/issues/3642
https://github.com/tarantool/tarantool/tree/g.kirichenko/gh-3642-fix-replication-socket-leak

 src/box/iproto.cc               |  2 +-
 test/replication/lua/rlimit.lua | 33 ++++++++++++++
 test/replication/misc.result    | 76 +++++++++++++++++++++++++++++++++
 test/replication/misc.test.lua  | 33 ++++++++++++++
 test/replication/suite.ini      |  2 +-
 5 files changed, 144 insertions(+), 2 deletions(-)
 create mode 100644 test/replication/lua/rlimit.lua

diff --git a/src/box/iproto.cc b/src/box/iproto.cc
index ab7b42169..984c6df44 100644
--- a/src/box/iproto.cc
+++ b/src/box/iproto.cc
@@ -1424,7 +1424,7 @@ tx_process_join_subscribe(struct cmsg *m)
 			unreachable();
 		}
 	} catch (SocketError *e) {
-		throw; /* don't write error response to prevent SIGPIPE */
+		return; /* don't write error response to prevent SIGPIPE */
 	} catch (Exception *e) {
 		iproto_write_error(con->input.fd, e, ::schema_version,
 				   msg->header.sync);
diff --git a/test/replication/lua/rlimit.lua b/test/replication/lua/rlimit.lua
new file mode 100644
index 000000000..c61b18a07
--- /dev/null
+++ b/test/replication/lua/rlimit.lua
@@ -0,0 +1,33 @@
+
+ffi = require('ffi')
+ffi.cdef([[
+typedef long rlim_t;
+struct rlimit {
+    rlim_t rlim_cur;  /* Soft limit */
+    rlim_t rlim_max;  /* Hard limit (ceiling for rlim_cur) */
+};
+int getrlimit(int resource, struct rlimit *rlim);
+int setrlimit(int resource, const struct rlimit *rlim);
+]])
+
+return {
+    RLIMIT_CPU = 0,
+    RLIMIT_FSIZE = 1,
+    RLIMIT_DATA = 2,
+    RLIMIT_STACK = 3,
+    RLIMIT_CORE = 4,
+    RLIMIT_RSS = 5,
+    RLIMIT_NPROC = 6,
+    RLIMIT_NOFILE = 7,
+    RLIMIT_MEMLOCK = 8,
+    RLIMIT_AS = 9,
+    limit = function()
+        return ffi.new('struct rlimit') 
+    end,
+    getrlimit = function (id, limit)
+        ffi.C.getrlimit(id, limit)
+    end,
+    setrlimit = function (id, limit)
+        ffi.C.setrlimit(id, limit)
+    end,
+}
diff --git a/test/replication/misc.result b/test/replication/misc.result
index 76e7fd5ee..a407abab5 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -229,6 +229,82 @@ test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 ---
 ...
+rlimit = require('rlimit')
+---
+...
+lim = rlimit.limit()
+---
+...
+rlimit.getrlimit(rlimit.RLIMIT_NOFILE, lim)
+---
+...
+old_fno = lim.rlim_cur
+---
+...
+lim.rlim_cur = 64
+---
+...
+rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
+---
+...
+test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
+---
+- true
+...
+test_run:cmd(string.format('start server sock'))
+---
+- true
+...
+test_run:cmd('switch sock')
+---
+- true
+...
+test_run = require('test_run').new()
+---
+...
+fiber = require('fiber')
+---
+...
+test_run:cmd("setopt delimiter ';'")
+---
+- true
+...
+for i = 1, 64 do
+    local replication = box.cfg.replication
+    box.cfg{replication = {}}
+    box.cfg{replication = replication}
+    while box.info.replication[1].upstream.status ~= 'follow' do
+        fiber.sleep(0.0001)
+    end
+end;
+---
+...
+test_run:cmd("setopt delimiter ''");
+---
+- true
+...
+box.info.replication[1].upstream.status
+---
+- follow
+...
+test_run:cmd('switch default')
+---
+- true
+...
+lim.rlim_cur = old_fno
+---
+...
+rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
+---
+...
+test_run:cmd('stop server sock')
+---
+- true
+...
+test_run:cmd('cleanup server sock')
+---
+- true
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index c60adf5a5..b23607eb8 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -90,6 +90,39 @@ box.space.space1:drop()
 test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 
+rlimit = require('rlimit')
+lim = rlimit.limit()
+rlimit.getrlimit(rlimit.RLIMIT_NOFILE, lim)
+old_fno = lim.rlim_cur
+lim.rlim_cur = 64
+rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
+
+test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
+test_run:cmd(string.format('start server sock'))
+test_run:cmd('switch sock')
+test_run = require('test_run').new()
+fiber = require('fiber')
+test_run:cmd("setopt delimiter ';'")
+for i = 1, 64 do
+    local replication = box.cfg.replication
+    box.cfg{replication = {}}
+    box.cfg{replication = replication}
+    while box.info.replication[1].upstream.status ~= 'follow' do
+        fiber.sleep(0.0001)
+    end
+end;
+test_run:cmd("setopt delimiter ''");
+
+box.info.replication[1].upstream.status
+
+test_run:cmd('switch default')
+
+lim.rlim_cur = old_fno
+rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
+
+test_run:cmd('stop server sock')
+test_run:cmd('cleanup server sock')
+
 box.schema.user.revoke('guest', 'replication')
 
 --
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index b489add58..8b4db9c72 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -5,6 +5,6 @@ description = tarantool/box, replication
 disabled = consistent.test.lua
 release_disabled = catch.test.lua errinj.test.lua gc.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua
 config = suite.cfg
-lua_libs = lua/fast_replica.lua
+lua_libs = lua/fast_replica.lua lua/rlimit.lua
 long_run = prune.test.lua
 is_parallel = False
-- 
2.18.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [tarantool-patches] [PATCH] Don't throw an exception in a replication handler
  2018-08-27 13:28 [tarantool-patches] [PATCH] Don't throw an exception in a replication handler Georgy Kirichenko
@ 2018-08-27 15:36 ` Vladimir Davydov
  2018-08-28 10:16 ` Vladimir Davydov
  1 sibling, 0 replies; 5+ messages in thread
From: Vladimir Davydov @ 2018-08-27 15:36 UTC (permalink / raw)
  To: Georgy Kirichenko; +Cc: tarantool-patches

Turned out tests wouldn't pass after this patch if run like this:

  ./test-run -j -1 replication/

because for some reason test-run keeps logs left from dead instances
open even after cleanup. I filed an issue for test-run:

  https://github.com/tarantool/test-run/issues/117

Can't push this until it is resolved.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [tarantool-patches] [PATCH] Don't throw an exception in a replication handler
  2018-08-27 13:28 [tarantool-patches] [PATCH] Don't throw an exception in a replication handler Georgy Kirichenko
  2018-08-27 15:36 ` Vladimir Davydov
@ 2018-08-28 10:16 ` Vladimir Davydov
  1 sibling, 0 replies; 5+ messages in thread
From: Vladimir Davydov @ 2018-08-28 10:16 UTC (permalink / raw)
  To: Georgy Kirichenko; +Cc: tarantool-patches

On Mon, Aug 27, 2018 at 04:28:09PM +0300, Georgy Kirichenko wrote:
> +    RLIMIT_CPU = 0,
> +    RLIMIT_FSIZE = 1,
> +    RLIMIT_DATA = 2,
> +    RLIMIT_STACK = 3,
> +    RLIMIT_CORE = 4,
> +    RLIMIT_RSS = 5,
> +    RLIMIT_NPROC = 6,
> +    RLIMIT_NOFILE = 7,
> +    RLIMIT_MEMLOCK = 8,
> +    RLIMIT_AS = 9,

Turned out OS X uses different constants. E.g. RLIMIT_NOFILE is 8,
not 7. Please fix.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [tarantool-patches] [PATCH] Don't throw an exception in a replication handler
  2018-08-23 16:00 Georgy Kirichenko
@ 2018-08-23 19:57 ` Vladimir Davydov
  0 siblings, 0 replies; 5+ messages in thread
From: Vladimir Davydov @ 2018-08-23 19:57 UTC (permalink / raw)
  To: Georgy Kirichenko; +Cc: tarantool-patches

On Thu, Aug 23, 2018 at 07:00:31PM +0300, Georgy Kirichenko wrote:
> It is an error to throw an error out of a cbus message handler because
> it breaks cbus message delivery. In case of replication throwing an
> error prevents iproto against replication socket closing.
> 
> Fixes 3642
> ---
> Branch:
> https://github.com/tarantool/tarantool/tree/g.kirichenko/gh-3642-fix-replication-socket-leak
> Issue: https://github.com/tarantool/tarantool/issues/3642

Please don't prefix branch and issue names with 'Branch' and 'Issue',
because it's pretty clear which is which without them.

The ticket doesn't have a milestone assigned. Please assign one
and rebase your branch on the latest 1.9 or 1.10, depending on
the milestone.

>  src/box/iproto.cc              |  2 +-
>  test/replication/misc.test.lua | 10 ++++++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)

You seem to have forgotten to update the result file.

> 
> diff --git a/src/box/iproto.cc b/src/box/iproto.cc
> index 0b92c316e..df32e4f2b 100644
> --- a/src/box/iproto.cc
> +++ b/src/box/iproto.cc
> @@ -1412,7 +1412,7 @@ tx_process_join_subscribe(struct cmsg *m)
>  			unreachable();
>  		}
>  	} catch (SocketError *e) {
> -		throw; /* don't write error response to prevent SIGPIPE */
> +		return; /* don't write error response to prevent SIGPIPE */
>  	} catch (Exception *e) {
>  		iproto_write_error(con->input.fd, e, ::schema_version,
>  				   msg->header.sync);
> diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
> index 850579769..32ab07924 100644
> --- a/test/replication/misc.test.lua
> +++ b/test/replication/misc.test.lua
> @@ -79,4 +79,14 @@ box.space.space1:drop()
>  test_run:cmd("switch default")
>  test_run:drop_cluster(SERVERS)
>  
> +test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
> +test_run:cmd(string.format('start server sock'))
> +test_run:cmd('switch sock')
> +fiber = require('fiber')
> +k = tonumber(io.popen('ulimit -n'):read())
> +for i = 2, k > 1024 and 1 or k + 20 do local replication = box.cfg.replication box.cfg{replication = {}} box.cfg{replication = replication} while box.info.replication[1].upstream.status ~= 'follow' do fiber.sleep(0.0001) end end

This line's definitely worth splitting.

This case takes adds another 3 seconds to the replication/misc run time,
which seems to be a little bit too much. Besides, it isn't tested
properly by Travis, because Travis has the fd limit > 1024. I think you
should lower the ulimit in the test. Please try to call setrlimit via
Lua ffi.

> +test_run:cmd('switch default')
> +test_run:cmd('stop server sock')
> +test_run:cmd('cleanup server sock')
> +
>  box.schema.user.revoke('guest', 'replication')

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tarantool-patches] [PATCH] Don't throw an exception in a replication handler
@ 2018-08-23 16:00 Georgy Kirichenko
  2018-08-23 19:57 ` Vladimir Davydov
  0 siblings, 1 reply; 5+ messages in thread
From: Georgy Kirichenko @ 2018-08-23 16:00 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Georgy Kirichenko

It is an error to throw an error out of a cbus message handler because
it breaks cbus message delivery. In case of replication throwing an
error prevents iproto against replication socket closing.

Fixes 3642
---
Branch:
https://github.com/tarantool/tarantool/tree/g.kirichenko/gh-3642-fix-replication-socket-leak
Issue: https://github.com/tarantool/tarantool/issues/3642
 src/box/iproto.cc              |  2 +-
 test/replication/misc.test.lua | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/box/iproto.cc b/src/box/iproto.cc
index 0b92c316e..df32e4f2b 100644
--- a/src/box/iproto.cc
+++ b/src/box/iproto.cc
@@ -1412,7 +1412,7 @@ tx_process_join_subscribe(struct cmsg *m)
 			unreachable();
 		}
 	} catch (SocketError *e) {
-		throw; /* don't write error response to prevent SIGPIPE */
+		return; /* don't write error response to prevent SIGPIPE */
 	} catch (Exception *e) {
 		iproto_write_error(con->input.fd, e, ::schema_version,
 				   msg->header.sync);
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index 850579769..32ab07924 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -79,4 +79,14 @@ box.space.space1:drop()
 test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 
+test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
+test_run:cmd(string.format('start server sock'))
+test_run:cmd('switch sock')
+fiber = require('fiber')
+k = tonumber(io.popen('ulimit -n'):read())
+for i = 2, k > 1024 and 1 or k + 20 do local replication = box.cfg.replication box.cfg{replication = {}} box.cfg{replication = replication} while box.info.replication[1].upstream.status ~= 'follow' do fiber.sleep(0.0001) end end
+test_run:cmd('switch default')
+test_run:cmd('stop server sock')
+test_run:cmd('cleanup server sock')
+
 box.schema.user.revoke('guest', 'replication')
-- 
2.18.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-08-28 10:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-27 13:28 [tarantool-patches] [PATCH] Don't throw an exception in a replication handler Georgy Kirichenko
2018-08-27 15:36 ` Vladimir Davydov
2018-08-28 10:16 ` Vladimir Davydov
  -- strict thread matches above, loose matches on Subject: below --
2018-08-23 16:00 Georgy Kirichenko
2018-08-23 19:57 ` Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox