<HTML><BODY><div>Hi!<br><br>Sent v8 of the patch. It includes improvement to fix the problem.<blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;">Понедельник, 21 декабря 2020, 19:20 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:<br> <div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_16085676201292008503_BODY">Hi! Thanks for the fixes!<br><br>Unfortunately, it seems the patch does not help. Probably it did earlier,<br>but now the descriptors are not closed.<br><br>The example you use in the ticket is not really hard enough - you<br>use only one connection, and you treat 'fetch_schema' as success.<br>Also you assume the connection won't be re-created repeatedly by the<br>external code.<br><br>To test it more thoroughly I used these scripts:<br><br>Instance 1:<br><br>box.cfg({listen = 3301})<br>box.schema.user.grant('guest', 'super')<br>while true do os.execute("sleep 1000") end<br><br>I used big sleep so as the instance wouldn't have a chance to handle<br>the pending connection requests.<br><br>Instance 2:<br><br>fiber = require('fiber')<br>netbox = require('net.box')<br>host = 'localhost:3301'<br>count = 0<br><br>function many_connect()<br>while true do<br>local c = netbox.connect(host, {wait_connected = false})<br>while c.state ~= 'fetch_schema' do fiber.yield() end<br>count = count + 1<br>c:close()<br>end<br>end<br><br>f1 = fiber.new(many_connect)<br>f2 = fiber.new(many_connect)<br>f3 = fiber.new(many_connect)<br>f4 = fiber.new(many_connect)<br><br>I create 4 fibers to reach the descriptor limit faster. You<br>can try more or less, but with big enough sleep() timeout the<br>result will be the same. On my machine with 4 fibers I hit the<br>descriptor limit (2560 in my case) almost immediately.<br><br>Each fiber does a totally normal thing - creates a connection and<br>closes it. Lets assume, they also tried to execute some request,<br>it failed a timeout, and they decide to re-create the connection.<br><br>If the descriptors would be closed on Instance 1, these fibers<br>would work fine, and 'count' would grow until Instance 1 stops<br>the sleep(). Or until it will hit OOM due to too many heavy<br>struct iproto_connection objects, which is not addressed here, but<br>we agreed to do it separately, so it is ok for now maybe.<br><br>But what happens is that it prints<br><br>SystemError accept, called on fd 16, aka [::]:3301: Too many open files<br><br>And no new connections can be accepted. I killed Instance 2 and<br>started it again - it starts failing immediately. Which means the<br>descriptors are not closed.</div></div></div></div></blockquote></div><div>Yes, all true. I actually forgot that iproto has to be able to</div><div>close the socket by itself. Well, now it is fixed. The idea</div><div>is that we actually need to start reading immediately when</div><div>the greeting is sent. Then we only need to care about not</div><div>processing extra request on actually closed connection.</div><div>It is achieved through patching of tx_accept_msg() and</div><div>it’s callees. So in case the connections closes, the</div><div>iproto_connection_close() in invoked and it closes the socket.</div><div>When everything finally moves to the tx, all requests are being</div><div>ignored depending on the the connection state. Fixed in v8 of the patch.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> diff --git a/src/box/iproto.cc b/src/box/iproto.cc<br>> index f7330af21d..93bf869299 100644<br>> --- a/src/box/iproto.cc<br>> +++ b/src/box/iproto.cc> @@ -1938,50 +1996,63 @@ net_end_subscribe(struct cmsg *m)<br>> +<br>> +/**<br>> + * Create the session and invoke the on_connect triggers.<br>> */<br>> static void<br>> tx_process_connect(struct cmsg *m)<br>> {<br>> struct iproto_msg *msg = (struct iproto_msg *) m;<br>> struct iproto_connection *con = msg->connection;<br>> - struct obuf *out = msg->connection->tx.p_obuf;<br>> - try { /* connect. */<br>> - con->session = session_create(SESSION_TYPE_BINARY);<br>> - if (con->session == NULL)<br>> - diag_raise();<br>> - con->session->meta.connection = con;<br>> - tx_fiber_init(con->session, 0);<br>> - char *greeting = (char *) static_alloc(IPROTO_GREETING_SIZE);<br>> - /* TODO: dirty read from tx thread */<br>> - struct tt_uuid uuid = INSTANCE_UUID;<br>> - random_bytes(con->salt, IPROTO_SALT_SIZE);<br>> - greeting_encode(greeting, tarantool_version_id(), &uuid,<br>> - con->salt, IPROTO_SALT_SIZE);<br>> - obuf_dup_xc(out, greeting, IPROTO_GREETING_SIZE);<br>> - if (! rlist_empty(&session_on_connect)) {<br>> - if (session_run_on_connect_triggers(con->session) != 0)<br>> - diag_raise();<br>> - }<br>> - iproto_wpos_create(&msg->wpos, out);<br>> - } catch (Exception *e) {<br>> +<br>> + con->session = session_create(SESSION_TYPE_BINARY);<br>> + if (con->session == NULL) {<br>> tx_reply_error(msg);<br>> msg->close_connection = true;<br>> + return;<br>> + }<br>> + con->session->meta.connection = con;<br>> +<br>> + tx_fiber_init(con->session, 0);<br>> + if (! rlist_empty(&session_on_connect)) {<br><br>In the new code we don't use ' ' after unary operators.</div></div></div></div></blockquote></div><div>Fixed in v8.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> + if (session_run_on_connect_triggers(con->session) != 0) {<br>> + tx_reply_error(msg);<br>> + msg->close_connection = true;<br>> + }<br>> }<br>> }</div></div></div></div></blockquote> <div> </div><div data-signature-widget="container"><div data-signature-widget="content"><div>--<br>Ilya Kosarev</div></div></div><div> </div></div></BODY></HTML>