[Tarantool-patches] [PATCH v7 3/3] iproto: move greeting from tx thread to iproto
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Mon Dec 21 19:20:17 MSK 2020
Hi! Thanks for the fixes!
Unfortunately, it seems the patch does not help. Probably it did earlier,
but now the descriptors are not closed.
The example you use in the ticket is not really hard enough - you
use only one connection, and you treat 'fetch_schema' as success.
Also you assume the connection won't be re-created repeatedly by the
external code.
To test it more thoroughly I used these scripts:
Instance 1:
box.cfg({listen = 3301})
box.schema.user.grant('guest', 'super')
while true do os.execute("sleep 1000") end
I used big sleep so as the instance wouldn't have a chance to handle
the pending connection requests.
Instance 2:
fiber = require('fiber')
netbox = require('net.box')
host = 'localhost:3301'
count = 0
function many_connect()
while true do
local c = netbox.connect(host, {wait_connected = false})
while c.state ~= 'fetch_schema' do fiber.yield() end
count = count + 1
c:close()
end
end
f1 = fiber.new(many_connect)
f2 = fiber.new(many_connect)
f3 = fiber.new(many_connect)
f4 = fiber.new(many_connect)
I create 4 fibers to reach the descriptor limit faster. You
can try more or less, but with big enough sleep() timeout the
result will be the same. On my machine with 4 fibers I hit the
descriptor limit (2560 in my case) almost immediately.
Each fiber does a totally normal thing - creates a connection and
closes it. Lets assume, they also tried to execute some request,
it failed a timeout, and they decide to re-create the connection.
If the descriptors would be closed on Instance 1, these fibers
would work fine, and 'count' would grow until Instance 1 stops
the sleep(). Or until it will hit OOM due to too many heavy
struct iproto_connection objects, which is not addressed here, but
we agreed to do it separately, so it is ok for now maybe.
But what happens is that it prints
SystemError accept, called on fd 16, aka [::]:3301: Too many open files
And no new connections can be accepted. I killed Instance 2 and
started it again - it starts failing immediately. Which means the
descriptors are not closed.
> diff --git a/src/box/iproto.cc b/src/box/iproto.cc
> index f7330af21d..93bf869299 100644
> --- a/src/box/iproto.cc
> +++ b/src/box/iproto.cc> @@ -1938,50 +1996,63 @@ net_end_subscribe(struct cmsg *m)
> +
> +/**
> + * Create the session and invoke the on_connect triggers.
> */
> static void
> tx_process_connect(struct cmsg *m)
> {
> struct iproto_msg *msg = (struct iproto_msg *) m;
> struct iproto_connection *con = msg->connection;
> - struct obuf *out = msg->connection->tx.p_obuf;
> - try { /* connect. */
> - con->session = session_create(SESSION_TYPE_BINARY);
> - if (con->session == NULL)
> - diag_raise();
> - con->session->meta.connection = con;
> - tx_fiber_init(con->session, 0);
> - char *greeting = (char *) static_alloc(IPROTO_GREETING_SIZE);
> - /* TODO: dirty read from tx thread */
> - struct tt_uuid uuid = INSTANCE_UUID;
> - random_bytes(con->salt, IPROTO_SALT_SIZE);
> - greeting_encode(greeting, tarantool_version_id(), &uuid,
> - con->salt, IPROTO_SALT_SIZE);
> - obuf_dup_xc(out, greeting, IPROTO_GREETING_SIZE);
> - if (! rlist_empty(&session_on_connect)) {
> - if (session_run_on_connect_triggers(con->session) != 0)
> - diag_raise();
> - }
> - iproto_wpos_create(&msg->wpos, out);
> - } catch (Exception *e) {
> +
> + con->session = session_create(SESSION_TYPE_BINARY);
> + if (con->session == NULL) {
> tx_reply_error(msg);
> msg->close_connection = true;
> + return;
> + }
> + con->session->meta.connection = con;
> +
> + tx_fiber_init(con->session, 0);
> + if (! rlist_empty(&session_on_connect)) {
In the new code we don't use ' ' after unary operators.
> + if (session_run_on_connect_triggers(con->session) != 0) {
> + tx_reply_error(msg);
> + msg->close_connection = true;
> + }
> }
> }
More information about the Tarantool-patches
mailing list