[Tarantool-patches] [PATCH v7 3/3] iproto: move greeting from tx thread to iproto

Ilya Kosarev i.kosarev at tarantool.org
Tue Dec 22 03:50:10 MSK 2020


Hi!

Sent v8 of the patch. It includes improvement to fix the problem. 
>Понедельник, 21 декабря 2020, 19:20 +03:00 от Vladislav Shpilevoy <v.shpilevoy at tarantool.org>:
> 
>Hi! Thanks for the fixes!
>
>Unfortunately, it seems the patch does not help. Probably it did earlier,
>but now the descriptors are not closed.
>
>The example you use in the ticket is not really hard enough - you
>use only one connection, and you treat 'fetch_schema' as success.
>Also you assume the connection won't be re-created repeatedly by the
>external code.
>
>To test it more thoroughly I used these scripts:
>
>Instance 1:
>
>box.cfg({listen = 3301})
>box.schema.user.grant('guest', 'super')
>while true do os.execute("sleep 1000") end
>
>I used big sleep so as the instance wouldn't have a chance to handle
>the pending connection requests.
>
>Instance 2:
>
>fiber = require('fiber')
>netbox = require('net.box')
>host = 'localhost:3301'
>count = 0
>
>function many_connect()
>while true do
>local c = netbox.connect(host, {wait_connected = false})
>while c.state ~= 'fetch_schema' do fiber.yield() end
>count = count + 1
>c:close()
>end
>end
>
>f1 = fiber.new(many_connect)
>f2 = fiber.new(many_connect)
>f3 = fiber.new(many_connect)
>f4 = fiber.new(many_connect)
>
>I create 4 fibers to reach the descriptor limit faster. You
>can try more or less, but with big enough sleep() timeout the
>result will be the same. On my machine with 4 fibers I hit the
>descriptor limit (2560 in my case) almost immediately.
>
>Each fiber does a totally normal thing - creates a connection and
>closes it. Lets assume, they also tried to execute some request,
>it failed a timeout, and they decide to re-create the connection.
>
>If the descriptors would be closed on Instance 1, these fibers
>would work fine, and 'count' would grow until Instance 1 stops
>the sleep(). Or until it will hit OOM due to too many heavy
>struct iproto_connection objects, which is not addressed here, but
>we agreed to do it separately, so it is ok for now maybe.
>
>But what happens is that it prints
>
>SystemError accept, called on fd 16, aka [::]:3301: Too many open files
>
>And no new connections can be accepted. I killed Instance 2 and
>started it again - it starts failing immediately. Which means the
>descriptors are not closed.
Yes, all true. I actually forgot that iproto has to be able to
close the socket by itself. Well, now it is fixed. The idea
is that we actually need to start reading immediately when
the greeting is sent. Then we only need to care about not
processing extra request on actually closed connection.
It is achieved through patching of tx_accept_msg() and
it’s callees. So in case the connections closes, the
iproto_connection_close() in invoked and it closes the socket.
When everything finally moves to the tx, all requests are being
ignored depending on the the connection state. Fixed in v8 of the patch.
>
>> diff --git a/src/box/iproto.cc b/src/box/iproto.cc
>> index f7330af21d..93bf869299 100644
>> --- a/src/box/iproto.cc
>> +++ b/src/box/iproto.cc> @@ -1938,50 +1996,63 @@ net_end_subscribe(struct cmsg *m)
>> +
>> +/**
>> + * Create the session and invoke the on_connect triggers.
>> */
>> static void
>> tx_process_connect(struct cmsg *m)
>> {
>> struct iproto_msg *msg = (struct iproto_msg *) m;
>> struct iproto_connection *con = msg->connection;
>> - struct obuf *out = msg->connection->tx.p_obuf;
>> - try { /* connect. */
>> - con->session = session_create(SESSION_TYPE_BINARY);
>> - if (con->session == NULL)
>> - diag_raise();
>> - con->session->meta.connection = con;
>> - tx_fiber_init(con->session, 0);
>> - char *greeting = (char *) static_alloc(IPROTO_GREETING_SIZE);
>> - /* TODO: dirty read from tx thread */
>> - struct tt_uuid uuid = INSTANCE_UUID;
>> - random_bytes(con->salt, IPROTO_SALT_SIZE);
>> - greeting_encode(greeting, tarantool_version_id(), &uuid,
>> - con->salt, IPROTO_SALT_SIZE);
>> - obuf_dup_xc(out, greeting, IPROTO_GREETING_SIZE);
>> - if (! rlist_empty(&session_on_connect)) {
>> - if (session_run_on_connect_triggers(con->session) != 0)
>> - diag_raise();
>> - }
>> - iproto_wpos_create(&msg->wpos, out);
>> - } catch (Exception *e) {
>> +
>> + con->session = session_create(SESSION_TYPE_BINARY);
>> + if (con->session == NULL) {
>> tx_reply_error(msg);
>> msg->close_connection = true;
>> + return;
>> + }
>> + con->session->meta.connection = con;
>> +
>> + tx_fiber_init(con->session, 0);
>> + if (! rlist_empty(&session_on_connect)) {
>
>In the new code we don't use ' ' after unary operators.
Fixed in v8.
>
>> + if (session_run_on_connect_triggers(con->session) != 0) {
>> + tx_reply_error(msg);
>> + msg->close_connection = true;
>> + }
>> }
>> } 
 
 
--
Ilya Kosarev
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20201222/dfc0c432/attachment.html>


More information about the Tarantool-patches mailing list