[Tarantool-patches] [PATCH v6 3/3] iproto: move greeting from tx thread to iproto
Ilya Kosarev
i.kosarev at tarantool.org
Sat Dec 19 20:41:46 MSK 2020
Hi!
Thanks for your review.
Sent v7 considering your comments. Some answers below.
>Среда, 16 декабря 2020, 2:04 +03:00 от Vladislav Shpilevoy <v.shpilevoy at tarantool.org>:
>
>Hi! Thanks for the patch!
>
>See 4 comments below.
>
>On 13.12.2020 00:38, Ilya Kosarev wrote:
>> On connection, an evio service callback is invoked to accept it. The
>> next step after acceptance was to process connection to tx thread
>> through cbus. This meant that any connection interaction involves tx
>> thread even before we get to send the greeting to the client.
>> Consequently, the client might reach the instance with enormous number
>> of connections, leading to the file descriptors limit exhaustion in
>> case of unresponsive tx thread (for example, when building secondary
>> index) and extra tx_process_connects afterwords, even in case the
>> instance doesn't fail with too many open files error.
>> This patch allows iproto to accept connection and send greeting by
>> itself. Thus the connection is being established and stays in
>> fetch_schema state while tx thread is unresponsive.
>>
>> Closes #3776
>> ---
>> src/box/iproto.cc | 118 +++++++++++++-----
>> test/app/gh-4787-netbox-empty-errmsg.result | 18 ---
>> test/app/gh-4787-netbox-empty-errmsg.test.lua | 8 --
>> 3 files changed, 88 insertions(+), 56 deletions(-)
>>
>> diff --git a/src/box/iproto.cc b/src/box/iproto.cc
>> index f7330af21d..a7da115925 100644
>> --- a/src/box/iproto.cc
>> +++ b/src/box/iproto.cc
>> @@ -1091,6 +1093,46 @@ iproto_connection_on_output(ev_loop *loop, struct ev_io *watcher,
>> }
>> }
>>
>> +static int
>> +iproto_buf_flush(struct iproto_connection *con)
>
>1. The function lacks a comment. It is not so trivial why do we
>have 2 flush functions. Also maybe worth renaming it to
>iproto_connection_greeting_flush. Because it is a method of
>'iproto_connection' (thus the prefix), and works with the
>greeting only.
Right, fixed in v7 of the patch.
>
>> +{
>> + struct iovec greeting;
>> + greeting.iov_base = &con->greeting_buf;
>> + greeting.iov_len = IPROTO_GREETING_SIZE;
>> + ssize_t nwr = sio_writev(con->output.fd, &greeting, 1);
>> +
>> + if (nwr > 0) {
>> + /* Count statistics */
>> + rmean_collect(rmean_net, IPROTO_SENT, nwr);
>> + return 1;
>> + } else if (nwr < 0 && !sio_wouldblock(errno)) {
>> + return -1;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void
>> +iproto_connection_on_greeting(ev_loop *loop, struct ev_io *watcher,
>> + int /* revents */)
>> +{
>> + struct iproto_connection *con =
>> + (struct iproto_connection *)watcher->data;
>> + int rc = iproto_buf_flush(con);
>> + if (rc <= 0) {
>> + if (rc == 0) {
>> + ev_io_start(loop, &con->output);
>> + } else {
>> + diag_log();
>> + con->state = IPROTO_CONNECTION_CLOSED;
>
>2. I don't understand how this part works. You didn't close the
>socket, and didn't delete it from libev. But IPROTO_CONNECTION_CLOSED
>clearly states
>
>/**
>* Socket was closed, a notification is sent to the TX
>* thread to close the session.
>*/
>
>The way you change it here does not match the description. AFAIU,
>the only legal function able to set this state is
>iproto_connection_close(), and it should remain so.
>
>Moreover, I see this should crash in an assertion in
>iproto_connection_close(). Because this function will be called by
>net_finish_connect() due to state being CLOSED (which is weird -
>why is close() called if the state is already CLOSED?). But inside
>it has "assert(con->state == IPROTO_CONNECTION_ALIVE)" in case the
>ev_watcher still has a descriptor - it still has, so it will go there
>any crash.
>
>Did I miss something?
Yes, my bad. It definitely needs new state. There was not going to be
an assertion fail as long as i switch it back, but it is still not good. Fixed in v7.
>
>> + }
>> + return;
>> + }
>> + ev_io_stop(con->loop, &con->output);
>> + ev_io_init(&con->output, iproto_connection_on_output,
>> + con->output.fd, EV_WRITE);
>> +}
>
>3. What if sio_writev() has written only half of the bytes?
>You didn't check and immediately proceeded to the normal
>operation mode. But the greeting wasn't sent yet. Not all.
>
>Also you don't propagate iov_base in case only a part of
>the data was sent.
>
>You can easily check it using a temporary error injection.
>Before sio_writev() try to inject iov_len =
>max(iov_len, IPROTO_GREETING_SIZE / 2) to force it to work
>in 2 iterations, and you will see the issue.
>
>Partial write is a rare thing, but it is totally real. It
>is not some kind of an impossible thing only from the books.
>Even if it is just 128 bytes.
Fixed in v7.
>
>> +
>> static struct iproto_connection *
>> iproto_connection_new(int fd)
>> {
>> @@ -1938,50 +1980,64 @@ net_end_subscribe(struct cmsg *m)
>> }
>>
>> /**
>> - * Handshake a connection: invoke the on-connect trigger
>> - * and possibly authenticate. Try to send the client an error
>> - * upon a failure.
>> + * Handshake a connection: send greeting for it.
>> + */
>> +static void
>> +iproto_process_connect(struct iproto_msg *msg)
>
>4. Why do you take iproto_msg object if the only thing you do
>with it is take the connection object? Wouldn't it be simpler
>and more understandable, if it would be iproto_connection_start()
>or something like that? With an iproto_connection * argument.
Fixed in v7.
>
>> +{
>> + struct iproto_connection *con = msg->connection;
>> + /*
>> + * INSTANCE_UUID is guaranteed to be inited before this moment.
>> + * We start listening either in local_recovery() or bootstrap().
>> + * The INSTANCE_UUID is ensured to be inited in the beginning of
>> + * both methods. In case of local_recovery() it is verified that
>> + * INSTANCE_UUID was read from the snapshot in memtx_engine_new().
>> + * In bootstrap() INSTANCE_UUID is either taken from the
>> + * instance_uuid box.cfg{} param or created on the spot.
>> + */
>> + random_bytes(con->salt, IPROTO_SALT_SIZE);
>> + greeting_encode(con->greeting_buf, tarantool_version_id(),
>> + &INSTANCE_UUID, con->salt, IPROTO_SALT_SIZE);
>> + assert(evio_has_fd(&con->output));
>> + ev_feed_event(con->loop, &con->output, EV_WRITE);
>> +}
--
Ilya Kosarev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20201219/db7ea367/attachment.html>
More information about the Tarantool-patches
mailing list