<HTML><BODY><div><div>Hi,</div><div> </div><div>Thanks for your review!</div><div> </div><div>Sent v4 of the patch considering the comments and privat discussion.</div><div> </div><div>Some answers are below.<br> </div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;">Пятница, 2 октября 2020, 2:45 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:<br> <div id=""><div class="js-helper js-readmsg-msg"><style type="text/css"></style><div><div id="style_16015959530756495254_BODY">Hi! Thanks for the patch! Very nice! Extreme niceness! Optimistic connection handling!<br><br>See 17 small minor comments below.<br><br>1. The branch fails quite hard all the iproto-related tests on my machine,<br>some with crashes. Also when I tried to run it in the console, I get<br>'Peer closed' right after netbox.connect.<br><br>However I see CI green, and I am not sure why does it pass there. On my<br>machine all networking it totally broken on your branch.<br><br>I fixed it by adding this:<br><br>@@ -578,6 +578,7 @@ iproto_msg_new(struct iproto_connection *con)<br>return NULL;<br>}<br>msg->connection = con;<br>+ msg->close_connection = false;<br>rmean_collect(rmean_net, IPROTO_REQUESTS, 1);<br>return msg;<br>}<br><br>It seems it is your bug, because previously it was used only on<br>connection, and initialized only in iproto_on_accept(). Now you<br>use it on each request, but not initialize on each request. Also this<br>should have been a huge luck it didn't fail on any CI job.</div></div></div></div></blockquote></div><div>Right, my fault here. Fixed in v4.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>On 26.09.2020 00:53, Ilya Kosarev wrote:<br>> On connection, an evio service callback is invoked to accept it. The<br>> next step after acception was to process connection to tx thread<br>> through cbus. This meant that any connection interaction involves<br>> tx thread even before we get to decode what does the client want<br>> from us. Consequently, a number of problems appears. The main one<br>> is that we might get descriptor leak in case of unresponsive tx thread<br>> (for example, when building secondary index).<br><br>2. It is not a leak really. Just too many clients waiting for being<br>processed. Descriptors were not lost completely.<br><br>As for the issue - what was the problem not to start listening until<br>recovery is complete?<br><br>And how does this patch solve it, if the connections are still kept,<br>and so their descriptors? How did you validate the problem was gone?<br>It seems the descriptors still are created, and still are not closed.<br><br>If the problem is not gone, that why does this patch does what it does?</div></div></div></div></blockquote></div><div>This topic was discussed with voice. Short answer here:</div><div>Previously iproto had only accepted connections without any</div><div>further actions, leading to the very fast descriptors exhaustion.</div><div>We can’t not start listening before recovery: it is needed for replication.</div><div>Patches solves is, cause the connections are now greeted and</div><div>they are not trying reconnect, wasting sockets. There is a reproducer</div><div>in the ticket, which i can't see how to include in test-run. But it shows</div><div>the difference.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div>> There are some other<br>> cases where we might not want to spend precious tx time to process<br>> the connection in case iproto can do it all alone.<br><br>3. What are these other cases? I can think only of SWIM in iproto thread,<br>but it is not related at all.</div></div></div></div></blockquote></div><div>The cases are that we can now add more logic to request</div><div>processing in iproto, if needed, when tx is not really needed to answer.</div><div>For example, see #4646.</div><div>This can also solve some problems with improper tx answers. </div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> This patch allows iproto to accept connection and send greeting by<br>> itself. The connection is initialized in tx thread when the real<br>> request comes through iproto_msg_decode(). In case request type was not<br>> recognized we can also send reply with an error without using tx. It is<br>> planned to add more iproto logic to prevent extra interaction with<br>> tx thread. This patch already to some extent solves descriptors leakage<br>> problem as far as connection establishes and stays in fetch_schema<br>> state while tx thread is unresponsive.<br>> The other user visible change is that on_connect triggers won't run on<br>> connections that don't provide any input, as reflected in<br>> bad_trigger.test.py.<br>><br>> Part of #3776<br>><br>> @TarantoolBot document<br>> Title: iproto: on_connect triggers execution<br>> Update the documentation for on_connect triggers to reflect that they<br>> are now being executed only with the first request. Though the triggers<br>> are still the first thing to be executed on a new connection. While it<br>> is quite a synthetic case to establish a connection without making<br>> any requests it is technically possible and now your triggers won't be<br>> executed in this case. Some request is required to start their<br>> execution.<br><br>4. I am not sure I understand. According to 3776 there was a problem, that<br>the connections were made. So they didn't do any requests? And the<br>case is not that syntetic as it seems?</div></div></div></div></blockquote></div><div>They didn’t make any requests just because they weren’t even greeted.</div><div>Instead they were reconnecting all the time. As soon as they are greeted</div><div>they are sending requests.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>What was the problem to preserve the original behaviour? For example,<br>you could make the on_connect execution delayed. TX thread could tell<br>when recovery is done, and iproto would send the pending connections to it.</div></div></div></div></blockquote></div><div>I think we don’t really need to deal with the connections that don’t send requests.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> src/box/iproto.cc | 317 +++++++++++++++++++++-----------<br>> test/box-py/bad_trigger.result | 1 +<br>> test/box-py/bad_trigger.test.py | 22 ++-<br>> test/sql/prepared.result | 4 +<br>> test/sql/prepared.test.lua | 1 +<br>> 5 files changed, 239 insertions(+), 106 deletions(-)<br>><br>> diff --git a/src/box/iproto.cc b/src/box/iproto.cc<br>> index b8f65e5ec..9f98fce86 100644<br>> --- a/src/box/iproto.cc<br>> +++ b/src/box/iproto.cc<br>> @@ -463,6 +474,7 @@ struct iproto_connection<br>> struct ev_io output;<br>> /** Logical session. */<br>> struct session *session;<br>> + bool init_failed;<br><br>5. Flags need 'is_' prefix.</div></div></div></div></blockquote></div><div>Right, ok. Done in v4.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>6. What is this? Why is it needed? Usually we leave comments<br>to each struct member. Otherwise it is hard to understand why<br>some of them are needed.</div></div></div></div></blockquote></div><div>Right, added the comment in v4.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>7. The field is accessed from tx thread only, and is close to<br>fields mutated from iproto thread. That leads to false-sharing<br>problem. Please, move it to iproto_connection.tx sub-struct.</div></div></div></div></blockquote></div><div>Right! Fixed in v4.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> ev_loop *loop;<br>> /**<br>> * Pre-allocated disconnect msg. Is sent right after<br>> @@ -677,9 +700,19 @@ iproto_connection_close(struct iproto_connection *con)<br>> * is done only once.<br>> */<br>> con->p_ibuf->wpos -= con->parse_size;<br>> - cpipe_push(&tx_pipe, &con->disconnect_msg);<br>> assert(con->state == IPROTO_CONNECTION_ALIVE);<br>> con->state = IPROTO_CONNECTION_CLOSED;<br>> + rlist_del(&con->in_stop_list);<br>> + if (con->session != NULL) {<br>> + cpipe_push(&tx_pipe, &con->disconnect_msg);<br>> + } else {<br>> + /*<br>> + * In case session was not created we can safely<br>> + * try to start destroy not involving tx thread.<br>> + */<br>> + iproto_connection_try_to_start_destroy(con);<br>> + }<br>> + return;<br><br>8. You wouldn't need the 'return' and wouldn't need to duplicate<br>'rlist_del(&con->in_stop_list);' call, if you would move<br>'rlist_del(&con->in_stop_list);' to the beginning of this function<br>from its end.</div></div></div></div></blockquote></div><div>Right, done in v4.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> } else if (con->state == IPROTO_CONNECTION_PENDING_DESTROY) {<br>> iproto_connection_try_to_start_destroy(con);<br>> } else {<br>> @@ -809,6 +842,7 @@ iproto_enqueue_batch(struct iproto_connection *con, struct ibuf *in)<br>> assert(rlist_empty(&con->in_stop_list));<br>> int n_requests = 0;<br>> bool stop_input = false;<br>> + bool obuf_in_iproto = (con->session == NULL);<br><br>9. We always name flags starting from 'is_' prefix or a similar one like 'has_',<br>'does_'. Please, follow that agreement.</div></div></div></div></blockquote></div><div>All right, added in v4.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> const char *errmsg;<br>> while (con->parse_size != 0 && !stop_input) {<br>> if (iproto_check_msg_max()) {<br>> @@ -1314,13 +1367,24 @@ iproto_msg_decode(struct iproto_msg *msg, const char **pos, const char *reqend,<br>> (uint32_t) type);<br>> goto error;<br>> }<br>> - return;<br>> + return 0;<br>> error:<br>> /** Log and send the error. */<br>> diag_log();<br>> - diag_create(&msg->diag);<br>> - diag_move(&fiber()->diag, &msg->diag);<br>> - cmsg_init(&msg->base, error_route);<br>> + if (msg->connection->session != NULL) {<br>> + diag_create(&msg->diag);<br>> + diag_move(&fiber()->diag, &msg->diag);<br>> + cmsg_init(&msg->base, error_route);<br>> + return 1;<br><br>10. Why do you need 3 return values? You only check for -1. So you<br>could return 0 here like it was before, and -1 below. In the<br>calling code you would check != 0 like we do everywhere.</div></div></div></div></blockquote></div><div>Well, the idea was that we have 2 different error cases. But actually</div><div>it should be: 0 if the route was inited. -1 otherwise. Now it is like that.</div><div>Fixed in v4.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> + }<br>> + /*<br>> + * In case session was not created we can process error path<br>> + * without tx thread.<br>> + */<br>> + tx_accept_wpos(msg->connection, &msg->wpos);<br>> + tx_reply_error(msg);<br><br>11. There is a naming convention, that all tx_ functions are called in<br>TX thread always. All net_ and iproto_ functions are called in IProto<br>thread. Lets not violate the rule and obfuscate the iproto.cc code<br>even more that it is. Please, come up with new names reflecting where<br>the functions are used. Or at least not specifying it as tx_.</div></div></div></div></blockquote></div><div>Well, this convention is already broken to some extent. For example,</div><div>tx_reply_error() itself calls <span style="color: rgb(8, 8, 8); font-family: "JetBrains Mono", monospace; font-size: 9.8pt;">iproto_reply_error() and</span></div><div><span style="color: rgb(8, 8, 8); font-family: "JetBrains Mono", monospace; font-size: 9.8pt;">iproto_wpos_create(). tx_process_sql() calls </span><span style="color: rgb(8, 8, 8); font-family: "JetBrains Mono", monospace; font-size: 9.8pt;">iproto_reply_sql().</span><br>Though i agree that it is a good idea to follow the convention. I decided to</div><div>introduce reply_error() and obuf_accept_wpos() in v4.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> + net_send_msg(&(msg->base));<br><br>12. The function is called 'iproto_msg_decode'. Lets leave it do decoding,<br>and not send anything from it. It just violates its purpose and name.</div></div></div></div></blockquote></div><div>Right, fixed in v4. </div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> + return -1;<br>> }<br>><br>> static void<br>> @@ -1478,13 +1538,27 @@ tx_accept_wpos(struct iproto_connection *con, const struct iproto_wpos *wpos)<br>> }<br>> }<br>><br>> -static inline struct iproto_msg *<br>> -tx_accept_msg(struct cmsg *m)<br>> +static inline int<br>> +tx_accept_msg(struct cmsg *m, struct iproto_msg **msg)<br>> {<br>> - struct iproto_msg *msg = (struct iproto_msg *) m;<br>> - tx_accept_wpos(msg->connection, &msg->wpos);<br>> - tx_fiber_init(msg->connection->session, msg->header.sync);<br>> - return msg;<br>> + *msg = (struct iproto_msg *) m;<br><br>13. Why do you change the return type and add an out parameter? You<br>could just return NULL in case of an error.</div></div></div></div></blockquote></div><div>No, i can’t, cause i will need this msg to process the error. I can’t get NULL</div><div>instead.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> + /*<br>> + * In case connection init failed we don't need to try anymore.<br>> + */<br>> + if ((*msg)->connection->init_failed)<br>> + return -1;<br>> + /*<br>> + * In case session was not created we need to init connection in tx and<br>> + * create it here.<br>> + */<br>> + if ((*msg)->connection->session == NULL && tx_init_connect(*msg) != 0) {<br>> + (*msg)->connection->init_failed = true;<br>> + (*msg)->close_connection = true;<br>> + return -1;<br>> + }<br>> + tx_accept_wpos((*msg)->connection, &(*msg)->wpos);<br>> + tx_fiber_init((*msg)->connection->session, (*msg)->header.sync);<br>> + return 0;<br>> }<br>><br>> /**<br>> @@ -1507,7 +1581,14 @@ tx_reply_error(struct iproto_msg *msg)<br>> static void<br>> tx_reply_iproto_error(struct cmsg *m)<br>> {<br>> - struct iproto_msg *msg = tx_accept_msg(m);<br>> + struct iproto_msg *msg;<br>> + /*<br>> + * We don't need to check tx_accept_msg() return value here<br>> + * as far as if we might only process iproto error in tx<br>> + * in case connection session is already created and<br>> + * thus tx_accept_msg() can't fail.<br>> + */<br>> + tx_accept_msg(m, &msg);<br><br>14. Well, if it fails, you have msg == NULL, and the code below will<br>crash. Otherwise it should be an assertion, not just a comment.</div></div></div></div></blockquote></div><div>Right, introduced the assertion in v4.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> struct obuf *out = msg->connection->tx.p_obuf;<br>> iproto_reply_error(out, diag_last_error(&msg->diag),<br>> msg->header.sync, ::schema_version);<br>> @@ -1865,6 +1961,29 @@ tx_process_replication(struct cmsg *m)<br>> }<br>> }<br>><br>> +/**<br>> + * Check connection health and try to send an error to the client<br>> + * in case of internal connection init or on_connect trigger failure.<br>> + */<br>> +static bool<br>> +iproto_connection_fail(struct iproto_msg *msg)<br><br>15. The function is called 'connection_fail' and literally does nothing<br>when the connection is fine. Does not this name look wrong to you?</div></div></div></div></blockquote></div><div>Ok, right, it will be iproto_connection_check_valid() in v4.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> +{<br>> + if (!msg->close_connection)<br>> + return false;<br>> + struct iproto_connection *con = msg->connection;<br>> + int64_t nwr = sio_writev(con->output.fd, msg->wpos.obuf->iov,<br>> + obuf_iovcnt(msg->wpos.obuf));<br><br>16. You can't just write to a socket whenever you want. It may be<br>not writable. Please, use libev events and callbacks to send anything.<br>You can do a write only when you get EV_WRITE event from libev, who<br>in turn gets it from select/poll/epoll/kqueue/whatever-else-is-provided<br>by the OS.</div></div></div></div></blockquote></div><div>Right. As discussed in voice, it is special error case. In case we will</div><div>do the normal way, we won’t be able to send the error. I checked the</div><div>case with bad_trigger.test and it clearly shows that the client won’t</div><div>receive  the error message in case we will try to wait for the event here,</div><div>while sio_writev() does its job.</div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> + if (nwr > 0) {<br>> + /* Count statistics. */<br>> + rmean_collect(rmean_net, IPROTO_SENT, nwr);<br>> + } else if (nwr < 0 && ! sio_wouldblock(errno)) {<br>> + diag_log();<br>> + }<br>> + iproto_connection_close(con);<br>> + iproto_msg_delete(msg);<br>> + return true;<br>> +}<br>> +<br>> static void<br>> net_send_msg(struct cmsg *m)<br>> {<br>> @@ -1936,81 +2066,60 @@ net_end_subscribe(struct cmsg *m)<br>> }<br>><br>> /**<br>> - * Handshake a connection: invoke the on-connect trigger<br>> - * and possibly authenticate. Try to send the client an error<br>> - * upon a failure.<br>> + * Handshake a connection: prepare greeting for it.<br>> */<br>> static void<br>> -tx_process_connect(struct cmsg *m)<br>> +iproto_process_connect(struct iproto_msg *msg)<br>> {<br>> - struct iproto_msg *msg = (struct iproto_msg *) m;<br>> struct iproto_connection *con = msg->connection;<br>> struct obuf *out = msg->connection->tx.p_obuf;<br>> - try { /* connect. */<br>> - con->session = session_create(SESSION_TYPE_BINARY);<br>> - if (con->session == NULL)<br>> - diag_raise();<br>> - con->session->meta.connection = con;<br>> - tx_fiber_init(con->session, 0);<br>> - char *greeting = (char *) static_alloc(IPROTO_GREETING_SIZE);<br>> - /* TODO: dirty read from tx thread */<br>> - struct tt_uuid uuid = INSTANCE_UUID;<br>> - random_bytes(con->salt, IPROTO_SALT_SIZE);<br>> - greeting_encode(greeting, tarantool_version_id(), &uuid,<br>> - con->salt, IPROTO_SALT_SIZE);<br>> - obuf_dup_xc(out, greeting, IPROTO_GREETING_SIZE);<br>> - if (! rlist_empty(&session_on_connect)) {<br>> - if (session_run_on_connect_triggers(con->session) != 0)<br>> - diag_raise();<br>> - }<br>> + char *greeting = (char *) static_alloc(IPROTO_GREETING_SIZE);<br>> + /* TODO: dirty read from tx thread */<br>> + struct tt_uuid uuid = INSTANCE_UUID;<br>> + random_bytes(con->salt, IPROTO_SALT_SIZE);<br><br>17. Sorry, but wtf?! This data right now may be being written by TX thread,<br>so you literally may read and return *garbage* here. It is not acceptable.<br>It is not even a short term solution, it is just a bug, if you can't prove<br>the value is already initialized and constant by that moment.<br><br>If listen starts before recovery, and UUID is stored in the xlogs, then it<br>is definitely a bug.</div></div></div></div></blockquote></div><div>As discussed in voice, actually, everything is fine. My fault to leave the</div><div>wrong comment here (I was not totally sure everything is fine, but the</div><div>comment was also wrong before my patch). In v4 it is replaced with</div><div>the correct one, that explains the situation. Here it is:</div><div><pre style="background-color:#ffffff; color:#080808; font-family:'JetBrains Mono',monospace; font-size:9.8pt"><span style="color:#8c8c8c;font-style:italic;">/*

</span><span style="color:#8c8c8c;font-style:italic;"> * INSTANCE_UUID is guaranteed to be inited before this moment.

</span><span style="color:#8c8c8c;font-style:italic;"> * We start listening either in local_recovery() or bootstrap().

</span><span style="color:#8c8c8c;font-style:italic;"> * The INSTANCE_UUID is ensured to be inited in the beginning of

</span><span style="color:#8c8c8c;font-style:italic;"> * both methods. In case of local_recovery() it is verified that

</span><span style="color:#8c8c8c;font-style:italic;"> * INSTANCE_UUID was read from the snapshot in memtx_engine_new().

</span><span style="color:#8c8c8c;font-style:italic;"> * In bootstrap() INSTANCE_UUID is either taken from the

</span><span style="color:#8c8c8c;font-style:italic;"> * instance_uuid box.cfg{} param or created on the spot.

</span><span style="color:#8c8c8c;font-style:italic;"> */</span></pre></div><div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> + greeting_encode(greeting, tarantool_version_id(), &uuid,<br>> + con->salt, IPROTO_SALT_SIZE);<br>> + if (obuf_dup(out, greeting, IPROTO_GREETING_SIZE)<br>> + != IPROTO_GREETING_SIZE) {<br>> + diag_set(OutOfMemory, IPROTO_GREETING_SIZE,<br>> + "greeting obuf", "dup");<br>> + iproto_reply_error(out, diag_last_error(&fiber()->diag),<br>> + msg->header.sync, ::schema_version);<br>> iproto_wpos_create(&msg->wpos, out);<br>> - } catch (Exception *e) {<br>> - tx_reply_error(msg);<br>> msg->close_connection = true;<br>> - }<br>> -}</div></div></div></div></blockquote><br><div data-signature-widget="container"><div data-signature-widget="content"><div>--<br>Ilya Kosarev</div></div></div></div></BODY></HTML>