[Tarantool-patches] [PATCH v3] iproto: make iproto thread more independent from tx

Wed Oct 28 01:06:54 MSK 2020

Привет. Далее будет ревью на русском, чтобы лучше понимать друг
друга.

>     > There are some other
>     > cases where we might not want to spend precious tx time to process
>     > the connection in case iproto can do it all alone.
> 
>     3. What are these other cases? I can think only of SWIM in iproto thread,
>     but it is not related at all.
> 
> The cases are that we can now add more logic to request
> processing in iproto, if needed, when tx is not really needed to answer.
> For example, see #4646.
> This can also solve some problems with improper tx answers. 

Какие improper tx answers?

>     > This patch allows iproto to accept connection and send greeting by
>     > itself. The connection is initialized in tx thread when the real
>     > request comes through iproto_msg_decode(). In case request type was not
>     > recognized we can also send reply with an error without using tx. It is
>     > planned to add more iproto logic to prevent extra interaction with
>     > tx thread. This patch already to some extent solves descriptors leakage
>     > problem as far as connection establishes and stays in fetch_schema
>     > state while tx thread is unresponsive.
>     > The other user visible change is that on_connect triggers won't run on
>     > connections that don't provide any input, as reflected in
>     > bad_trigger.test.py.
>     >
>     > Part of #3776
>     >
>     > @TarantoolBot document
>     > Title: iproto: on_connect triggers execution
>     > Update the documentation for on_connect triggers to reflect that they
>     > are now being executed only with the first request. Though the triggers
>     > are still the first thing to be executed on a new connection. While it
>     > is quite a synthetic case to establish a connection without making
>     > any requests it is technically possible and now your triggers won't be
>     > executed in this case. Some request is required to start their
>     > execution.
> 
>     4. I am not sure I understand. According to 3776 there was a problem, that
>     the connections were made. So they didn't do any requests? And the
>     case is not that syntetic as it seems?
> 
> They didn’t make any requests just because they weren’t even greeted.
> Instead they were reconnecting all the time. As soon as they are greeted
> they are sending requests.

Не совсем так. Да, они не были greeted, но проблема не в том, что они слали или
не слали реквесты. А в том, что даже если они делали дисконнект, он должен был
пройти через tx поток. После твоего патча вроде как не должен. В итоге они так
же коннектятся, зависают, отключаются, но теперь закрытие сокета ловится как
EV_READ в ипрото потоке, и там же закрывается. Правда объект iproto_connection,
все буферы, и struct iproto_msg не удаляются все равно. Как бы это не вылетело
по памяти еще раньше.

Все-таки неправильно, что ты сразу шлешь запросы в tx поток. Надо чтоб TX явно
говорил, что это можно делать. До тех пор даже не пытаться туда ничего слать.
Если учесть, что инстанс может подниматься часами, у него может кончиться память
при агрессивных реконнектах с кучи клиентов.

Сейчас время на это есть, поэтому кмк лучше сделать это одним нормальным
патчсетом, торопиться некуда.

>     What was the problem to preserve the original behaviour? For example,
>     you could make the on_connect execution delayed. TX thread could tell
>     when recovery is done, and iproto would send the pending connections to it.
> 
> I think we don’t really need to deal with the connections that don’t send requests.

Тут не про то, кто как думает, а как не сломать совместимость. Походу это ломает.
Раньше триггер звался на все коннекты, и юзер мог их например отлуплять, выдав
ошибку в триггере. Или как-то логгировать.

Я все еще не вижу препятствий это не ломать. Пока ТХ тред не дал разрешение передавать
ему запросы, отлупляем всех.

Когда ТХ тред дал разрешение, все еще не закрытые коннекты идут в ТХ, даже если у них
нет запросов. В чем проблема это сделать и не ломать то, что уже есть?

>     > const char *errmsg;
>     > while (con->parse_size != 0 && !stop_input) {
>     > if (iproto_check_msg_max()) {
>     > @@ -1314,13 +1367,24 @@ iproto_msg_decode(struct iproto_msg *msg, const char **pos, const char *reqend,
>     > (uint32_t) type);
>     > goto error;
>     > }
>     > - return;
>     > + return 0;
>     > error:
>     > /** Log and send the error. */
>     > diag_log();
>     > - diag_create(&msg->diag);
>     > - diag_move(&fiber()->diag, &msg->diag);
>     > - cmsg_init(&msg->base, error_route);
>     > + if (msg->connection->session != NULL) {
>     > + diag_create(&msg->diag);
>     > + diag_move(&fiber()->diag, &msg->diag);
>     > + cmsg_init(&msg->base, error_route);
>     > + return 1;
> 
>     10. Why do you need 3 return values? You only check for -1. So you
>     could return 0 here like it was before, and -1 below. In the
>     calling code you would check != 0 like we do everywhere.
> 
> Well, the idea was that we have 2 different error cases. But actually
> it should be: 0 if the route was inited. -1 otherwise. Now it is like that.
> Fixed in v4.

Идея понятна, что там разное может внутри произойти. Но ты использовал только
2 случая. Какой смысл в третьем? Это был просто мертвый код. Добавлять его
"на всякий случай" нет никакого смысла.

>     > + }
>     > + /*
>     > + * In case session was not created we can process error path
>     > + * without tx thread.
>     > + */
>     > + tx_accept_wpos(msg->connection, &msg->wpos);
>     > + tx_reply_error(msg);
> 
>     11. There is a naming convention, that all tx_ functions are called in
>     TX thread always. All net_ and iproto_ functions are called in IProto
>     thread. Lets not violate the rule and obfuscate the iproto.cc code
>     even more that it is. Please, come up with new names reflecting where
>     the functions are used. Or at least not specifying it as tx_.
> 
> Well, this convention is already broken to some extent. For example,
> tx_reply_error() itself calls iproto_reply_error() and
> iproto_wpos_create(). tx_process_sql() calls iproto_reply_sql().

Возможно я напиздел. Скорее всего iproto_ - это общие функции. net_ - для
сетевого потока, и tx_ - для главного потока. С некоторыми исключениями,
которые скорее всего просто неправильны.

>     > +{
>     > + if (!msg->close_connection)
>     > + return false;
>     > + struct iproto_connection *con = msg->connection;
>     > + int64_t nwr = sio_writev(con->output.fd, msg->wpos.obuf->iov,
>     > + obuf_iovcnt(msg->wpos.obuf));
> 
>     16. You can't just write to a socket whenever you want. It may be
>     not writable. Please, use libev events and callbacks to send anything.
>     You can do a write only when you get EV_WRITE event from libev, who
>     in turn gets it from select/poll/epoll/kqueue/whatever-else-is-provided
>     by the OS.
> 
> Right. As discussed in voice, it is special error case. In case we will
> do the normal way, we won’t be able to send the error. I checked the
> case with bad_trigger.test and it clearly shows that the client won’t
> receive  the error message in case we will try to wait for the event here,
> while sio_writev() does its job.

Ясен пень, если ты сокет закроешь, то писать в него будет нельзя, и никакие
события на него больше не придут от libev. Надо его не закрывать, и тогда
все будет.

Голосом мы обсудили, что это не специальный случай, когда записать ошибку нормально
невозможно, а что просто автору оригинального кода было лень сделать это
нормально через события. И он решил записать в сокет напрямую. Теперь это
протаскивается дальше. Еще вероятная причина - все заточено на использование
obuf, в который нельзя было писать в ипрото потоке впринципе. Я пока не могу
сказать, что с этим делать. Зависит от того, что дальше будет с obuf. См
другое письмо.