From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp31.i.mail.ru (smtp31.i.mail.ru [94.100.177.91]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 833F04765E0 for ; Mon, 28 Dec 2020 14:21:45 +0300 (MSK) References: <52c05de8a1d2ebe2d0e928b470e9a139f846c7ff.1608840673.git.i.kosarev@tarantool.org> From: Vladislav Shpilevoy Message-ID: Date: Mon, 28 Dec 2020 14:21:43 +0300 MIME-Version: 1.0 In-Reply-To: <52c05de8a1d2ebe2d0e928b470e9a139f846c7ff.1608840673.git.i.kosarev@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH v9 3/3] iproto: move greeting from tx thread to iproto List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ilya Kosarev , alyapunov@tarantool.org Cc: tarantool-patches@dev.tarantool.org Hi! Thanks for the patch! We are getting closer and closer to the final solution! > diff --git a/src/box/iproto.cc b/src/box/iproto.cc > index f7330af21..b92a0433b 100644 > --- a/src/box/iproto.cc > +++ b/src/box/iproto.cc > @@ -963,7 +971,7 @@ iproto_connection_on_input(ev_loop *loop, struct ev_io *watcher, > * otherwise we might deplete the fiber pool in tx > * thread and deadlock. > */ > - if (iproto_check_msg_max()) { > + if (iproto_check_msg_max() && con->session != NULL) { Ok, I see why didn't it close anything before. It didn't even try to read, because my tests used the entire net_msg limit, and only then closed the socket. On the server side it didn't detect the close, because didn't call sio_read(). It is better now, but leads to another issue - you allow to allocate infinite numer of network messages until the connection is established. Net_msg max was here for a reason. That will lead to OOM eventually if connection count is big, and they send a lot or messages, or a few big messages. Because you will allocate net_msg objects and also will consume ibuf and keep it until TX responds. Even if the connection is already closed. I tried to find a way how to resolve this. For example, don't read anything until TX acks the connection. But it has 2 issues: there is no cross-platform way to find if a socket is closed without calling read() on it and consuming all pending data. And this way is basically an ugly version of the next step of what to do with IProto - make TX control its behaviour and introduce a proper state machine for TX. With that said, it looks that this patch won't fix the issue with the current approach. Or rather it fixes the issue, but simply introduces another one, not less severe, about memory instead of descriptors. It seems, you need to implement TX control over IProto now. Make it tell IProto that it should stop accepting new connections and stop sending messages to TX. It should not even hurt replication, because if TX does not dispatch IProto messages, it includes service messages as well, and therefore there is no difference if you not send them to TX, or send and they are not dispatched. After writing the text above, I had a discussion with Mons to sync about what to do here, and in similar places, such as applier fibers running the entire process out of memory when they read their sockets too aggressively (that should persuade you that OOM on network messages is real). For one-thread communications that can be solved with a simple queue limit. For our case we already have a queue limit - net_msg_max. But it is not enough apparently. We need a proper state machine for the entire thread. You need to find all the states TX thread can be in, create a diagram, and describe what it does in each state, and what it can't do. For example, obviously, during local recovery TX thread does not yield, and therefore does not handle messages from other threads (because does not return control to the scheduler fiber) (although you better validate this statement, we may have missed something). After we have the states, we can decide what to do in each state with other threads. For example, before starting local recovery we sync with IProto thread. We make it stop generating new messages, handle all pending messages, and start recovery. After local recovery we let IProto work normally, like now in master branch. Also you need to find what was the reason why listen is even started before local recovery. Because it seems that if we have data to recover locally, it happens regardless of replication settings. And that makes me wonder why the hell do we even start listen so early then? Anyway during local recovery we can't accept replication messages as well, AFAIK (but that can be false). It is worth checking if this commit was really needed: https://github.com/tarantool/tarantool/commit/601bb92a173943185b9050dab4572dc3e6f8e9a1 It could be just that Kostja said we need to do that, and Vova did, because it was easier than argue, but there was no real necessity in it. The state machine for TX thread would be good to make global. Because now it is inside memtx engine if I remember correctly. But it does not belong to memtx engine really. The states are going to be also used to run triggers installed by users, from what I understood after talking to Mons.