<HTML><BODY><div>Hello everyone!</div><div> </div><div>It is well known that tarantool processes connections through iproto<br>subsystem. Due to some problems, roughly described in <a href="https://github.com/tarantool/tarantool/issues/3776">the</a> <a href="https://github.com/tarantool/tarantool/issues/4646">mentioned</a><br><a href="https://github.com/tarantool/tarantool/issues/4910">tickets</a>, it turns out that this subsystem behavior should be<br>reconsidered in some aspects.</div><div> </div><div>Proposed changes are supposed to solve at least following problems.<br>First one is descriptors rlimit violation in case with some clients<br>performing enough requests while tx-thread is unresponsive. According<br>to <a href="https://github.com/rosik">Yaroslav</a> 12 vshard routers reconnecting every 10.5 seconds for 15<br>minutes are enough to recovery dying with «can't initialize storage:<br>error reading directory: too many open files» error.</div><div>Second one is dirty read and others when tx can response although<br>bootstrap is not finished.</div><div> </div><div>The solution is basically to provide iproto with more freedom, at least<br>in some cases. As far as i see it can be implemented using humble<br>state-machine. The alternative is vtab and it seems like an overkill<br>here, as far as it is less transparent and there can be only 2 options<br>for each request: to process it or to reject. To start with, we can use<br>2 states to solve first problem, which seems to be more painful, and<br>then introduce new states to solve second problem and possibly some<br>more. These states may be called "solo" & "assist" states. "Assist"<br>state mostly implies current iproto behavior and shoulbe the basic one,<br>while "solo" state is intended to be enabled by tx thread when it is<br>going to become unresponsive for considerale time (for example, while<br>building secondary keys). "Solo" state means that iproto won't<br>communicate with tx and will simply answer everyone with any request<br>that tx is busy. The alternative is some kind of heartbeat from tx to<br>iproto to allow iproto decide if it needs to change it's state itself,<br>however it also seems like an overkill. If user, for example, loads tx<br>so much that it can't communicate with iproto, that is his own problem.</div><div> </div><div><div>This approach is needed as far as now iproto can only accept<br>connections, consequently spending sockets in case tx thread can't<br>answer. tx now needs to prepare greeting and only then iproto can send<br>it. It work the same with all other requests: tx needs to prepare the<br>answer and then iproto processes it.</div><div>Proposed approach allows iproto itself to close connections or ask<br>them to wait in "solo" state. This will solve leaking descriptors<br>problem. Late more states can be added, where iproto, for example, will<br>answer itself only to dml requests (while tx is not ready for it). This<br>idea is partly realized and it shows satisfying behavior in case with<br>unresponsive tx.</div><div> </div><div>There is one thing that causes trouble: using output bufs with<br>thread-local slab_cache. Now obuf's slab_cache belongs to tx, while<br>proposed changes mean that both tx & iproto have to be able to use<br>them depending on state & request type. I am currently searching for<br>the best approach here. There is an option to use more obufs (4 instead<br>of 2), 2 of them belonging to tx thread and 2 of them belonging to<br>iproto thread.</div><div> </div><div>It is also doubtable if connections to iproto in "solo" state should be<br>closed or retry their requests after some timeout. I propose to close<br>them, while there is opinion that it is not the right behavior. Though<br>I think it is more transparent and understandable for users to<br>reconnect by themselves, also as far as this unresponsive tx state<br>might last for quite a long time.</div></div><div> </div><div data-signature-widget="container"><div data-signature-widget="content"><div>--<br>Ilya Kosarev</div></div></div></BODY></HTML>