From: "Ilya Kosarev" <i.kosarev@tarantool.org>
To: server-dev@tarantool.org, tarantool-discussions@dev.tarantool.org
Subject: [Tarantool-discussions] [server-dev]  [rfc] iproto connections processing improvements
Date: Fri, 29 May 2020 16:47:58 +0300
Message-ID: <1590760078.650797476@f437.i.mail.ru> (raw)

Hello everyone!
It is well known that tarantool processes connections through iproto
subsystem. Due to some problems, roughly described in  the mentioned
tickets , it turns out that this subsystem behavior should be
reconsidered in some aspects.
Proposed changes are supposed to solve at least following problems.
First one is descriptors rlimit violation in case with some clients
performing enough requests while tx-thread is unresponsive. According
to  Yaroslav 12 vshard routers reconnecting every 10.5 seconds for 15
minutes are enough to recovery dying with «can't initialize storage:
error reading directory: too many open files» error.
Second one is dirty read and others when tx can response although
bootstrap is not finished.
The solution is basically to provide iproto with more freedom, at least
in some cases. As far as i see it can be implemented using humble
state-machine. The alternative is vtab and it seems like an overkill
here, as far as it is less transparent and there can be only 2 options
for each request: to process it or to reject. To start with, we can use
2 states to solve first problem, which seems to be more painful, and
then introduce new states to solve second problem and possibly some
more. These states may be called "solo" & "assist" states. "Assist"
state mostly implies current iproto behavior and shoulbe the basic one,
while "solo" state is intended to be enabled by tx thread when it is
going to become unresponsive for considerale time (for example, while
building secondary keys). "Solo" state means that iproto won't
communicate with tx and will simply answer everyone with any request
that tx is busy. The alternative is some kind of heartbeat from tx to
iproto to allow iproto decide if it needs to change it's state itself,
however it also seems like an overkill. If user, for example, loads tx
so much that it can't communicate with iproto, that is his own problem.
This approach is needed as far as now iproto can only accept
connections, consequently spending sockets in case tx thread can't
answer. tx now needs to prepare greeting and only then iproto can send
it. It work the same with all other requests: tx needs to prepare the
answer and then iproto processes it.
Proposed approach allows iproto itself to close connections or ask
them to wait in "solo" state. This will solve leaking descriptors
problem. Late more states can be added, where iproto, for example, will
answer itself only to dml requests (while tx is not ready for it). This
idea is partly realized and it shows satisfying behavior in case with
unresponsive tx.
There is one thing that causes trouble: using output bufs with
thread-local slab_cache. Now obuf's slab_cache belongs to tx, while
proposed changes mean that both tx & iproto have to be able to use
them depending on state & request type. I am currently searching for
the best approach here. There is an option to use more obufs (4 instead
of 2), 2 of them belonging to tx thread and 2 of them belonging to
iproto thread.
It is also doubtable if connections to iproto in "solo" state should be
closed or retry their requests after some timeout. I propose to close
them, while there is opinion that it is not the right behavior. Though
I think it is more transparent and understandable for users to
reconnect by themselves, also as far as this unresponsive tx state
might last for quite a long time.
Ilya Kosarev

