Tarantool discussions archive
 help / color / mirror / Atom feed
* [Tarantool-discussions] [server-dev]  [rfc] iproto connections processing improvements
@ 2020-05-29 13:47 Ilya Kosarev
       [not found] ` <20200529145512.GA189726@atlas>
  0 siblings, 1 reply; 4+ messages in thread
From: Ilya Kosarev @ 2020-05-29 13:47 UTC (permalink / raw)
  To: server-dev, tarantool-discussions

[-- Attachment #1: Type: text/plain, Size: 3428 bytes --]


Hello everyone!
 
It is well known that tarantool processes connections through iproto
subsystem. Due to some problems, roughly described in  the mentioned
tickets , it turns out that this subsystem behavior should be
reconsidered in some aspects.
 
Proposed changes are supposed to solve at least following problems.
First one is descriptors rlimit violation in case with some clients
performing enough requests while tx-thread is unresponsive. According
to  Yaroslav 12 vshard routers reconnecting every 10.5 seconds for 15
minutes are enough to recovery dying with «can't initialize storage:
error reading directory: too many open files» error.
Second one is dirty read and others when tx can response although
bootstrap is not finished.
 
The solution is basically to provide iproto with more freedom, at least
in some cases. As far as i see it can be implemented using humble
state-machine. The alternative is vtab and it seems like an overkill
here, as far as it is less transparent and there can be only 2 options
for each request: to process it or to reject. To start with, we can use
2 states to solve first problem, which seems to be more painful, and
then introduce new states to solve second problem and possibly some
more. These states may be called "solo" & "assist" states. "Assist"
state mostly implies current iproto behavior and shoulbe the basic one,
while "solo" state is intended to be enabled by tx thread when it is
going to become unresponsive for considerale time (for example, while
building secondary keys). "Solo" state means that iproto won't
communicate with tx and will simply answer everyone with any request
that tx is busy. The alternative is some kind of heartbeat from tx to
iproto to allow iproto decide if it needs to change it's state itself,
however it also seems like an overkill. If user, for example, loads tx
so much that it can't communicate with iproto, that is his own problem.
 
This approach is needed as far as now iproto can only accept
connections, consequently spending sockets in case tx thread can't
answer. tx now needs to prepare greeting and only then iproto can send
it. It work the same with all other requests: tx needs to prepare the
answer and then iproto processes it.
Proposed approach allows iproto itself to close connections or ask
them to wait in "solo" state. This will solve leaking descriptors
problem. Late more states can be added, where iproto, for example, will
answer itself only to dml requests (while tx is not ready for it). This
idea is partly realized and it shows satisfying behavior in case with
unresponsive tx.
 
There is one thing that causes trouble: using output bufs with
thread-local slab_cache. Now obuf's slab_cache belongs to tx, while
proposed changes mean that both tx & iproto have to be able to use
them depending on state & request type. I am currently searching for
the best approach here. There is an option to use more obufs (4 instead
of 2), 2 of them belonging to tx thread and 2 of them belonging to
iproto thread.
 
It is also doubtable if connections to iproto in "solo" state should be
closed or retry their requests after some timeout. I propose to close
them, while there is opinion that it is not the right behavior. Though
I think it is more transparent and understandable for users to
reconnect by themselves, also as far as this unresponsive tx state
might last for quite a long time.
 
--
Ilya Kosarev

[-- Attachment #2: Type: text/html, Size: 4142 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-discussions] [server-dev]  [rfc] iproto connections processing improvements
       [not found] ` <20200529145512.GA189726@atlas>
@ 2020-06-30 11:18   ` Ilya Kosarev
  2020-06-30 11:47     ` Konstantin Osipov
  0 siblings, 1 reply; 4+ messages in thread
From: Ilya Kosarev @ 2020-06-30 11:18 UTC (permalink / raw)
  To: Konstantin Osipov, server-dev; +Cc: tarantool-discussions

[-- Attachment #1: Type: text/plain, Size: 5929 bytes --]


Ссылки были заинлайнены под слова.
Не учёл, что это работает не во всех клиентах.
https://github.com/tarantool/tarantool/issues/3776
https://github.com/tarantool/tarantool/issues/4646
https://github.com/tarantool/tarantool/issues/4910
 
Про vtab/состояния конечно вопрос отдельный, я написал, как это вижу.
 
В iproto_msg_decode как раз есть возможность отделить одно от другого.
Или речь не об этом? В предложении в том числе говорится как раз о том,
чтобы выполнять iproto_msg_decode не дожидаясь tx (сейчас tx слишком
активно участвует в процессе сетевого взаимодействия.)
 
Кроме того, так как в итоге всё равно tx должен поучаствовать в
процессе, когда он "занят", мы хотим отсекать любые соединения
в iproto, чтобы не происходило собственно утечки дескрипторов, как
минимум.
 
--
Ilya Kosarev
>Пятница, 29 мая 2020, 17:55 +03:00 от Konstantin Osipov < kostja@scylladb.com >:
> 
>* Ilya Kosarev < i.kosarev@tarantool.org > [20/05/29 16:49]:
>
>Илья, по-русски то тут сложно было бы разобраться, а по-английски
>уж и подавно.
>
>Ссылок на "mentioned tickets" нет.
>
>vtab значит оверкилл, - это идёт отсылка
>к моему комментарию в тикете, видимо?
>Можно было бы со мной напрямую обсудить.
>
>
>В целом, тут вопрос не в rfc vs vtab, а в том как разделить
>в трафике соединения от реплик, которые нужно принимать во время
>бутстрапа, от соединений от клиентов.
>
>На сегодня в протоколе таких различий нет.
>
>В письме об этом ничего нет.
>
> 
>>
>> Hello everyone!
>>  
>> It is well known that tarantool processes connections through iproto
>> subsystem. Due to some problems, roughly described in the mentioned
>> tickets , it turns out that this subsystem behavior should be
>> reconsidered in some aspects.
>>  
>> Proposed changes are supposed to solve at least following problems.
>> First one is descriptors rlimit violation in case with some clients
>> performing enough requests while tx-thread is unresponsive. According
>> to Yaroslav 12 vshard routers reconnecting every 10.5 seconds for 15
>> minutes are enough to recovery dying with «can't initialize storage:
>> error reading directory: too many open files» error.
>> Second one is dirty read and others when tx can response although
>> bootstrap is not finished.
>>  
>> The solution is basically to provide iproto with more freedom, at least
>> in some cases. As far as i see it can be implemented using humble
>> state-machine. The alternative is vtab and it seems like an overkill
>> here, as far as it is less transparent and there can be only 2 options
>> for each request: to process it or to reject. To start with, we can use
>> 2 states to solve first problem, which seems to be more painful, and
>> then introduce new states to solve second problem and possibly some
>> more. These states may be called "solo" & "assist" states. "Assist"
>> state mostly implies current iproto behavior and shoulbe the basic one,
>> while "solo" state is intended to be enabled by tx thread when it is
>> going to become unresponsive for considerale time (for example, while
>> building secondary keys). "Solo" state means that iproto won't
>> communicate with tx and will simply answer everyone with any request
>> that tx is busy. The alternative is some kind of heartbeat from tx to
>> iproto to allow iproto decide if it needs to change it's state itself,
>> however it also seems like an overkill. If user, for example, loads tx
>> so much that it can't communicate with iproto, that is his own problem.
>>  
>> This approach is needed as far as now iproto can only accept
>> connections, consequently spending sockets in case tx thread can't
>> answer. tx now needs to prepare greeting and only then iproto can send
>> it. It work the same with all other requests: tx needs to prepare the
>> answer and then iproto processes it.
>> Proposed approach allows iproto itself to close connections or ask
>> them to wait in "solo" state. This will solve leaking descriptors
>> problem. Late more states can be added, where iproto, for example, will
>> answer itself only to dml requests (while tx is not ready for it). This
>> idea is partly realized and it shows satisfying behavior in case with
>> unresponsive tx.
>>  
>> There is one thing that causes trouble: using output bufs with
>> thread-local slab_cache. Now obuf's slab_cache belongs to tx, while
>> proposed changes mean that both tx & iproto have to be able to use
>> them depending on state & request type. I am currently searching for
>> the best approach here. There is an option to use more obufs (4 instead
>> of 2), 2 of them belonging to tx thread and 2 of them belonging to
>> iproto thread.
>>  
>> It is also doubtable if connections to iproto in "solo" state should be
>> closed or retry their requests after some timeout. I propose to close
>> them, while there is opinion that it is not the right behavior. Though
>> I think it is more transparent and understandable for users to
>> reconnect by themselves, also as far as this unresponsive tx state
>> might last for quite a long time.
>>  
>> --
>> Ilya Kosarev
>--
>Konstantin Osipov, Moscow, Russia
 
 
 

[-- Attachment #2: Type: text/html, Size: 7996 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-discussions] [server-dev] [rfc] iproto connections processing improvements
  2020-06-30 11:18   ` Ilya Kosarev
@ 2020-06-30 11:47     ` Konstantin Osipov
  2020-07-02 14:22       ` [Tarantool-discussions] [dev] " Ilya Kosarev
  0 siblings, 1 reply; 4+ messages in thread
From: Konstantin Osipov @ 2020-06-30 11:47 UTC (permalink / raw)
  To: Ilya Kosarev, server-dev; +Cc: tarantool-discussions

* Ilya Kosarev <i.kosarev@tarantool.org> [20/06/30 14:19]:
> 
> Ссылки были заинлайнены под слова.
> Не учёл, что это работает не во всех клиентах.
> https://github.com/tarantool/tarantool/issues/3776
> https://github.com/tarantool/tarantool/issues/4646
> https://github.com/tarantool/tarantool/issues/4910
>  
> Про vtab/состояния конечно вопрос отдельный, я написал, как это вижу.
>  
> В iproto_msg_decode как раз есть возможность отделить одно от другого.

Как именно?


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-discussions] [dev]  [rfc] iproto connections processing improvements
  2020-06-30 11:47     ` Konstantin Osipov
@ 2020-07-02 14:22       ` Ilya Kosarev
  0 siblings, 0 replies; 4+ messages in thread
From: Ilya Kosarev @ 2020-07-02 14:22 UTC (permalink / raw)
  To: Konstantin Osipov, dev; +Cc: tarantool-discussions

[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]


As a result of private discussion, here are the steps to be
implemented:
1. Greeting should be done by iproto solely. This means session
creation has to be moved to a later point (after iproto_msg_decode).
Thus iproto has to be able to reach iproto_msg_decode without tx
assistance. Iproto also should be able to finish connection itself in
case it is possible (connection being rejected by iproto).
2. Introduce state machine managed from tx. tx should be able to enable
different iproto states depending on tx work phase, for example, to
reject all connections on secondary index build.
3. To be more specific, we need to be able to classify different types
of connections, for example, replica connection vs client connection.
This means we need to add specific flag for replica authentication and
prioritize it if needed depending on the iproto state.
4. New approach to connections handling means we need to reconsider
clients behavior: specific error for this rejection type, reconnection
on timeout.
 
--
Ilya Kosarev 
>Вторник, 30 июня 2020, 14:47 +03:00 от Konstantin Osipov <kostja.osipov@gmail.com>:
> 
>* Ilya Kosarev < i.kosarev@tarantool.org > [20/06/30 14:19]:
>>
>> Ссылки были заинлайнены под слова.
>> Не учёл, что это работает не во всех клиентах.
>>  https://github.com/tarantool/tarantool/issues/3776
>>  https://github.com/tarantool/tarantool/issues/4646
>>  https://github.com/tarantool/tarantool/issues/4910
>>  
>> Про vtab/состояния конечно вопрос отдельный, я написал, как это вижу.
>>  
>> В iproto_msg_decode как раз есть возможность отделить одно от другого.
>Как именно?
>
>
>--
>Konstantin Osipov, Moscow, Russia
 

[-- Attachment #2: Type: text/html, Size: 2652 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-07-02 14:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-29 13:47 [Tarantool-discussions] [server-dev] [rfc] iproto connections processing improvements Ilya Kosarev
     [not found] ` <20200529145512.GA189726@atlas>
2020-06-30 11:18   ` Ilya Kosarev
2020-06-30 11:47     ` Konstantin Osipov
2020-07-02 14:22       ` [Tarantool-discussions] [dev] " Ilya Kosarev

Tarantool discussions archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-discussions/0 tarantool-discussions/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-discussions tarantool-discussions/ https://lists.tarantool.org/tarantool-discussions \
		tarantool-discussions@dev.tarantool.org.
	public-inbox-index tarantool-discussions

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git