Re[2]: [patches] Re: [PATCH] Break connection on timeout

Konstantin Belyavskiy k.belyavskiy at tarantool.org
Fri Feb 9 13:07:34 MSK 2018


Guys, let's stop this and choose one option:

Option 1 (without new parameter but with error injection to pass test):
https://github.com/tarantool/tarantool/commit/eeb4ed12938dca972e12cf1ac117641c75ff9ea7
Option 2 (with new parameter disconnect_timeout and without error injection):
https://github.com/tarantool/tarantool/commit/aa6b81c63b03e08981a1272fc31243f04debc230  

I personally think that second is better, that's why I send it to review first.
But still it has one thing to discuss:
For compatibility I bind by default replication_disconnect_timeout to disconnect_timeout,
but probably it's better not do it. Because It has internal check that disconnect timeout is greater
than replication timeout, if user do not change it, all instances will have same value, no matter
that replication timeout value is used. So this will not break connection between different instances,
and makes easier to connect with older versions.


>Пятница,  9 февраля 2018, 12:10 +03:00 от Vladimir Davydov <vdavydov.dev at gmail.com>:
>
>On Thu, Feb 08, 2018 at 11:29:14PM +0300, Konstantin Osipov wrote:
>> * Konstantin Belyavskiy < k.belyavskiy at tarantool.org > [18/02/08 21:53]:
>> > 
>> > branch: gh-3025-break-connection-timeout-v2
>> 
>> I like the idea of disconnect timeout being separate from other
>> timeouts, since I'm really concerned with the case of bogus
>> disconnects in case of different heartbeat settings at master and
>> slave. 
>
>I beg to disagree, because I don't see how the new configuration option
>would help against configurations being different on replicas of the
>same cluster. Besides, disconnect_timeout and replication_timeout would
>be interconnected - it wouldn't make sense to set disconnect_timeout to
>a value less than replication_timeout.
>
>> 
>> I'm just brainstorming, but how about this:
>> 
>> - we pass time of the next heartbeat in each heartbeat message. 
>>   We can measure this time, instead of passing it on - e.g. by 
>>   keeping applier.prev_heartbeat_time or similar.
>> - unless we have reached (heartbeat time) *
>>   disconnect_heartbeat_count timeout without heartbeats, we don't
>>   disconnect. Initially, until we know prev_heartbeat_time Or we
>>   only disconnect after some hard-coded timeout
>>   like 300 seconds. And we never let to set heartbeat timeout
>>   above 60 seconds, so that the two options work well together.
>> - we disconnect only after we missed X heartbeats.
>> 
>> Other options?
>
>To be honest, I think it's an overkill. We haven't seen any complaints
>about spurious disconnects. Why should we care about this problem now?


С уважением,
Konstantin Belyavskiy
k.belyavskiy at tarantool.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20180209/ccf82403/attachment.html>


More information about the Tarantool-patches mailing list