[patches] Re: [PATCH] Break connection on timeout

Vladimir Davydov vdavydov.dev at gmail.com
Fri Feb 9 12:10:37 MSK 2018


On Thu, Feb 08, 2018 at 11:29:14PM +0300, Konstantin Osipov wrote:
> * Konstantin Belyavskiy <k.belyavskiy at tarantool.org> [18/02/08 21:53]:
> > 
> > branch: gh-3025-break-connection-timeout-v2
> 
> I like the idea of disconnect timeout being separate from other
> timeouts, since I'm really concerned with the case of bogus
> disconnects in case of different heartbeat settings at master and
> slave. 

I beg to disagree, because I don't see how the new configuration option
would help against configurations being different on replicas of the
same cluster. Besides, disconnect_timeout and replication_timeout would
be interconnected - it wouldn't make sense to set disconnect_timeout to
a value less than replication_timeout.

> 
> I'm just brainstorming, but how about this:
> 
> - we pass time of the next heartbeat in each heartbeat message. 
>   We can measure this time, instead of passing it on - e.g. by 
>   keeping applier.prev_heartbeat_time or similar.
> - unless we have reached (heartbeat time) *
>   disconnect_heartbeat_count timeout without heartbeats, we don't
>   disconnect. Initially, until we know prev_heartbeat_time Or we
>   only disconnect after some hard-coded timeout
>   like 300 seconds. And we never let to set heartbeat timeout
>   above 60 seconds, so that the two options work well together.
> - we disconnect only after we missed X heartbeats.
> 
> Other options?

To be honest, I think it's an overkill. We haven't seen any complaints
about spurious disconnects. Why should we care about this problem now?



More information about the Tarantool-patches mailing list