[patches] Re: [PATCH] Break connection on timeout

Konstantin Osipov kostja at tarantool.org
Sat Feb 10 09:42:17 MSK 2018


* Vladimir Davydov <vdavydov.dev at gmail.com> [18/02/09 12:12]:
> > > branch: gh-3025-break-connection-timeout-v2
> > 
> > I like the idea of disconnect timeout being separate from other
> > timeouts, since I'm really concerned with the case of bogus
> > disconnects in case of different heartbeat settings at master and
> > slave. 
> 
> I beg to disagree, because I don't see how the new configuration option
> would help against configurations being different on replicas of the
> same cluster. Besides, disconnect_timeout and replication_timeout would
> be interconnected - it wouldn't make sense to set disconnect_timeout to
> a value less than replication_timeout.

Let's discuss it f2f, the idea is that these options could be
different, but it wouldn't matter - they would work as expected,
since the quantity used will be the number of peer heartbeat
round-trips.

I'm all for better ideas.

> > 
> > I'm just brainstorming, but how about this:
> > 
> > - we pass time of the next heartbeat in each heartbeat message. 
> >   We can measure this time, instead of passing it on - e.g. by 
> >   keeping applier.prev_heartbeat_time or similar.
> > - unless we have reached (heartbeat time) *
> >   disconnect_heartbeat_count timeout without heartbeats, we don't
> >   disconnect. Initially, until we know prev_heartbeat_time Or we
> >   only disconnect after some hard-coded timeout
> >   like 300 seconds. And we never let to set heartbeat timeout
> >   above 60 seconds, so that the two options work well together.
> > - we disconnect only after we missed X heartbeats.
> > 
> > Other options?
> 
> To be honest, I think it's an overkill. We haven't seen any complaints
> about spurious disconnects. Why should we care about this problem now?

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.org - www.twitter.com/kostja_osipov



More information about the Tarantool-patches mailing list