<HTML><BODY>Guys, let's stop this and choose one option:<br><br>Option 1 (without new parameter but with error injection to pass test):<br><a href="https://github.com/tarantool/tarantool/commit/eeb4ed12938dca972e12cf1ac117641c75ff9ea7">https://github.com/tarantool/tarantool/commit/eeb4ed12938dca972e12cf1ac117641c75ff9ea7</a><br>Option 2 (with new parameter disconnect_timeout and without error injection):<br><a href="https://github.com/tarantool/tarantool/commit/aa6b81c63b03e08981a1272fc31243f04debc230">https://github.com/tarantool/tarantool/commit/aa6b81c63b03e08981a1272fc31243f04debc230</a> <br><br>I personally think that second is better, that's why I send it to review first.<br>But still it has one thing to discuss:<br>For compatibility I bind by default replication_disconnect_timeout to disconnect_timeout,<br>but probably it's better not do it. Because It has internal check that disconnect timeout is greater<br>than replication timeout, if user do not change it, all instances will have same value, no matter<br>that replication timeout value is used. So this will not break connection between different instances,<br>and makes easier to connect with older versions.<br><br><br><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;">

        Пятница,  9 февраля 2018, 12:10 +03:00 от Vladimir Davydov <vdavydov.dev@gmail.com>:<br>

        <br>

        <div id="">


<div class="js-helper js-readmsg-msg">

        <style type="text/css"></style>

        <div>

                <base target="_self" href="https://e.mail.ru/">

                
            <div id="style_15181674440000000231_BODY">On Thu, Feb 08, 2018 at 11:29:14PM +0300, Konstantin Osipov wrote:<br>

                                 > * Konstantin Belyavskiy <<a href="mailto:k.belyavskiy@tarantool.org">k.belyavskiy@tarantool.org</a>> [18/02/08 21:53]:<br>

> > <br>

> > branch: gh-3025-break-connection-timeout-v2<br>

> <br>

> I like the idea of disconnect timeout being separate from other<br>

> timeouts, since I'm really concerned with the case of bogus<br>

> disconnects in case of different heartbeat settings at master and<br>

> slave. <br>

      <br>

I beg to disagree, because I don't see how the new configuration option<br>

would help against configurations being different on replicas of the<br>

same cluster. Besides, disconnect_timeout and replication_timeout would<br>

be interconnected - it wouldn't make sense to set disconnect_timeout to<br>

a value less than replication_timeout.<br>

<br>

> <br>

> I'm just brainstorming, but how about this:<br>

> <br>

> - we pass time of the next heartbeat in each heartbeat message. <br>

>   We can measure this time, instead of passing it on - e.g. by <br>

>   keeping applier.prev_heartbeat_time or similar.<br>

> - unless we have reached (heartbeat time) *<br>

>   disconnect_heartbeat_count timeout without heartbeats, we don't<br>

>   disconnect. Initially, until we know prev_heartbeat_time Or we<br>

>   only disconnect after some hard-coded timeout<br>

>   like 300 seconds. And we never let to set heartbeat timeout<br>

>   above 60 seconds, so that the two options work well together.<br>

> - we disconnect only after we missed X heartbeats.<br>

> <br>

> Other options?<br>

<br>

To be honest, I think it's an overkill. We haven't seen any complaints<br>

about spurious disconnects. Why should we care about this problem now?<br>

</div>

            
                <base target="_self" href="https://e.mail.ru/">

        </div>


</div>


</div>

</blockquote>

<br>

<br>С уважением,<br>Konstantin Belyavskiy<br>k.belyavskiy@tarantool.org<br></BODY></HTML>