Re: [patches] Re: [PATCH] Break connection on timeout

Konstantin Belyavskiy k.belyavskiy at tarantool.org
Thu Feb 8 20:26:21 MSK 2018


branch: gh-3025-break-connection-timeout-v2

>Четверг,  8 февраля 2018, 20:05 +03:00 от Vladimir Davydov <vdavydov.dev at gmail.com>:
>
>On Thu, Feb 08, 2018 at 07:42:46PM +0300, Konstantin Belyavskiy wrote:
>> In replication schema if one of the instances was powered off, it doesn't detected
>> by others and the connection hangs. Alive machines show 'follow' state.
>> Add timeout to solve this issue. It's safe since applier and relay both send
>> messages every replication_timeout so we can assume that if we read nothing we
>> have problem with connection.
>> Use replication_disconnect_timeout which is replication_timeout * 4 as for now.
>
>Personally, I don't think we need the new option for now, hardcocded
>replication_timeout * 4 (see replication_disconnect_timeout() helper)
>will do just fine IMHO. If anybody asks for it (nobody has yet), we can
>introduce it anytime later. To simulate the server freeze in the test,
>you can use the existing ERRINJ_RELAY_REPORT_INTERVAL.
>
>Anyway, if you do think the option is absolutely necessary, let's do one
>step at a time - first fix the actual bug, only then add the new option
>in a separate patch.
>
>> 
>> Closes #3025
>> ---
>>  src/box/applier.cc               |  7 ++++++-
>>  src/box/box.cc                   | 23 ++++++++++++++++++++-
>>  src/box/box.h                    |  1 +
>>  src/box/lua/cfg.cc               | 13 ++++++++++++
>>  src/box/lua/load_cfg.lua         |  4 ++++
>>  src/box/relay.cc                 |  2 +-
>>  src/box/replication.cc           |  1 +
>>  src/box/replication.h            |  7 ++-----
>>  test/app-tap/init_script.result  | 43 ++++++++++++++++++++--------------------
>>  test/box/admin.result            |  2 ++
>>  test/box/cfg.result              |  4 ++++
>>  test/replication/errinj.result   | 13 ++++++++++++
>>  test/replication/errinj.test.lua |  7 +++++++
>>  13 files changed, 98 insertions(+), 29 deletions(-)
>> 
>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>> index f0073bada..bc1f8fc5e 100644
>> --- a/src/box/applier.cc
>> +++ b/src/box/applier.cc
>> @@ -418,7 +418,12 @@ applier_subscribe(struct applier *applier)
>>  			applier_set_state(applier, APPLIER_FOLLOW);
>>  		}
>> 
>> -		coio_read_xrow(coio, ibuf, &row);
>> +		if (applier->version_id < version_id(1, 7, 7))
>
>A comment why you need to check a version here is a must. 
Agree, fixed
>
>
>> +			coio_read_xrow(coio, ibuf, &row);
>> +		else {
>> +			coio_read_xrow_timeout_xc(coio, ibuf, &row,
>> +						  replication_disconnect_timeout);
>> +		}
>
>> diff --git a/test/replication/errinj.result b/test/replication/errinj.result
>> index d1f1dbe91..d0121e209 100644
>> --- a/test/replication/errinj.result
>> +++ b/test/replication/errinj.result
>> @@ -426,6 +426,19 @@ test_run:cmd("switch replica_ack")
>>  ---
>>  - true
>>  ...
>> +-- applier connection timeout is by default proportional to replication_timeout, so since it's very low
>> +-- for this test case, we should manually set it to new value to allow connection. It should be higher
>> +-- than master's replication_timeout (0.1 in this case)
>> +box.cfg{replication_disconnect_timeout = 1.}
>> +---
>> +...
>> +fiber = require('fiber')
>> +---
>> +...
>> +-- wait master's replication_timeout (0.1) couple times to make sure connection is done (follow)
>> +fiber.sleep(0.5)
>> +---
>> +...
>
>This timeout is too long. Please try to make it shorter so that the
>tests run faster.
Reduce it to 0.15s. Still even original test was quite long to complete..

С уважением,
Konstantin Belyavskiy
k.belyavskiy at tarantool.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20180208/fd1cedb0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Allow-http-client-connection-to-use-unix-socket.patch
Type: application/x-patch
Size: 18204 bytes
Desc: not available
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20180208/fd1cedb0/attachment.bin>


More information about the Tarantool-patches mailing list