From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <vdavydov.dev@gmail.com>
Date: Thu, 14 Mar 2019 19:43:51 +0300
From: Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [tarantool-patches] [PATCH] evio: fix timeout calculations
Message-ID: <20190314164351.rjdljd5p2wg4f36i@esperanza>
References: <20190314151641.26876-1-sergepetrenko@tarantool.org>
 <D50D4CB9-4CDC-4483-85D7-CC0C9C9D3261@tarantool.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <D50D4CB9-4CDC-4483-85D7-CC0C9C9D3261@tarantool.org>
To: Serge Petrenko <sergepetrenko@tarantool.org>
Cc: tarantool-patches@freelists.org
List-ID: <tarantool-patches.dev.tarantool.org>

On Thu, Mar 14, 2019 at 06:57:12PM +0300, Serge Petrenko wrote:
> 
> 
> > 14 марта 2019 г., в 18:16, Serge Petrenko <sergepetrenko@tarantool.org> написал(а):
> > 
> > The function evio_timeout_update() failed to update the starting time
> > point, which lead to timeouts happening much faster than they should if
> > there were consecutive calls to the function.
> > This lead, for example, to applier timing out while reading a several
> > megabyte-size row in 0.2 seconds even if replication_timeout was set to
> > 15 seconds.
> > 
> > Closes #4042
> > ---
> > https://github.com/tarantool/tarantool/tree/sp/gh-4042-applier-timeout
> > https://github.com/tarantool/tarantool/issues/4042
> > 
> > src/box/xrow_io.cc                         |  4 +-
> > src/lib/core/coio.cc                       | 18 ++--
> > src/lib/core/coio.h                        |  2 +-
> > src/lib/core/evio.h                        |  5 +-
> > test/replication/long_row_timeout.result   | 98 ++++++++++++++++++++++
> > test/replication/long_row_timeout.test.lua | 43 ++++++++++
> > test/replication/replica_big.lua           | 12 +++
> > test/replication/suite.cfg                 |  1 +
> > 8 files changed, 169 insertions(+), 14 deletions(-)
> > create mode 100644 test/replication/long_row_timeout.result
> > create mode 100644 test/replication/long_row_timeout.test.lua
> > create mode 100644 test/replication/replica_big.lua

> diff --git a/test/replication/long_row_timeout.test.lua b/test/replication/long_row_timeout.test.lua
> index 21a522018..3993f1657 100644
> --- a/test/replication/long_row_timeout.test.lua
> +++ b/test/replication/long_row_timeout.test.lua
> @@ -10,13 +10,13 @@ test_run:cmd('create server replica with rpl_master=default, script="replication
>  test_run:cmd('start server replica')
>  box.info.replication[2].downstream.status
>  tup_sz = box.cfg.memtx_max_tuple_size
> -box.cfg{memtx_max_tuple_size = 21 * 1024 * 1024, memtx_memory = 1024 * 1024 * 1024}
> +box.cfg{memtx_max_tuple_size = 21 * 1024 * 1024}
>  
>  -- insert some big rows which cannot be read in one go, so applier yields
>  -- on read a couple of times.
>  s = box.schema.space.create('test')
>  _ = s:create_index('pk')
> -for i = 1,5 do box.space.test:insert{i, require('digest').urandom(20 * 1024 * 1024)} end
> +for i = 1,5 do box.space.test:replace{1, digest.urandom(20 * 1024 * 1024)} collectgarbage('collect') end

After this change you don't need replica_big.lua anymore. I removed it.

Also, we need to call `box.snapshot()` in the end of this test to rotate
xlogs, otherwise the following test may fail trying to apply the huge
rows on a newly deployed replica during final join stage. Added it.

Pushed to 2.1 and 1.10.