[PATCH] replication: fix disconnect due to race condition

Vladimir Davydov vdavydov.dev at gmail.com
Fri Feb 16 14:44:30 MSK 2018


On Fri, Feb 16, 2018 at 02:39:11PM +0300, Konstantin Belyavskiy wrote:
> Incomming ACK lead to race condition and prevent heartbeat
> messages. It ends up with disconnect on timeout.
> This fix based on @locker proposal to send vclock only to
> reply master (since it itself send heartbeat messages).
> 
> Fix #3160
> ---
> branch: gh-3160-disconnect-race-condition
>  src/box/applier.cc | 8 ++++++--

Please add a test case.

>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 91769ae00..d9656c870 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -98,6 +98,11 @@ applier_log_error(struct applier *applier, struct error *e)
>  
>  /*
>   * Fiber function to write vclock to replication master.
> + * To track conncection status, replica answers to master
> + * with encoded vclock. In addition to update requests,
> + * master also sends heartbeat messages every
> + * replication_timeout (introduced in 1.7.7).

What happens if the master is older than 1.7.7 and so doesn't
send heartbeats?

> + * On such requests replica also responds with vlock.
>   */
>  static int
>  applier_writer_f(va_list ap)
> @@ -106,10 +111,9 @@ applier_writer_f(va_list ap)
>  	struct ev_io io;
>  	coio_create(&io, applier->io.fd);
>  
> -	/* Re-connect loop */
>  	while (!fiber_is_cancelled()) {
>  		fiber_cond_wait_timeout(&applier->writer_cond,
> -					replication_timeout);
> +					TIMEOUT_INFINITY);
>  		/* Send ACKs only when in FOLLOW mode ,*/
>  		if (applier->state != APPLIER_SYNC &&
>  		    applier->state != APPLIER_FOLLOW)



More information about the Tarantool-patches mailing list