[PATCH] replication: fix disconnect due to race condition
Vladimir Davydov
vdavydov.dev at gmail.com
Fri Feb 16 14:44:30 MSK 2018
On Fri, Feb 16, 2018 at 02:39:11PM +0300, Konstantin Belyavskiy wrote:
> Incomming ACK lead to race condition and prevent heartbeat
> messages. It ends up with disconnect on timeout.
> This fix based on @locker proposal to send vclock only to
> reply master (since it itself send heartbeat messages).
>
> Fix #3160
> ---
> branch: gh-3160-disconnect-race-condition
> src/box/applier.cc | 8 ++++++--
Please add a test case.
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 91769ae00..d9656c870 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -98,6 +98,11 @@ applier_log_error(struct applier *applier, struct error *e)
>
> /*
> * Fiber function to write vclock to replication master.
> + * To track conncection status, replica answers to master
> + * with encoded vclock. In addition to update requests,
> + * master also sends heartbeat messages every
> + * replication_timeout (introduced in 1.7.7).
What happens if the master is older than 1.7.7 and so doesn't
send heartbeats?
> + * On such requests replica also responds with vlock.
> */
> static int
> applier_writer_f(va_list ap)
> @@ -106,10 +111,9 @@ applier_writer_f(va_list ap)
> struct ev_io io;
> coio_create(&io, applier->io.fd);
>
> - /* Re-connect loop */
> while (!fiber_is_cancelled()) {
> fiber_cond_wait_timeout(&applier->writer_cond,
> - replication_timeout);
> + TIMEOUT_INFINITY);
> /* Send ACKs only when in FOLLOW mode ,*/
> if (applier->state != APPLIER_SYNC &&
> applier->state != APPLIER_FOLLOW)
More information about the Tarantool-patches
mailing list