[PATCH] replication: fix disconnect due to race condition

Konstantin Belyavskiy k.belyavskiy at tarantool.org
Fri Feb 16 14:39:11 MSK 2018


Incomming ACK lead to race condition and prevent heartbeat
messages. It ends up with disconnect on timeout.
This fix based on @locker proposal to send vclock only to
reply master (since it itself send heartbeat messages).

Fix #3160
---
branch: gh-3160-disconnect-race-condition
 src/box/applier.cc | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 91769ae00..d9656c870 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -98,6 +98,11 @@ applier_log_error(struct applier *applier, struct error *e)
 
 /*
  * Fiber function to write vclock to replication master.
+ * To track conncection status, replica answers to master
+ * with encoded vclock. In addition to update requests,
+ * master also sends heartbeat messages every
+ * replication_timeout (introduced in 1.7.7).
+ * On such requests replica also responds with vlock.
  */
 static int
 applier_writer_f(va_list ap)
@@ -106,10 +111,9 @@ applier_writer_f(va_list ap)
 	struct ev_io io;
 	coio_create(&io, applier->io.fd);
 
-	/* Re-connect loop */
 	while (!fiber_is_cancelled()) {
 		fiber_cond_wait_timeout(&applier->writer_cond,
-					replication_timeout);
+					TIMEOUT_INFINITY);
 		/* Send ACKs only when in FOLLOW mode ,*/
 		if (applier->state != APPLIER_SYNC &&
 		    applier->state != APPLIER_FOLLOW)
-- 
2.14.3 (Apple Git-98)




More information about the Tarantool-patches mailing list