[PATCH] replication: fix disconnect due to race condition
Konstantin Belyavskiy
k.belyavskiy at tarantool.org
Fri Feb 16 14:39:11 MSK 2018
Incomming ACK lead to race condition and prevent heartbeat
messages. It ends up with disconnect on timeout.
This fix based on @locker proposal to send vclock only to
reply master (since it itself send heartbeat messages).
Fix #3160
---
branch: gh-3160-disconnect-race-condition
src/box/applier.cc | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 91769ae00..d9656c870 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -98,6 +98,11 @@ applier_log_error(struct applier *applier, struct error *e)
/*
* Fiber function to write vclock to replication master.
+ * To track conncection status, replica answers to master
+ * with encoded vclock. In addition to update requests,
+ * master also sends heartbeat messages every
+ * replication_timeout (introduced in 1.7.7).
+ * On such requests replica also responds with vlock.
*/
static int
applier_writer_f(va_list ap)
@@ -106,10 +111,9 @@ applier_writer_f(va_list ap)
struct ev_io io;
coio_create(&io, applier->io.fd);
- /* Re-connect loop */
while (!fiber_is_cancelled()) {
fiber_cond_wait_timeout(&applier->writer_cond,
- replication_timeout);
+ TIMEOUT_INFINITY);
/* Send ACKs only when in FOLLOW mode ,*/
if (applier->state != APPLIER_SYNC &&
applier->state != APPLIER_FOLLOW)
--
2.14.3 (Apple Git-98)
More information about the Tarantool-patches
mailing list