[PATCH] applier: stop sending ACKs if master closed socket

Vladimir Davydov vdavydov.dev at gmail.com
Tue Jan 30 13:53:42 MSK 2018


On Mon, Jan 29, 2018 at 09:22:03PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev at gmail.com> [18/01/29 20:21]:
> > > > 
> > > > To avoid that, let's make the applier writer fiber (the one that sends
> > > > ACKs) exit immediately if it receives EPIPE error while trying to send
> > > > an ACK.
> > > > 
> > > > Closes #2945
> > > > ---
> > > > Branch: gh-2945-applier-dont-send-acks-if-master-closed-socket
> > > > 
> > > >  src/box/applier.cc | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/src/box/applier.cc b/src/box/applier.cc
> > > > index f0073bad..e8907975 100644
> > > > --- a/src/box/applier.cc
> > > > +++ b/src/box/applier.cc
> > > > @@ -118,6 +118,13 @@ applier_writer_f(va_list ap)
> > > >  			coio_write_xrow(&io, &xrow);
> > > >  		} catch (SocketError *e) {
> > > >  			/*
> > > > +			 * There is no point trying to send ACKs if
> > > > +			 * the master closed its end - we would only
> > > > +			 * spam the log - so exit immediately.
> > > > +			 */
> > > > +			if (e->get_errno() == EPIPE)
> > > > +				break;
> > > > +			/*
> > > 
> > > This fix it contradicts the comment below. 
> > > If the applier ACK fiber got an error, it waits for reconnect.
> > > It should not send "noisy" acks while this hasn't happened,
> > > regardless of the type of error. Why is EPIPE special?
> > 
> > When it happens (EPIPE), the applier is still connected to the master,
> > because there are rows left in the socket. For every row applied, the
> > applier wakes up the writer (via writer_cond), which makes it send an
> > ack and consequently log an error as the receiving end (at the master's
> > side) has been closed.
> 
> Why are we reading and acking rows one by one in the first place?
> We should be acking the entier batch with a single event, no?

Hmm, how do we define "batch" in applier?

Generally, I agree that it is an overkill to send an ACK per each
applied request, but I don't know how to fix it right. I see the
following options:

 - Rate limit ACKs, say only send an ACK once per
   replication_timeout tops.

 - Send an ACK once per N messages/bytes. What should N be?
   If N is not big enough, we will still get quite a lot of
   error messages from the writer fiber though.

 - Do not send an ACK until the socket is empty
   (use ioctl(FIONREAD) to detect it?)

What do you think?

Thanks.



More information about the Tarantool-patches mailing list