[PATCH] applier: stop sending ACKs if master closed socket
Vladimir Davydov
vdavydov.dev at gmail.com
Tue Jan 30 13:53:42 MSK 2018
On Mon, Jan 29, 2018 at 09:22:03PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev at gmail.com> [18/01/29 20:21]:
> > > >
> > > > To avoid that, let's make the applier writer fiber (the one that sends
> > > > ACKs) exit immediately if it receives EPIPE error while trying to send
> > > > an ACK.
> > > >
> > > > Closes #2945
> > > > ---
> > > > Branch: gh-2945-applier-dont-send-acks-if-master-closed-socket
> > > >
> > > > src/box/applier.cc | 7 +++++++
> > > > 1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/src/box/applier.cc b/src/box/applier.cc
> > > > index f0073bad..e8907975 100644
> > > > --- a/src/box/applier.cc
> > > > +++ b/src/box/applier.cc
> > > > @@ -118,6 +118,13 @@ applier_writer_f(va_list ap)
> > > > coio_write_xrow(&io, &xrow);
> > > > } catch (SocketError *e) {
> > > > /*
> > > > + * There is no point trying to send ACKs if
> > > > + * the master closed its end - we would only
> > > > + * spam the log - so exit immediately.
> > > > + */
> > > > + if (e->get_errno() == EPIPE)
> > > > + break;
> > > > + /*
> > >
> > > This fix it contradicts the comment below.
> > > If the applier ACK fiber got an error, it waits for reconnect.
> > > It should not send "noisy" acks while this hasn't happened,
> > > regardless of the type of error. Why is EPIPE special?
> >
> > When it happens (EPIPE), the applier is still connected to the master,
> > because there are rows left in the socket. For every row applied, the
> > applier wakes up the writer (via writer_cond), which makes it send an
> > ack and consequently log an error as the receiving end (at the master's
> > side) has been closed.
>
> Why are we reading and acking rows one by one in the first place?
> We should be acking the entier batch with a single event, no?
Hmm, how do we define "batch" in applier?
Generally, I agree that it is an overkill to send an ACK per each
applied request, but I don't know how to fix it right. I see the
following options:
- Rate limit ACKs, say only send an ACK once per
replication_timeout tops.
- Send an ACK once per N messages/bytes. What should N be?
If N is not big enough, we will still get quite a lot of
error messages from the writer fiber though.
- Do not send an ACK until the socket is empty
(use ioctl(FIONREAD) to detect it?)
What do you think?
Thanks.
More information about the Tarantool-patches
mailing list