From: Konstantin Osipov <kostja.osipov@gmail.com>
To: sergepetrenko <sergepetrenko@tarantool.org>,
alexander.turenko@tarantool.org, v.shpilevoy@tarantool.org,
tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 1/2] replication: correctly check for rows to skip in applier
Date: Thu, 13 Feb 2020 09:58:49 +0300 [thread overview]
Message-ID: <20200213065849.GA18311@atlas> (raw)
In-Reply-To: <20200213064730.GA17713@atlas>
* Konstantin Osipov <kostja.osipov@gmail.com> [20/02/13 09:47]:
From relay.cc:
/*
* We're feeding a WAL, thus responding to FINAL JOIN or SUBSCRIBE
* request. If this is FINAL JOIN (i.e. relay->replica is NULL),
* we must relay all rows, even those originating from the replica
* itself (there may be such rows if this is rebootstrap). If this
* SUBSCRIBE, only send a row if it is not from the same replica
* (i.e. don't send replica's own rows back) or if this row is
* missing on the other side (i.e. in case of sudden power-loss,
* data was not written to WAL, so remote master can't recover
* it). In the latter case packet's LSN is less than or equal to
* local master's LSN at the moment it received 'SUBSCRIBE' request.
*/
if (relay->replica == NULL ||
packet->replica_id != relay->replica->id ||
packet->lsn <= vclock_get(&relay->local_vclock_at_subscribe,
packet->replica_id)) {
struct errinj *inj = errinj(ERRINJ_RELAY_BREAK_LSN,
ERRINJ_INT);
if (inj != NULL && packet->lsn == inj->iparam) {
packet->lsn = inj->iparam - 1;
say_warn("injected broken lsn: %lld",
(long long) packet->lsn);
}
relay_send(relay, packet);
}
}
As you can see we never send our own rows back, as long as
they are greater than relay->local_vclock_at_subscribe.
So what exactly does go wrong here?
Is the bug triggered during initial replication configuration, or
during a reconfiguration?
I suspect the issue is that at reconfiguration we send
local_vclock_at_subscribe, but keep changing it.
The fix then would be to make sure the local component in
local_vclock_at_subscribe is set to infinity during
reconfiguration.
> * sergepetrenko <sergepetrenko@tarantool.org> [20/02/13 09:34]:
> > Fix replicaset.applier.vclock initialization issues: it wasn't
> > initialized at all previously.
>
> In the next line you say that you remove the initialization. What
> do you mean here?
>
> > Moreover, there is no valid point in code
> > to initialize it, since it may get stale right away if new entries are
> > written to WAL.
>
> Well, it reflects the state of the wal *as seen by* the set of
> appliers. This is stated in the comment. So it doesn't have to
> reflect local changes.
>
> > So, check for both applier and replicaset vclocks.
> > The greater one protects the instance from applying the rows it has
> > already applied or has already scheduled to write.
> > Also remove an unnecessary aplier vclock initialization from
> > replication_init().
>
> First of all, the race you describe applies to
> local changes only. Yet you add the check for all replica ids.
> This further obliterates this piece of code.
>
> Second, the core of the issue is a "hole" in vclock protection
> enforced by latch_lock/latch_unlock. Basically the assumption that
> latch_lock/latch_unlock has is that while a latch is locked, no
> source can apply a transaction under this replica id. This, is
> violated by the local WAL.
>
> We used to skip all changes by local vclock id before in applier.
>
> Later it was changed to be able to get-your-own logs on recovery,
> e.g. if some replica has them , and the local node lost a piece of
> wal.
>
> It will take me a while to find this commit and ticket, but this
> is the commit and ticket which introduced the regression.
>
> The proper fix is to only apply local changes received from
> remotes in orphan mode, and begin skipping them when entering
> read-write mode.
>
> > Closes #4739
> > ---
> > src/box/applier.cc | 14 ++++++++++++--
> > src/box/replication.cc | 1 -
> > 2 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/box/applier.cc b/src/box/applier.cc
> > index ae3d281a5..acb26b7e2 100644
> > --- a/src/box/applier.cc
> > +++ b/src/box/applier.cc
> > @@ -731,8 +731,18 @@ applier_apply_tx(struct stailq *rows)
> > struct latch *latch = (replica ? &replica->order_latch :
> > &replicaset.applier.order_latch);
> > latch_lock(latch);
> > - if (vclock_get(&replicaset.applier.vclock,
> > - first_row->replica_id) >= first_row->lsn) {
> > + /*
> > + * We cannot tell which vclock is greater. There is no
> > + * proper place to initialize applier vclock, since it
> > + * may get stale right away if we write something to WAL
> > + * and it gets replicated and then arrives back from the
> > + * replica. So check against both vclocks. Replicaset
> > + * vclock will guard us from corner cases like the one
> > + * above.
> > + */
> > + if (MAX(vclock_get(&replicaset.applier.vclock, first_row->replica_id),
> > + vclock_get(&replicaset.vclock, first_row->replica_id)) >=
> > + first_row->lsn) {
> > latch_unlock(latch);
> > return 0;
> > }
> > diff --git a/src/box/replication.cc b/src/box/replication.cc
> > index e7bfa22ab..7b04573a4 100644
> > --- a/src/box/replication.cc
> > +++ b/src/box/replication.cc
> > @@ -93,7 +93,6 @@ replication_init(void)
> > latch_create(&replicaset.applier.order_latch);
> >
> > vclock_create(&replicaset.applier.vclock);
> > - vclock_copy(&replicaset.applier.vclock, &replicaset.vclock);
> > rlist_create(&replicaset.applier.on_rollback);
> > rlist_create(&replicaset.applier.on_commit);
> >
> > --
> > 2.20.1 (Apple Git-117)
>
> --
> Konstantin Osipov, Moscow, Russia
--
Konstantin Osipov, Moscow, Russia
next prev parent reply other threads:[~2020-02-13 6:58 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-12 23:50 [Tarantool-patches] [PATCH 0/2] replication: fix applying of rows originating from local instance sergepetrenko
2020-02-12 23:51 ` [Tarantool-patches] [PATCH 1/2] replication: correctly check for rows to skip in applier sergepetrenko
2020-02-13 6:47 ` Konstantin Osipov
2020-02-13 6:58 ` Konstantin Osipov [this message]
2020-02-13 22:03 ` Sergey Petrenko
2020-02-13 22:10 ` Sergey Petrenko
2020-02-12 23:51 ` [Tarantool-patches] [PATCH 2/2] wal: panic when trying to write a record with a broken lsn sergepetrenko
2020-02-13 6:48 ` Konstantin Osipov
2020-02-13 22:05 ` Sergey Petrenko
2020-02-14 7:26 ` Konstantin Osipov
2020-02-13 6:48 ` [Tarantool-patches] [PATCH 0/2] replication: fix applying of rows originating from local instance Konstantin Osipov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200213065849.GA18311@atlas \
--to=kostja.osipov@gmail.com \
--cc=alexander.turenko@tarantool.org \
--cc=sergepetrenko@tarantool.org \
--cc=tarantool-patches@dev.tarantool.org \
--cc=v.shpilevoy@tarantool.org \
--subject='Re: [Tarantool-patches] [PATCH 1/2] replication: correctly check for rows to skip in applier' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox