From: "Sergey Petrenko" <sergepetrenko@tarantool.org> To: "Konstantin Osipov" <kostja.osipov@gmail.com> Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org Subject: Re: [Tarantool-patches] [PATCH 1/2] replication: correctly check for rows to skip in applier Date: Fri, 14 Feb 2020 01:10:20 +0300 [thread overview] Message-ID: <1581631820.174189421@f334.i.mail.ru> (raw) In-Reply-To: <20200213064730.GA17713@atlas> [-- Attachment #1: Type: text/plain, Size: 3853 bytes --] Hi! Thanks for your detailed answer! I pushed v2 regarding all the comments. >Четверг, 13 февраля 2020, 9:47 +03:00 от Konstantin Osipov <kostja.osipov@gmail.com>: > >* sergepetrenko < sergepetrenko@tarantool.org > [20/02/13 09:34]: >> Fix replicaset.applier.vclock initialization issues: it wasn't >> initialized at all previously. > >In the next line you say that you remove the initialization. What >do you mean here? I changed both the commit and the commit message. > > >> Moreover, there is no valid point in code >> to initialize it, since it may get stale right away if new entries are >> written to WAL. > >Well, it reflects the state of the wal *as seen by* the set of >appliers. This is stated in the comment. So it doesn't have to >reflect local changes. I see. > > >> So, check for both applier and replicaset vclocks. >> The greater one protects the instance from applying the rows it has >> already applied or has already scheduled to write. >> Also remove an unnecessary aplier vclock initialization from >> replication_init(). > >First of all, the race you describe applies to >local changes only. Yet you add the check for all replica ids. >This further obliterates this piece of code. Ok, fixed. > > >Second, the core of the issue is a "hole" in vclock protection >enforced by latch_lock/latch_unlock. Basically the assumption that >latch_lock/latch_unlock has is that while a latch is locked, no >source can apply a transaction under this replica id. This, is >violated by the local WAL. > >We used to skip all changes by local vclock id before in applier. > >Later it was changed to be able to get-your-own logs on recovery, >e.g. if some replica has them , and the local node lost a piece of >wal. > >It will take me a while to find this commit and ticket, but this >is the commit and ticket which introduced the regression. > >The proper fix is to only apply local changes received from >remotes in orphan mode, and begin skipping them when entering >read-write mode. Thanks for the clarification. > > >> Closes #4739 >> --- >> src/box/applier.cc | 14 ++++++++++++-- >> src/box/replication.cc | 1 - >> 2 files changed, 12 insertions(+), 3 deletions(-) >> >> diff --git a/src/box/applier.cc b/src/box/applier.cc >> index ae3d281a5..acb26b7e2 100644 >> --- a/src/box/applier.cc >> +++ b/src/box/applier.cc >> @@ -731,8 +731,18 @@ applier_apply_tx(struct stailq *rows) >> struct latch *latch = (replica ? &replica->order_latch : >> &replicaset.applier.order_latch); >> latch_lock(latch); >> - if (vclock_get(&replicaset.applier.vclock, >> - first_row->replica_id) >= first_row->lsn) { >> + /* >> + * We cannot tell which vclock is greater. There is no >> + * proper place to initialize applier vclock, since it >> + * may get stale right away if we write something to WAL >> + * and it gets replicated and then arrives back from the >> + * replica. So check against both vclocks. Replicaset >> + * vclock will guard us from corner cases like the one >> + * above. >> + */ >> + if (MAX(vclock_get(&replicaset.applier.vclock, first_row->replica_id), >> + vclock_get(&replicaset.vclock, first_row->replica_id)) >= >> + first_row->lsn) { >> latch_unlock(latch); >> return 0; >> } >> diff --git a/src/box/replication.cc b/src/box/replication.cc >> index e7bfa22ab..7b04573a4 100644 >> --- a/src/box/replication.cc >> +++ b/src/box/replication.cc >> @@ -93,7 +93,6 @@ replication_init(void) >> latch_create(&replicaset.applier.order_latch); >> >> vclock_create(&replicaset.applier.vclock); >> - vclock_copy(&replicaset.applier.vclock, &replicaset.vclock); >> rlist_create(&replicaset.applier.on_rollback); >> rlist_create(&replicaset.applier.on_commit); >> >> -- >> 2.20.1 (Apple Git-117) > >-- >Konstantin Osipov, Moscow, Russia -- Sergey Petrenko [-- Attachment #2: Type: text/html, Size: 5614 bytes --]
next prev parent reply other threads:[~2020-02-13 22:10 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-02-12 23:50 [Tarantool-patches] [PATCH 0/2] replication: fix applying of rows originating from local instance sergepetrenko 2020-02-12 23:51 ` [Tarantool-patches] [PATCH 1/2] replication: correctly check for rows to skip in applier sergepetrenko 2020-02-13 6:47 ` Konstantin Osipov 2020-02-13 6:58 ` Konstantin Osipov 2020-02-13 22:03 ` Sergey Petrenko 2020-02-13 22:10 ` Sergey Petrenko [this message] 2020-02-12 23:51 ` [Tarantool-patches] [PATCH 2/2] wal: panic when trying to write a record with a broken lsn sergepetrenko 2020-02-13 6:48 ` Konstantin Osipov 2020-02-13 22:05 ` Sergey Petrenko 2020-02-14 7:26 ` Konstantin Osipov 2020-02-13 6:48 ` [Tarantool-patches] [PATCH 0/2] replication: fix applying of rows originating from local instance Konstantin Osipov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1581631820.174189421@f334.i.mail.ru \ --to=sergepetrenko@tarantool.org \ --cc=kostja.osipov@gmail.com \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 1/2] replication: correctly check for rows to skip in applier' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox