Hi! Thanks for your detailed answer! I pushed v2 regarding all the comments. >Четверг, 13 февраля 2020, 9:47 +03:00 от Konstantin Osipov : > >* sergepetrenko < sergepetrenko@tarantool.org > [20/02/13 09:34]: >> Fix replicaset.applier.vclock initialization issues: it wasn't >> initialized at all previously. > >In the next line you say that you remove the initialization. What >do you mean here? I changed both the commit and the commit message. > > >> Moreover, there is no valid point in code >> to initialize it, since it may get stale right away if new entries are >> written to WAL. > >Well, it reflects the state of the wal *as seen by* the set of >appliers. This is stated in the comment. So it doesn't have to >reflect local changes. I see. > > >> So, check for both applier and replicaset vclocks. >> The greater one protects the instance from applying the rows it has >> already applied or has already scheduled to write. >> Also remove an unnecessary aplier vclock initialization from >> replication_init(). > >First of all, the race you describe applies to >local changes only. Yet you add the check for all replica ids. >This further obliterates this piece of code. Ok, fixed. > > >Second, the core of the issue is a "hole" in vclock protection >enforced by latch_lock/latch_unlock. Basically the assumption that >latch_lock/latch_unlock has is that while a latch is locked, no >source can apply a transaction under this replica id. This, is >violated by the local WAL. > >We used to skip all changes by local vclock id before in applier. > >Later it was changed to be able to get-your-own logs on recovery, >e.g. if some replica has them , and the local node lost a piece of >wal. > >It will take me a while to find this commit and ticket, but this >is the commit and ticket which introduced the regression. > >The proper fix is to only apply local changes received from >remotes in orphan mode, and begin skipping them when entering >read-write mode. Thanks for the clarification. > > >> Closes #4739 >> --- >> src/box/applier.cc | 14 ++++++++++++-- >> src/box/replication.cc | 1 - >> 2 files changed, 12 insertions(+), 3 deletions(-) >> >> diff --git a/src/box/applier.cc b/src/box/applier.cc >> index ae3d281a5..acb26b7e2 100644 >> --- a/src/box/applier.cc >> +++ b/src/box/applier.cc >> @@ -731,8 +731,18 @@ applier_apply_tx(struct stailq *rows) >> struct latch *latch = (replica ? &replica->order_latch : >> &replicaset.applier.order_latch); >> latch_lock(latch); >> - if (vclock_get(&replicaset.applier.vclock, >> - first_row->replica_id) >= first_row->lsn) { >> + /* >> + * We cannot tell which vclock is greater. There is no >> + * proper place to initialize applier vclock, since it >> + * may get stale right away if we write something to WAL >> + * and it gets replicated and then arrives back from the >> + * replica. So check against both vclocks. Replicaset >> + * vclock will guard us from corner cases like the one >> + * above. >> + */ >> + if (MAX(vclock_get(&replicaset.applier.vclock, first_row->replica_id), >> + vclock_get(&replicaset.vclock, first_row->replica_id)) >= >> + first_row->lsn) { >> latch_unlock(latch); >> return 0; >> } >> diff --git a/src/box/replication.cc b/src/box/replication.cc >> index e7bfa22ab..7b04573a4 100644 >> --- a/src/box/replication.cc >> +++ b/src/box/replication.cc >> @@ -93,7 +93,6 @@ replication_init(void) >> latch_create(&replicaset.applier.order_latch); >> >> vclock_create(&replicaset.applier.vclock); >> - vclock_copy(&replicaset.applier.vclock, &replicaset.vclock); >> rlist_create(&replicaset.applier.on_rollback); >> rlist_create(&replicaset.applier.on_commit); >> >> -- >> 2.20.1 (Apple Git-117) > >-- >Konstantin Osipov, Moscow, Russia -- Sergey Petrenko