From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f65.google.com (mail-lf1-f65.google.com [209.85.167.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 61B31469719 for ; Thu, 13 Feb 2020 09:47:33 +0300 (MSK) Received: by mail-lf1-f65.google.com with SMTP id m30so3414709lfp.8 for ; Wed, 12 Feb 2020 22:47:33 -0800 (PST) Date: Thu, 13 Feb 2020 09:47:30 +0300 From: Konstantin Osipov Message-ID: <20200213064730.GA17713@atlas> References: <85cfed8fd0cfa23c5a8504b6516fad18028edbaf.1581551227.git.sergepetrenko@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <85cfed8fd0cfa23c5a8504b6516fad18028edbaf.1581551227.git.sergepetrenko@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH 1/2] replication: correctly check for rows to skip in applier List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: sergepetrenko Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org * sergepetrenko [20/02/13 09:34]: > Fix replicaset.applier.vclock initialization issues: it wasn't > initialized at all previously. In the next line you say that you remove the initialization. What do you mean here? > Moreover, there is no valid point in code > to initialize it, since it may get stale right away if new entries are > written to WAL. Well, it reflects the state of the wal *as seen by* the set of appliers. This is stated in the comment. So it doesn't have to reflect local changes. > So, check for both applier and replicaset vclocks. > The greater one protects the instance from applying the rows it has > already applied or has already scheduled to write. > Also remove an unnecessary aplier vclock initialization from > replication_init(). First of all, the race you describe applies to local changes only. Yet you add the check for all replica ids. This further obliterates this piece of code. Second, the core of the issue is a "hole" in vclock protection enforced by latch_lock/latch_unlock. Basically the assumption that latch_lock/latch_unlock has is that while a latch is locked, no source can apply a transaction under this replica id. This, is violated by the local WAL. We used to skip all changes by local vclock id before in applier. Later it was changed to be able to get-your-own logs on recovery, e.g. if some replica has them , and the local node lost a piece of wal. It will take me a while to find this commit and ticket, but this is the commit and ticket which introduced the regression. The proper fix is to only apply local changes received from remotes in orphan mode, and begin skipping them when entering read-write mode. > Closes #4739 > --- > src/box/applier.cc | 14 ++++++++++++-- > src/box/replication.cc | 1 - > 2 files changed, 12 insertions(+), 3 deletions(-) > > diff --git a/src/box/applier.cc b/src/box/applier.cc > index ae3d281a5..acb26b7e2 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -731,8 +731,18 @@ applier_apply_tx(struct stailq *rows) > struct latch *latch = (replica ? &replica->order_latch : > &replicaset.applier.order_latch); > latch_lock(latch); > - if (vclock_get(&replicaset.applier.vclock, > - first_row->replica_id) >= first_row->lsn) { > + /* > + * We cannot tell which vclock is greater. There is no > + * proper place to initialize applier vclock, since it > + * may get stale right away if we write something to WAL > + * and it gets replicated and then arrives back from the > + * replica. So check against both vclocks. Replicaset > + * vclock will guard us from corner cases like the one > + * above. > + */ > + if (MAX(vclock_get(&replicaset.applier.vclock, first_row->replica_id), > + vclock_get(&replicaset.vclock, first_row->replica_id)) >= > + first_row->lsn) { > latch_unlock(latch); > return 0; > } > diff --git a/src/box/replication.cc b/src/box/replication.cc > index e7bfa22ab..7b04573a4 100644 > --- a/src/box/replication.cc > +++ b/src/box/replication.cc > @@ -93,7 +93,6 @@ replication_init(void) > latch_create(&replicaset.applier.order_latch); > > vclock_create(&replicaset.applier.vclock); > - vclock_copy(&replicaset.applier.vclock, &replicaset.vclock); > rlist_create(&replicaset.applier.on_rollback); > rlist_create(&replicaset.applier.on_commit); > > -- > 2.20.1 (Apple Git-117) -- Konstantin Osipov, Moscow, Russia