[Tarantool-patches] [PATCH 1/2] replication: correctly check for rows to skip in applier

Konstantin Osipov kostja.osipov at gmail.com
Thu Feb 13 09:47:30 MSK 2020


* sergepetrenko <sergepetrenko at tarantool.org> [20/02/13 09:34]:
> Fix replicaset.applier.vclock initialization issues: it wasn't
> initialized at all previously.

In the next line you say that you remove the initialization. What
do you mean here?

> Moreover, there is no valid point in code
> to initialize it, since it may get stale right away if new entries are
> written to WAL.

Well, it reflects the state of the wal *as seen by* the set of
appliers. This is stated in the comment. So it doesn't have to
reflect local changes. 

> So, check for both applier and replicaset vclocks.
> The greater one protects the instance from applying the rows it has
> already applied or has already scheduled to write.
> Also remove an unnecessary aplier vclock initialization from
> replication_init().

First of all, the race you describe applies to
local changes only. Yet you add the check for all replica ids. 
This further obliterates this piece of code.

Second, the core of the issue is a "hole" in vclock protection
enforced by latch_lock/latch_unlock. Basically the assumption that
latch_lock/latch_unlock has is that while a latch is locked, no
source can apply a transaction under this replica id. This, is
violated by the local WAL.

We used to skip all changes by local vclock id before in applier.

Later it was changed to be able to get-your-own logs on recovery,
e.g. if some replica has them , and the local node lost a piece of
wal.

It will take me a while to find this commit and ticket, but this
is the commit and ticket which introduced the regression.

The proper fix is to only apply local changes received from
remotes in orphan mode, and begin skipping them when entering
read-write mode.

> Closes #4739
> ---
>  src/box/applier.cc     | 14 ++++++++++++--
>  src/box/replication.cc |  1 -
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index ae3d281a5..acb26b7e2 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -731,8 +731,18 @@ applier_apply_tx(struct stailq *rows)
>  	struct latch *latch = (replica ? &replica->order_latch :
>  			       &replicaset.applier.order_latch);
>  	latch_lock(latch);
> -	if (vclock_get(&replicaset.applier.vclock,
> -		       first_row->replica_id) >= first_row->lsn) {
> +	/*
> +	 * We cannot tell which vclock is greater. There is no
> +	 * proper place to initialize applier vclock, since it
> +	 * may get stale right away if we write something to WAL
> +	 * and it gets replicated and then arrives back from the
> +	 * replica. So check against both vclocks. Replicaset
> +	 * vclock will guard us from corner cases like the one
> +	 * above.
> +	 */
> +	if (MAX(vclock_get(&replicaset.applier.vclock, first_row->replica_id),
> +		vclock_get(&replicaset.vclock, first_row->replica_id)) >=
> +	    first_row->lsn) {
>  		latch_unlock(latch);
>  		return 0;
>  	}
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index e7bfa22ab..7b04573a4 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -93,7 +93,6 @@ replication_init(void)
>  	latch_create(&replicaset.applier.order_latch);
>  
>  	vclock_create(&replicaset.applier.vclock);
> -	vclock_copy(&replicaset.applier.vclock, &replicaset.vclock);
>  	rlist_create(&replicaset.applier.on_rollback);
>  	rlist_create(&replicaset.applier.on_commit);
>  
> -- 
> 2.20.1 (Apple Git-117)

-- 
Konstantin Osipov, Moscow, Russia


More information about the Tarantool-patches mailing list