Tarantool development patches archive
 help / color / mirror / Atom feed
From: Konstantin Osipov <kostja.osipov@gmail.com>
To: sergepetrenko <sergepetrenko@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 1/2] replication: correctly check for rows to skip in applier
Date: Thu, 13 Feb 2020 09:47:30 +0300	[thread overview]
Message-ID: <20200213064730.GA17713@atlas> (raw)
In-Reply-To: <85cfed8fd0cfa23c5a8504b6516fad18028edbaf.1581551227.git.sergepetrenko@tarantool.org>

* sergepetrenko <sergepetrenko@tarantool.org> [20/02/13 09:34]:
> Fix replicaset.applier.vclock initialization issues: it wasn't
> initialized at all previously.

In the next line you say that you remove the initialization. What
do you mean here?

> Moreover, there is no valid point in code
> to initialize it, since it may get stale right away if new entries are
> written to WAL.

Well, it reflects the state of the wal *as seen by* the set of
appliers. This is stated in the comment. So it doesn't have to
reflect local changes. 

> So, check for both applier and replicaset vclocks.
> The greater one protects the instance from applying the rows it has
> already applied or has already scheduled to write.
> Also remove an unnecessary aplier vclock initialization from
> replication_init().

First of all, the race you describe applies to
local changes only. Yet you add the check for all replica ids. 
This further obliterates this piece of code.

Second, the core of the issue is a "hole" in vclock protection
enforced by latch_lock/latch_unlock. Basically the assumption that
latch_lock/latch_unlock has is that while a latch is locked, no
source can apply a transaction under this replica id. This, is
violated by the local WAL.

We used to skip all changes by local vclock id before in applier.

Later it was changed to be able to get-your-own logs on recovery,
e.g. if some replica has them , and the local node lost a piece of
wal.

It will take me a while to find this commit and ticket, but this
is the commit and ticket which introduced the regression.

The proper fix is to only apply local changes received from
remotes in orphan mode, and begin skipping them when entering
read-write mode.

> Closes #4739
> ---
>  src/box/applier.cc     | 14 ++++++++++++--
>  src/box/replication.cc |  1 -
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index ae3d281a5..acb26b7e2 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -731,8 +731,18 @@ applier_apply_tx(struct stailq *rows)
>  	struct latch *latch = (replica ? &replica->order_latch :
>  			       &replicaset.applier.order_latch);
>  	latch_lock(latch);
> -	if (vclock_get(&replicaset.applier.vclock,
> -		       first_row->replica_id) >= first_row->lsn) {
> +	/*
> +	 * We cannot tell which vclock is greater. There is no
> +	 * proper place to initialize applier vclock, since it
> +	 * may get stale right away if we write something to WAL
> +	 * and it gets replicated and then arrives back from the
> +	 * replica. So check against both vclocks. Replicaset
> +	 * vclock will guard us from corner cases like the one
> +	 * above.
> +	 */
> +	if (MAX(vclock_get(&replicaset.applier.vclock, first_row->replica_id),
> +		vclock_get(&replicaset.vclock, first_row->replica_id)) >=
> +	    first_row->lsn) {
>  		latch_unlock(latch);
>  		return 0;
>  	}
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index e7bfa22ab..7b04573a4 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -93,7 +93,6 @@ replication_init(void)
>  	latch_create(&replicaset.applier.order_latch);
>  
>  	vclock_create(&replicaset.applier.vclock);
> -	vclock_copy(&replicaset.applier.vclock, &replicaset.vclock);
>  	rlist_create(&replicaset.applier.on_rollback);
>  	rlist_create(&replicaset.applier.on_commit);
>  
> -- 
> 2.20.1 (Apple Git-117)

-- 
Konstantin Osipov, Moscow, Russia

  reply	other threads:[~2020-02-13  6:47 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-12 23:50 [Tarantool-patches] [PATCH 0/2] replication: fix applying of rows originating from local instance sergepetrenko
2020-02-12 23:51 ` [Tarantool-patches] [PATCH 1/2] replication: correctly check for rows to skip in applier sergepetrenko
2020-02-13  6:47   ` Konstantin Osipov [this message]
2020-02-13  6:58     ` Konstantin Osipov
2020-02-13 22:03       ` Sergey Petrenko
2020-02-13 22:10     ` Sergey Petrenko
2020-02-12 23:51 ` [Tarantool-patches] [PATCH 2/2] wal: panic when trying to write a record with a broken lsn sergepetrenko
2020-02-13  6:48   ` Konstantin Osipov
2020-02-13 22:05     ` Sergey Petrenko
2020-02-14  7:26       ` Konstantin Osipov
2020-02-13  6:48 ` [Tarantool-patches] [PATCH 0/2] replication: fix applying of rows originating from local instance Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200213064730.GA17713@atlas \
    --to=kostja.osipov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 1/2] replication: correctly check for rows to skip in applier' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox