Re: [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: sergepetrenko <sergepetrenko@tarantool.org>,
	kirichenkoga@gmail.com, kostja.osipov@gmail.com,
	alexander.turenko@tarantool.org
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it
Date: Thu, 27 Feb 2020 00:54:54 +0100	[thread overview]
Message-ID: <1bb673a1-8acb-d13a-edc4-4ad13a93a13a@tarantool.org> (raw)
In-Reply-To: <d35a224112e2d32014cf973dc662cc890eca0323.1582709303.git.sergepetrenko@tarantool.org>

Thanks for the patch!

See 4 comments below.

>     replication: do not relay rows coming from a remote instance back to it
>     
>     We have a mechanism for restoring rows originating from an instance that
>     suffered a sudden power loss: remote masters resend the isntance's rows
>     received before a certain point in time, defined by remote master vclock
>     at the moment of subscribe.
>     However, this is useful only on initial replication configuraiton, when
>     an instance has just recovered, so that it can receive what it has
>     relayed but haven't synced to disk.
>     In other cases, when an instance is operating normally and master-master
>     replication is configured, the mechanism described above may lead to
>     instance re-applying instance's own rows, coming from a master it has just
>     subscribed to.
>     To fix the problem do not relay rows coming from a remote instance, if
>     the instance has already recovered.
>     
>     Closes #4739
> 
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 911353425..73ffc0d68 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -866,8 +866,13 @@ applier_subscribe(struct applier *applier)
>  	struct vclock vclock;
>  	vclock_create(&vclock);
>  	vclock_copy(&vclock, &replicaset.vclock);
> +	/*
> +	 * Stop accepting local rows coming from a remote
> +	 * instance as soon as local WAL starts accepting writes.
> +	 */
> +	unsigned int id_filter = box_is_orphan() ? 0 : 1 << instance_id;

1. I was always wondering, what if the instance got orphaned after it
started accepting writes? WAL is fully functional, it syncs whatever is
needed, and then a resubscribe happens. Can this break anything?

>  	xrow_encode_subscribe_xc(&row, &REPLICASET_UUID, &INSTANCE_UUID,
> -				 &vclock, replication_anon, 0);
> +				 &vclock, replication_anon, id_filter);
>  	coio_write_xrow(coio, &row);
>  
>  	/* Read SUBSCRIBE response */
> diff --git a/src/box/wal.c b/src/box/wal.c
> index 27bff662a..35ba7b072 100644
> --- a/src/box/wal.c
> +++ b/src/box/wal.c
> @@ -278,8 +278,13 @@ tx_schedule_commit(struct cmsg *msg)
>  		/* Closes the input valve. */
>  		stailq_concat(&writer->rollback, &batch->rollback);
>  	}
> +
> +	ERROR_INJECT(ERRINJ_REPLICASET_VCLOCK_UPDATE, { goto skip_update; });
>  	/* Update the tx vclock to the latest written by wal. */
>  	vclock_copy(&replicaset.vclock, &batch->vclock);
> +#ifndef NDEBUG
> +skip_update:
> +#endif

2. Consider this hack which I just invented. In that way you won't
depend on ERRINJ and NDEBUG interconnection.

====================
@@ -282,9 +282,7 @@ tx_schedule_commit(struct cmsg *msg)
 	ERROR_INJECT(ERRINJ_REPLICASET_VCLOCK_UPDATE, { goto skip_update; });
 	/* Update the tx vclock to the latest written by wal. */
 	vclock_copy(&replicaset.vclock, &batch->vclock);
-#ifndef NDEBUG
-skip_update:
-#endif
+	ERROR_INJECT(ERRINJ_REPLICASET_VCLOCK_UPDATE, {skip_update:;});
 	tx_schedule_queue(&batch->commit);
 	mempool_free(&writer->msg_pool, container_of(msg, struct wal_msg, base));
 }
====================

Talking of the injection itself - don't know really. Perhaps
it would be better to add a delay to the wal_write_to_disk()
function, to its very end, after wal_notify_watchers(). In
that case relay will wake up, send whatever it wants, and TX
won't update the vclock until you let wal_write_to_disk()
finish. Seems more natural this way.

>  	tx_schedule_queue(&batch->commit);
>  	mempool_free(&writer->msg_pool, container_of(msg, struct wal_msg, base));
>  }
> diff --git a/test/replication/gh-4739-vclock-assert.result b/test/replication/gh-4739-vclock-assert.result
> new file mode 100644
> index 000000000..7dc2f7118
> --- /dev/null
> +++ b/test/replication/gh-4739-vclock-assert.result
> @@ -0,0 +1,82 @@
> +-- test-run result file version 2
> +env = require('test_run')
> + | ---
> + | ...
> +test_run = env.new()
> + | ---
> + | ...
> +
> +SERVERS = {'rebootstrap1', 'rebootstrap2'}
> + | ---
> + | ...
> +test_run:create_cluster(SERVERS, "replication")
> + | ---
> + | ...
> +test_run:wait_fullmesh(SERVERS)
> + | ---
> + | ...
> +
> +test_run:cmd('switch rebootstrap1')
> + | ---
> + | - true
> + | ...
> +fiber = require('fiber')
> + | ---
> + | ...
> +-- Stop updating replicaset vclock to simulate a situation, when
> +-- a row is already relayed to the remote master, but the local
> +-- vclock update hasn't happened yet.
> +box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', true)
> + | ---
> + | - ok
> + | ...
> +lsn = box.info.lsn
> + | ---
> + | ...
> +box.space._schema:replace{'something'}
> + | ---
> + | - ['something']
> + | ...
> +-- Vclock isn't updated.
> +box.info.lsn == lsn
> + | ---
> + | - true
> + | ...
> +
> +-- Wait until the remote instance gets the row.
> +while test_run:get_vclock('rebootstrap2')[box.info.id] == lsn do\
> +    fiber.sleep(0.01)\
> +end

3. There is a cool thing which I discovered relatively recently:
test_run:wait_cond(). It does fiber sleep and while cycle, and
has a finite timeout, so such a test won't hang for 10 minutes
in Travis in case of a problem.

> + | ---
> + | ...
> +
> +-- Restart the remote instance. This will make the first instance
> +-- resubscribe without entering orphan mode.
> +test_run:cmd('restart server rebootstrap2')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('switch rebootstrap1')
> + | ---
> + | - true
> + | ...
> +-- Wait until resubscribe is sent
> +fiber.sleep(2 * box.cfg.replication_timeout)

4. Don't we collect any statistics on replication requests, just
like we do in box.stat()? Perhaps box.stat.net() can help? To
wait properly. Maybe just do test_run:wait_cond() for status 'sync'?

> + | ---
> + | ...
> +box.info.replication[2].upstream.status
> + | ---
> + | - sync
> + | ...
> +
> +box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', false)
> + | ---
> + | - ok
> + | ...
> +test_run:cmd('switch default')
> + | ---
> + | - true
> + | ...
> +test_run:drop_cluster(SERVERS)
> + | ---
> + | ...

next prev parent reply	other threads:[~2020-02-26 23:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-26 10:00 [Tarantool-patches] [PATCH v4 0/4] replication: fix applying of rows originating from local instance sergepetrenko
2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 1/4] box: expose box_is_orphan method sergepetrenko
2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 2/4] wal: warn when trying to write a record with a broken lsn sergepetrenko
2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 3/4] replication: implement an instance id filter for relay sergepetrenko
2020-02-26 10:18   ` Konstantin Osipov
2020-02-26 11:16     ` Serge Petrenko
2020-02-26 23:54   ` Vladislav Shpilevoy
2020-02-27  6:48     ` Konstantin Osipov
2020-02-27 13:15     ` Serge Petrenko
2020-02-27 23:33       ` Vladislav Shpilevoy
2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it sergepetrenko
2020-02-26 10:23   ` Konstantin Osipov
2020-02-26 11:21     ` Serge Petrenko
2020-02-26 11:58       ` Konstantin Osipov
2020-02-26 15:58         ` Serge Petrenko
2020-02-26 16:40           ` Konstantin Osipov
2020-02-26 23:54   ` Vladislav Shpilevoy [this message]
2020-02-27  6:52     ` Konstantin Osipov
2020-02-27 14:13     ` Serge Petrenko
2020-02-27 21:17       ` Serge Petrenko
2020-02-27 23:22         ` Vladislav Shpilevoy
2020-02-28  8:03           ` Serge Petrenko
2020-02-26 23:54 ` [Tarantool-patches] [PATCH v4 0/4] replication: fix applying of rows originating from local instance Vladislav Shpilevoy
2020-02-27 21:24   ` Serge Petrenko
2020-02-27 23:24     ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1bb673a1-8acb-d13a-edc4-4ad13a93a13a@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=alexander.turenko@tarantool.org \
    --cc=kirichenkoga@gmail.com \
    --cc=kostja.osipov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox