From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: sergepetrenko <sergepetrenko@tarantool.org>, kirichenkoga@gmail.com, kostja.osipov@gmail.com, alexander.turenko@tarantool.org Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it Date: Thu, 27 Feb 2020 00:54:54 +0100 [thread overview] Message-ID: <1bb673a1-8acb-d13a-edc4-4ad13a93a13a@tarantool.org> (raw) In-Reply-To: <d35a224112e2d32014cf973dc662cc890eca0323.1582709303.git.sergepetrenko@tarantool.org> Thanks for the patch! See 4 comments below. > replication: do not relay rows coming from a remote instance back to it > > We have a mechanism for restoring rows originating from an instance that > suffered a sudden power loss: remote masters resend the isntance's rows > received before a certain point in time, defined by remote master vclock > at the moment of subscribe. > However, this is useful only on initial replication configuraiton, when > an instance has just recovered, so that it can receive what it has > relayed but haven't synced to disk. > In other cases, when an instance is operating normally and master-master > replication is configured, the mechanism described above may lead to > instance re-applying instance's own rows, coming from a master it has just > subscribed to. > To fix the problem do not relay rows coming from a remote instance, if > the instance has already recovered. > > Closes #4739 > > diff --git a/src/box/applier.cc b/src/box/applier.cc > index 911353425..73ffc0d68 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -866,8 +866,13 @@ applier_subscribe(struct applier *applier) > struct vclock vclock; > vclock_create(&vclock); > vclock_copy(&vclock, &replicaset.vclock); > + /* > + * Stop accepting local rows coming from a remote > + * instance as soon as local WAL starts accepting writes. > + */ > + unsigned int id_filter = box_is_orphan() ? 0 : 1 << instance_id; 1. I was always wondering, what if the instance got orphaned after it started accepting writes? WAL is fully functional, it syncs whatever is needed, and then a resubscribe happens. Can this break anything? > xrow_encode_subscribe_xc(&row, &REPLICASET_UUID, &INSTANCE_UUID, > - &vclock, replication_anon, 0); > + &vclock, replication_anon, id_filter); > coio_write_xrow(coio, &row); > > /* Read SUBSCRIBE response */ > diff --git a/src/box/wal.c b/src/box/wal.c > index 27bff662a..35ba7b072 100644 > --- a/src/box/wal.c > +++ b/src/box/wal.c > @@ -278,8 +278,13 @@ tx_schedule_commit(struct cmsg *msg) > /* Closes the input valve. */ > stailq_concat(&writer->rollback, &batch->rollback); > } > + > + ERROR_INJECT(ERRINJ_REPLICASET_VCLOCK_UPDATE, { goto skip_update; }); > /* Update the tx vclock to the latest written by wal. */ > vclock_copy(&replicaset.vclock, &batch->vclock); > +#ifndef NDEBUG > +skip_update: > +#endif 2. Consider this hack which I just invented. In that way you won't depend on ERRINJ and NDEBUG interconnection. ==================== @@ -282,9 +282,7 @@ tx_schedule_commit(struct cmsg *msg) ERROR_INJECT(ERRINJ_REPLICASET_VCLOCK_UPDATE, { goto skip_update; }); /* Update the tx vclock to the latest written by wal. */ vclock_copy(&replicaset.vclock, &batch->vclock); -#ifndef NDEBUG -skip_update: -#endif + ERROR_INJECT(ERRINJ_REPLICASET_VCLOCK_UPDATE, {skip_update:;}); tx_schedule_queue(&batch->commit); mempool_free(&writer->msg_pool, container_of(msg, struct wal_msg, base)); } ==================== Talking of the injection itself - don't know really. Perhaps it would be better to add a delay to the wal_write_to_disk() function, to its very end, after wal_notify_watchers(). In that case relay will wake up, send whatever it wants, and TX won't update the vclock until you let wal_write_to_disk() finish. Seems more natural this way. > tx_schedule_queue(&batch->commit); > mempool_free(&writer->msg_pool, container_of(msg, struct wal_msg, base)); > } > diff --git a/test/replication/gh-4739-vclock-assert.result b/test/replication/gh-4739-vclock-assert.result > new file mode 100644 > index 000000000..7dc2f7118 > --- /dev/null > +++ b/test/replication/gh-4739-vclock-assert.result > @@ -0,0 +1,82 @@ > +-- test-run result file version 2 > +env = require('test_run') > + | --- > + | ... > +test_run = env.new() > + | --- > + | ... > + > +SERVERS = {'rebootstrap1', 'rebootstrap2'} > + | --- > + | ... > +test_run:create_cluster(SERVERS, "replication") > + | --- > + | ... > +test_run:wait_fullmesh(SERVERS) > + | --- > + | ... > + > +test_run:cmd('switch rebootstrap1') > + | --- > + | - true > + | ... > +fiber = require('fiber') > + | --- > + | ... > +-- Stop updating replicaset vclock to simulate a situation, when > +-- a row is already relayed to the remote master, but the local > +-- vclock update hasn't happened yet. > +box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', true) > + | --- > + | - ok > + | ... > +lsn = box.info.lsn > + | --- > + | ... > +box.space._schema:replace{'something'} > + | --- > + | - ['something'] > + | ... > +-- Vclock isn't updated. > +box.info.lsn == lsn > + | --- > + | - true > + | ... > + > +-- Wait until the remote instance gets the row. > +while test_run:get_vclock('rebootstrap2')[box.info.id] == lsn do\ > + fiber.sleep(0.01)\ > +end 3. There is a cool thing which I discovered relatively recently: test_run:wait_cond(). It does fiber sleep and while cycle, and has a finite timeout, so such a test won't hang for 10 minutes in Travis in case of a problem. > + | --- > + | ... > + > +-- Restart the remote instance. This will make the first instance > +-- resubscribe without entering orphan mode. > +test_run:cmd('restart server rebootstrap2') > + | --- > + | - true > + | ... > +test_run:cmd('switch rebootstrap1') > + | --- > + | - true > + | ... > +-- Wait until resubscribe is sent > +fiber.sleep(2 * box.cfg.replication_timeout) 4. Don't we collect any statistics on replication requests, just like we do in box.stat()? Perhaps box.stat.net() can help? To wait properly. Maybe just do test_run:wait_cond() for status 'sync'? > + | --- > + | ... > +box.info.replication[2].upstream.status > + | --- > + | - sync > + | ... > + > +box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', false) > + | --- > + | - ok > + | ... > +test_run:cmd('switch default') > + | --- > + | - true > + | ... > +test_run:drop_cluster(SERVERS) > + | --- > + | ...
next prev parent reply other threads:[~2020-02-26 23:54 UTC|newest] Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-02-26 10:00 [Tarantool-patches] [PATCH v4 0/4] replication: fix applying of rows originating from local instance sergepetrenko 2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 1/4] box: expose box_is_orphan method sergepetrenko 2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 2/4] wal: warn when trying to write a record with a broken lsn sergepetrenko 2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 3/4] replication: implement an instance id filter for relay sergepetrenko 2020-02-26 10:18 ` Konstantin Osipov 2020-02-26 11:16 ` Serge Petrenko 2020-02-26 23:54 ` Vladislav Shpilevoy 2020-02-27 6:48 ` Konstantin Osipov 2020-02-27 13:15 ` Serge Petrenko 2020-02-27 23:33 ` Vladislav Shpilevoy 2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it sergepetrenko 2020-02-26 10:23 ` Konstantin Osipov 2020-02-26 11:21 ` Serge Petrenko 2020-02-26 11:58 ` Konstantin Osipov 2020-02-26 15:58 ` Serge Petrenko 2020-02-26 16:40 ` Konstantin Osipov 2020-02-26 23:54 ` Vladislav Shpilevoy [this message] 2020-02-27 6:52 ` Konstantin Osipov 2020-02-27 14:13 ` Serge Petrenko 2020-02-27 21:17 ` Serge Petrenko 2020-02-27 23:22 ` Vladislav Shpilevoy 2020-02-28 8:03 ` Serge Petrenko 2020-02-26 23:54 ` [Tarantool-patches] [PATCH v4 0/4] replication: fix applying of rows originating from local instance Vladislav Shpilevoy 2020-02-27 21:24 ` Serge Petrenko 2020-02-27 23:24 ` Vladislav Shpilevoy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1bb673a1-8acb-d13a-edc4-4ad13a93a13a@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=alexander.turenko@tarantool.org \ --cc=kirichenkoga@gmail.com \ --cc=kostja.osipov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox