From: Serge Petrenko <sergepetrenko@tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: kirichenkoga@gmail.com, tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it Date: Fri, 28 Feb 2020 11:03:51 +0300 [thread overview] Message-ID: <6FE40C03-CBF9-4B32-A281-8622F1AD3E44@tarantool.org> (raw) In-Reply-To: <3212b507-4d3a-fef5-525c-45f6fb642c55@tarantool.org> [-- Attachment #1: Type: text/plain, Size: 10788 bytes --] > 28 февр. 2020 г., в 02:22, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а): > >> Ok. I tried to add either ERROR_INJECT_YIELD or ERROR_INJECT_SLEEP at the end >> of wal_write_to_disk(). >> It looks like you cannot yield in wal_write_to_disk(). (is it possible to yield in WAL thread at all?) >> Firing this injection with ERROR_INJECT_YIELD and then resetting it leads to wal thread stopping >> processing messages. This leads to tarantool hanging infinitely on shutdown when tx waits for wal >> thread to exit, but wal never gets the shutdown signal. >> >> Using ERROR_INJECT_SLEEP leads to wal watchers not being notified until the injection is reset. This >> probably happens because of wal_notify_watchers’ use of cpipe_flush_input(), which doesn’t flush the input until >> the end of event loop iteration, if there are not enough messages (only one message in our case). >> The event loop iteration never ends, because we sleep right after wal_notify_watchers() call. >> >> So, I see skipping vclock assignment in tx_schedule_commit() as the only possible alternative. >> Hope my explanation was clear enough and, more imortantly, correct. If not, lest discuss. > > Yeah, I stumbled into the same problems. And realized that the current test, > after all, is not valid. So we need to change it anyway. > > First I tried to solve them by trying to block TX thread totally on one > instance after it tried to commit something. Since that would block the > test too, I tried to introduce a new thread - errinj thread, which would > listen on an ip/port or a unix socket, and will receive requests to set > error injections from another instance. So rebootstrap1's TX thread would > freeze, and I could control that instance via interaction with its errinj > thread from rebootstrap2 instance or from default instance. > > Despite the idea would work for sure, it appeared to be hard to implement > for a short time, so I postponed that, and probably will open a ticket to > implement such thing. It could be useful for any test, which needs to test > behaviour of other threads, when TX is not scheduled for a long time. Also > we can implement that logic as a part of iproto thread. > > For our case I found a simpler solution - sleep in wal_write_to_disk, but > deliver all watcher events immediately. Then the test works. And still > crashes without your patch. Thanks for your amendments! The new test LGTM. Do you mean the old test is incorrect because the injection simulates an impossible situation? If yes, then I agree. I applied your diff with a tiny fix. See below. I’ll send v5 with a changelog in the cover letter shortly. > > Here is my diff, which I pushed on top of your branch. If you don't agree - > lets discuss. Otherwise squash and the patchset LGTM. > > ================================================================================ > > commit 7054ed8ffc5cff690858261073cdfb1822e241b7 > Author: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> > Date: Fri Feb 28 00:02:10 2020 +0100 > > Review fixes > > diff --git a/src/box/wal.c b/src/box/wal.c > index bf127b259..1668c9348 100644 > --- a/src/box/wal.c > +++ b/src/box/wal.c > @@ -278,11 +278,8 @@ tx_schedule_commit(struct cmsg *msg) > /* Closes the input valve. */ > stailq_concat(&writer->rollback, &batch->rollback); > } > - > - ERROR_INJECT(ERRINJ_REPLICASET_VCLOCK_UPDATE, { goto skip_update; }); > /* Update the tx vclock to the latest written by wal. */ > vclock_copy(&replicaset.vclock, &batch->vclock); > - ERROR_INJECT(ERRINJ_REPLICASET_VCLOCK_UPDATE, {skip_update:;}); > tx_schedule_queue(&batch->commit); > mempool_free(&writer->msg_pool, container_of(msg, struct wal_msg, base)); > } > @@ -1117,6 +1114,7 @@ done: > } > fiber_gc(); > wal_notify_watchers(writer, WAL_EVENT_WRITE); > + ERROR_INJECT_SLEEP(ERRINJ_RELAY_FASTER_THAN_TX); > } > > /** WAL writer main loop. */ > @@ -1328,6 +1326,8 @@ wal_watcher_notify(struct wal_watcher *watcher, unsigned events) > msg->events = events; > cmsg_init(&msg->cmsg, watcher->route); > cpipe_push(&watcher->watcher_pipe, &msg->cmsg); > + ERROR_INJECT(ERRINJ_RELAY_FASTER_THAN_TX, > + cpipe_deliver_now(&watcher->watcher_pipe)); > } > > static void > diff --git a/src/lib/core/cbus.h b/src/lib/core/cbus.h > index 16d122779..f0101cb8b 100644 > --- a/src/lib/core/cbus.h > +++ b/src/lib/core/cbus.h > @@ -176,6 +176,13 @@ cpipe_set_max_input(struct cpipe *pipe, int max_input) > pipe->max_input = max_input; > } > > +static inline void > +cpipe_deliver_now(struct cpipe *pipe) > +{ > + if (pipe->n_input > 0) > + ev_invoke(pipe->producer, &pipe->flush_input, EV_CUSTOM); > +} > + > /** > * Flush all staged messages into the pipe and eventually to the > * consumer. > diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h > index 58fe158fd..d8cdf3f27 100644 > --- a/src/lib/core/errinj.h > +++ b/src/lib/core/errinj.h > @@ -137,7 +137,7 @@ struct errinj { > _(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT, {.iparam = 0}) \ > _(ERRINJ_FIBER_MADVISE, ERRINJ_BOOL, {.bparam = false}) \ > _(ERRINJ_FIBER_MPROTECT, ERRINJ_INT, {.iparam = -1}) \ > - _(ERRINJ_REPLICASET_VCLOCK_UPDATE, ERRINJ_BOOL, {.bparam = false}) \ > + _(ERRINJ_RELAY_FASTER_THAN_TX, ERRINJ_BOOL, {.bparam = false}) \ > > ENUM0(errinj_id, ERRINJ_LIST); > extern struct errinj errinjs[]; > diff --git a/test/box/errinj.result b/test/box/errinj.result > index eb0905238..4ad24d0c1 100644 > --- a/test/box/errinj.result > +++ b/test/box/errinj.result > @@ -59,12 +59,12 @@ evals > - ERRINJ_PORT_DUMP: false > - ERRINJ_RELAY_BREAK_LSN: -1 > - ERRINJ_RELAY_EXIT_DELAY: 0 > + - ERRINJ_RELAY_FASTER_THAN_TX: false > - ERRINJ_RELAY_FINAL_JOIN: false > - ERRINJ_RELAY_FINAL_SLEEP: false > - ERRINJ_RELAY_REPORT_INTERVAL: 0 > - ERRINJ_RELAY_SEND_DELAY: false > - ERRINJ_RELAY_TIMEOUT: 0 > - - ERRINJ_REPLICASET_VCLOCK_UPDATE: false > - ERRINJ_REPLICA_JOIN_DELAY: false > - ERRINJ_SIO_READ_MAX: -1 > - ERRINJ_SNAP_COMMIT_DELAY: false > diff --git a/test/replication/gh-4739-vclock-assert.result b/test/replication/gh-4739-vclock-assert.result > index a612826a0..43d3f27f3 100644 > --- a/test/replication/gh-4739-vclock-assert.result > +++ b/test/replication/gh-4739-vclock-assert.result > @@ -26,16 +26,19 @@ fiber = require('fiber') > -- Stop updating replicaset vclock to simulate a situation, when > -- a row is already relayed to the remote master, but the local > -- vclock update hasn't happened yet. > -box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', true) > +box.error.injection.set('ERRINJ_RELAY_FASTER_THAN_TX', true) > | --- > | - ok > | ... > lsn = box.info.lsn > | --- > | ... > -box.space._schema:replace{'something'} > +f = fiber.create(function() box.space._schema:replace{'something'} end) > | --- > - | - ['something'] > + | ... > +test_run:wait_cond(function() return f:status() == 'suspended' end) > + | --- > + | - true > | ... No need to call wait_cond() here. The fiber is suspended as soon as control is returned to console, where we’re trying to call wait_cond. diff --git a/test/replication/gh-4739-vclock-assert.result b/test/replication/gh-4739-vclock-assert.result index 43d3f27f3..83896c4e1 100644 --- a/test/replication/gh-4739-vclock-assert.result +++ b/test/replication/gh-4739-vclock-assert.result @@ -36,9 +36,9 @@ lsn = box.info.lsn f = fiber.create(function() box.space._schema:replace{'something'} end) | --- | ... -test_run:wait_cond(function() return f:status() == 'suspended' end) +f:status() | --- - | - true + | - suspended | ... -- Vclock isn't updated. box.info.lsn == lsn > -- Vclock isn't updated. > box.info.lsn == lsn > @@ -53,7 +56,7 @@ end, 10) > > -- Restart the remote instance. This will make the first instance > -- resubscribe without entering orphan mode. > -test_run:cmd('restart server rebootstrap2') > +test_run:cmd('restart server rebootstrap2 with wait=False') > | --- > | - true > | ... > @@ -68,10 +71,14 @@ end, 10) > | --- > | - true > | ... > -box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', false) > +box.error.injection.set('ERRINJ_RELAY_FASTER_THAN_TX', false) > | --- > | - ok > | ... > +box.space._schema:get{'something'} > + | --- > + | - ['something'] > + | ... > test_run:cmd('switch default') > | --- > | - true > diff --git a/test/replication/gh-4739-vclock-assert.test.lua b/test/replication/gh-4739-vclock-assert.test.lua > index b6a7caf3b..f8dd86688 100644 > --- a/test/replication/gh-4739-vclock-assert.test.lua > +++ b/test/replication/gh-4739-vclock-assert.test.lua > @@ -10,9 +10,10 @@ fiber = require('fiber') > -- Stop updating replicaset vclock to simulate a situation, when > -- a row is already relayed to the remote master, but the local > -- vclock update hasn't happened yet. > -box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', true) > +box.error.injection.set('ERRINJ_RELAY_FASTER_THAN_TX', true) > lsn = box.info.lsn > -box.space._schema:replace{'something'} > +f = fiber.create(function() box.space._schema:replace{'something'} end) > +test_run:wait_cond(function() return f:status() == 'suspended' end) diff --git a/test/replication/gh-4739-vclock-assert.test.lua b/test/replication/gh-4739-vclock-assert.test.lua index f8dd86688..5755ad752 100644 --- a/test/replication/gh-4739-vclock-assert.test.lua +++ b/test/replication/gh-4739-vclock-assert.test.lua @@ -13,7 +13,7 @@ fiber = require('fiber') box.error.injection.set('ERRINJ_RELAY_FASTER_THAN_TX', true) lsn = box.info.lsn f = fiber.create(function() box.space._schema:replace{'something'} end) -test_run:wait_cond(function() return f:status() == 'suspended' end) +f:status() -- Vclock isn't updated. box.info.lsn == lsn > -- Vclock isn't updated. > box.info.lsn == lsn > > @@ -23,12 +24,13 @@ end, 10) > > -- Restart the remote instance. This will make the first instance > -- resubscribe without entering orphan mode. > -test_run:cmd('restart server rebootstrap2') > +test_run:cmd('restart server rebootstrap2 with wait=False') > test_run:cmd('switch rebootstrap1') > -- Wait until resubscribe is sent > test_run:wait_cond(function()\ > return box.info.replication[2].upstream.status == 'sync'\ > end, 10) > -box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', false) > +box.error.injection.set('ERRINJ_RELAY_FASTER_THAN_TX', false) > +box.space._schema:get{'something'} > test_run:cmd('switch default') > test_run:drop_cluster(SERVERS) -- Serge Petrenko sergepetrenko@tarantool.org [-- Attachment #2: Type: text/html, Size: 17090 bytes --]
next prev parent reply other threads:[~2020-02-28 8:03 UTC|newest] Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-02-26 10:00 [Tarantool-patches] [PATCH v4 0/4] replication: fix applying of rows originating from local instance sergepetrenko 2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 1/4] box: expose box_is_orphan method sergepetrenko 2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 2/4] wal: warn when trying to write a record with a broken lsn sergepetrenko 2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 3/4] replication: implement an instance id filter for relay sergepetrenko 2020-02-26 10:18 ` Konstantin Osipov 2020-02-26 11:16 ` Serge Petrenko 2020-02-26 23:54 ` Vladislav Shpilevoy 2020-02-27 6:48 ` Konstantin Osipov 2020-02-27 13:15 ` Serge Petrenko 2020-02-27 23:33 ` Vladislav Shpilevoy 2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it sergepetrenko 2020-02-26 10:23 ` Konstantin Osipov 2020-02-26 11:21 ` Serge Petrenko 2020-02-26 11:58 ` Konstantin Osipov 2020-02-26 15:58 ` Serge Petrenko 2020-02-26 16:40 ` Konstantin Osipov 2020-02-26 23:54 ` Vladislav Shpilevoy 2020-02-27 6:52 ` Konstantin Osipov 2020-02-27 14:13 ` Serge Petrenko 2020-02-27 21:17 ` Serge Petrenko 2020-02-27 23:22 ` Vladislav Shpilevoy 2020-02-28 8:03 ` Serge Petrenko [this message] 2020-02-26 23:54 ` [Tarantool-patches] [PATCH v4 0/4] replication: fix applying of rows originating from local instance Vladislav Shpilevoy 2020-02-27 21:24 ` Serge Petrenko 2020-02-27 23:24 ` Vladislav Shpilevoy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=6FE40C03-CBF9-4B32-A281-8622F1AD3E44@tarantool.org \ --to=sergepetrenko@tarantool.org \ --cc=kirichenkoga@gmail.com \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox