From: Vladimir Davydov <vdavydov.dev@gmail.com> To: kostja@tarantool.org Cc: tarantool-patches@freelists.org Subject: Re: [PATCH] xlog: fix fallocate vs read race Date: Fri, 14 Dec 2018 15:04:42 +0300 [thread overview] Message-ID: <20181214120442.vdcg7nt5kzwvx3cn@esperanza> (raw) In-Reply-To: <8548a4bd8439a1e4a7f78ff37216c170c61a33c3.1544783335.git.vdavydov.dev@gmail.com> On Fri, Dec 14, 2018 at 01:29:44PM +0300, Vladimir Davydov wrote: > posix_fallocate(), which is used for preallocating disk space for WAL > files, increases the file size and fills the allocated space with zeros. > The problem is a WAL file may be read by a relay thread at the same time > it is written to. We try to handle the zeroed space in xlog_cursor (see > xlog_cursor_next_tx()), however this turns out to be not enough, because > transactions are written not atomically so it may occur that a writer > writes half a transaction when a reader reads it. Without fallocate, the > reader would stop at EOF until the rest of the transaction is written, > but with fallocate it reads zeroes instead and thinks that the xlog file > is corrupted while actually it is not. > > Fix this issue by using fallocate() with FALLOC_FL_KEEP_SIZE flag > instead of posix_fallocate(). With the flag fallocate() won't increase > the file size, it will only allocate disk space beyond EOF. > > The test will be added shortly. Here goes the test: diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 95e94e5a..984d2e81 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -6,6 +6,7 @@ "wal_off.test.lua": {}, "hot_standby.test.lua": {}, "rebootstrap.test.lua": {}, + "wal_rw_stress.test.lua": {}, "*": { "memtx": {"engine": "memtx"}, "vinyl": {"engine": "vinyl"} diff --git a/test/replication/wal_rw_stress.result b/test/replication/wal_rw_stress.result new file mode 100644 index 00000000..cc68877b --- /dev/null +++ b/test/replication/wal_rw_stress.result @@ -0,0 +1,106 @@ +test_run = require('test_run').new() +--- +... +-- +-- gh-3893: Replication failure: relay may report that an xlog +-- is corrupted if it it currently being written to. +-- +s = box.schema.space.create('test') +--- +... +_ = s:create_index('primary') +--- +... +-- Deploy a replica. +box.schema.user.grant('guest', 'replication') +--- +... +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") +--- +- true +... +test_run:cmd("start server replica") +--- +- true +... +-- Setup replica => master channel. +box.cfg{replication = test_run:cmd("eval replica 'return box.cfg.listen'")} +--- +... +-- Disable master => replica channel. +test_run:cmd("switch replica") +--- +- true +... +replication = box.cfg.replication +--- +... +box.cfg{replication = {}} +--- +... +test_run:cmd("switch default") +--- +- true +... +-- Write some xlogs on the master. +test_run:cmd("setopt delimiter ';'") +--- +- true +... +for i = 1, 100 do + box.begin() + for j = 1, 100 do + s:replace{1, require('digest').urandom(1000)} + end + box.commit() +end; +--- +... +test_run:cmd("setopt delimiter ''"); +--- +- true +... +-- Enable master => replica channel and wait for the replica to catch up. +-- The relay handling replica => master channel on the replica will read +-- an xlog while the applier is writing to it. Although applier and relay +-- are running in different threads, there shouldn't be any rw errors. +test_run:cmd("switch replica") +--- +- true +... +box.cfg{replication = replication} +--- +... +box.info.replication[1].downstream.status ~= 'stopped' or box.info +--- +- true +... +test_run:cmd("switch default") +--- +- true +... +-- Cleanup. +box.cfg{replication = {}} +--- +... +test_run:cmd("stop server replica") +--- +- true +... +test_run:cmd("cleanup server replica") +--- +- true +... +test_run:cmd("delete server replica") +--- +- true +... +test_run:cleanup_cluster() +--- +... +box.schema.user.revoke('guest', 'replication') +--- +... +s:drop() +--- +... diff --git a/test/replication/wal_rw_stress.test.lua b/test/replication/wal_rw_stress.test.lua new file mode 100644 index 00000000..08570b28 --- /dev/null +++ b/test/replication/wal_rw_stress.test.lua @@ -0,0 +1,51 @@ +test_run = require('test_run').new() + +-- +-- gh-3893: Replication failure: relay may report that an xlog +-- is corrupted if it it currently being written to. +-- +s = box.schema.space.create('test') +_ = s:create_index('primary') + +-- Deploy a replica. +box.schema.user.grant('guest', 'replication') +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") +test_run:cmd("start server replica") + +-- Setup replica => master channel. +box.cfg{replication = test_run:cmd("eval replica 'return box.cfg.listen'")} + +-- Disable master => replica channel. +test_run:cmd("switch replica") +replication = box.cfg.replication +box.cfg{replication = {}} +test_run:cmd("switch default") + +-- Write some xlogs on the master. +test_run:cmd("setopt delimiter ';'") +for i = 1, 100 do + box.begin() + for j = 1, 100 do + s:replace{1, require('digest').urandom(1000)} + end + box.commit() +end; +test_run:cmd("setopt delimiter ''"); + +-- Enable master => replica channel and wait for the replica to catch up. +-- The relay handling replica => master channel on the replica will read +-- an xlog while the applier is writing to it. Although applier and relay +-- are running in different threads, there shouldn't be any rw errors. +test_run:cmd("switch replica") +box.cfg{replication = replication} +box.info.replication[1].downstream.status ~= 'stopped' or box.info +test_run:cmd("switch default") + +-- Cleanup. +box.cfg{replication = {}} +test_run:cmd("stop server replica") +test_run:cmd("cleanup server replica") +test_run:cmd("delete server replica") +test_run:cleanup_cluster() +box.schema.user.revoke('guest', 'replication') +s:drop()
next prev parent reply other threads:[~2018-12-14 12:04 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-12-14 10:29 Vladimir Davydov 2018-12-14 11:07 ` Konstantin Osipov 2018-12-14 11:12 ` Vladimir Davydov 2018-12-14 12:04 ` Vladimir Davydov [this message] 2018-12-14 12:30 ` Vladimir Davydov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181214120442.vdcg7nt5kzwvx3cn@esperanza \ --to=vdavydov.dev@gmail.com \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='Re: [PATCH] xlog: fix fallocate vs read race' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox