[tarantool-patches] [PATCH 2/2] [replication] [recovery] recover missing data
Vladimir Davydov
vdavydov.dev at gmail.com
Fri Mar 30 14:33:56 MSK 2018
On Thu, Mar 29, 2018 at 07:15:16PM +0300, Konstantin Belyavskiy wrote:
> Part 2 of 2.
> Recover missing local data from replica.
> In case of sudden power-loss, if data was not written to WAL but
> already sent to remote replica, local can't recover properly and
> we have different datasets.
> Fix it by using remote replica's data and LSN comparison.
> Based on @GeorgyKirichenko proposal and @locker race free check.
>
> Closes #3210
> ---
> branch: gh-3210-recover-missing-local-data-master-master
> src/box/relay.cc | 16 ++++-
> src/box/wal.cc | 15 +++-
> test/replication/recover_missing.result | 116 ++++++++++++++++++++++++++++++
> test/replication/recover_missing.test.lua | 41 +++++++++++
> test/replication/suite.ini | 2 +-
> 5 files changed, 185 insertions(+), 5 deletions(-)
> create mode 100644 test/replication/recover_missing.result
> create mode 100644 test/replication/recover_missing.test.lua
Nit: please rename the test to recover_missing_xlog.test.lua
> diff --git a/test/replication/recover_missing.test.lua b/test/replication/recover_missing.test.lua
> new file mode 100644
> index 000000000..775d23a0b
> --- /dev/null
> +++ b/test/replication/recover_missing.test.lua
> @@ -0,0 +1,41 @@
> +env = require('test_run')
> +test_run = env.new()
> +
> +SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
> +-- Start servers
> +test_run:create_cluster(SERVERS)
> +-- Wait for full mesh
> +test_run:wait_fullmesh(SERVERS)
> +
> +test_run:cmd("switch autobootstrap1")
> +for i = 0, 9 do box.space.test:insert{i, 'test' .. i} end
> +box.space.test:count()
> +
> +test_run:cmd('switch default')
> +vclock1 = test_run:get_vclock('autobootstrap1')
> +vclock2 = test_run:wait_cluster_vclock(SERVERS, vclock1)
> +
> +test_run:cmd("switch autobootstrap2")
> +box.space.test:count()
> +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
> +test_run:cmd("stop server autobootstrap1")
> +fio = require('fio')
> +-- This test checks ability to recover missing local data
> +-- from remote replica. See #3210.
> +-- Delete data on first master and test that after restart,
> +-- due to difference in vclock it will be able to recover
> +-- all missing data from replica.
> +-- Also check that there is no concurrency, i.e. master is
> +-- in 'read-only' mode unless it receives all data.
> +fio.unlink(fio.pathjoin(fio.abspath("."), string.format('autobootstrap1/%020d.xlog', 8)))
> +test_run:cmd("start server autobootstrap1")
> +
> +test_run:cmd("switch autobootstrap1")
> +for i = 10, 19 do box.space.test:insert{i, 'test' .. i} end
> +fiber = require('fiber')
> +fiber.sleep(0.1)
I don't think you still need this 'sleep', not after patch 1.
> +box.space.test:select()
> +
> +-- Cleanup.
> +test_run:cmd('switch default')
> +test_run:drop_cluster(SERVERS)
More information about the Tarantool-patches
mailing list