[tarantool-patches] [PATCH 2/2] [replication] [recovery] recover missing data

Vladimir Davydov vdavydov.dev at gmail.com
Fri Mar 30 14:33:56 MSK 2018


On Thu, Mar 29, 2018 at 07:15:16PM +0300, Konstantin Belyavskiy wrote:
> Part 2 of 2.
> Recover missing local data from replica.
> In case of sudden power-loss, if data was not written to WAL but
> already sent to remote replica, local can't recover properly and
> we have different datasets.
> Fix it by using remote replica's data and LSN comparison.
> Based on @GeorgyKirichenko proposal and @locker race free check.
> 
> Closes #3210
> ---
>  branch: gh-3210-recover-missing-local-data-master-master
>  src/box/relay.cc                          |  16 ++++-
>  src/box/wal.cc                            |  15 +++-
>  test/replication/recover_missing.result   | 116 ++++++++++++++++++++++++++++++
>  test/replication/recover_missing.test.lua |  41 +++++++++++
>  test/replication/suite.ini                |   2 +-
>  5 files changed, 185 insertions(+), 5 deletions(-)
>  create mode 100644 test/replication/recover_missing.result
>  create mode 100644 test/replication/recover_missing.test.lua

Nit: please rename the test to recover_missing_xlog.test.lua

> diff --git a/test/replication/recover_missing.test.lua b/test/replication/recover_missing.test.lua
> new file mode 100644
> index 000000000..775d23a0b
> --- /dev/null
> +++ b/test/replication/recover_missing.test.lua
> @@ -0,0 +1,41 @@
> +env = require('test_run')
> +test_run = env.new()
> +
> +SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
> +-- Start servers
> +test_run:create_cluster(SERVERS)
> +-- Wait for full mesh
> +test_run:wait_fullmesh(SERVERS)
> +
> +test_run:cmd("switch autobootstrap1")
> +for i = 0, 9 do box.space.test:insert{i, 'test' .. i} end
> +box.space.test:count()
> +
> +test_run:cmd('switch default')
> +vclock1 = test_run:get_vclock('autobootstrap1')
> +vclock2 = test_run:wait_cluster_vclock(SERVERS, vclock1)
> +
> +test_run:cmd("switch autobootstrap2")
> +box.space.test:count()
> +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
> +test_run:cmd("stop server autobootstrap1")
> +fio = require('fio')
> +-- This test checks ability to recover missing local data
> +-- from remote replica. See #3210.
> +-- Delete data on first master and test that after restart,
> +-- due to difference in vclock it will be able to recover
> +-- all missing data from replica.
> +-- Also check that there is no concurrency, i.e. master is
> +-- in 'read-only' mode unless it receives all data.
> +fio.unlink(fio.pathjoin(fio.abspath("."), string.format('autobootstrap1/%020d.xlog', 8)))
> +test_run:cmd("start server autobootstrap1")
> +
> +test_run:cmd("switch autobootstrap1")
> +for i = 10, 19 do box.space.test:insert{i, 'test' .. i} end

> +fiber = require('fiber')
> +fiber.sleep(0.1)

I don't think you still need this 'sleep', not after patch 1.

> +box.space.test:select()
> +
> +-- Cleanup.
> +test_run:cmd('switch default')
> +test_run:drop_cluster(SERVERS)



More information about the Tarantool-patches mailing list