From: Konstantin Belyavskiy <k.belyavskiy@tarantool.org> To: tarantool-patches@freelists.org, vdavydov@tarantool.org, georgy@tarantool.org Subject: [PATCH] [replication] [recovery] recover missing data Date: Mon, 26 Mar 2018 18:22:23 +0300 [thread overview] Message-ID: <20180326152223.93432-1-k.belyavskiy@tarantool.org> (raw) Version for 1.6 without test (to add or not is under discussion) Recover missing local data from replica. In case of sudden power-loss, if data was not written to WAL but already sent to remote replica, local can't recover properly and we have different datasets. Fix it by using remote replica's data and LSN comparison. Based on @GeorgyKirichenko proposal Closes #3210 --- branch: gh-3210-recover-missing-local-data-master-master-16 src/box/relay.cc | 12 +++- src/box/relay.h | 5 ++ test/replication/master1.lua | 30 +++++++++ test/replication/master2.lua | 1 + test/replication/recover_missing.result | 103 ++++++++++++++++++++++++++++++ test/replication/recover_missing.test.lua | 38 +++++++++++ 6 files changed, 186 insertions(+), 3 deletions(-) create mode 100644 test/replication/master1.lua create mode 120000 test/replication/master2.lua create mode 100644 test/replication/recover_missing.result create mode 100644 test/replication/recover_missing.test.lua diff --git a/src/box/relay.cc b/src/box/relay.cc index 8e53ca022..092ea0627 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -250,6 +250,7 @@ relay_subscribe(int fd, struct xrow_header *packet, * and identify ourselves with our own server id. */ struct xrow_header row; + relay.masters_lsn_at_subscribe = vclock_get(master_vclock, r->server_id); xrow_encode_vclock(&row, master_vclock); /* * Identify the message with the server id of this @@ -286,10 +287,15 @@ relay_send_row(struct recovery *r, void *param, struct xrow_header *packet) * (JOIN request). In this case, send every row. * Otherwise, we're feeding a WAL, thus responding to * SUBSCRIBE request. In that case, only send a row if - * it is not from the same server (i.e. don't send - * replica's own rows back). + * it is not from the same server (i.e. don't send replica's + * own rows back). There is a special case if this row is + * missing on the other side (i.e. in case of sudden power-loss, + * data was not written to WAL, so remote master can't recover + * it). In this case packet's LSN is lower or equal then local + * master's LSN at the moment it has issued 'SUBSCRIBE' request. */ - if (packet->server_id == 0 || packet->server_id != r->server_id) { + if (packet->server_id == 0 || packet->server_id != r->server_id || + packet->lsn <= relay->masters_lsn_at_subscribe) { relay_send(relay, packet); ERROR_INJECT(ERRINJ_RELAY, { diff --git a/src/box/relay.h b/src/box/relay.h index c13267134..9cf7cbfac 100644 --- a/src/box/relay.h +++ b/src/box/relay.h @@ -45,6 +45,11 @@ struct relay { uint64_t sync; struct recovery *r; ev_tstamp wal_dir_rescan_delay; + /** + * Local master's LSN at the moment of subscribe, used to check + * dataset on the other side and send missing data rows if any. + */ + int64_t masters_lsn_at_subscribe; }; /** diff --git a/test/replication/master1.lua b/test/replication/master1.lua new file mode 100644 index 000000000..758874ee7 --- /dev/null +++ b/test/replication/master1.lua @@ -0,0 +1,30 @@ +#!/usr/bin/env tarantool + +-- get instance name from filename (master1.lua => master1) +local INSTANCE_ID = string.match(arg[0], "%d") +local USER = 'cluster' +local PASSWORD = 'somepassword' +local SOCKET_DIR = require('fio').cwd() +local function instance_uri(instance_id) + --return 'localhost:'..(3310 + instance_id) + return SOCKET_DIR..'/master'..instance_id..'.sock'; +end + +-- start console first +require('console').listen(os.getenv('ADMIN')) + +box.cfg({ + listen = instance_uri(INSTANCE_ID); + replication_source = { + USER..':'..PASSWORD..'@'..instance_uri(1); + USER..':'..PASSWORD..'@'..instance_uri(2); + }; +}) + +box.once("bootstrap", function() + local test_run = require('test_run').new() + box.schema.user.create(USER, { password = PASSWORD }) + box.schema.user.grant(USER, 'replication') + box.schema.space.create('test', {engine = test_run:get_cfg('engine')}) + box.space.test:create_index('primary') +end) diff --git a/test/replication/master2.lua b/test/replication/master2.lua new file mode 120000 index 000000000..f6ea42dd7 --- /dev/null +++ b/test/replication/master2.lua @@ -0,0 +1 @@ +master1.lua \ No newline at end of file diff --git a/test/replication/recover_missing.result b/test/replication/recover_missing.result new file mode 100644 index 000000000..f49d2dc49 --- /dev/null +++ b/test/replication/recover_missing.result @@ -0,0 +1,103 @@ +env = require('test_run') +--- +... +test_run = env.new() +--- +... +SERVERS = { 'master1', 'master2' } +--- +... +-- Start servers +test_run:create_cluster(SERVERS) +--- +... +-- Check connection status +-- first on master 1 +test_run:cmd("switch master1") +--- +- true +... +fiber = require('fiber') +--- +... +while box.info.replication.status ~= 'follow' do fiber.sleep(0.001) end +--- +... +box.info.replication.status +--- +- follow +... +-- and then on master 2 +test_run:cmd("switch master2") +--- +- true +... +fiber = require('fiber') +--- +... +while box.info.replication.status ~= 'follow' do fiber.sleep(0.001) end +--- +... +box.info.replication.status +--- +- follow +... +test_run:cmd("switch master1") +--- +- true +... +box.snapshot() +--- +- ok +... +box.space.test:insert({1}) +--- +- [1] +... +box.space.test:count() +--- +- 1 +... +test_run:cmd("switch master2") +--- +- true +... +box.space.test:count() +--- +- 1 +... +test_run:cmd("stop server master1") +--- +- true +... +fio = require('fio') +--- +... +fio.unlink(fio.pathjoin(fio.abspath("."), string.format('master1/%020d.xlog', 0))) +--- +- true +... +test_run:cmd("start server master1") +--- +- true +... +test_run:cmd("switch master1") +--- +- true +... +box.space.test:count() +--- +- 1 +... +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("stop server master1") +--- +- true +... +test_run:cmd("stop server master2") +--- +- true +... diff --git a/test/replication/recover_missing.test.lua b/test/replication/recover_missing.test.lua new file mode 100644 index 000000000..964056786 --- /dev/null +++ b/test/replication/recover_missing.test.lua @@ -0,0 +1,38 @@ +env = require('test_run') +test_run = env.new() + +SERVERS = { 'master1', 'master2' } +-- Start servers +test_run:create_cluster(SERVERS) + +-- Check connection status +-- first on master 1 +test_run:cmd("switch master1") +fiber = require('fiber') +while box.info.replication.status ~= 'follow' do fiber.sleep(0.001) end +box.info.replication.status +-- and then on master 2 +test_run:cmd("switch master2") +fiber = require('fiber') +while box.info.replication.status ~= 'follow' do fiber.sleep(0.001) end +box.info.replication.status + +test_run:cmd("switch master1") +box.snapshot() +box.space.test:insert({1}) +box.space.test:count() + +test_run:cmd("switch master2") +box.space.test:count() +test_run:cmd("stop server master1") +fio = require('fio') +fio.unlink(fio.pathjoin(fio.abspath("."), string.format('master1/%020d.xlog', 0))) +test_run:cmd("start server master1") + +test_run:cmd("switch master1") +box.space.test:count() + +test_run:cmd("switch default") +test_run:cmd("stop server master1") +test_run:cmd("stop server master2") + -- 2.14.3 (Apple Git-98)
reply other threads:[~2018-03-26 15:22 UTC|newest] Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180326152223.93432-1-k.belyavskiy@tarantool.org \ --to=k.belyavskiy@tarantool.org \ --cc=georgy@tarantool.org \ --cc=tarantool-patches@freelists.org \ --cc=vdavydov@tarantool.org \ --subject='Re: [PATCH] [replication] [recovery] recover missing data' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox