Please look at the new version. >Среда, 28 марта 2018, 12:25 +03:00 от Vladimir Davydov : > >On Mon, Mar 26, 2018 at 06:24:36PM +0300, Konstantin Belyavskiy wrote: >> Please check most recent version. >> branch: gh-3210-recover-missing-local-data-master-master > >Please don't send or submit a patch for 1.6 until we commit it to >the trunk. Ok, let's finish with 1.9 first, then I will send you patch for 1.6. > > >I failed to find the patch in the mailing list. Pasting it here for >review. > >> From 391448a496fd769ff6724cadab4d333d37a9088e Mon Sep 17 00:00:00 2001 >> From: Konstantin Belyavskiy < k.belyavskiy@tarantool.org > >> Date: Tue, 13 Mar 2018 17:51:52 +0300 >> Subject: [PATCH] [replication] [recovery] recover missing data >> >> Recover missing local data from replica. >> In case of sudden power-loss, if data was not written to WAL but >> already sent to remote replica, local can't recover properly and >> we have different datasets. >> Fix it by using remote replica's data and LSN comparison. >> Based on @GeorgyKirichenko proposal and @locker race free check. > >> Switch off replication/catch.test.lua > >Why? If the test is broken, please fix it. If you find the test >pointless, delete it with a proper explanation. Actually this test was broken, since from description  -- Check that replica doesn't enter read-write mode before -- catching up with the master: to check that we inject sleep into -- the master relay_send function and attempt a data modifying -- statement in replica while it's still fetching data from the -- master. it should be in read-only mode, but only now it works as expected. And I don't understand second part:  -- case #2: delete tuple by net.box But my new test also checks this behaviour (remember you mention concurrency issue and suggest to use read-only mode to fix it). So it's rather duplicated. Ok, as for now, decided to update test and to enable it back. Let's discuss about this test separately. > > >> >> Closes #3210 >> >> diff --git a/src/box/applier.cc b/src/box/applier.cc >> index 6bfe5a99..5f0b3069 100644 >> --- a/src/box/applier.cc >> +++ b/src/box/applier.cc >> @@ -453,7 +453,8 @@ applier_subscribe(struct applier *applier) >> } >> >> if (applier->state == APPLIER_SYNC && >> - applier->lag <= replication_sync_lag) { >> + applier->lag <= replication_sync_lag && >> + vclock_compare(&applier->vclock, &replicaset.vclock) <= 0) { > >First, you use a wrong vclock - applier->vclock is the vclock at connect >(the name is rather misleading though, true, we should probably rename >it to remote_vclock_at_connect). You should use the vclock received in >the SUBSCRIBE request. Initially I thought that applier->vclock is the same, but under certain condition it  could have different value, thanks for find it out. Also it makes new code easier to understand. > > >Second, this new condition could use a comment. Add comment. > > >Third, this is a worthwhile change as is so I think it should be >submitted in a separate patch. May be, so first submit new sync condition, then other parts? > > >> /* Applier is synced, switch to "follow". */ >> applier_set_state(applier, APPLIER_FOLLOW); >> } >> diff --git a/src/box/relay.cc b/src/box/relay.cc >> index 2bd05ad5..344a8e01 100644 >> --- a/src/box/relay.cc >> +++ b/src/box/relay.cc >> @@ -110,6 +110,11 @@ struct relay { >> struct vclock recv_vclock; >> /** Replicatoin slave version. */ >> uint32_t version_id; >> + /** >> + * Local master's LSN at the moment of subscribe, used to check >> + * dataset on the other side and send missing data rows if any. >> + */ >> + int64_t masters_lsn_at_subscribe; > >Why did you change the member name? I only asked to update the comment. >'local_lsn_at_subscribe' is not perfect, but still a better name IMO. >Actually, I'm thinking about storing a whole vclock here instead of just >one LSN - that would help avoid confusion: > >  /** Local vclock at the time of subscribe. */ >  struct vclock local_vclock_at_subscribe; Done. > > >> >> /** Relay endpoint */ >> struct cbus_endpoint endpoint; >> diff --git a/src/box/wal.cc b/src/box/wal.cc >> index 4576cfe0..4a43775d 100644 >> --- a/src/box/wal.cc >> +++ b/src/box/wal.cc >> @@ -768,8 +768,15 @@ wal_write(struct journal *journal, struct journal_entry *entry) >> /* >> * Find last row from local instance id >> * and promote vclock. >> + * In master-master configuration, during sudden >> + * power-loss if data was not written to WAL but >> + * already sent to others they will send back. >> + * In this case we should update only local >> + * vclock but not the replicaset one. Could be >> + * checked by simple lsn comparison. > >I still don't understand this comment. > >> */ >> - if ((*last)->replica_id == instance_id) { >> + if ((*last)->replica_id == instance_id && >> + replicaset.vclock.lsn[instance_id] < (*last)->lsn) { > >Use vclock_get() for this. > >Also, we agreed to move this to applier AFAIR. It's not so simple, attempt to check instance_id != row.replica_id to avoid double replicaset's vclock promotion (just before  xstream_write_xc ) fails, since it's not the only case of vclock promotion and lead to wrong leaving read-only mode. I will try to explain it better.. but this not works: --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -503,8 +503,9 @@ applier_subscribe(struct applier *applier) -                       vclock_follow(&replicaset.vclock, row.replica_id, -                                     row.lsn); +                       if (row.replica_id != instance_id) +                               vclock_follow(&replicaset.vclock, row.replica_id, +                                             row.lsn);                         xstream_write_xc(applier->subscribe_stream, &row); >> vclock_follow(&replicaset.vclock, instance_id, >> (*last)->lsn); >> break; >> diff --git a/test/replication/on_replace.lua b/test/replication/on_replace.lua >> index 7e49efe1..c5855892 100644 >> --- a/test/replication/on_replace.lua >> +++ b/test/replication/on_replace.lua >> @@ -22,13 +22,10 @@ box.cfg({ >> }; >> }) >> >> -env = require('test_run') >> -test_run = env.new() >> -engine = test_run:get_cfg('engine') >> - >> box.once("bootstrap", function() >> + local test_run = require('test_run').new() >> box.schema.user.create(USER, { password = PASSWORD }) >> box.schema.user.grant(USER, 'replication') >> - box.schema.space.create('test', {engine = engine}) >> + box.schema.space.create('test', {engine = test_run:get_cfg('engine')}) >> box.space.test:create_index('primary') >> end) >> diff --git a/test/replication/recover_missing.test.lua b/test/replication/recover_missing.test.lua >> new file mode 100644 >> index 00000000..d5b0e0ad >> --- /dev/null >> +++ b/test/replication/recover_missing.test.lua >> @@ -0,0 +1,42 @@ >> +env = require('test_run') >> +test_run = env.new() >> + >> +SERVERS = { 'on_replace1', 'on_replace2' } > >Please use autobootstrap.lua - I want to see how it works with multiple >masters. Ok, now 3 instances. >> +-- Start servers >> +test_run:create_cluster(SERVERS) >> +-- Wait for full mesh >> +test_run:wait_fullmesh(SERVERS) >> + >> +test_run:cmd("switch on_replace1") >> +for i = 0, 9 do box.space.test:insert{i, 'test' .. i} end > >Nit: please start counting from 1, as this is common in Lua. I hope this is not necessary, just insert some random data to check it later ) >> +box.space.test:count() >> + >> +test_run:cmd('switch default') >> +vclock1 = test_run:get_vclock('on_replace1') >> +vclock2 = test_run:wait_cluster_vclock(SERVERS, vclock1) >> + >> +test_run:cmd("switch on_replace2") >> +box.space.test:count() >> +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.1) > >Decrease the timeout to speed up test execution time. Ok >> +test_run:cmd("stop server on_replace1") >> +fio = require('fio') >> +-- This test checks ability to recover missing local data >> +-- from remote replica. See #3210. >> +-- Delete data on first master and test that after restart, >> +-- due to difference in vclock it will be able to recover >> +-- all missing data from replica. >> +-- Also check that there is no concurrency, i.e. master is >> +-- in 'read-only' mode unless it receives all data. >> +fio.unlink(fio.pathjoin(fio.abspath("."), string.format('on_replace1/%020d.xlog', 8))) >> +test_run:cmd("start server on_replace1") >> + >> +test_run:cmd("switch on_replace1") >> +for i = 10, 19 do box.space.test:insert{i, 'test' .. i} end >> +fiber = require('fiber') >> +fiber.sleep(1) >> +box.space.test:count() > >Use 'select' here to make sure the data received are correct. Ok > >> + >> +-- Cleanup. >> +test_run:cmd('switch default') >> +test_run:drop_cluster(SERVERS) >> + > >Nit: extra line at the end of the file. Best regards, Konstantin Belyavskiy k.belyavskiy@tarantool.org