Re[2]: Fwd: Re[2]: [patches] [PATCH] [replication] [recovery] recover missing data

Tarantool development patches archive
 help / color / mirror / Atom feed

From: "Konstantin Belyavskiy" <k.belyavskiy@tarantool.org>
To: "Vladimir Davydov" <vdavydov.dev@gmail.com>
Cc: georgy <georgy@tarantool.org>, tarantool-patches@freelists.org
Subject: Re[2]: Fwd: Re[2]: [patches] [PATCH] [replication] [recovery] recover missing data
Date: Wed, 28 Mar 2018 20:01:00 +0300	[thread overview]
Message-ID: <1522256460.31326583@f221.i.mail.ru> (raw)
In-Reply-To: <20180328092518.ladp4mvyiufc6uto@esperanza>

[-- Attachment #1: Type: text/plain, Size: 9061 bytes --]

Please look at the new version.

>Среда, 28 марта 2018, 12:25 +03:00 от Vladimir Davydov <vdavydov.dev@gmail.com>:
>
>On Mon, Mar 26, 2018 at 06:24:36PM +0300, Konstantin Belyavskiy wrote:
>> Please check most recent version.
>> branch: gh-3210-recover-missing-local-data-master-master
>
>Please don't send or submit a patch for 1.6 until we commit it to
>the trunk. 
Ok, let's finish with 1.9 first, then I will send you patch for 1.6.
>
>
>I failed to find the patch in the mailing list. Pasting it here for
>review.
>
>> From 391448a496fd769ff6724cadab4d333d37a9088e Mon Sep 17 00:00:00 2001
>> From: Konstantin Belyavskiy < k.belyavskiy@tarantool.org >
>> Date: Tue, 13 Mar 2018 17:51:52 +0300
>> Subject: [PATCH] [replication] [recovery] recover missing data
>> 
>> Recover missing local data from replica.
>> In case of sudden power-loss, if data was not written to WAL but
>> already sent to remote replica, local can't recover properly and
>> we have different datasets.
>> Fix it by using remote replica's data and LSN comparison.
>> Based on @GeorgyKirichenko proposal and @locker race free check.
>
>> Switch off replication/catch.test.lua
>
>Why? If the test is broken, please fix it. If you find the test
>pointless, delete it with a proper explanation. 
Actually this test was broken, since from description 
-- Check that replica doesn't enter read-write mode before
-- catching up with the master: to check that we inject sleep into
-- the master relay_send function and attempt a data modifying
-- statement in replica while it's still fetching data from the
-- master. it should be in read-only mode, but only now it works as expected. And I don't understand second part: 
-- case #2: delete tuple by net.box But my new test also checks this behaviour (remember you mention concurrency issue and suggest to
use read-only mode to fix it). So it's rather duplicated.
Ok, as for now, decided to update test and to enable it back.
Let's discuss about this test separately.
>
>
>> 
>> Closes #3210
>> 
>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>> index 6bfe5a99..5f0b3069 100644
>> --- a/src/box/applier.cc
>> +++ b/src/box/applier.cc
>> @@ -453,7 +453,8 @@ applier_subscribe(struct applier *applier)
>>  		}
>> 
>>  		if (applier->state == APPLIER_SYNC &&
>> -		    applier->lag <= replication_sync_lag) {
>> +		    applier->lag <= replication_sync_lag &&
>> +		    vclock_compare(&applier->vclock, &replicaset.vclock) <= 0) {
>
>First, you use a wrong vclock - applier->vclock is the vclock at connect
>(the name is rather misleading though, true, we should probably rename
>it to remote_vclock_at_connect). You should use the vclock received in
>the SUBSCRIBE request. 
Initially I thought that applier->vclock is the same, but under certain condition it 
could have different value, thanks for find it out. Also it makes new code easier
to understand.
>
>
>Second, this new condition could use a comment. 
Add comment.
>
>
>Third, this is a worthwhile change as is so I think it should be
>submitted in a separate patch. 
May be, so first submit new sync condition, then other parts?
>
>
>>  			/* Applier is synced, switch to "follow". */
>>  			applier_set_state(applier, APPLIER_FOLLOW);
>>  		}
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index 2bd05ad5..344a8e01 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -110,6 +110,11 @@ struct relay {
>>  	struct vclock recv_vclock;
>>  	/** Replicatoin slave version. */
>>  	uint32_t version_id;
>> +	/**
>> +	 * Local master's LSN at the moment of subscribe, used to check
>> +	 * dataset on the other side and send missing data rows if any.
>> +	 */
>> +	int64_t masters_lsn_at_subscribe;
>
>Why did you change the member name? I only asked to update the comment.
>'local_lsn_at_subscribe' is not perfect, but still a better name IMO.
>Actually, I'm thinking about storing a whole vclock here instead of just
>one LSN - that would help avoid confusion:
>
>  /** Local vclock at the time of subscribe. */
>  struct vclock local_vclock_at_subscribe; 
Done.
>
>
>> 
>>  	/** Relay endpoint */
>>  	struct cbus_endpoint endpoint;
>> diff --git a/src/box/wal.cc b/src/box/wal.cc
>> index 4576cfe0..4a43775d 100644
>> --- a/src/box/wal.cc
>> +++ b/src/box/wal.cc
>> @@ -768,8 +768,15 @@ wal_write(struct journal *journal, struct journal_entry *entry)
>>  			/*
>>  			 * Find last row from local instance id
>>  			 * and promote vclock.
>> +			 * In master-master configuration, during sudden
>> +			 * power-loss if data was not written to WAL but
>> +			 * already sent to others they will send back.
>> +			 * In this case we should update only local
>> +			 * vclock but not the replicaset one. Could be
>> +			 * checked by simple lsn comparison.
>
>I still don't understand this comment.
>
>>  			 */
>> -			if ((*last)->replica_id == instance_id) {
>> +			if ((*last)->replica_id == instance_id &&
>> +			    replicaset.vclock.lsn[instance_id] < (*last)->lsn) {
>
>Use vclock_get() for this.
>
>Also, we agreed to move this to applier AFAIR. 
It's not so simple, attempt to check instance_id != row.replica_id to avoid double replicaset's
vclock promotion (just before  xstream_write_xc ) fails, since it's not the only case of vclock
promotion and lead to wrong leaving read-only mode. I will try to explain it better.. but this
not works:
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -503,8 +503,9 @@ applier_subscribe(struct applier *applier)
-                       vclock_follow(&replicaset.vclock, row.replica_id,
-                                     row.lsn);
+                        if (row.replica_id != instance_id)
+                                vclock_follow(&replicaset.vclock, row.replica_id,
+                                              row.lsn);
                        xstream_write_xc(applier->subscribe_stream, &row);
>>  				vclock_follow(&replicaset.vclock, instance_id,
>>  					      (*last)->lsn);
>>  				break;
>> diff --git a/test/replication/on_replace.lua b/test/replication/on_replace.lua
>> index 7e49efe1..c5855892 100644
>> --- a/test/replication/on_replace.lua
>> +++ b/test/replication/on_replace.lua
>> @@ -22,13 +22,10 @@ box.cfg({
>>      };
>>  })
>> 
>> -env = require('test_run')
>> -test_run = env.new()
>> -engine = test_run:get_cfg('engine')
>> -
>>  box.once("bootstrap", function()
>> +    local test_run = require('test_run').new()
>>      box.schema.user.create(USER, { password = PASSWORD })
>>      box.schema.user.grant(USER, 'replication')
>> -    box.schema.space.create('test', {engine = engine})
>> +    box.schema.space.create('test', {engine = test_run:get_cfg('engine')})
>>      box.space.test:create_index('primary')
>>  end)
>> diff --git a/test/replication/recover_missing.test.lua b/test/replication/recover_missing.test.lua
>> new file mode 100644
>> index 00000000..d5b0e0ad
>> --- /dev/null
>> +++ b/test/replication/recover_missing.test.lua
>> @@ -0,0 +1,42 @@
>> +env = require('test_run')
>> +test_run = env.new()
>> +
>> +SERVERS = { 'on_replace1', 'on_replace2' }
>
>Please use autobootstrap.lua - I want to see how it works with multiple
>masters. 
Ok, now 3 instances.
>> +-- Start servers
>> +test_run:create_cluster(SERVERS)
>> +-- Wait for full mesh
>> +test_run:wait_fullmesh(SERVERS)
>> +
>> +test_run:cmd("switch on_replace1")
>> +for i = 0, 9 do box.space.test:insert{i, 'test' .. i} end
>
>Nit: please start counting from 1, as this is common in Lua. 
I hope this is not necessary, just insert some random data to check it later )
>> +box.space.test:count()
>> +
>> +test_run:cmd('switch default')
>> +vclock1 = test_run:get_vclock('on_replace1')
>> +vclock2 = test_run:wait_cluster_vclock(SERVERS, vclock1)
>> +
>> +test_run:cmd("switch on_replace2")
>> +box.space.test:count()
>> +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.1)
>
>Decrease the timeout to speed up test execution time. 
Ok
>> +test_run:cmd("stop server on_replace1")
>> +fio = require('fio')
>> +-- This test checks ability to recover missing local data
>> +-- from remote replica. See #3210.
>> +-- Delete data on first master and test that after restart,
>> +-- due to difference in vclock it will be able to recover
>> +-- all missing data from replica.
>> +-- Also check that there is no concurrency, i.e. master is
>> +-- in 'read-only' mode unless it receives all data.
>> +fio.unlink(fio.pathjoin(fio.abspath("."), string.format('on_replace1/%020d.xlog', 8)))
>> +test_run:cmd("start server on_replace1")
>> +
>> +test_run:cmd("switch on_replace1")
>> +for i = 10, 19 do box.space.test:insert{i, 'test' .. i} end
>> +fiber = require('fiber')
>> +fiber.sleep(1)
>> +box.space.test:count()
>
>Use 'select' here to make sure the data received are correct.
Ok
>
>> +
>> +-- Cleanup.
>> +test_run:cmd('switch default')
>> +test_run:drop_cluster(SERVERS)
>> +
>
>Nit: extra line at the end of the file.


Best regards,
Konstantin Belyavskiy
k.belyavskiy@tarantool.org

[-- Attachment #2: Type: text/html, Size: 21068 bytes --]

     prev parent reply	other threads:[~2018-03-28 17:01 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-26 15:24 Konstantin Belyavskiy
2018-03-28  9:25 ` Vladimir Davydov
2018-03-28 17:01   ` Konstantin Belyavskiy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1522256460.31326583@f221.i.mail.ru \
    --to=k.belyavskiy@tarantool.org \
    --cc=georgy@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --cc=vdavydov.dev@gmail.com \
    --subject='Re: Re[2]: Fwd: Re[2]: [patches] [PATCH] [replication] [recovery] recover missing data' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox