Fwd: Re[2]: [patches] [PATCH] [replication] [recovery] recover missing data

Konstantin Belyavskiy k.belyavskiy at tarantool.org
Mon Mar 26 18:24:36 MSK 2018


Vladimir,

Please check most recent version.
branch: gh-3210-recover-missing-local-data-master-master

Please note, that only branch for Tarantool 1.9 is discussed here, so please check another letter for v.1.6 branch.

>Среда, 21 марта 2018, 19:34 +03:00 от Vladimir Davydov < vdavydov.dev at gmail.com >:
>
>On Wed, Mar 21, 2018 at 04:47:46PM +0300, Konstantin Belyavskiy wrote:
>> Please take a look at the new version.
>
>I don't see the new patch in the mailing list nor in this email so I
>have to check out the branch, which is inconvenient for me. Please
>always make sure that your patches can be found in the mailing list.
Separated
>
>> 
>> >Четверг, 15 марта 2018, 17:10 +03:00 от Konstantin Osipov < kostja at tarantool.org >:
>> >
>> >* Konstantin Belyavskiy <  k.belyavskiy at tarantool.org > [18/03/15 11:39]:
>> >
>> >The patch has to go to 1.6, since that's where the customer
>> >complained about it.
>> >
>> >Is it feasible to backport? 
>> for 1.9 branch: gh-3210-recover-missing-local-data-master-master
>> for 1.6 branch: gh-3210-recover-missing-local-data-master-master-16 *
>
>$ git log --oneline origin/gh-3210-recover-missing-local-data-master-master-16 | head -n3
>2aa60f15 remove test
>4e2fa3b9 [replication] [recovery] recover missing data
>b80868f2 [replication] [recovery] recover missing data
>
>Squash them, please.
>
>Pasting the patch from the branch here for review:
>
>> From 20e1f76c9ec829f48a0899fa71f805a01ef5be42 Mon Sep 17 00:00:00 2001
>> From: Konstantin Belyavskiy < k.belyavskiy at tarantool.org >
>> Date: Tue, 13 Mar 2018 17:51:52 +0300
>> Subject: [PATCH] [replication] [recovery] recover missing data
>> 
>> Recover missing local data from replica.
>> In case of sudden power-loss, if data was not written to WAL but
>> already sent to remote replica, local can't recover properly and
>> we have different datasets.
>> Fix it by using remote replica's data and LSN comparison.
>> Based on @GeorgyKirichenko proposal
>> 
>> Closes #3210
>> 
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index 2bd05ad5..86c18ecb 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -110,6 +110,11 @@ struct relay {
>>  	struct vclock recv_vclock;
>>  	/** Replicatoin slave version. */
>>  	uint32_t version_id;
>> +	/**
>> +	 * LSN at the moment of subscribe, used to check dataset
>> +	 * on the other side and send missing data rows if any.
>> +	 */
>> +	int64_t local_lsn_before_subscribe;
>
>It isn't clear from the comment what LSN it is, namely that this is LSN
>of the replica as the master sees it. 
Updated
>
>
>> 
>>  	/** Relay endpoint */
>>  	struct cbus_endpoint endpoint;
>> @@ -541,6 +546,7 @@ relay_subscribe(int fd, uint64_t sync, struct replica *replica,
>>  	relay.version_id = replica_version_id;
>>  	relay.replica = replica;
>>  	replica_set_relay(replica, &relay);
>> +	relay.local_lsn_before_subscribe = vclock_get(&replicaset.vclock, replica->id);
>> 
>>  	int rc = cord_costart(&relay.cord, tt_sprintf("relay_%p", &relay),
>>  			      relay_subscribe_f, &relay);
>> @@ -583,10 +589,15 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
>>  	/*
>>  	 * We're feeding a WAL, thus responding to SUBSCRIBE request.
>>  	 * In that case, only send a row if it is not from the same replica
>> -	 * (i.e. don't send replica's own rows back).
>> +	 * (i.e. don't send replica's own rows back) or if this row is
>> +	 * missing on the other side (i.e. in case of sudden power-loss,
>> +	 * data was not written to WAL, so remote replica can't recover
>> +	 * it). In this case packet's LSN is lower or equal then local
>> +	 * replica's LSN at the moment it has issued 'SUBSCRIBE' request.
>>  	 */
>
>To be honest, I don't understand why we should fix this at all. If WAL
>is written without syncing, there's no guarantee that there will be no
>data loss in case of power outage.
>
>Anyway, the fix seems to be inherently racy, because the master that
>experienced a data loss might start writing new records to WAL right
>after local recovery, even before it receives its own lost records from
>the replica. What will happen then?
Fixed with @locker proposal based on 'read-only' mode.
>
>>  	if (relay->replica == NULL ||
>> -	    packet->replica_id != relay->replica->id) {
>> +	    packet->replica_id != relay->replica->id ||
>> +	    packet->lsn <= relay->local_lsn_before_subscribe) {
>>  		relay_send(relay, packet);
>>  	}
>>  }
>> diff --git a/src/box/wal.cc b/src/box/wal.cc
>> index 4576cfe0..c321b2de 100644
>> --- a/src/box/wal.cc
>> +++ b/src/box/wal.cc
>> @@ -768,8 +768,16 @@ wal_write(struct journal *journal, struct journal_entry *entry)
>>  			/*
>>  			 * Find last row from local instance id
>>  			 * and promote vclock.
>> +			 * In master-master configuration, during sudden
>> +			 * power-loss if data was not written to WAL but
>> +			 * already sent to others they will send back.
>> +			 * In this case we should update only local
>> +			 * vclock but not the replicaset one. Could be
>> +			 * checked by simple lsn comparison.
>
>I'm afraid I don't understand the comment. Is this a counter-measure
>against the race condition I wrote about above? 
I'm trying to explain here, that in case we got master's own row, but it was missed, we
should not update replicaset vclock, but only local vclock.
So we should skip vclock_follow here.
>
>
>> +			 * See #3210 for details.
>
>Please don't refer to GitHub tickets in the code - comments alone should
>be enough. 
Fixed.
>
>>  			 */
>> -			if ((*last)->replica_id == instance_id) {
>> +			if ((*last)->replica_id == instance_id &&
>> +			    replicaset.vclock.lsn[instance_id] < (*last)->lsn) {
>
>There's vclock_get() helper for this.
>
>>  				vclock_follow(&replicaset.vclock, instance_id,
>>  					      (*last)->lsn);
>>  				break;
>
>> diff --git a/test/replication/master1.lua b/test/replication/master1.lua
>> new file mode 120000
>> diff --git a/test/replication/master2.lua b/test/replication/master2.lua
>> new file mode 120000
>> diff --git a/test/replication/master_master.lua b/test/replication/master_master.lua
>> new file mode 100644
>
>Do you really need all these new files. Can't you reuse
>autobootstrap.lua? 
Fixed, based on 'on_replace.lua' since it's closer than autobootstrap.lua (2 instance vs. 3)
>
>
>> diff --git a/test/replication/recover_missing.test.lua b/test/replication/recover_missing.test.lua
>> new file mode 100644
>> index 00000000..2f36fc32
>> --- /dev/null
>> +++ b/test/replication/recover_missing.test.lua
>> @@ -0,0 +1,30 @@
>
>Please add a comment describing what you're testing here and the GitHub
>ticket number, as we usually do.
>
>> +env = require('test_run')
>> +test_run = env.new()
>> +
>> +SERVERS = { 'master1', 'master2' }
>> +-- Start servers
>> +test_run:create_cluster(SERVERS)
>> +-- Wait for full mesh
>> +test_run:wait_fullmesh(SERVERS)
>> +
>> +test_run:cmd("switch master1")
>> +box.space._schema:insert({'1'})
>> +box.space._schema:select('1')
>
>Please don't use system spaces for tests. 
Fixed.
>
>
>> +
>> +fiber = require('fiber')
>> +fiber.sleep(0.1)
>
>Waiting for a change to be relayed like this is racy. Either use
>wait_cluster_vclock() helper or loop on the replica until it sees the
>change. 
Fixed.
>
>
>> +
>> +test_run:cmd("switch master2")
>> +box.space._schema:select('1')
>> +test_run:cmd("stop server master1")
>> +fio = require('fio')
>> +fio.unlink(fio.pathjoin(fio.abspath("."), string.format('master1/%020d.xlog', 8)))
>> +test_run:cmd("start server master1")
>> +
>> +test_run:cmd("switch master1")
>> +box.space._schema:select('1')
>> +
>> +test_run:cmd("switch default")
>> +test_run:cmd("stop server master1")
>> +test_run:cmd("stop server master2")
>
>Use drop_cluster() for cleanup.
Fixed.

Best regards,
Konstantin Belyavskiy
k.belyavskiy at tarantool.org

----------------------------------------------------------------------

Best regards,
Konstantin Belyavskiy
k.belyavskiy at tarantool.org

----------------------------------------------------------------------

Best regards,
Konstantin Belyavskiy
k.belyavskiy at tarantool.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20180326/7de9b9c8/attachment.html>


More information about the Tarantool-patches mailing list