Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 1/1] replication: prevent boot when rs uuid mismatches
@ 2021-06-03 17:15 Vladislav Shpilevoy via Tarantool-patches
  2021-06-04 11:20 ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 3+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-03 17:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

When an instance was being bootstrapped, it didn't check if its
replication sources had the same replicaset UUID.

As a result, if they didn't match, it used to boot from any of
them almost randomly (using selection among non-read-only nodes,
and min uuid among these) and raise an error about the mismatching
ones later.

Obviously, there is something wrong, such replication config is
not valid and the node should not boot at all.

The patch tries to prevent such instance's bootstrap if it sees at
least 2 replicas with different replicaset UUID.

It does not solve the problem for all the cases because still one
of the replicas simply could be not available. Then the new node
would give up and boot from the other node successfully, and
notice replicaset UUID mismatch only when the connection to the
first node is restored.

But it should help for most of the boot attempts.

Closes #5613

@TarantoolBot document
Title: New field in IPROTO_BALLOT

`IPROTO_BALLOT(0x29)` is a response for `IPROTO_VOTE(0x44)`. It
used to contain 5 fields (is_ro, vclock, gc_vclock, etc). Now it
also contains `IPROTO_BALLOT_REPLICASET_UUID(0x06)`. It is a
UUID string showing replicaset UUID of the sender. It can be nil
UUID (when all digits are 0) when not known. It is optional. When
omitted, it is assumed to be not known.
---
Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5613-cross-boot-uuid
Issue: https://github.com/tarantool/tarantool/issues/5613

I am not sure this bug is worth fixing really. Can't even say it
is a bug. Sending it on a review to get more opinions.

I am 50/50 for closing it as wontfix and for pushing this patch.

It would be good if there was a way to fix it without changing
the protocol. Then I wouldn't doubt. But I couldn't find a way
except patch the ballot.

I also thought about adding replicaset UUID in the greeting, but
it has plain text format, which means it is quite fragile when it
comes to any changes. In terms of backward compatibility.

I am open to any ideas how to fix it alternatively. And to
opinions that we don't want it "fixed".

 .../unreleased/gh-5613-cross-boot-uuid.md     |  6 +++
 src/box/box.cc                                |  1 +
 src/box/errcode.h                             |  1 +
 src/box/iproto_constants.h                    |  1 +
 src/box/replication.cc                        | 14 +++++
 src/box/xrow.c                                | 16 ++++--
 src/box/xrow.h                                |  2 +
 test/box/error.result                         |  1 +
 .../gh-5613-cross-bootstrap.result            | 54 +++++++++++++++++++
 .../gh-5613-cross-bootstrap.test.lua          | 16 ++++++
 test/replication/gh-5613-master1.lua          |  4 ++
 test/replication/gh-5613-master2.lua          |  4 ++
 test/replication/gh-5613-replica.lua          |  4 ++
 test/replication/suite.cfg                    |  1 +
 14 files changed, 122 insertions(+), 3 deletions(-)
 create mode 100644 changelogs/unreleased/gh-5613-cross-boot-uuid.md
 create mode 100644 test/replication/gh-5613-cross-bootstrap.result
 create mode 100644 test/replication/gh-5613-cross-bootstrap.test.lua
 create mode 100644 test/replication/gh-5613-master1.lua
 create mode 100644 test/replication/gh-5613-master2.lua
 create mode 100644 test/replication/gh-5613-replica.lua

diff --git a/changelogs/unreleased/gh-5613-cross-boot-uuid.md b/changelogs/unreleased/gh-5613-cross-boot-uuid.md
new file mode 100644
index 000000000..99b9e4428
--- /dev/null
+++ b/changelogs/unreleased/gh-5613-cross-boot-uuid.md
@@ -0,0 +1,6 @@
+## bugfix/replication
+
+* Fixed an error when a replica at attempt to boot from instances of different
+  replicasets (with not the same replicaset UUID) used to boot from one of the
+  instances and raise an error about the others. Now it won't boot at all if
+  detects this situation (gh-5613).
diff --git a/src/box/box.cc b/src/box/box.cc
index 6dc991dc8..13747cb7b 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2872,6 +2872,7 @@ box_process_vote(struct ballot *ballot)
 	ballot->is_loading = is_ro;
 	vclock_copy(&ballot->vclock, &replicaset.vclock);
 	vclock_copy(&ballot->gc_vclock, &gc.vclock);
+	ballot->replicaset_uuid = REPLICASET_UUID;
 }
 
 /** Insert a new cluster into _schema */
diff --git a/src/box/errcode.h b/src/box/errcode.h
index d93820e96..4002d27ee 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -277,6 +277,7 @@ struct errcode_record {
 	/*222 */_(ER_QUORUM_WAIT,		"Couldn't wait for quorum %d: %s") \
 	/*223 */_(ER_INTERFERING_PROMOTE,	"Instance with replica id %u was promoted first") \
 	/*224 */_(ER_RAFT_DISABLED,		"Elections were turned off while running box.ctl.promote()")\
+	/*225 */_(ER_REPLICASET_UUID_CONFLICT,	"Conflicting replicaset UUIDs %s and %s") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
index 7362ddaf1..7ea7e4a14 100644
--- a/src/box/iproto_constants.h
+++ b/src/box/iproto_constants.h
@@ -167,6 +167,7 @@ enum iproto_ballot_key {
 	IPROTO_BALLOT_GC_VCLOCK = 0x03,
 	IPROTO_BALLOT_IS_LOADING = 0x04,
 	IPROTO_BALLOT_IS_ANON = 0x05,
+	IPROTO_BALLOT_REPLICASET_UUID = 0x06,
 };
 
 #define bit(c) (1ULL<<IPROTO_##c)
diff --git a/src/box/replication.cc b/src/box/replication.cc
index aefb812b3..43b2d3de8 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -702,6 +702,8 @@ replicaset_connect(struct applier **appliers, int count,
 	/* Memory for on_state triggers registered in appliers */
 	struct applier_on_connect triggers[VCLOCK_MAX];
 
+	struct tt_uuid replicaset_uuid = uuid_nil;
+
 	struct replicaset_connect_state state;
 	state.connected = state.failed = 0;
 	fiber_cond_create(&state.wakeup);
@@ -767,6 +769,18 @@ replicaset_connect(struct applier **appliers, int count,
 		struct applier *applier = appliers[i];
 		if (applier->state != APPLIER_CONNECTED)
 			applier_stop(applier);
+		struct tt_uuid *uuid = &applier->ballot.replicaset_uuid;
+		if (tt_uuid_is_nil(uuid))
+			continue;
+		if (tt_uuid_is_nil(&replicaset_uuid)) {
+			replicaset_uuid = *uuid;
+			continue;
+		}
+		if (tt_uuid_is_equal(&replicaset_uuid, uuid))
+			continue;
+		diag_set(ClientError, ER_REPLICASET_UUID_CONFLICT,
+			 tt_uuid_str(&replicaset_uuid), tt_uuid_str(uuid));
+		goto error;
 	}
 
 	/* Now all the appliers are connected, update the replica set. */
diff --git a/src/box/xrow.c b/src/box/xrow.c
index 2e364cea5..286a6323f 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -449,7 +449,7 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
 		  uint64_t sync, uint32_t schema_version)
 {
 	size_t max_size = IPROTO_HEADER_LEN + mp_sizeof_map(1) +
-		mp_sizeof_uint(UINT32_MAX) + mp_sizeof_map(5) +
+		mp_sizeof_uint(UINT32_MAX) + mp_sizeof_map(6) +
 		mp_sizeof_uint(UINT32_MAX) + mp_sizeof_bool(ballot->is_ro) +
 		mp_sizeof_uint(UINT32_MAX) + mp_sizeof_bool(ballot->is_loading) +
 		mp_sizeof_uint(IPROTO_BALLOT_IS_ANON) +
@@ -457,7 +457,9 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
 		mp_sizeof_uint(UINT32_MAX) +
 		mp_sizeof_vclock_ignore0(&ballot->vclock) +
 		mp_sizeof_uint(UINT32_MAX) +
-		mp_sizeof_vclock_ignore0(&ballot->gc_vclock);
+		mp_sizeof_vclock_ignore0(&ballot->gc_vclock) +
+		mp_sizeof_uint(UINT32_MAX) +
+		mp_sizeof_str(UUID_STR_LEN);
 
 	char *buf = obuf_reserve(out, max_size);
 	if (buf == NULL) {
@@ -469,7 +471,7 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
 	char *data = buf + IPROTO_HEADER_LEN;
 	data = mp_encode_map(data, 1);
 	data = mp_encode_uint(data, IPROTO_BALLOT);
-	data = mp_encode_map(data, 5);
+	data = mp_encode_map(data, 6);
 	data = mp_encode_uint(data, IPROTO_BALLOT_IS_RO);
 	data = mp_encode_bool(data, ballot->is_ro);
 	data = mp_encode_uint(data, IPROTO_BALLOT_IS_LOADING);
@@ -480,6 +482,8 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
 	data = mp_encode_vclock_ignore0(data, &ballot->vclock);
 	data = mp_encode_uint(data, IPROTO_BALLOT_GC_VCLOCK);
 	data = mp_encode_vclock_ignore0(data, &ballot->gc_vclock);
+	data = mp_encode_uint(data, IPROTO_BALLOT_REPLICASET_UUID);
+	data = xrow_encode_uuid(data, &ballot->replicaset_uuid);
 	size_t size = data - buf;
 	assert(size <= max_size);
 
@@ -1361,6 +1365,7 @@ xrow_decode_ballot(struct xrow_header *row, struct ballot *ballot)
 	ballot->is_loading = false;
 	ballot->is_anon = false;
 	vclock_create(&ballot->vclock);
+	ballot->replicaset_uuid = uuid_nil;
 
 	const char *start = NULL;
 	const char *end = NULL;
@@ -1424,6 +1429,11 @@ xrow_decode_ballot(struct xrow_header *row, struct ballot *ballot)
 						     &ballot->gc_vclock) != 0)
 				goto err;
 			break;
+		case IPROTO_BALLOT_REPLICASET_UUID:
+			if (xrow_decode_uuid(&data,
+					     &ballot->replicaset_uuid) != 0)
+				goto err;
+			break;
 		default:
 			mp_next(&data);
 		}
diff --git a/src/box/xrow.h b/src/box/xrow.h
index b3c664be2..4148a2314 100644
--- a/src/box/xrow.h
+++ b/src/box/xrow.h
@@ -382,6 +382,8 @@ struct ballot {
 	struct vclock vclock;
 	/** Oldest vclock available on the instance. */
 	struct vclock gc_vclock;
+	/** Replicaset UUID of the sender. Nil when unknown. */
+	struct tt_uuid replicaset_uuid;
 };
 
 /**
diff --git a/test/box/error.result b/test/box/error.result
index cc8cbaaa9..56e1bf8ab 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -443,6 +443,7 @@ t;
  |   222: box.error.QUORUM_WAIT
  |   223: box.error.INTERFERING_PROMOTE
  |   224: box.error.RAFT_DISABLED
+ |   225: box.error.REPLICASET_UUID_CONFLICT
  | ...
 
 test_run:cmd("setopt delimiter ''");
diff --git a/test/replication/gh-5613-cross-bootstrap.result b/test/replication/gh-5613-cross-bootstrap.result
new file mode 100644
index 000000000..e45422f62
--- /dev/null
+++ b/test/replication/gh-5613-cross-bootstrap.result
@@ -0,0 +1,54 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+test_run:cmd('create server master1 with script="replication/gh-5613-master1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server master1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server master2 with script="replication/gh-5613-master2.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server master2')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('create server replica with script="replication/gh-5613-replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with crash_expected=True')
+ | ---
+ | - false
+ | ...
+opts = {filename = 'gh-5613-replica.log'}
+ | ---
+ | ...
+assert(test_run:grep_log(nil, 'ER_REPLICASET_UUID_CONFLICT', nil, opts) ~= nil)
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('stop server master2')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master2')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server master1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master1')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-5613-cross-bootstrap.test.lua b/test/replication/gh-5613-cross-bootstrap.test.lua
new file mode 100644
index 000000000..1af9c49fd
--- /dev/null
+++ b/test/replication/gh-5613-cross-bootstrap.test.lua
@@ -0,0 +1,16 @@
+test_run = require('test_run').new()
+
+test_run:cmd('create server master1 with script="replication/gh-5613-master1.lua"')
+test_run:cmd('start server master1')
+test_run:cmd('create server master2 with script="replication/gh-5613-master2.lua"')
+test_run:cmd('start server master2')
+
+test_run:cmd('create server replica with script="replication/gh-5613-replica.lua"')
+test_run:cmd('start server replica with crash_expected=True')
+opts = {filename = 'gh-5613-replica.log'}
+assert(test_run:grep_log(nil, 'ER_REPLICASET_UUID_CONFLICT', nil, opts) ~= nil)
+
+test_run:cmd('stop server master2')
+test_run:cmd('delete server master2')
+test_run:cmd('stop server master1')
+test_run:cmd('delete server master1')
diff --git a/test/replication/gh-5613-master1.lua b/test/replication/gh-5613-master1.lua
new file mode 100644
index 000000000..3fb77dd0c
--- /dev/null
+++ b/test/replication/gh-5613-master1.lua
@@ -0,0 +1,4 @@
+#!/usr/bin/env tarantool
+
+require('console').listen(os.getenv('ADMIN'))
+box.cfg({listen = 'unix/:./master1.sock'})
diff --git a/test/replication/gh-5613-master2.lua b/test/replication/gh-5613-master2.lua
new file mode 100644
index 000000000..8a9bd4ea9
--- /dev/null
+++ b/test/replication/gh-5613-master2.lua
@@ -0,0 +1,4 @@
+#!/usr/bin/env tarantool
+
+require('console').listen(os.getenv('ADMIN'))
+box.cfg({listen = 'unix/:./master2.sock'})
diff --git a/test/replication/gh-5613-replica.lua b/test/replication/gh-5613-replica.lua
new file mode 100644
index 000000000..ec9edf0cf
--- /dev/null
+++ b/test/replication/gh-5613-replica.lua
@@ -0,0 +1,4 @@
+#!/usr/bin/env tarantool
+
+require('console').listen(os.getenv('ADMIN'))
+box.cfg({replication = {'unix/:./master1.sock', 'unix/:./master2.sock'}})
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 27eab20c2..bcaedc7e1 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -44,6 +44,7 @@
     "gh-5435-qsync-clear-synchro-queue-commit-all.test.lua": {},
     "gh-5536-wal-limit.test.lua": {},
     "gh-5566-final-join-synchro.test.lua": {},
+    "gh-5613-cross-bootstrap.test.lua": {},
     "gh-6032-promote-wal-write.test.lua": {},
     "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
     "gh-6094-rs-uuid-mismatch.test.lua": {},
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/1] replication: prevent boot when rs uuid mismatches
  2021-06-03 17:15 [Tarantool-patches] [PATCH 1/1] replication: prevent boot when rs uuid mismatches Vladislav Shpilevoy via Tarantool-patches
@ 2021-06-04 11:20 ` Serge Petrenko via Tarantool-patches
  2021-06-04 23:48   ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 3+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2021-06-04 11:20 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, gorcunov



03.06.2021 20:15, Vladislav Shpilevoy пишет:
> When an instance was being bootstrapped, it didn't check if its
> replication sources had the same replicaset UUID.
>
> As a result, if they didn't match, it used to boot from any of
> them almost randomly (using selection among non-read-only nodes,
> and min uuid among these) and raise an error about the mismatching
> ones later.
>
> Obviously, there is something wrong, such replication config is
> not valid and the node should not boot at all.
>
> The patch tries to prevent such instance's bootstrap if it sees at
> least 2 replicas with different replicaset UUID.
>
> It does not solve the problem for all the cases because still one
> of the replicas simply could be not available. Then the new node
> would give up and boot from the other node successfully, and
> notice replicaset UUID mismatch only when the connection to the
> first node is restored.
>
> But it should help for most of the boot attempts.
>
> Closes #5613
>
> @TarantoolBot document
> Title: New field in IPROTO_BALLOT
>
> `IPROTO_BALLOT(0x29)` is a response for `IPROTO_VOTE(0x44)`. It
> used to contain 5 fields (is_ro, vclock, gc_vclock, etc). Now it
> also contains `IPROTO_BALLOT_REPLICASET_UUID(0x06)`. It is a
> UUID string showing replicaset UUID of the sender. It can be nil
> UUID (when all digits are 0) when not known. It is optional. When
> omitted, it is assumed to be not known.
> ---
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5613-cross-boot-uuid
> Issue: https://github.com/tarantool/tarantool/issues/5613
>
> I am not sure this bug is worth fixing really. Can't even say it
> is a bug. Sending it on a review to get more opinions.
>
> I am 50/50 for closing it as wontfix and for pushing this patch.
>
> It would be good if there was a way to fix it without changing
> the protocol. Then I wouldn't doubt. But I couldn't find a way
> except patch the ballot.
>
> I also thought about adding replicaset UUID in the greeting, but
> it has plain text format, which means it is quite fragile when it
> comes to any changes. In terms of backward compatibility.
>
> I am open to any ideas how to fix it alternatively. And to
> opinions that we don't want it "fixed".


Hi! Thanks for working on this!

IMO we may ignore the issue you're trying to fix. Or maybe fix it as well,
but as a follow-up to 5613.

First of all, 5613 is about 3rd replica bootstrapping a separate cluster,
even when it sees that the 2 other nodes have already bootstrapped.

This patch doesn't actually fix 5613. The 3rd node shows a different 
error now,
but it still bootstraps its own cluster with a separate uuid.

I propose to change replicaset_round() somehow, so that it never chooses
non-bootstrapped instances over bootstrapped ones. Even when bootstrapped
instances are read-only.
Looks like you don't even have to change ballot for this purpose. There's
already the 'is_loading' field. We just have to assign higher priority to
`is_loading = false` rather than `read_only = false`.

Once the issue with choosing a non-bootstrapped instance as replicaset 
leader
over a bootstrapped one is fixed, we may want to start checking that every
bootstrapped peer has the same replicaset uuid.

P.S. I've checked, and looks like is_loading is not that useful now.
It's equal to instance's is_ro flag (not the one passed in ballot, but 
actual is_ro).
Still, it's easier to change is_loading encoding than introduce a whole 
new field.

>
>   .../unreleased/gh-5613-cross-boot-uuid.md     |  6 +++
>   src/box/box.cc                                |  1 +
>   src/box/errcode.h                             |  1 +
>   src/box/iproto_constants.h                    |  1 +
>   src/box/replication.cc                        | 14 +++++
>   src/box/xrow.c                                | 16 ++++--
>   src/box/xrow.h                                |  2 +
>   test/box/error.result                         |  1 +
>   .../gh-5613-cross-bootstrap.result            | 54 +++++++++++++++++++
>   .../gh-5613-cross-bootstrap.test.lua          | 16 ++++++
>   test/replication/gh-5613-master1.lua          |  4 ++
>   test/replication/gh-5613-master2.lua          |  4 ++
>   test/replication/gh-5613-replica.lua          |  4 ++
>   test/replication/suite.cfg                    |  1 +
>   14 files changed, 122 insertions(+), 3 deletions(-)
>   create mode 100644 changelogs/unreleased/gh-5613-cross-boot-uuid.md
>   create mode 100644 test/replication/gh-5613-cross-bootstrap.result
>   create mode 100644 test/replication/gh-5613-cross-bootstrap.test.lua
>   create mode 100644 test/replication/gh-5613-master1.lua
>   create mode 100644 test/replication/gh-5613-master2.lua
>   create mode 100644 test/replication/gh-5613-replica.lua
>
> diff --git a/changelogs/unreleased/gh-5613-cross-boot-uuid.md b/changelogs/unreleased/gh-5613-cross-boot-uuid.md
> new file mode 100644
> index 000000000..99b9e4428
> --- /dev/null
> +++ b/changelogs/unreleased/gh-5613-cross-boot-uuid.md
> @@ -0,0 +1,6 @@
> +## bugfix/replication
> +
> +* Fixed an error when a replica at attempt to boot from instances of different
> +  replicasets (with not the same replicaset UUID) used to boot from one of the
> +  instances and raise an error about the others. Now it won't boot at all if
> +  detects this situation (gh-5613).
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 6dc991dc8..13747cb7b 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -2872,6 +2872,7 @@ box_process_vote(struct ballot *ballot)
>   	ballot->is_loading = is_ro;
>   	vclock_copy(&ballot->vclock, &replicaset.vclock);
>   	vclock_copy(&ballot->gc_vclock, &gc.vclock);
> +	ballot->replicaset_uuid = REPLICASET_UUID;
>   }
>   
>   /** Insert a new cluster into _schema */
> diff --git a/src/box/errcode.h b/src/box/errcode.h
> index d93820e96..4002d27ee 100644
> --- a/src/box/errcode.h
> +++ b/src/box/errcode.h
> @@ -277,6 +277,7 @@ struct errcode_record {
>   	/*222 */_(ER_QUORUM_WAIT,		"Couldn't wait for quorum %d: %s") \
>   	/*223 */_(ER_INTERFERING_PROMOTE,	"Instance with replica id %u was promoted first") \
>   	/*224 */_(ER_RAFT_DISABLED,		"Elections were turned off while running box.ctl.promote()")\
> +	/*225 */_(ER_REPLICASET_UUID_CONFLICT,	"Conflicting replicaset UUIDs %s and %s") \
>   
>   /*
>    * !IMPORTANT! Please follow instructions at start of the file
> diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
> index 7362ddaf1..7ea7e4a14 100644
> --- a/src/box/iproto_constants.h
> +++ b/src/box/iproto_constants.h
> @@ -167,6 +167,7 @@ enum iproto_ballot_key {
>   	IPROTO_BALLOT_GC_VCLOCK = 0x03,
>   	IPROTO_BALLOT_IS_LOADING = 0x04,
>   	IPROTO_BALLOT_IS_ANON = 0x05,
> +	IPROTO_BALLOT_REPLICASET_UUID = 0x06,
>   };
>   
>   #define bit(c) (1ULL<<IPROTO_##c)
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index aefb812b3..43b2d3de8 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -702,6 +702,8 @@ replicaset_connect(struct applier **appliers, int count,
>   	/* Memory for on_state triggers registered in appliers */
>   	struct applier_on_connect triggers[VCLOCK_MAX];
>   
> +	struct tt_uuid replicaset_uuid = uuid_nil;
> +
>   	struct replicaset_connect_state state;
>   	state.connected = state.failed = 0;
>   	fiber_cond_create(&state.wakeup);
> @@ -767,6 +769,18 @@ replicaset_connect(struct applier **appliers, int count,
>   		struct applier *applier = appliers[i];
>   		if (applier->state != APPLIER_CONNECTED)
>   			applier_stop(applier);
> +		struct tt_uuid *uuid = &applier->ballot.replicaset_uuid;
> +		if (tt_uuid_is_nil(uuid))
> +			continue;
> +		if (tt_uuid_is_nil(&replicaset_uuid)) {
> +			replicaset_uuid = *uuid;
> +			continue;
> +		}
> +		if (tt_uuid_is_equal(&replicaset_uuid, uuid))
> +			continue;
> +		diag_set(ClientError, ER_REPLICASET_UUID_CONFLICT,
> +			 tt_uuid_str(&replicaset_uuid), tt_uuid_str(uuid));
> +		goto error;
>   	}
>   
>   	/* Now all the appliers are connected, update the replica set. */
> diff --git a/src/box/xrow.c b/src/box/xrow.c
> index 2e364cea5..286a6323f 100644
> --- a/src/box/xrow.c
> +++ b/src/box/xrow.c
> @@ -449,7 +449,7 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
>   		  uint64_t sync, uint32_t schema_version)
>   {
>   	size_t max_size = IPROTO_HEADER_LEN + mp_sizeof_map(1) +
> -		mp_sizeof_uint(UINT32_MAX) + mp_sizeof_map(5) +
> +		mp_sizeof_uint(UINT32_MAX) + mp_sizeof_map(6) +
>   		mp_sizeof_uint(UINT32_MAX) + mp_sizeof_bool(ballot->is_ro) +
>   		mp_sizeof_uint(UINT32_MAX) + mp_sizeof_bool(ballot->is_loading) +
>   		mp_sizeof_uint(IPROTO_BALLOT_IS_ANON) +
> @@ -457,7 +457,9 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
>   		mp_sizeof_uint(UINT32_MAX) +
>   		mp_sizeof_vclock_ignore0(&ballot->vclock) +
>   		mp_sizeof_uint(UINT32_MAX) +
> -		mp_sizeof_vclock_ignore0(&ballot->gc_vclock);
> +		mp_sizeof_vclock_ignore0(&ballot->gc_vclock) +
> +		mp_sizeof_uint(UINT32_MAX) +
> +		mp_sizeof_str(UUID_STR_LEN);
>   
>   	char *buf = obuf_reserve(out, max_size);
>   	if (buf == NULL) {
> @@ -469,7 +471,7 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
>   	char *data = buf + IPROTO_HEADER_LEN;
>   	data = mp_encode_map(data, 1);
>   	data = mp_encode_uint(data, IPROTO_BALLOT);
> -	data = mp_encode_map(data, 5);
> +	data = mp_encode_map(data, 6);
>   	data = mp_encode_uint(data, IPROTO_BALLOT_IS_RO);
>   	data = mp_encode_bool(data, ballot->is_ro);
>   	data = mp_encode_uint(data, IPROTO_BALLOT_IS_LOADING);
> @@ -480,6 +482,8 @@ iproto_reply_vote(struct obuf *out, const struct ballot *ballot,
>   	data = mp_encode_vclock_ignore0(data, &ballot->vclock);
>   	data = mp_encode_uint(data, IPROTO_BALLOT_GC_VCLOCK);
>   	data = mp_encode_vclock_ignore0(data, &ballot->gc_vclock);
> +	data = mp_encode_uint(data, IPROTO_BALLOT_REPLICASET_UUID);
> +	data = xrow_encode_uuid(data, &ballot->replicaset_uuid);
>   	size_t size = data - buf;
>   	assert(size <= max_size);
>   
> @@ -1361,6 +1365,7 @@ xrow_decode_ballot(struct xrow_header *row, struct ballot *ballot)
>   	ballot->is_loading = false;
>   	ballot->is_anon = false;
>   	vclock_create(&ballot->vclock);
> +	ballot->replicaset_uuid = uuid_nil;
>   
>   	const char *start = NULL;
>   	const char *end = NULL;
> @@ -1424,6 +1429,11 @@ xrow_decode_ballot(struct xrow_header *row, struct ballot *ballot)
>   						     &ballot->gc_vclock) != 0)
>   				goto err;
>   			break;
> +		case IPROTO_BALLOT_REPLICASET_UUID:
> +			if (xrow_decode_uuid(&data,
> +					     &ballot->replicaset_uuid) != 0)
> +				goto err;
> +			break;
>   		default:
>   			mp_next(&data);
>   		}
> diff --git a/src/box/xrow.h b/src/box/xrow.h
> index b3c664be2..4148a2314 100644
> --- a/src/box/xrow.h
> +++ b/src/box/xrow.h
> @@ -382,6 +382,8 @@ struct ballot {
>   	struct vclock vclock;
>   	/** Oldest vclock available on the instance. */
>   	struct vclock gc_vclock;
> +	/** Replicaset UUID of the sender. Nil when unknown. */
> +	struct tt_uuid replicaset_uuid;
>   };
>   
>   /**
> diff --git a/test/box/error.result b/test/box/error.result
> index cc8cbaaa9..56e1bf8ab 100644
> --- a/test/box/error.result
> +++ b/test/box/error.result
> @@ -443,6 +443,7 @@ t;
>    |   222: box.error.QUORUM_WAIT
>    |   223: box.error.INTERFERING_PROMOTE
>    |   224: box.error.RAFT_DISABLED
> + |   225: box.error.REPLICASET_UUID_CONFLICT
>    | ...
>   
>   test_run:cmd("setopt delimiter ''");
> diff --git a/test/replication/gh-5613-cross-bootstrap.result b/test/replication/gh-5613-cross-bootstrap.result
> new file mode 100644
> index 000000000..e45422f62
> --- /dev/null
> +++ b/test/replication/gh-5613-cross-bootstrap.result
> @@ -0,0 +1,54 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +
> +test_run:cmd('create server master1 with script="replication/gh-5613-master1.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server master1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server master2 with script="replication/gh-5613-master2.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server master2')
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('create server replica with script="replication/gh-5613-replica.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica with crash_expected=True')
> + | ---
> + | - false
> + | ...
> +opts = {filename = 'gh-5613-replica.log'}
> + | ---
> + | ...
> +assert(test_run:grep_log(nil, 'ER_REPLICASET_UUID_CONFLICT', nil, opts) ~= nil)
> + | ---
> + | - true
> + | ...
> +
> +test_run:cmd('stop server master2')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server master2')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server master1')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server master1')
> + | ---
> + | - true
> + | ...
> diff --git a/test/replication/gh-5613-cross-bootstrap.test.lua b/test/replication/gh-5613-cross-bootstrap.test.lua
> new file mode 100644
> index 000000000..1af9c49fd
> --- /dev/null
> +++ b/test/replication/gh-5613-cross-bootstrap.test.lua
> @@ -0,0 +1,16 @@
> +test_run = require('test_run').new()
> +
> +test_run:cmd('create server master1 with script="replication/gh-5613-master1.lua"')
> +test_run:cmd('start server master1')
> +test_run:cmd('create server master2 with script="replication/gh-5613-master2.lua"')
> +test_run:cmd('start server master2')
> +
> +test_run:cmd('create server replica with script="replication/gh-5613-replica.lua"')
> +test_run:cmd('start server replica with crash_expected=True')
> +opts = {filename = 'gh-5613-replica.log'}
> +assert(test_run:grep_log(nil, 'ER_REPLICASET_UUID_CONFLICT', nil, opts) ~= nil)
> +
> +test_run:cmd('stop server master2')
> +test_run:cmd('delete server master2')
> +test_run:cmd('stop server master1')
> +test_run:cmd('delete server master1')
> diff --git a/test/replication/gh-5613-master1.lua b/test/replication/gh-5613-master1.lua
> new file mode 100644
> index 000000000..3fb77dd0c
> --- /dev/null
> +++ b/test/replication/gh-5613-master1.lua
> @@ -0,0 +1,4 @@
> +#!/usr/bin/env tarantool
> +
> +require('console').listen(os.getenv('ADMIN'))
> +box.cfg({listen = 'unix/:./master1.sock'})
> diff --git a/test/replication/gh-5613-master2.lua b/test/replication/gh-5613-master2.lua
> new file mode 100644
> index 000000000..8a9bd4ea9
> --- /dev/null
> +++ b/test/replication/gh-5613-master2.lua
> @@ -0,0 +1,4 @@
> +#!/usr/bin/env tarantool
> +
> +require('console').listen(os.getenv('ADMIN'))
> +box.cfg({listen = 'unix/:./master2.sock'})
> diff --git a/test/replication/gh-5613-replica.lua b/test/replication/gh-5613-replica.lua
> new file mode 100644
> index 000000000..ec9edf0cf
> --- /dev/null
> +++ b/test/replication/gh-5613-replica.lua
> @@ -0,0 +1,4 @@
> +#!/usr/bin/env tarantool
> +
> +require('console').listen(os.getenv('ADMIN'))
> +box.cfg({replication = {'unix/:./master1.sock', 'unix/:./master2.sock'}})
> diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
> index 27eab20c2..bcaedc7e1 100644
> --- a/test/replication/suite.cfg
> +++ b/test/replication/suite.cfg
> @@ -44,6 +44,7 @@
>       "gh-5435-qsync-clear-synchro-queue-commit-all.test.lua": {},
>       "gh-5536-wal-limit.test.lua": {},
>       "gh-5566-final-join-synchro.test.lua": {},
> +    "gh-5613-cross-bootstrap.test.lua": {},
>       "gh-6032-promote-wal-write.test.lua": {},
>       "gh-6057-qsync-confirm-async-no-wal.test.lua": {},
>       "gh-6094-rs-uuid-mismatch.test.lua": {},

-- 
Serge Petrenko


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/1] replication: prevent boot when rs uuid mismatches
  2021-06-04 11:20 ` Serge Petrenko via Tarantool-patches
@ 2021-06-04 23:48   ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 0 replies; 3+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-06-04 23:48 UTC (permalink / raw)
  To: Serge Petrenko, tarantool-patches, gorcunov

Hi! Thanks for the review!

> First of all, 5613 is about 3rd replica bootstrapping a separate cluster,
> even when it sees that the 2 other nodes have already bootstrapped.
> 
> This patch doesn't actually fix 5613. The 3rd node shows a different error now,
> but it still bootstraps its own cluster with a separate uuid.

I see now. I thought that in the issue description the first 2 nodes were
bootstrapped separately.

> I propose to change replicaset_round() somehow, so that it never chooses
> non-bootstrapped instances over bootstrapped ones. Even when bootstrapped
> instances are read-only.

I did it in the new version, see another email thread.

> Looks like you don't even have to change ballot for this purpose. There's
> already the 'is_loading' field. We just have to assign higher priority to
> `is_loading = false` rather than `read_only = false`.
> 
> P.S. I've checked, and looks like is_loading is not that useful now.
> It's equal to instance's is_ro flag (not the one passed in ballot, but actual is_ro).
> Still, it's easier to change is_loading encoding than introduce a whole new field.

Yes, indeed, is_loading has little to do with actual loading.
It is more like "box.cfg() is finished and box.cfg{read_only=true} was set".

I did several changes to the ballot to make it work. Renamed
field is_ro, renamed + slightly changed behaviour of is_loading,
and added a new field.

Only is_loading is not enough, because I still need to know who is really
read-only. Not just by read_only=false, but who is actually writable. There
can be orphans who has finished bootstrap/recovery, but are not writable yet.

Some replication tests starts failing if we only look at read_only=false
and finished bootstrap/recovery. For instance, assume 1 node is started
and booted fine, it is writable. Then 2 other nodes are started: node2
and node3. They connect to node1 first, get its ballot, vclock. Then
node2 registers on node1. Node3 connects to node2 now and gets its ballot.
It sees node3 has higher vclock than node1 (because node2 connected to
node3 later). If it does not look at it being read-only (because it is
an orphan), it tries to boot from node3 (because its vclock looks like > node1)
and fails because node3 can't write to _cluster yet.

That error I got on replication/bootstrap_leader.test.lua until I decided
to keep both properties of being booted and of being read-only.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-06-04 23:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-03 17:15 [Tarantool-patches] [PATCH 1/1] replication: prevent boot when rs uuid mismatches Vladislav Shpilevoy via Tarantool-patches
2021-06-04 11:20 ` Serge Petrenko via Tarantool-patches
2021-06-04 23:48   ` Vladislav Shpilevoy via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox