[Tarantool-patches] [PATCH v2 0/4] replication: fix applying of rows originating from local instance

Tarantool development patches archive
 help / color / mirror / Atom feed

* [Tarantool-patches] [PATCH v2 0/4] replication: fix applying of rows originating from local instance
@ 2020-02-13 21:52 sergepetrenko
  2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 1/4] box: expose box_is_orphan method sergepetrenko
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: sergepetrenko @ 2020-02-13 21:52 UTC (permalink / raw)
  To: alexander.turenko, v.shpilevoy, kostja.osipov; +Cc: tarantool-patches

This patchset fixes problems with applying self-issued rows arriving via replication.

https://github.com/tarantool/tarantool/issues/4739
https://github.com/tarantool/tarantool/tree/sp/gh-4739-vclock-assert

Changes in v2:
  - review fixes as per review from Kostja

Serge Petrenko (4):
  box: expose box_is_orphan method
  replication: check for rows to skip in applier correctly
  wal: wart when trying to write a record with a broken lsn
  replication: do not promote local_vclock_at_subscribe unnecessarily

 src/box/applier.cc     | 17 +++++++++++++++--
 src/box/box.cc         | 14 +++++++++++++-
 src/box/box.h          |  3 +++
 src/box/relay.cc       | 13 +++++++++++--
 src/box/relay.h        |  3 ++-
 src/box/replication.cc |  1 -
 src/box/wal.c          | 15 ++++++++++++---
 7 files changed, 56 insertions(+), 10 deletions(-)

-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Tarantool-patches] [PATCH v2 1/4] box: expose box_is_orphan method
  2020-02-13 21:52 [Tarantool-patches] [PATCH v2 0/4] replication: fix applying of rows originating from local instance sergepetrenko
@ 2020-02-13 21:52 ` sergepetrenko
  2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 2/4] replication: check for rows to skip in applier correctly sergepetrenko
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: sergepetrenko @ 2020-02-13 21:52 UTC (permalink / raw)
  To: alexander.turenko, v.shpilevoy, kostja.osipov; +Cc: tarantool-patches

From: Serge Petrenko <sergepetrenko@tarantool.org>

is_orphan status check is needed by applier in order to not re-apply
local instance rows coming from the replica after replication has
synced.

Prerequisite #4739
---
 src/box/box.cc | 6 ++++++
 src/box/box.h  | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/src/box/box.cc b/src/box/box.cc
index 1b2b27d61..0290578b2 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -247,6 +247,12 @@ box_is_ro(void)
 	return is_ro || is_orphan;
 }
 
+bool
+box_is_orphan(void)
+{
+	return is_orphan;
+}
+
 int
 box_wait_ro(bool ro, double timeout)
 {
diff --git a/src/box/box.h b/src/box/box.h
index a212e6510..f37a945eb 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -105,6 +105,9 @@ box_set_ro();
 bool
 box_is_ro(void);
 
+bool
+box_is_orphan(void);
+
 /**
  * Wait until the instance switches to a desired mode.
  * \param ro wait read-only if set or read-write if unset
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Tarantool-patches] [PATCH v2 2/4] replication: check for rows to skip in applier correctly
  2020-02-13 21:52 [Tarantool-patches] [PATCH v2 0/4] replication: fix applying of rows originating from local instance sergepetrenko
  2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 1/4] box: expose box_is_orphan method sergepetrenko
@ 2020-02-13 21:52 ` sergepetrenko
  2020-02-14  7:19   ` Konstantin Osipov
  2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn sergepetrenko
  2020-02-13 21:53 ` [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily sergepetrenko
  3 siblings, 1 reply; 16+ messages in thread
From: sergepetrenko @ 2020-02-13 21:52 UTC (permalink / raw)
  To: alexander.turenko, v.shpilevoy, kostja.osipov; +Cc: tarantool-patches

From: Serge Petrenko <sergepetrenko@tarantool.org>

Remove applier vclock initialization from replication_init(), where it
is zeroed-out, and place it in the end of box_cfg_xc(), where replicaset
vclock already has a meaningful value.
Do not apply rows originating form the current instance if replication
sync has ended.

Closes #4739
---
 src/box/applier.cc     | 17 +++++++++++++++--
 src/box/box.cc         |  6 ++++++
 src/box/replication.cc |  1 -
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index ae3d281a5..e931e1595 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -731,8 +731,21 @@ applier_apply_tx(struct stailq *rows)
 	struct latch *latch = (replica ? &replica->order_latch :
 			       &replicaset.applier.order_latch);
 	latch_lock(latch);
-	if (vclock_get(&replicaset.applier.vclock,
-		       first_row->replica_id) >= first_row->lsn) {
+	/*
+	 * Skip remote rows either if one of the appliers has
+	 * sent them to write or if the rows originate from the
+	 * local instance and we've already synced with the
+	 * replica. The latter is important because relay gets
+	 * notified about WAL write before tx does, so it is
+	 * possible that a remote instance receives our rows
+	 * via replication before we update replicaset vclock and
+	 * even sends these rows back to us. An attemt to apply
+	 * such rows will lead to having entries with duplicate
+	 * LSNs in WAL.
+	 */
+	if (vclock_get(&replicaset.applier.vclock, first_row->replica_id) >=
+	    first_row->lsn || (first_row->replica_id == instance_id &&
+	    !box_is_orphan())) {
 		latch_unlock(latch);
 		return 0;
 	}
diff --git a/src/box/box.cc b/src/box/box.cc
index 0290578b2..952d60ad1 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2425,6 +2425,12 @@ box_cfg_xc(void)
 
 	rmean_cleanup(rmean_box);
 
+	/*
+	 * Local recovery is over so it's fine to update applier
+	 * vclock now.
+	 */
+	vclock_copy(&replicaset.applier.vclock, &replicaset.vclock);
+
 	/* Follow replica */
 	replicaset_follow();
 
diff --git a/src/box/replication.cc b/src/box/replication.cc
index e7bfa22ab..7b04573a4 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -93,7 +93,6 @@ replication_init(void)
 	latch_create(&replicaset.applier.order_latch);
 
 	vclock_create(&replicaset.applier.vclock);
-	vclock_copy(&replicaset.applier.vclock, &replicaset.vclock);
 	rlist_create(&replicaset.applier.on_rollback);
 	rlist_create(&replicaset.applier.on_commit);
 
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn
  2020-02-13 21:52 [Tarantool-patches] [PATCH v2 0/4] replication: fix applying of rows originating from local instance sergepetrenko
  2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 1/4] box: expose box_is_orphan method sergepetrenko
  2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 2/4] replication: check for rows to skip in applier correctly sergepetrenko
@ 2020-02-13 21:52 ` sergepetrenko
  2020-02-14  7:20   ` Konstantin Osipov
  2020-02-16 16:15   ` Vladislav Shpilevoy
  2020-02-13 21:53 ` [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily sergepetrenko
  3 siblings, 2 replies; 16+ messages in thread
From: sergepetrenko @ 2020-02-13 21:52 UTC (permalink / raw)
  To: alexander.turenko, v.shpilevoy, kostja.osipov; +Cc: tarantool-patches

From: Serge Petrenko <sergepetrenko@tarantool.org>

There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attemt to write a record with a duplicate or otherwise broken lsn.

Follow-up #4739
---
 src/box/wal.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/box/wal.c b/src/box/wal.c
index 0ae66ff32..f8ee2b7d8 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -951,9 +951,18 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 			(*row)->tsn = tsn;
 			(*row)->is_commit = row == end - 1;
 		} else {
-			vclock_follow(vclock_diff, (*row)->replica_id,
-				      (*row)->lsn - vclock_get(base,
-							       (*row)->replica_id));
+			int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
+			if (diff <= vclock_get(vclock_diff,
+					       (*row)->replica_id)) {
+				say_crit("Attempt to write a broken LSN to WAL:"
+					 " replica id: %d, committed lsn: %d,"
+					 " new lsn %d", (*row)->replica_id,
+					 vclock_get(base, (*row)->replica_id) +
+					 vclock_get(vclock_diff,
+						    (*row)->replica_id),
+						    (*row)->lsn);
+			}
+			vclock_follow(vclock_diff, (*row)->replica_id, diff);
 		}
 	}
 }
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily
  2020-02-13 21:52 [Tarantool-patches] [PATCH v2 0/4] replication: fix applying of rows originating from local instance sergepetrenko
                   ` (2 preceding siblings ...)
  2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn sergepetrenko
@ 2020-02-13 21:53 ` sergepetrenko
  2020-02-14  7:25   ` Konstantin Osipov
  3 siblings, 1 reply; 16+ messages in thread
From: sergepetrenko @ 2020-02-13 21:53 UTC (permalink / raw)
  To: alexander.turenko, v.shpilevoy, kostja.osipov; +Cc: tarantool-patches

From: Serge Petrenko <sergepetrenko@tarantool.org>

When master processes a subscribe response, it responds with its vclock
at the moment of receiving the request. However, the fiber processing
the request may yield on coio_write_xrow, when sending the response to
the replica. In the meantime, master may apply additional rows coming
from the replica after it has issued SUBSCRIBE.
Then in relay_subscribe master sets its local vclock_at_subscribe to
a possibly updated value of replicaset.vclock
So, set local_vclock_at_subscribe to a remembered value, rather than an
updated one.

Follow-up #4739
---
 src/box/box.cc   |  2 +-
 src/box/relay.cc | 13 +++++++++++--
 src/box/relay.h  |  3 ++-
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 952d60ad1..7dec1ae6b 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1871,7 +1871,7 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
 	 * indefinitely).
 	 */
 	relay_subscribe(replica, io->fd, header->sync, &replica_clock,
-			replica_version_id);
+			replica_version_id, &vclock);
 }
 
 void
diff --git a/src/box/relay.cc b/src/box/relay.cc
index b89632273..b69646446 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -676,7 +676,8 @@ relay_subscribe_f(va_list ap)
 /** Replication acceptor fiber handler. */
 void
 relay_subscribe(struct replica *replica, int fd, uint64_t sync,
-		struct vclock *replica_clock, uint32_t replica_version_id)
+		struct vclock *replica_clock, uint32_t replica_version_id,
+		struct vclock *clock_at_subscribe)
 {
 	assert(replica->anon || replica->id != REPLICA_ID_NIL);
 	struct relay *relay = replica->relay;
@@ -699,7 +700,15 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 		replica_on_relay_stop(replica);
 	});
 
-	vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock);
+	/*
+	 * It's too late to remember replicaset.vclock as local
+	 * vclock at subscribe. It might have incremented while we
+	 * were writing a subscribe response, and we don't want to
+	 * replicate back rows originating from the replica and
+	 * having arrived later than replica has issued
+	 * SUBSCRIBE.
+	 */
+	vclock_copy(&relay->local_vclock_at_subscribe, clock_at_subscribe);
 	relay->r = recovery_new(cfg_gets("wal_dir"), false,
 			        replica_clock);
 	vclock_copy(&relay->tx.vclock, replica_clock);
diff --git a/src/box/relay.h b/src/box/relay.h
index e1782d78f..54ebd6731 100644
--- a/src/box/relay.h
+++ b/src/box/relay.h
@@ -124,6 +124,7 @@ relay_final_join(int fd, uint64_t sync, struct vclock *start_vclock,
  */
 void
 relay_subscribe(struct replica *replica, int fd, uint64_t sync,
-		struct vclock *replica_vclock, uint32_t replica_version_id);
+		struct vclock *replica_vclock, uint32_t replica_version_id,
+		struct vclock *clock_at_subscribe);
 
 #endif /* TARANTOOL_REPLICATION_RELAY_H_INCLUDED */
-- 
2.20.1 (Apple Git-117)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/4] replication: check for rows to skip in applier correctly
  2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 2/4] replication: check for rows to skip in applier correctly sergepetrenko
@ 2020-02-14  7:19   ` Konstantin Osipov
  2020-02-14  7:29     ` Konstantin Osipov
  0 siblings, 1 reply; 16+ messages in thread
From: Konstantin Osipov @ 2020-02-14  7:19 UTC (permalink / raw)
  To: sergepetrenko; +Cc: tarantool-patches, v.shpilevoy

* sergepetrenko <sergepetrenko@tarantool.org> [20/02/14 09:46]:
> From: Serge Petrenko <sergepetrenko@tarantool.org>
> 
> Remove applier vclock initialization from replication_init(), where it
> is zeroed-out, and place it in the end of box_cfg_xc(), where replicaset
> vclock already has a meaningful value.
> Do not apply rows originating form the current instance if replication
> sync has ended.


> 
> Closes #4739
> ---
>  src/box/applier.cc     | 17 +++++++++++++++--
>  src/box/box.cc         |  6 ++++++
>  src/box/replication.cc |  1 -
>  3 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index ae3d281a5..e931e1595 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -731,8 +731,21 @@ applier_apply_tx(struct stailq *rows)
>  	struct latch *latch = (replica ? &replica->order_latch :
>  			       &replicaset.applier.order_latch);
>  	latch_lock(latch);
> -	if (vclock_get(&replicaset.applier.vclock,
> -		       first_row->replica_id) >= first_row->lsn) {
> +	/*
> +	 * Skip remote rows either if one of the appliers has
> +	 * sent them to write or if the rows originate from the
> +	 * local instance and we've already synced with the
> +	 * replica. The latter is important because relay gets
> +	 * notified about WAL write before tx does, so it is
> +	 * possible that a remote instance receives our rows
> +	 * via replication before we update replicaset vclock and
> +	 * even sends these rows back to us. An attemt to apply
> +	 * such rows will lead to having entries with duplicate
> +	 * LSNs in WAL.
> +	 */
> +	if (vclock_get(&replicaset.applier.vclock, first_row->replica_id) >=
> +	    first_row->lsn || (first_row->replica_id == instance_id &&
> +	    !box_is_orphan())) {
>  		latch_unlock(latch);
>  		return 0;

I think you should patch SUBSCRIBE iproto command, not the filter
itself.

Basically, if it's *re*configuraiton, not first replication
configuration, SUBSCRIBE should set local VCLOCK component to
infinity (check out variable local_vclock_at_subscribe, how it is
assigned and how it used by the relay).

In other words, I think the filter should be on the relay side.


-- 
Konstantin Osipov, Moscow, Russia
https://scylladb.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn
  2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn sergepetrenko
@ 2020-02-14  7:20   ` Konstantin Osipov
  2020-02-14 10:46     ` Serge Petrenko
  2020-02-16 16:15   ` Vladislav Shpilevoy
  1 sibling, 1 reply; 16+ messages in thread
From: Konstantin Osipov @ 2020-02-14  7:20 UTC (permalink / raw)
  To: sergepetrenko; +Cc: tarantool-patches, v.shpilevoy

* sergepetrenko <sergepetrenko@tarantool.org> [20/02/14 09:46]:
> From: Serge Petrenko <sergepetrenko@tarantool.org>
> 
> There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
> fire in release builds, of course. Let's at least warn the user on an
> attemt to write a record with a duplicate or otherwise broken lsn.

wart is бородавка.

I think you meant warn :)

otherwise lgtm


-- 
Konstantin Osipov, Moscow, Russia
https://scylladb.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily
  2020-02-13 21:53 ` [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily sergepetrenko
@ 2020-02-14  7:25   ` Konstantin Osipov
  2020-02-14 10:46     ` Serge Petrenko
  0 siblings, 1 reply; 16+ messages in thread
From: Konstantin Osipov @ 2020-02-14  7:25 UTC (permalink / raw)
  To: sergepetrenko; +Cc: tarantool-patches, v.shpilevoy

* sergepetrenko <sergepetrenko@tarantool.org> [20/02/14 09:46]:
> From: Serge Petrenko <sergepetrenko@tarantool.org>
> 
> When master processes a subscribe response, it responds with its vclock
> at the moment of receiving the request. However, the fiber processing
> the request may yield on coio_write_xrow, when sending the response to
> the replica. In the meantime, master may apply additional rows coming
> from the replica after it has issued SUBSCRIBE.
> Then in relay_subscribe master sets its local vclock_at_subscribe to
> a possibly updated value of replicaset.vclock
> So, set local_vclock_at_subscribe to a remembered value, rather than an
> updated one.

I don't fully understand the explanation and what this fix
achieves.

local_vclock_at_subscribe is used to leave orphan mode. It
basically tells the applier when it more or less has fully caught
up with the relay. 

It should not impact replication correctness in any other way.

I.e. it shouldn't matter whether or not it's accurate - while the
applier is reading rows from the master, the master can get new
rows anyway.

If local_vclock_at_subscribe has any other meaning/impact, it's a bug. 


> 
> Follow-up #4739
> ---
>  src/box/box.cc   |  2 +-
>  src/box/relay.cc | 13 +++++++++++--
>  src/box/relay.h  |  3 ++-
>  3 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 952d60ad1..7dec1ae6b 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1871,7 +1871,7 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
>  	 * indefinitely).
>  	 */
>  	relay_subscribe(replica, io->fd, header->sync, &replica_clock,
> -			replica_version_id);
> +			replica_version_id, &vclock);
>  }
>  
>  void
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index b89632273..b69646446 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -676,7 +676,8 @@ relay_subscribe_f(va_list ap)
>  /** Replication acceptor fiber handler. */
>  void
>  relay_subscribe(struct replica *replica, int fd, uint64_t sync,
> -		struct vclock *replica_clock, uint32_t replica_version_id)
> +		struct vclock *replica_clock, uint32_t replica_version_id,
> +		struct vclock *clock_at_subscribe)
>  {
>  	assert(replica->anon || replica->id != REPLICA_ID_NIL);
>  	struct relay *relay = replica->relay;
> @@ -699,7 +700,15 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>  		replica_on_relay_stop(replica);
>  	});
>  
> -	vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock);
> +	/*
> +	 * It's too late to remember replicaset.vclock as local
> +	 * vclock at subscribe. It might have incremented while we
> +	 * were writing a subscribe response, and we don't want to
> +	 * replicate back rows originating from the replica and
> +	 * having arrived later than replica has issued
> +	 * SUBSCRIBE.


I still don't und
> +	 */
> +	vclock_copy(&relay->local_vclock_at_subscribe, clock_at_subscribe);
>  	relay->r = recovery_new(cfg_gets("wal_dir"), false,
>  			        replica_clock);
>  	vclock_copy(&relay->tx.vclock, replica_clock);
> diff --git a/src/box/relay.h b/src/box/relay.h
> index e1782d78f..54ebd6731 100644
> --- a/src/box/relay.h
> +++ b/src/box/relay.h
> @@ -124,6 +124,7 @@ relay_final_join(int fd, uint64_t sync, struct vclock *start_vclock,
>   */
>  void
>  relay_subscribe(struct replica *replica, int fd, uint64_t sync,
> -		struct vclock *replica_vclock, uint32_t replica_version_id);
> +		struct vclock *replica_vclock, uint32_t replica_version_id,
> +		struct vclock *clock_at_subscribe);
>  
>  #endif /* TARANTOOL_REPLICATION_RELAY_H_INCLUDED */
> -- 
> 2.20.1 (Apple Git-117)
> 

-- 
Konstantin Osipov, Moscow, Russia
https://scylladb.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/4] replication: check for rows to skip in applier correctly
  2020-02-14  7:19   ` Konstantin Osipov
@ 2020-02-14  7:29     ` Konstantin Osipov
  0 siblings, 0 replies; 16+ messages in thread
From: Konstantin Osipov @ 2020-02-14  7:29 UTC (permalink / raw)
  To: sergepetrenko, alexander.turenko, v.shpilevoy, tarantool-patches

* Konstantin Osipov <kostja.osipov@gmail.com> [20/02/14 10:19]:
> I think you should patch SUBSCRIBE iproto command, not the filter
> itself.
> 
> Basically, if it's *re*configuraiton, not first replication
> configuration, SUBSCRIBE should set local VCLOCK component to
> infinity (check out variable local_vclock_at_subscribe, how it is
> assigned and how it used by the relay).

should be: check replica_vclock, how it's assigned and how it's
used by the relay.

What I mean is:

Set replica_vclock[self] = infinity before sending at resubscribe.

then you don't need any filters.

-- 
Konstantin Osipov, Moscow, Russia
https://scylladb.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily
  2020-02-14  7:25   ` Konstantin Osipov
@ 2020-02-14 10:46     ` Serge Petrenko
  2020-02-14 10:52       ` Konstantin Osipov
  0 siblings, 1 reply; 16+ messages in thread
From: Serge Petrenko @ 2020-02-14 10:46 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, v.shpilevoy

[-- Attachment #1: Type: text/plain, Size: 5512 bytes --]


> 14 февр. 2020 г., в 10:25, Konstantin Osipov <kostja.osipov@gmail.com> написал(а):
> 
> * sergepetrenko <sergepetrenko@tarantool.org <mailto:sergepetrenko@tarantool.org>> [20/02/14 09:46]:
>> From: Serge Petrenko <sergepetrenko@tarantool.org <mailto:sergepetrenko@tarantool.org>>
>> 
>> When master processes a subscribe response, it responds with its vclock
>> at the moment of receiving the request. However, the fiber processing
>> the request may yield on coio_write_xrow, when sending the response to
>> the replica. In the meantime, master may apply additional rows coming
>> from the replica after it has issued SUBSCRIBE.
>> Then in relay_subscribe master sets its local vclock_at_subscribe to
>> a possibly updated value of replicaset.vclock
>> So, set local_vclock_at_subscribe to a remembered value, rather than an
>> updated one.
> 
> I don't fully understand the explanation and what this fix
> achieves.

The fix allows to stop relaying rows that have just came from the replica back to it.
It is not necessary, since we’ve fixed applier in a different way, but I think there’s no
need to resend the replica’s rows back and forth. It just looks more correct, or consistent,
if you wish. If master responds that its vclock is such and such, it should use the same
vclock to judge whether to send the replica its own rows, or not, in my opinion.
Before the patch master judges by replicaset vclock, which may get updated while master
responds to subscribe (coio_write_xrow yields).

> 
> local_vclock_at_subscribe is used to leave orphan mode. It
> basically tells the applier when it more or less has fully caught
> up with the relay. 

Yes, master sends it to replica, and later master uses its own replicaset vclock
(previously) or the same vclock it sent to replica (after my patch) to filter replica’s
rows to send back to it (see the piece of code you’ve shown me yesterday).
Imagine a situation: you have master-master configuration with instances 1 and 2
2 is subscribed to 1, and 1 resubscribes to 2 (maybe 2 just restarted and was the first one
to successfully subscribe).
2 yields on writing subscribe responce. In the meantime 1 writes something new to WAL
and relays it to 2. 2 writes it to WAL and increments its replicaset vclock. Later it’ll resend
these rows back to 1, because `relay->local_vclock_at_subscribe` holds an updated vclock
value.
Hope I made it more clear this time.

> 
> It should not impact replication correctness in any other way.
> 
> I.e. it shouldn't matter whether or not it's accurate - while the
> applier is reading rows from the master, the master can get new
> rows anyway.
> 
> If local_vclock_at_subscribe has any other meaning/impact, it's a bug. 
> 
> 
>> 
>> Follow-up #4739
>> ---
>> src/box/box.cc   |  2 +-
>> src/box/relay.cc | 13 +++++++++++--
>> src/box/relay.h  |  3 ++-
>> 3 files changed, 14 insertions(+), 4 deletions(-)
>> 
>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 952d60ad1..7dec1ae6b 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -1871,7 +1871,7 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
>> 	 * indefinitely).
>> 	 */
>> 	relay_subscribe(replica, io->fd, header->sync, &replica_clock,
>> -			replica_version_id);
>> +			replica_version_id, &vclock);
>> }
>> 
>> void
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index b89632273..b69646446 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -676,7 +676,8 @@ relay_subscribe_f(va_list ap)
>> /** Replication acceptor fiber handler. */
>> void
>> relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>> -		struct vclock *replica_clock, uint32_t replica_version_id)
>> +		struct vclock *replica_clock, uint32_t replica_version_id,
>> +		struct vclock *clock_at_subscribe)
>> {
>> 	assert(replica->anon || replica->id != REPLICA_ID_NIL);
>> 	struct relay *relay = replica->relay;
>> @@ -699,7 +700,15 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>> 		replica_on_relay_stop(replica);
>> 	});
>> 
>> -	vclock_copy(&relay->local_vclock_at_subscribe, &replicaset.vclock);
>> +	/*
>> +	 * It's too late to remember replicaset.vclock as local
>> +	 * vclock at subscribe. It might have incremented while we
>> +	 * were writing a subscribe response, and we don't want to
>> +	 * replicate back rows originating from the replica and
>> +	 * having arrived later than replica has issued
>> +	 * SUBSCRIBE.
> 
> 
> I still don't und
>> +	 */
>> +	vclock_copy(&relay->local_vclock_at_subscribe, clock_at_subscribe);
>> 	relay->r = recovery_new(cfg_gets("wal_dir"), false,
>> 			        replica_clock);
>> 	vclock_copy(&relay->tx.vclock, replica_clock);
>> diff --git a/src/box/relay.h b/src/box/relay.h
>> index e1782d78f..54ebd6731 100644
>> --- a/src/box/relay.h
>> +++ b/src/box/relay.h
>> @@ -124,6 +124,7 @@ relay_final_join(int fd, uint64_t sync, struct vclock *start_vclock,
>>  */
>> void
>> relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>> -		struct vclock *replica_vclock, uint32_t replica_version_id);
>> +		struct vclock *replica_vclock, uint32_t replica_version_id,
>> +		struct vclock *clock_at_subscribe);
>> 
>> #endif /* TARANTOOL_REPLICATION_RELAY_H_INCLUDED */
>> -- 
>> 2.20.1 (Apple Git-117)
>> 
> 
> -- 
> Konstantin Osipov, Moscow, Russia
> https://scylladb.com <https://scylladb.com/>

[-- Attachment #2: Type: text/html, Size: 26292 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn
  2020-02-14  7:20   ` Konstantin Osipov
@ 2020-02-14 10:46     ` Serge Petrenko
  0 siblings, 0 replies; 16+ messages in thread
From: Serge Petrenko @ 2020-02-14 10:46 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, Vladislav Shpilevoy


> 14 февр. 2020 г., в 10:20, Konstantin Osipov <kostja.osipov@gmail.com> написал(а):
> 
> * sergepetrenko <sergepetrenko@tarantool.org> [20/02/14 09:46]:
>> From: Serge Petrenko <sergepetrenko@tarantool.org>
>> 
>> There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
>> fire in release builds, of course. Let's at least warn the user on an
>> attemt to write a record with a duplicate or otherwise broken lsn.
> 
> wart is бородавка.
> 
> I think you meant warn :)
> 
> otherwise lgtm
> 
> 
> -- 
> Konstantin Osipov, Moscow, Russia
> https://scylladb.com


Thanks!
--
Serge Petrenko
sergepetrenko@tarantool.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily
  2020-02-14 10:46     ` Serge Petrenko
@ 2020-02-14 10:52       ` Konstantin Osipov
  0 siblings, 0 replies; 16+ messages in thread
From: Konstantin Osipov @ 2020-02-14 10:52 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy

* Serge Petrenko <sergepetrenko@tarantool.org> [20/02/14 13:47]:

> The fix allows to stop relaying rows that have just came from the replica back to it.
> It is not necessary, since we’ve fixed applier in a different way, but I think there’s no
> need to resend the replica’s rows back and forth. It just looks more correct, or consistent,
> if you wish. If master responds that its vclock is such and such, it should use the same
> vclock to judge whether to send the replica its own rows, or not, in my opinion.
> Before the patch master judges by replicaset vclock, which may get updated while master
> responds to subscribe (coio_write_xrow yields).

These rows are few. The check is there in all cases. There is no
such thing as partially rotten egg. The old code was correct in
that regard. The new code can't be "more correct", it's either
also correct or not. 

If you want to clarify the old code, you could add a comment.

> Yes, master sends it to replica, and later master uses its own replicaset vclock
> (previously) or the same vclock it sent to replica (after my patch) to filter replica’s
> rows to send back to it (see the piece of code you’ve shown me yesterday).

No, it uses a different vclock for filtering - replica vclock.

> Imagine a situation: you have master-master configuration with instances 1 and 2
> 2 is subscribed to 1, and 1 resubscribes to 2 (maybe 2 just restarted and was the first one
> to successfully subscribe).
> 2 yields on writing subscribe responce. In the meantime 1 writes something new to WAL
> and relays it to 2. 2 writes it to WAL and increments its replicaset vclock. Later it’ll resend
> these rows back to 1, because `relay->local_vclock_at_subscribe` holds an updated vclock
> value.
> Hope I made it more clear this time.

This is what you should fix by setting replica_vclock[self] =
infinity at subscribe.

Do you see the difference between a single assignment instruction
and a condition evaluated on every row?

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn
  2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn sergepetrenko
  2020-02-14  7:20   ` Konstantin Osipov
@ 2020-02-16 16:15   ` Vladislav Shpilevoy
  2020-02-18 17:28     ` Serge Petrenko
  1 sibling, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy @ 2020-02-16 16:15 UTC (permalink / raw)
  To: sergepetrenko, alexander.turenko, kostja.osipov; +Cc: tarantool-patches

Hi! Thanks for the patch! I will review other commits when
Kostja is fine with them.

Since he finished with this one, here is my nit: lets keep
assertion for the debug build. If not this assertion, we
probably wouldn't notice this bug, and may miss future bugs
without it. During running the tests we won't notice a
warning.

Also we probably should not call vclock_follow() at all, if
lsn is broken. Just keep it as it. It does not look right to
decrease it. And make it panic() in vclock_follow() to catch
other bugs related to it.

On 13/02/2020 22:52, sergepetrenko wrote:
> From: Serge Petrenko <sergepetrenko@tarantool.org>
> 
> There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
> fire in release builds, of course. Let's at least warn the user on an
> attemt to write a record with a duplicate or otherwise broken lsn.
> 
> Follow-up #4739
> ---
>  src/box/wal.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/src/box/wal.c b/src/box/wal.c
> index 0ae66ff32..f8ee2b7d8 100644
> --- a/src/box/wal.c
> +++ b/src/box/wal.c
> @@ -951,9 +951,18 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>  			(*row)->tsn = tsn;
>  			(*row)->is_commit = row == end - 1;
>  		} else {
> -			vclock_follow(vclock_diff, (*row)->replica_id,
> -				      (*row)->lsn - vclock_get(base,
> -							       (*row)->replica_id));
> +			int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
> +			if (diff <= vclock_get(vclock_diff,
> +					       (*row)->replica_id)) {
> +				say_crit("Attempt to write a broken LSN to WAL:"
> +					 " replica id: %d, committed lsn: %d,"
> +					 " new lsn %d", (*row)->replica_id,
> +					 vclock_get(base, (*row)->replica_id) +
> +					 vclock_get(vclock_diff,
> +						    (*row)->replica_id),
> +						    (*row)->lsn);
> +			}
> +			vclock_follow(vclock_diff, (*row)->replica_id, diff);

On the summary, lets call follow() in 'else' branch, and add unreachable()
after crit log.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn
  2020-02-16 16:15   ` Vladislav Shpilevoy
@ 2020-02-18 17:28     ` Serge Petrenko
  2020-02-18 21:15       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 16+ messages in thread
From: Serge Petrenko @ 2020-02-18 17:28 UTC (permalink / raw)
  To: Vladislav Shpilevoy, Alexander Turenko, Konstantin Osipov
  Cc: tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 3659 bytes --]

Hi! Thanks for your review!

Please find my answers below, together with an incremental diff.
I’ll send v3 shortly.

> 16 февр. 2020 г., в 19:15, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а):
> 
> Hi! Thanks for the patch! I will review other commits when
> Kostja is fine with them.
> 
> Since he finished with this one, here is my nit: lets keep
> assertion for the debug build. If not this assertion, we
> probably wouldn't notice this bug, and may miss future bugs
> without it. During running the tests we won't notice a
> warning.
> 
> Also we probably should not call vclock_follow() at all, if
> lsn is broken. Just keep it as it. It does not look right to

Ok

> decrease it. And make it panic() in vclock_follow() to catch
> other bugs related to it.

Kostja was against panicking on the previous review iteration,
the assertion is still in `vclock_follow()`

> 
> On 13/02/2020 22:52, sergepetrenko wrote:
>> From: Serge Petrenko <sergepetrenko@tarantool.org>
>> 
>> There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
>> fire in release builds, of course. Let's at least warn the user on an
>> attemt to write a record with a duplicate or otherwise broken lsn.
>> 
>> Follow-up #4739
>> ---
>> src/box/wal.c | 15 ++++++++++++---
>> 1 file changed, 12 insertions(+), 3 deletions(-)
>> 
>> diff --git a/src/box/wal.c b/src/box/wal.c
>> index 0ae66ff32..f8ee2b7d8 100644
>> --- a/src/box/wal.c
>> +++ b/src/box/wal.c
>> @@ -951,9 +951,18 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>> 			(*row)->tsn = tsn;
>> 			(*row)->is_commit = row == end - 1;
>> 		} else {
>> -			vclock_follow(vclock_diff, (*row)->replica_id,
>> -				      (*row)->lsn - vclock_get(base,
>> -							       (*row)->replica_id));
>> +			int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
>> +			if (diff <= vclock_get(vclock_diff,
>> +					       (*row)->replica_id)) {
>> +				say_crit("Attempt to write a broken LSN to WAL:"
>> +					 " replica id: %d, committed lsn: %d,"
>> +					 " new lsn %d", (*row)->replica_id,
>> +					 vclock_get(base, (*row)->replica_id) +
>> +					 vclock_get(vclock_diff,
>> +						    (*row)->replica_id),
>> +						    (*row)->lsn);
>> +			}
>> +			vclock_follow(vclock_diff, (*row)->replica_id, diff);
> 
> On the summary, lets call follow() in 'else' branch, and add unreachable()
> after crit log.


I believe `unreachable` doesn’t fit here, since it implies that the code is
truly unreachable, while we are trying to catch something that «shouldn’t happen».
I’ve even seen a ticket in our repo regarding this misuse.
Let’s just leave an `assert(0)` in this branch, unreachable is defined like that anyway.

Here are my changes:

diff --git a/src/box/wal.c b/src/box/wal.c
index f8ee2b7d8..a87aedf1d 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -955,14 +955,16 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 			if (diff <= vclock_get(vclock_diff,
 					       (*row)->replica_id)) {
 				say_crit("Attempt to write a broken LSN to WAL:"
-					 " replica id: %d, committed lsn: %d,"
+					 " replica id: %d, confirmed lsn: %d,"
 					 " new lsn %d", (*row)->replica_id,
 					 vclock_get(base, (*row)->replica_id) +
 					 vclock_get(vclock_diff,
 						    (*row)->replica_id),
 						    (*row)->lsn);
+				assert(0);
+			} else {
+				vclock_follow(vclock_diff, (*row)->replica_id, diff);
 			}
-			vclock_follow(vclock_diff, (*row)->replica_id, diff);
 		}
 	}
 }

--
Serge Petrenko
sergepetrenko@tarantool.org


[-- Attachment #2: Type: text/html, Size: 22757 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn
  2020-02-18 17:28     ` Serge Petrenko
@ 2020-02-18 21:15       ` Vladislav Shpilevoy
  2020-02-19  8:46         ` Serge Petrenko
  0 siblings, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy @ 2020-02-18 21:15 UTC (permalink / raw)
  To: Serge Petrenko, Alexander Turenko, Konstantin Osipov; +Cc: tarantool-patches

Hi! Thanks for the fixes!

> diff --git a/src/box/wal.c b/src/box/wal.c
> index f8ee2b7d8..a87aedf1d 100644
> --- a/src/box/wal.c
> +++ b/src/box/wal.c
> @@ -955,14 +955,16 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>  if (diff <= vclock_get(vclock_diff,
>         (*row)->replica_id)) {
>  say_crit("Attempt to write a broken LSN to WAL:"
> - " replica id: %d, committed lsn: %d,"
> + " replica id: %d, confirmed lsn: %d,"
>   " new lsn %d", (*row)->replica_id,
>   vclock_get(base, (*row)->replica_id) +
>   vclock_get(vclock_diff,
>      (*row)->replica_id),
>      (*row)->lsn);
> +assert(0);

Ok. Lets then use assert(false) at least. We don't normally use
integers as booleans.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn
  2020-02-18 21:15       ` Vladislav Shpilevoy
@ 2020-02-19  8:46         ` Serge Petrenko
  0 siblings, 0 replies; 16+ messages in thread
From: Serge Petrenko @ 2020-02-19  8:46 UTC (permalink / raw)
  To: Vladislav Shpilevoy, Alexander Turenko, Konstantin Osipov
  Cc: tarantool-patches





> 19 февр. 2020 г., в 00:15, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а):
> 
> Hi! Thanks for the fixes!
> 
>> diff --git a/src/box/wal.c b/src/box/wal.c
>> index f8ee2b7d8..a87aedf1d 100644
>> --- a/src/box/wal.c
>> +++ b/src/box/wal.c
>> @@ -955,14 +955,16 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>>  if (diff <= vclock_get(vclock_diff,
>>         (*row)->replica_id)) {
>>  say_crit("Attempt to write a broken LSN to WAL:"
>> - " replica id: %d, committed lsn: %d,"
>> + " replica id: %d, confirmed lsn: %d,"
>>   " new lsn %d", (*row)->replica_id,
>>   vclock_get(base, (*row)->replica_id) +
>>   vclock_get(vclock_diff,
>>      (*row)->replica_id),
>>      (*row)->lsn);
>> +assert(0);
> 
> Ok. Lets then use assert(false) at least. We don't normally use
> integers as booleans.

Thanks for review! Fixed.

diff --git a/src/box/wal.c b/src/box/wal.c
index a87aedf1d..b08d62b29 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -961,7 +961,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
                                         vclock_get(vclock_diff,
                                                    (*row)->replica_id),
                                                    (*row)->lsn);
-                               assert(0);
+                               assert(false);
                        } else {
                                vclock_follow(vclock_diff, (*row)->replica_id, diff);
                        }



--
Serge Petrenko
sergepetrenko@tarantool.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-02-19  8:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13 21:52 [Tarantool-patches] [PATCH v2 0/4] replication: fix applying of rows originating from local instance sergepetrenko
2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 1/4] box: expose box_is_orphan method sergepetrenko
2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 2/4] replication: check for rows to skip in applier correctly sergepetrenko
2020-02-14  7:19   ` Konstantin Osipov
2020-02-14  7:29     ` Konstantin Osipov
2020-02-13 21:52 ` [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn sergepetrenko
2020-02-14  7:20   ` Konstantin Osipov
2020-02-14 10:46     ` Serge Petrenko
2020-02-16 16:15   ` Vladislav Shpilevoy
2020-02-18 17:28     ` Serge Petrenko
2020-02-18 21:15       ` Vladislav Shpilevoy
2020-02-19  8:46         ` Serge Petrenko
2020-02-13 21:53 ` [Tarantool-patches] [PATCH v2 4/4] replication: do not promote local_vclock_at_subscribe unnecessarily sergepetrenko
2020-02-14  7:25   ` Konstantin Osipov
2020-02-14 10:46     ` Serge Petrenko
2020-02-14 10:52       ` Konstantin Osipov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox