Среда, 26 февраля 2020, 13:23 +03:00 от Konstantin Osipov <kostja.osipov@gmail.com>:
 
* sergepetrenko <sergepetrenko@tarantool.org> [20/02/26 13:00]:
> From: Serge Petrenko <sergepetrenko@tarantool.org>
>
> We have a mechanism for restoring rows originating from an instance that
> suffered a sudden power loss: remote masters resend the isntance's rows
> received before a certain point in time, defined by remote master vclock
> at the moment of subscribe.
> However, this is useful only on initial replication configuraiton, when
> an instance has just recovered, so that it can receive what it has
> relayed but haven't synced to disk.
> In other cases, when an instance is operating normally and master-master
> replication is configured, the mechanism described above may lead to
> instance re-applying instance's own rows, coming from a master it has just
> subscribed to.
> To fix the problem do not relay rows coming from a remote instance, if
> the instance has already recovered.
>

A comment like this also belongs to the code. Usually the patch
that fixes a bug comes along with a test case for a bug, are you
sure you can't submit one?
 
I don’t think I can. The test that comes with an issue is a stress test,
relying on running it with multiple workers simultaneously.
It reproduces the problem when ran with 4 workers on one of my PCs,
and with 20 workers on the other.
I think we don’t have the appropriate testing infrastructure to run the same
test with multiple workers at the same time, and I couldn’t come up with a
single test which would reproduce the same problem.
 

> vclock_copy(&vclock, &replicaset.vclock);
> + unsigned int id_mask = box_is_orphan() ? 0 : 1 << instance_id;

box_is_orphan() fits the bill, so it's good enough.

I would explain, however, that what we are really looking for
here is whether or not the local WAL accepts writes. As soon as we
started allowing writes to the local WAL, we don't want to get
these writes from elsewhere.
 
Ok.
 

diff --git a/src/box/applier.cc b/src/box/applier.cc

index 1a07d71a9..73ffc0d68 100644

--- a/src/box/applier.cc

+++ b/src/box/applier.cc

@@ -866,9 +866,13 @@ applier_subscribe(struct applier *applier)

  struct vclock vclock;

  vclock_create(&vclock);

  vclock_copy(&vclock, &replicaset.vclock);

- unsigned int id_mask = box_is_orphan() ? 0 : 1 << instance_id;

+ /*

+ * Stop accepting local rows coming from a remote

+ * instance as soon as local WAL starts accepting writes.

+ */

+ unsigned int id_filter = box_is_orphan() ? 0 : 1 << instance_id;

  xrow_encode_subscribe_xc(&row, &REPLICASET_UUID, &INSTANCE_UUID,

- &vclock, replication_anon, id_mask);

+ &vclock, replication_anon, id_filter);

  coio_write_xrow(coio, &row);

 

  /* Read SUBSCRIBE response */

 

--
Konstantin Osipov, Moscow, Russia
 
--
Serge Petrenko