>Среда, 26 февраля 2020, 13:23 +03:00 от Konstantin Osipov : >  >* sergepetrenko < sergepetrenko@tarantool.org > [20/02/26 13:00]: >> From: Serge Petrenko < sergepetrenko@tarantool.org > >> >> We have a mechanism for restoring rows originating from an instance that >> suffered a sudden power loss: remote masters resend the isntance's rows >> received before a certain point in time, defined by remote master vclock >> at the moment of subscribe. >> However, this is useful only on initial replication configuraiton, when >> an instance has just recovered, so that it can receive what it has >> relayed but haven't synced to disk. >> In other cases, when an instance is operating normally and master-master >> replication is configured, the mechanism described above may lead to >> instance re-applying instance's own rows, coming from a master it has just >> subscribed to. >> To fix the problem do not relay rows coming from a remote instance, if >> the instance has already recovered. >> > >A comment like this also belongs to the code. Usually the patch >that fixes a bug comes along with a test case for a bug, are you >sure you can't submit one?   I don’t think I can. The test that comes with an issue is a stress test, relying on running it with multiple workers simultaneously. It reproduces the problem when ran with 4 workers on one of my PCs, and with 20 workers on the other. I think we don’t have the appropriate testing infrastructure to run the same test with multiple workers at the same time, and I couldn’t come up with a single test which would reproduce the same problem.   > >> vclock_copy(&vclock, &replicaset.vclock); >> + unsigned int id_mask = box_is_orphan() ? 0 : 1 << instance_id; > >box_is_orphan() fits the bill, so it's good enough. > >I would explain, however, that what we are really looking for >here is whether or not the local WAL accepts writes. As soon as we >started allowing writes to the local WAL, we don't want to get >these writes from elsewhere.   Ok.   diff --git a/src/box/applier.cc b/src/box/applier.cc index 1a07d71a9..73ffc0d68 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -866,9 +866,13 @@ applier_subscribe(struct applier *applier)   struct vclock vclock;   vclock_create(&vclock);   vclock_copy(&vclock, &replicaset.vclock); - unsigned int id_mask = box_is_orphan() ? 0 : 1 << instance_id; + /* + * Stop accepting local rows coming from a remote + * instance as soon as local WAL starts accepting writes. + */ + unsigned int id_filter = box_is_orphan() ? 0 : 1 << instance_id;   xrow_encode_subscribe_xc(&row, &REPLICASET_UUID, &INSTANCE_UUID, - &vclock, replication_anon, id_mask); + &vclock, replication_anon, id_filter);   coio_write_xrow(coio, &row);     /* Read SUBSCRIBE response */   > >-- >Konstantin Osipov, Moscow, Russia   -- Serge Petrenko