[Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it

Serge Petrenko sergepetrenko at tarantool.org
Wed Feb 26 14:21:01 MSK 2020


  
>Среда, 26 февраля 2020, 13:23 +03:00 от Konstantin Osipov <kostja.osipov at gmail.com>:
> 
>* sergepetrenko < sergepetrenko at tarantool.org > [20/02/26 13:00]:
>> From: Serge Petrenko < sergepetrenko at tarantool.org >
>>
>> We have a mechanism for restoring rows originating from an instance that
>> suffered a sudden power loss: remote masters resend the isntance's rows
>> received before a certain point in time, defined by remote master vclock
>> at the moment of subscribe.
>> However, this is useful only on initial replication configuraiton, when
>> an instance has just recovered, so that it can receive what it has
>> relayed but haven't synced to disk.
>> In other cases, when an instance is operating normally and master-master
>> replication is configured, the mechanism described above may lead to
>> instance re-applying instance's own rows, coming from a master it has just
>> subscribed to.
>> To fix the problem do not relay rows coming from a remote instance, if
>> the instance has already recovered.
>>
>
>A comment like this also belongs to the code. Usually the patch
>that fixes a bug comes along with a test case for a bug, are you
>sure you can't submit one?
 
I don’t think I can. The test that comes with an issue is a stress test,
relying on running it with multiple workers simultaneously.
It reproduces the problem when ran with 4 workers on one of my PCs,
and with 20 workers on the other.
I think we don’t have the appropriate testing infrastructure to run the same
test with multiple workers at the same time, and I couldn’t come up with a
single test which would reproduce the same problem.
 
>
>> vclock_copy(&vclock, &replicaset.vclock);
>> + unsigned int id_mask = box_is_orphan() ? 0 : 1 << instance_id;
>
>box_is_orphan() fits the bill, so it's good enough.
>
>I would explain, however, that what we are really looking for
>here is whether or not the local WAL accepts writes. As soon as we
>started allowing writes to the local WAL, we don't want to get
>these writes from elsewhere.
 
Ok.
 
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 1a07d71a9..73ffc0d68 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -866,9 +866,13 @@ applier_subscribe(struct applier *applier)
  struct vclock vclock;
  vclock_create(&vclock);
  vclock_copy(&vclock, &replicaset.vclock);
- unsigned int id_mask = box_is_orphan() ? 0 : 1 << instance_id;
+ /*
+ * Stop accepting local rows coming from a remote
+ * instance as soon as local WAL starts accepting writes.
+ */
+ unsigned int id_filter = box_is_orphan() ? 0 : 1 << instance_id;
  xrow_encode_subscribe_xc(&row, &REPLICASET_UUID, &INSTANCE_UUID,
- &vclock, replication_anon, id_mask);
+ &vclock, replication_anon, id_filter);
  coio_write_xrow(coio, &row);
 
  /* Read SUBSCRIBE response */
 
>
>--
>Konstantin Osipov, Moscow, Russia
 
--
Serge Petrenko
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20200226/33b14c23/attachment.html>


More information about the Tarantool-patches mailing list