<HTML><BODY><div> </div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;">Среда, 26 февраля 2020, 13:23 +03:00 от Konstantin Osipov <kostja.osipov@gmail.com>:<br> <div id=""><div class="js-helper js-readmsg-msg"><style type="text/css"></style><div><div id="style_15827126010946780806_BODY">* sergepetrenko <<a href="/compose?To=sergepetrenko@tarantool.org">sergepetrenko@tarantool.org</a>> [20/02/26 13:00]:<br>> From: Serge Petrenko <<a href="/compose?To=sergepetrenko@tarantool.org">sergepetrenko@tarantool.org</a>><br>><br>> We have a mechanism for restoring rows originating from an instance that<br>> suffered a sudden power loss: remote masters resend the isntance's rows<br>> received before a certain point in time, defined by remote master vclock<br>> at the moment of subscribe.<br>> However, this is useful only on initial replication configuraiton, when<br>> an instance has just recovered, so that it can receive what it has<br>> relayed but haven't synced to disk.<br>> In other cases, when an instance is operating normally and master-master<br>> replication is configured, the mechanism described above may lead to<br>> instance re-applying instance's own rows, coming from a master it has just<br>> subscribed to.<br>> To fix the problem do not relay rows coming from a remote instance, if<br>> the instance has already recovered.<br>><br><br>A comment like this also belongs to the code. Usually the patch<br>that fixes a bug comes along with a test case for a bug, are you<br>sure you can't submit one?</div></div></div></div></blockquote><div> </div><div>I don’t think I can. The test that comes with an issue is a stress test,</div><div>relying on running it with multiple workers simultaneously.</div><div>It reproduces the problem when ran with 4 workers on one of my PCs,</div><div>and with 20 workers on the other.</div><div>I think we don’t have the appropriate testing infrastructure to run the same</div><div>test with multiple workers at the same time, and I couldn’t come up with a</div><div>single test which would reproduce the same problem.</div><div> </div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>> vclock_copy(&vclock, &replicaset.vclock);<br>> + unsigned int id_mask = box_is_orphan() ? 0 : 1 << instance_id;<br><br>box_is_orphan() fits the bill, so it's good enough.<br><br>I would explain, however, that what we are really looking for<br>here is whether or not the local WAL accepts writes. As soon as we<br>started allowing writes to the local WAL, we don't want to get<br>these writes from elsewhere.</div></div></div></div></blockquote><div> </div><div>Ok.</div><div> </div><div><p>diff --git a/src/box/applier.cc b/src/box/applier.cc</p><p>index 1a07d71a9..73ffc0d68 100644</p><p>--- a/src/box/applier.cc</p><p>+++ b/src/box/applier.cc</p><p>@@ -866,9 +866,13 @@ applier_subscribe(struct applier *applier)</p><p> struct vclock vclock;</p><p> vclock_create(&vclock);</p><p> vclock_copy(&vclock, &replicaset.vclock);</p><p>- unsigned int id_mask = box_is_orphan() ? 0 : 1 << instance_id;</p><p>+ /*</p><p>+ * Stop accepting local rows coming from a remote</p><p>+ * instance as soon as local WAL starts accepting writes.</p><p>+ */</p><p>+ unsigned int id_filter = box_is_orphan() ? 0 : 1 << instance_id;</p><p> xrow_encode_subscribe_xc(&row, &REPLICASET_UUID, &INSTANCE_UUID,</p><p>- &vclock, replication_anon, id_mask);</p><p>+ &vclock, replication_anon, id_filter);</p><p> coio_write_xrow(coio, &row);</p><p> </p><p> /* Read SUBSCRIBE response */</p><div> </div></div><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div><div class="js-helper js-readmsg-msg"><div><div><br>--<br>Konstantin Osipov, Moscow, Russia</div></div></div></div></blockquote><div> </div><div>--<br>Serge Petrenko</div><div> </div></BODY></HTML>