<HTML><BODY><br><br><br><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;">

        Четверг, 13 февраля 2020, 9:58 +03:00 от Konstantin Osipov <kostja.osipov@gmail.com>:<br><br><div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_15815771330712026555_BODY">* Konstantin Osipov <<a href="mailto:kostja.osipov@gmail.com">kostja.osipov@gmail.com</a>> [20/02/13 09:47]:<br><br>

From relay.cc:<br><br>

        /*                                                                      <br>

         * We're feeding a WAL, thus responding to FINAL JOIN or SUBSCRIBE      <br>

         * request. If this is FINAL JOIN (i.e. relay->replica is NULL),        <br>

         * we must relay all rows, even those originating from the replica      <br>

         * itself (there may be such rows if this is rebootstrap). If this      <br>

         * SUBSCRIBE, only send a row if it is not from the same replica        <br>

         * (i.e. don't send replica's own rows back) or if this row is          <br>

         * missing on the other side (i.e. in case of sudden power-loss,        <br>

         * data was not written to WAL, so remote master can't recover          <br>

         * it). In the latter case packet's LSN is less than or equal to        <br>

         * local master's LSN at the moment it received 'SUBSCRIBE' request.    <br>

         */                                                                     <br>

        if (relay->replica == NULL ||                                           <br>

            packet->replica_id != relay->replica->id ||                         <br>

            packet->lsn <= vclock_get(&relay->local_vclock_at_subscribe,        <br>

                                      packet->replica_id)) {                    <br>

                struct errinj *inj = errinj(ERRINJ_RELAY_BREAK_LSN,             <br>

                                            ERRINJ_INT);                        <br>

                if (inj != NULL && packet->lsn == inj->iparam) {                <br>

                        packet->lsn = inj->iparam - 1;                          <br>

                        say_warn("injected broken lsn: %lld",                   <br>

                                 (long long) packet->lsn);                      <br>

                }                                                               <br>

                relay_send(relay, packet);                                      <br>

        }                                                                       <br>

}                 <br><br><br>

As you can see we never send our own rows back, as long as <br>

they are greater than relay->local_vclock_at_subscribe.</div></div></div></div></blockquote><br>True. First of all, local_vclock_at_subscribe was updated later than<br>it should've been. I describe it in more detail in one of the commits.<br><br><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_15815771330712026555_BODY"><br><br>

So what exactly does go wrong here? <br><br>

Is the bug triggered during initial replication configuration, or<br>

during a reconfiguration?</div></div></div></div></blockquote><br>Both during reconfiguration, resubscribing in case a remote instance<br>dies and restarts.<br><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_15815771330712026555_BODY"><br><br>

I suspect the issue is that at reconfiguration we send<br>

local_vclock_at_subscribe, but keep changing it.</div></div></div></div></blockquote>Yep<br><br><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_15815771330712026555_BODY"><br>

The fix then would be to make sure the local component in<br>

local_vclock_at_subscribe is set to infinity during<br>

reconfiguration</div></div></div></div></blockquote><blockquote style="border-left:1px solid #0857A6; margin:10px; padding:0 0 0 10px;"><div id=""><div class="js-helper js-readmsg-msg"><div><div id="style_15815771330712026555_BODY"><br><br><div class="mail-quote-collapse">> * sergepetrenko <<a href="mailto:sergepetrenko@tarantool.org">sergepetrenko@tarantool.org</a>> [20/02/13 09:34]:<br>

> > Fix replicaset.applier.vclock initialization issues: it wasn't<br>

> > initialized at all previously.<br>

> <br>

> In the next line you say that you remove the initialization. What<br>

> do you mean here?<br>

> <br>

> > Moreover, there is no valid point in code<br>

> > to initialize it, since it may get stale right away if new entries are<br>

> > written to WAL.<br>

> <br>

> Well, it reflects the state of the wal *as seen by* the set of<br>

> appliers. This is stated in the comment. So it doesn't have to<br>

> reflect local changes. <br>

> <br>

> > So, check for both applier and replicaset vclocks.<br>

> > The greater one protects the instance from applying the rows it has<br>

> > already applied or has already scheduled to write.<br>

> > Also remove an unnecessary aplier vclock initialization from<br>

> > replication_init().<br>

> <br>

> First of all, the race you describe applies to<br>

> local changes only. Yet you add the check for all replica ids. <br>

> This further obliterates this piece of code.<br>

> <br>

> Second, the core of the issue is a "hole" in vclock protection<br>

> enforced by latch_lock/latch_unlock. Basically the assumption that<br>

> latch_lock/latch_unlock has is that while a latch is locked, no<br>

> source can apply a transaction under this replica id. This, is<br>

> violated by the local WAL.<br>

> <br>

> We used to skip all changes by local vclock id before in applier.<br>

> <br>

> Later it was changed to be able to get-your-own logs on recovery,<br>

> e.g. if some replica has them , and the local node lost a piece of<br>

> wal.<br>

> <br>

> It will take me a while to find this commit and ticket, but this<br>

> is the commit and ticket which introduced the regression.<br>

> <br>

> The proper fix is to only apply local changes received from<br>

> remotes in orphan mode, and begin skipping them when entering<br>

> read-write mode.<br>

> <br>

> > Closes #4739<br>

> > ---<br>

> >  src/box/applier.cc     | 14 ++++++++++++--<br>

> >  src/box/replication.cc |  1 -<br>

> >  2 files changed, 12 insertions(+), 3 deletions(-)<br>

> > <br>

> > diff --git a/src/box/applier.cc b/src/box/applier.cc<br>

> > index ae3d281a5..acb26b7e2 100644<br>

> > --- a/src/box/applier.cc<br>

> > +++ b/src/box/applier.cc<br>

> > @@ -731,8 +731,18 @@ applier_apply_tx(struct stailq *rows)<br>

> >       struct latch *latch = (replica ? &replica->order_latch :<br>

> >                              &replicaset.applier.order_latch);<br>

> >       latch_lock(latch);<br>

> > -     if (vclock_get(&replicaset.applier.vclock,<br>

> > -                    first_row->replica_id) >= first_row->lsn) {<br>

> > +     /*<br>

> > +      * We cannot tell which vclock is greater. There is no<br>

> > +      * proper place to initialize applier vclock, since it<br>

> > +      * may get stale right away if we write something to WAL<br>

> > +      * and it gets replicated and then arrives back from the<br>

> > +      * replica. So check against both vclocks. Replicaset<br>

> > +      * vclock will guard us from corner cases like the one<br>

> > +      * above.<br>

> > +      */<br>

> > +     if (MAX(vclock_get(&replicaset.applier.vclock, first_row->replica_id),<br>

> > +             vclock_get(&replicaset.vclock, first_row->replica_id)) >=<br>

> > +         first_row->lsn) {<br>

> >               latch_unlock(latch);<br>

> >               return 0;<br>

> >       }<br>

> > diff --git a/src/box/replication.cc b/src/box/replication.cc<br>

> > index e7bfa22ab..7b04573a4 100644<br>

> > --- a/src/box/replication.cc<br>

> > +++ b/src/box/replication.cc<br>

> > @@ -93,7 +93,6 @@ replication_init(void)<br>

> >       latch_create(&replicaset.applier.order_latch);<br>

> >  <br>

> >       vclock_create(&replicaset.applier.vclock);<br>

> > -     vclock_copy(&replicaset.applier.vclock, &replicaset.vclock);<br>

> >       rlist_create(&replicaset.applier.on_rollback);<br>

> >       rlist_create(&replicaset.applier.on_commit);<br>

> >  <br>

> > -- <br>

> > 2.20.1 (Apple Git-117)<br>

> <br>

> -- <br>

> Konstantin Osipov, Moscow, Russia<br></div><br>

-- <br>

Konstantin Osipov, Moscow, Russia<br></div></div></div></div></blockquote>

<br>

<br>-- <br>Sergey Petrenko<br></BODY></HTML>