<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi! Thank you for review!<div class="">I pushed the new version of the patch on the branch and answered your comments below..</div><div class="">Incremental diff is below.</div><div class=""><br class=""><blockquote type="cite" class="">11 дек. 2018 г., в 13:42, Vladimir Davydov <<a href="mailto:vdavydov.dev@gmail.com" class="">vdavydov.dev@gmail.com</a>> написал(а):<br class=""><br class="">On Mon, Dec 10, 2018 at 06:36:00PM +0300, Serge Petrenko wrote:<br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Applier used to promote vclock prior to applying the row. This lead to a<br class="">situation when master's row would be skipped forever in case there is an<br class="">error trying to apply it. However, some errors are transient, and we<br class="">might be able to successfully apply the same row later.<br class=""><br class="">So only advance vclock if a corresponding row is successfully written to<br class="">WAL. This makes sure that rows are not skipped.<br class="">replication_skip_conflict works as before: if it is set an a conflict<br class="">occurs while applying the row, it is silently ignored and the vclock is<br class="">advanced, so the row is never applied.<br class=""><br class="">While we're at it, make wal writer the only one responsible for<br class="">advancing replicaset vclock. It was already doing it for rows coming from<br class="">the local instance, besides, it makes the code cleaner since now we want<br class="">to advance vclock only after a successful write and lets us get rid of<br class="">unnecessary checks whether applier or wal has already advanced the<br class="">vclock.<br class=""><br class="">Closes #2283<br class="">---<br class=""><a href="https://github.com/tarantool/tarantool/issues/2283" class="">https://github.com/tarantool/tarantool/issues/2283</a><br class="">https://github.com/tarantool/tarantool/tree/sp/gh-2283-dont-skip-rows<br class=""><br class="">src/box/applier.cc                          |  49 +++++-----<br class="">src/box/wal.c                               |  59 ++++++-----<br class="">src/errinj.h                                |   1 +<br class="">test/box/errinj.result                      |   2 +<br class="">test/replication/skip_conflict_row.result   |  64 ++++++++++++<br class="">test/replication/skip_conflict_row.test.lua |  19 ++++<br class="">test/xlog/errinj.result                     |  17 +++-<br class="">test/xlog/errinj.test.lua                   |   4 +<br class="">test/xlog/panic_on_lsn_gap.result           | 103 +++++++++++---------<br class="">test/xlog/panic_on_lsn_gap.test.lua         |  46 +++++----<br class="">10 files changed, 241 insertions(+), 123 deletions(-)<br class=""><br class="">diff --git a/src/box/applier.cc b/src/box/applier.cc<br class="">index ff4af95e5..bdc303a7d 100644<br class="">--- a/src/box/applier.cc<br class="">+++ b/src/box/applier.cc<br class="">@@ -355,6 +355,10 @@ applier_join(struct applier *applier)<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>coio_read_xrow(coio, ibuf, &row);<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>applier->last_row_time = ev_monotonic_now(loop());<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (iproto_type_is_dml(row.type)) {<br class="">+<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>/*<br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * Manually advance vclock, since no WAL writes<br class="">+<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * are done during this stage of recovery.<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> */<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>vclock_follow_xrow(&replicaset.vclock, &row);<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>xstream_write_xc(applier->subscribe_stream, &row);<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (++row_count % 100000 == 0)<br class="">@@ -513,31 +517,21 @@ applier_subscribe(struct applier *applier)<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>applier->lag = ev_now(loop()) - row.tm;<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>applier->last_row_time = ev_monotonic_now(loop());<br class=""><br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>/*<br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * In a full mesh topology, the same set<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span> * of changes may arrive via two<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span> * concurrently running appliers.<br class="">+<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span> * Thus the rows may execute out of order,<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span> * when the following xstream_write()<br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * yields on WAL. Hence we need a latch to<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span> * strictly order all changes which belong<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span> * to the same server id.<br class="">+<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span> */<br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>struct replica *replica = replica_by_id(row.replica_id);<br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>struct latch *latch = (replica ? &replica->order_latch :<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>       &replicaset.applier.order_latch);<br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>latch_lock(latch);<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (vclock_get(&replicaset.vclock, row.replica_id) < row.lsn) {<br class="">-<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>/**<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * Promote the replica set vclock before<br class="">-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * applying the row. If there is an<br class="">-<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * exception (conflict) applying the row,<br class="">-<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * the row is skipped when the replication<br class="">-<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * is resumed.<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> */<br class="">-<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>vclock_follow_xrow(&replicaset.vclock, &row);<br class="">-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>struct replica *replica = replica_by_id(row.replica_id);<br class="">-<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>struct latch *latch = (replica ? &replica->order_latch :<br class="">-<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>       &replicaset.applier.order_latch);<br class="">-<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>/*<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * In a full mesh topology, the same set<br class="">-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * of changes may arrive via two<br class="">-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * concurrently running appliers. Thanks<br class="">-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * to vclock_follow() above, the first row<br class="">-<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * in the set will be skipped - but the<br class="">-<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * remaining may execute out of order,<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * when the following xstream_write()<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * yields on WAL. Hence we need a latch to<br class="">-<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * strictly order all changes which belong<br class="">-<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * to the same server id.<br class="">-<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> */<br class="">-<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>latch_lock(latch);<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>int res = xstream_write(applier->subscribe_stream, &row);<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>latch_unlock(latch);<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (res != 0) {<br class="">@@ -548,12 +542,13 @@ applier_subscribe(struct applier *applier)<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> */<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (e->type == &type_ClientError &&<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>    box_error_code(e) == ER_TUPLE_FOUND &&<br class="">-<span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>    replication_skip_conflict)<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>    replication_skip_conflict) {<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>diag_clear(diag_get());<br class="">-<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>else<br class="">+<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>vclock_follow_xrow(&replicaset.vclock, &row);<br class=""></blockquote><br class="">Hmm, do we really need to advance the vclock here?<br class=""></blockquote><div class=""><br class=""></div><div class="">if xstream_write above fails, vclock is not advanced anywhere, so we have to advance it manually</div><div class="">if we want to skip the row, as directed by replication_skip_conflict option.</div><br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">+<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>} else<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>diag_raise();<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>}<br class="">-<span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>}<br class="">+<span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>} else latch_unlock(latch);<br class=""></blockquote><br class="">Coding style: latch_unlock should be on the next line.<br class=""><br class="">Anyway, could you please rewrite this part so that latch_lock and<br class="">latch_unlock stay on the same level?<br class=""></blockquote><div class=""><br class=""></div><div class="">Rewrote this part, incremental diff is below.</div><br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (applier->state == APPLIER_SYNC ||<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>    applier->state == APPLIER_FOLLOW)<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>fiber_cond_signal(&applier->writer_cond);<br class="">diff --git a/src/box/wal.c b/src/box/wal.c<br class="">index 3b50d3629..c6c3c4ee5 100644<br class="">--- a/src/box/wal.c<br class="">+++ b/src/box/wal.c<br class="">@@ -121,6 +121,11 @@ struct wal_writer<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span> * with this LSN and LSN becomes "real".<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span> */<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>struct vclock vclock;<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span>/*<br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span> * Uncommitted vclock of the latest entry to be written,<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span> * which will become "real" on a successful WAL write.<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span> */<br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span>struct vclock req_vclock;<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span>/**<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span> * VClock of the most recent successfully created checkpoint.<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span> * The WAL writer must not delete WAL files that are needed to<br class="">@@ -171,6 +176,11 @@ struct wal_msg {<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span> * be rolled back.<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span> */<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>struct stailq rollback;<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span>/**<br class="">+<span class="Apple-tab-span" style="white-space:pre">   </span> * Vclock of latest successfully written request in batch<br class="">+<span class="Apple-tab-span" style="white-space:pre">        </span> * used to advance replicaset vclock.<br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span> */<br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span>struct vclock vclock;<br class=""></blockquote><br class="">I don't understand this part. Why do you need to introduce these<br class="">wal_msg::vclock and wal_writer::req_vclock. Can't you simply use<br class="">the vclock of the last written row in wal_write()?<br class=""></blockquote><div class=""><br class=""></div><div class="">A batch can have rows from different instances. Previously WAL writer</div><div class="">only advanced replicaset vclock for the rows from the local instance, so we</div><div class="">could use the last written row from an instance to advance vclock.</div><div class="">Now WAL writer advances replicaset vclock for all the rows coming from</div><div class="">every cluster member, so we would have to search the batch for the last</div><div class="">written row from every instance, which is rather cumbersome.</div><div class="">So we better get a separate vclock for that to advance it on every successful WAL write</div><div class="">and then copy it to replicaset vclock.</div><br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">};<br class=""><br class="">/**<br class="">@@ -284,6 +294,7 @@ tx_schedule_commit(struct cmsg *msg)<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span>/* Closes the input valve. */<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>stailq_concat(&writer->rollback, &batch->rollback);<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span>}<br class="">+<span class="Apple-tab-span" style="white-space:pre">     </span>vclock_copy(&replicaset.vclock, &batch->vclock);<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>tx_schedule_queue(&batch->commit);<br class="">}<br class=""><br class="">@@ -368,6 +379,7 @@ wal_writer_create(struct wal_writer *writer, enum wal_mode wal_mode,<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span>writer->checkpoint_triggered = false;<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>vclock_copy(&writer->vclock, vclock);<br class="">+<span class="Apple-tab-span" style="white-space:pre">  </span>vclock_copy(&writer->req_vclock, vclock);<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>vclock_copy(&writer->checkpoint_vclock, checkpoint_vclock);<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>rlist_create(&writer->watchers);<br class=""><br class="">@@ -907,10 +919,15 @@ wal_assign_lsn(struct wal_writer *writer, struct xrow_header **row,<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>/** Assign LSN to all local rows. */<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span>for ( ; row < end; row++) {<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span>if ((*row)->replica_id == 0) {<br class="">-<span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>(*row)->lsn = vclock_inc(&writer->vclock, instance_id);<br class="">+<span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>(*row)->lsn = vclock_inc(&writer->req_vclock, instance_id);<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>(*row)->replica_id = instance_id;<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>} else {<br class="">-<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>vclock_follow_xrow(&writer->vclock, *row);<br class="">+<span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>vclock_follow_xrow(&writer->req_vclock, *row);<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span>}<br class="">+<span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>struct errinj *inj = errinj(ERRINJ_WAL_LSN_GAP, ERRINJ_INT);<br class=""></blockquote><br class="">This error injection looks rather artifical to me. Can such a thing<br class="">happen in the real world?<br class=""><br class="">Anyway, why do you need it at all? AFAICS you don't use it in the<br class="">replication test, which is supposed to test the feature. I don't<br class="">want to dive into the xlog tests so could you please elaborate here<br class="">or, even better, in the commit message?<br class=""></blockquote><div class=""><br class=""></div><div class="">In a test xlog/panic_on_lsn_gap.test.lua we created an lsn gap in a file by</div><div class="">simulating a failed WAL write, which caused the writer to advance its vclock</div><div class="">The way to create an lsn gap in a WAL doesn’t work anymore, because this</div><div class="">patch alters WAL writer to only advance its vclock on successful journal writes.</div><div class="">This error injection was added to simulate the old behaviour.</div><div class=""><br class=""></div><div class="">I guess we can drop it together with xlog/panic_on_lsn_gap.test.lua.</div><div class="">AFAICS the bug this test is for is not reproducible anymore.</div><div class="">What do you think?</div><br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (inj != NULL && inj->iparam > 0) {<br class="">+<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>(*row)->lsn += inj->iparam;<br class="">+<span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>vclock_follow_xrow(&writer->req_vclock, *row);<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>}<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span>}<br class="">}<br class="">@@ -975,13 +992,14 @@ wal_write_to_disk(struct cmsg *msg)<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>struct stailq_entry *last_committed = NULL;<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>stailq_foreach_entry(entry, &wal_msg->commit, fifo) {<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>wal_assign_lsn(writer, entry->rows, entry->rows + entry->n_rows);<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>entry->res = vclock_sum(&writer->vclock);<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>rc = xlog_write_entry(l, entry);<br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>entry->res = vclock_sum(&writer->req_vclock);<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>int rc = xlog_write_entry(l, entry);<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (rc < 0)<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>goto done;<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (rc > 0) {<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>writer->checkpoint_wal_size += rc;<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>last_committed = &entry->fifo;<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>vclock_copy(&writer->vclock, &writer->req_vclock);<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>}<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>/* rc == 0: the write is buffered in xlog_tx */<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>}<br class="">@@ -991,6 +1009,7 @@ wal_write_to_disk(struct cmsg *msg)<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>writer->checkpoint_wal_size += rc;<br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>last_committed = stailq_last(&wal_msg->commit);<br class="">+<span class="Apple-tab-span" style="white-space:pre">        </span>vclock_copy(&writer->vclock, &writer->req_vclock);<br class=""><br class=""><span class="Apple-tab-span" style="white-space:pre">  </span>/*<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span> * Notify TX if the checkpoint threshold has been exceeded.<br class="">@@ -1021,6 +1040,10 @@ done:<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>error_log(error);<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>diag_clear(diag_get());<br class=""><span class="Apple-tab-span" style="white-space:pre">        </span>}<br class="">+<span class="Apple-tab-span" style="white-space:pre">     </span>/* Latest successfully written row vclock. */<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span>vclock_copy(&wal_msg->vclock, &writer->vclock);<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span>/* Rollback request vclock to the latest committed. */<br class="">+<span class="Apple-tab-span" style="white-space:pre">        </span>vclock_copy(&writer->req_vclock, &writer->vclock);<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>/*<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span> * We need to start rollback from the first request<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span> * following the last committed request. If<br class="">@@ -1152,31 +1175,6 @@ wal_write(struct journal *journal, struct journal_entry *entry)<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span>bool cancellable = fiber_set_cancellable(false);<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>fiber_yield(); /* Request was inserted. */<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>fiber_set_cancellable(cancellable);<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span>if (entry->res > 0) {<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>struct xrow_header **last = entry->rows + entry->n_rows - 1;<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>while (last >= entry->rows) {<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>/*<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * Find last row from local instance id<br class="">-<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * and promote vclock.<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> */<br class="">-<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>if ((*last)->replica_id == instance_id) {<br class="">-<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>/*<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * In master-master configuration, during sudden<br class="">-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * power-loss, if the data have not been written<br class="">-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * to WAL but have already been sent to others,<br class="">-<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * they will send the data back. In this case<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> * vclock has already been promoted by applier.<br class="">-<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> */<br class=""></blockquote><br class="">It'd be nice to leave this comment.<br class=""></blockquote><div class=""><br class=""></div><div class="">Applier doesn’t advance vclock anymore. We can leave the </div><div class=""><blockquote type="cite" class=""><blockquote type="cite" class="">-<span class="Apple-tab-span" style="white-space: pre;">    </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span class="Apple-tab-span" style="white-space: pre;">  </span> * In master-master configuration, during sudden<br class="">-<span class="Apple-tab-span" style="white-space: pre;">       </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span class="Apple-tab-span" style="white-space: pre;">  </span> * power-loss, if the data have not been written<br class="">-<span class="Apple-tab-span" style="white-space: pre;">       </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span class="Apple-tab-span" style="white-space: pre;">  </span> * to WAL but have already been sent to others,<br class="">-<span class="Apple-tab-span" style="white-space: pre;">        </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span class="Apple-tab-span" style="white-space: pre;">  </span><span class="Apple-tab-span" style="white-space: pre;">  </span> * they will send the data back.</blockquote></blockquote></div><div class="">part, but I don’t know, where to put it now.</div><br class=""><blockquote type="cite" class=""><br class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">-<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (vclock_get(&replicaset.vclock,<br class="">-<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>       instance_id) < (*last)->lsn) {<br class="">-<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>vclock_follow_xrow(&replicaset.vclock,<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>   *last);<br class="">-<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>}<br class="">-<span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>break;<br class="">-<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>}<br class="">-<span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>--last;<br class="">-<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>}<br class="">-<span class="Apple-tab-span" style="white-space:pre">     </span>}<br class=""><span class="Apple-tab-span" style="white-space:pre">      </span>return entry->res;<br class="">}<br class=""><br class="">@@ -1187,11 +1185,12 @@ wal_write_in_wal_mode_none(struct journal *journal,<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>struct wal_writer *writer = (struct wal_writer *) journal;<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>wal_assign_lsn(writer, entry->rows, entry->rows + entry->n_rows);<br class=""><span class="Apple-tab-span" style="white-space:pre">     </span>int64_t old_lsn = vclock_get(&replicaset.vclock, instance_id);<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span>int64_t new_lsn = vclock_get(&writer->vclock, instance_id);<br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span>int64_t new_lsn = vclock_get(&writer->req_vclock, instance_id);<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>if (new_lsn > old_lsn) {<br class=""><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>/* There were local writes, promote vclock. */<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span>vclock_follow(&replicaset.vclock, instance_id, new_lsn);<br class=""><span class="Apple-tab-span" style="white-space:pre">   </span>}<br class="">+<span class="Apple-tab-span" style="white-space:pre">     </span>vclock_copy(&writer->vclock, &writer->req_vclock);<br class=""><span class="Apple-tab-span" style="white-space:pre">       </span>return vclock_sum(&writer->vclock);<br class="">}<br class=""></blockquote></blockquote><br class="">Incremental diff:<div class=""><br class="">diff --git a/src/box/<a href="http://applier.cc" class="">applier.cc</a> b/src/box/<a href="http://applier.cc" class="">applier.cc</a><br class="">index bdc303a7d..6fac2561a 100644<br class="">--- a/src/box/<a href="http://applier.cc" class="">applier.cc</a><br class="">+++ b/src/box/<a href="http://applier.cc" class="">applier.cc</a><br class="">@@ -530,25 +530,32 @@ applier_subscribe(struct applier *applier)<br class="">                struct replica *replica = replica_by_id(row.replica_id);<br class="">                struct latch *latch = (replica ? &replica->order_latch :<br class="">                                       &replicaset.applier.order_latch);<br class="">+               int res = 0;<br class="">+<br class="">                latch_lock(latch);<br class="">-               if (vclock_get(&replicaset.vclock, row.replica_id) < row.lsn) {<br class="">-                       int res = xstream_write(applier->subscribe_stream, &row);<br class="">-                       latch_unlock(latch);<br class="">-                       if (res != 0) {<br class="">-                               struct error *e = diag_last_error(diag_get());<br class="">-                               /**<br class="">-                                * Silently skip ER_TUPLE_FOUND error if such<br class="">-                                * option is set in config.<br class="">-                                */<br class="">-                               if (e->type == &type_ClientError &&<br class="">-                                   box_error_code(e) == ER_TUPLE_FOUND &&<br class="">-                                   replication_skip_conflict) {<br class="">-                                       diag_clear(diag_get());<br class="">-                                       vclock_follow_xrow(&replicaset.vclock, &row);<br class="">-                               } else<br class="">-                                       diag_raise();<br class="">-                       }<br class="">-               } else latch_unlock(latch);<br class="">+               if (vclock_get(&replicaset.vclock, row.replica_id) < row.lsn)<br class="">+                       res = xstream_write(applier->subscribe_stream, &row);<br class="">+               latch_unlock(latch);<br class="">+<br class="">+               if (res != 0) {<br class="">+                       struct error *e = diag_last_error(diag_get());<br class="">+                       /**<br class="">+                        * Silently skip ER_TUPLE_FOUND error if such<br class="">+                        * option is set in config.<br class="">+                        * Also advance the vclock manually, since<br class="">+                        * WAL writer only advances it on successful<br class="">+                        * writes, which is not the case.<br class="">+                        * ER_TUPLE_FOUND occurs even before WAL write.<br class="">+                        */<br class="">+                       if (e->type == &type_ClientError &&<br class="">+                           box_error_code(e) == ER_TUPLE_FOUND &&<br class="">+                           replication_skip_conflict) {<br class="">+                               diag_clear(diag_get());<br class="">+                               vclock_follow_xrow(&replicaset.vclock, &row);<br class="">+                       } else<br class="">+                               diag_raise();<br class="">+               }<br class="">+<br class="">                if (applier->state == APPLIER_SYNC ||<br class="">                    applier->state == APPLIER_FOLLOW)<br class="">                        fiber_cond_signal(&applier->writer_cond);<br class=""><br class=""></div></div></body></html>