From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 28 Jan 2019 15:09:01 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] [PATCH v2 3/5] Enforce applier out of order protection Message-ID: <20190128120901.spkitg7kyrfjp6xz@esperanza> References: <4c39bbbfcd12c47b9b14fc1a0a0484331939ed63.1548152776.git.georgy@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4c39bbbfcd12c47b9b14fc1a0a0484331939ed63.1548152776.git.georgy@tarantool.org> To: Georgy Kirichenko Cc: tarantool-patches@freelists.org List-ID: On Tue, Jan 22, 2019 at 01:31:11PM +0300, Georgy Kirichenko wrote: > Do not skip row until the row is not processed by other appliers. Looks like a fix for https://github.com/tarantool/tarantool/issues/3568 Worth adding a test? > > Prerequisite #980 > --- > src/box/applier.cc | 35 ++++++++++++++++++----------------- > 1 file changed, 18 insertions(+), 17 deletions(-) > > diff --git a/src/box/applier.cc b/src/box/applier.cc > index 87873e970..148c8ce5a 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -504,6 +504,22 @@ applier_subscribe(struct applier *applier) > > applier->lag = ev_now(loop()) - row.tm; > applier->last_row_time = ev_monotonic_now(loop()); > + struct replica *replica = replica_by_id(row.replica_id); > + struct latch *latch = (replica ? &replica->order_latch : > + &replicaset.applier.order_latch); > + /* > + * In a full mesh topology, the same set > + * of changes may arrive via two > + * concurrently running appliers. Thanks > + * to vclock_follow() above, the first row I don't see any vclock_follow() above. Please fix the comment. > + * in the set will be skipped - but the > + * remaining may execute out of order, > + * when the following xstream_write() > + * yields on WAL. Hence we need a latch to > + * strictly order all changes which belong > + * to the same server id. > + */ > + latch_lock(latch); > if (vclock_get(&replicaset.applier.vclock, > row.replica_id) < row.lsn) { > if (row.replica_id == instance_id && AFAIU this patch makes replicaset.applier.vclock, introduced by the previous patch, useless. > @@ -516,24 +532,7 @@ applier_subscribe(struct applier *applier) > int64_t old_lsn = vclock_get(&replicaset.applier.vclock, > row.replica_id); > vclock_follow_xrow(&replicaset.applier.vclock, &row); > - struct replica *replica = replica_by_id(row.replica_id); > - struct latch *latch = (replica ? &replica->order_latch : > - &replicaset.applier.order_latch); > - /* > - * In a full mesh topology, the same set > - * of changes may arrive via two > - * concurrently running appliers. Thanks > - * to vclock_follow() above, the first row > - * in the set will be skipped - but the > - * remaining may execute out of order, > - * when the following xstream_write() > - * yields on WAL. Hence we need a latch to > - * strictly order all changes which belong > - * to the same server id. > - */ > - latch_lock(latch); > int res = xstream_write(applier->subscribe_stream, &row); > - latch_unlock(latch); > if (res != 0) { > struct error *e = diag_last_error(diag_get()); > /** > @@ -548,11 +547,13 @@ applier_subscribe(struct applier *applier) > /* Rollback lsn to have a chance for a retry. */ > vclock_set(&replicaset.applier.vclock, > row.replica_id, old_lsn); > + latch_unlock(latch); > diag_raise(); > } > } > } > done: > + latch_unlock(latch); > /* > * Stay 'orphan' until appliers catch up with > * the remote vclock at the time of SUBSCRIBE