From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp60.i.mail.ru (smtp60.i.mail.ru [217.69.128.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 25635445320 for ; Sun, 5 Jul 2020 14:58:05 +0300 (MSK) From: Serge Petrenko Date: Sun, 5 Jul 2020 14:57:49 +0300 Message-Id: <20200705115749.45407-1-sergepetrenko@tarantool.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH] applier: fix tx boundary check for half-applied txns List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org In case there are 2 "new" instances, running tarantool 2.2+, master and replica, and one "old" instance, running an earlier tarantool version, in a full-mesh cluster, it may happen that the "new" replica receives part of a tx from an "old" instance, and the remaining part from a "new" instance. Since "new" instances preserve tx boundaries, "new" replica would skip the tx remains assuming it has already applied the full tx if it has applied the first tx row. This leads to gaps in "new" replica's WAL and to skipping the remaining part of the tx forever. Fix this behaviour to apply the full tx even if it's beginning is already applied in mixed clusters. Closes #5125 --- https://github.com/tarantool/tarantool/issues/5125 https://github.com/tarantool/tarantool/tree/sp/gh-5125-applier-tx-boundaries src/box/applier.cc | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/src/box/applier.cc b/src/box/applier.cc index df48b4796..6ca4cca94 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -737,6 +737,7 @@ applier_apply_tx(struct stailq *rows) struct xrow_header *first_row = &stailq_first_entry(rows, struct applier_tx_row, next)->row; struct xrow_header *last_row; + last_row = &stailq_last_entry(rows, struct applier_tx_row, next)->row; struct replica *replica = replica_by_id(first_row->replica_id); /* * In a full mesh topology, the same set of changes @@ -748,9 +749,28 @@ applier_apply_tx(struct stailq *rows) &replicaset.applier.order_latch); latch_lock(latch); if (vclock_get(&replicaset.applier.vclock, - first_row->replica_id) >= first_row->lsn) { + last_row->replica_id) >= last_row->lsn) { latch_unlock(latch); return 0; + } else if (vclock_get(&replicaset.applier.vclock, + first_row->replica_id) >= first_row->lsn) { + /* + * We've received part of the tx from an old + * instance not knowing of tx boundaries. + * Skip the already applied part. + */ + struct xrow_header *tmp; + while (true) { + tmp = &stailq_first_entry(rows, + struct applier_tx_row, + next)->row; + if (tmp->lsn <= vclock_get(&replicaset.applier.vclock, + tmp->replica_id)) { + stailq_shift(rows); + } else { + break; + } + } } /** @@ -835,7 +855,6 @@ applier_apply_tx(struct stailq *rows) * instances, which send every single tx row as a separate * transaction. */ - last_row = &stailq_last_entry(rows, struct applier_tx_row, next)->row; vclock_follow(&replicaset.applier.vclock, last_row->replica_id, last_row->lsn); latch_unlock(latch); -- 2.24.3 (Apple Git-128)