From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id CD8D74696C3 for ; Thu, 23 Apr 2020 12:41:14 +0300 (MSK) Received: by mail-lf1-f68.google.com with SMTP id t11so4216070lfe.4 for ; Thu, 23 Apr 2020 02:41:14 -0700 (PDT) Date: Thu, 23 Apr 2020 12:41:12 +0300 From: Cyrill Gorcunov Message-ID: <20200423094112.GD3072@uranus> References: <20200422182810.79257-1-sergepetrenko@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200422182810.79257-1-sergepetrenko@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org, v.shpilevoy@tarantool.org On Wed, Apr 22, 2020 at 09:28:10PM +0300, Serge Petrenko wrote: > Since the introduction of transaction boundaries in replication > protocol, appliers follow replicaset.applier.vclock to the lsn of the > first row in an arrived batch. This is enough and doesn't lead to errors > when replicating from other instances, respecting transaction boundaries > (instances with version 2.1.2 and up). However, if there's a 1.10 > instance in 2.1.2+ cluster, it sends every single tx row as a separate > transaction, breaking the comparison with replicaset.applier.vclock and > making the applier apply part of the changes, it has already applied > when processing a full transaction coming from another 2.x instance. > Such behaviour leads to ER_TUPLE_FOUND errors in the scenario described > above. > In order to guard from such cases, follow replicaset.applier.vclock to > the lsn of the last row in tx. > > Closes #4924 Serge, can we please put this into code comment itself? Say like (please check that I didn't miss somthing) --- diff --git a/src/box/applier.cc b/src/box/applier.cc index 68de3c08c..495bc7393 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -736,6 +736,7 @@ applier_apply_tx(struct stailq *rows) { struct xrow_header *first_row = &stailq_first_entry(rows, struct applier_tx_row, next)->row; + struct xrow_header *last_row; struct replica *replica = replica_by_id(first_row->replica_id); /* * In a full mesh topology, the same set of changes @@ -826,9 +827,16 @@ applier_apply_tx(struct stailq *rows) if (txn_commit_async(txn) < 0) goto fail; - /* Transaction was sent to journal so promote vclock. */ - vclock_follow(&replicaset.applier.vclock, - first_row->replica_id, first_row->lsn); + /* + * The transaction was sent to the journal so promote vclock. + * + * Use the lsn of the last row here for backward compatibility + * with 1.10 series where we sent every single tx in a row as + * a separate transaction. + */ + last_row = &stailq_last_entry(rows, struct applier_tx_row, next)->row; + vclock_follow(&replicaset.applier.vclock, last_row->replica_id, + last_row->lsn); latch_unlock(latch); return 0; rollback: