* [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row
@ 2020-04-22 18:28 Serge Petrenko
2020-04-23 9:41 ` Cyrill Gorcunov
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Serge Petrenko @ 2020-04-22 18:28 UTC (permalink / raw)
To: v.shpilevoy, gorcunov; +Cc: tarantool-patches
Since the introduction of transaction boundaries in replication
protocol, appliers follow replicaset.applier.vclock to the lsn of the
first row in an arrived batch. This is enough and doesn't lead to errors
when replicating from other instances, respecting transaction boundaries
(instances with version 2.1.2 and up). However, if there's a 1.10
instance in 2.1.2+ cluster, it sends every single tx row as a separate
transaction, breaking the comparison with replicaset.applier.vclock and
making the applier apply part of the changes, it has already applied
when processing a full transaction coming from another 2.x instance.
Such behaviour leads to ER_TUPLE_FOUND errors in the scenario described
above.
In order to guard from such cases, follow replicaset.applier.vclock to
the lsn of the last row in tx.
Closes #4924
---
https://github.com/tarantool/tarantool/issues/4924
https://github.com/tarantool/tarantool/tree/sp/gh-4924-applier-duplicate-key
src/box/applier.cc | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 68de3c08c..eb0297f73 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -736,6 +736,7 @@ applier_apply_tx(struct stailq *rows)
{
struct xrow_header *first_row = &stailq_first_entry(rows,
struct applier_tx_row, next)->row;
+ struct xrow_header *last_row;
struct replica *replica = replica_by_id(first_row->replica_id);
/*
* In a full mesh topology, the same set of changes
@@ -827,8 +828,9 @@ applier_apply_tx(struct stailq *rows)
goto fail;
/* Transaction was sent to journal so promote vclock. */
- vclock_follow(&replicaset.applier.vclock,
- first_row->replica_id, first_row->lsn);
+ last_row = &stailq_last_entry(rows, struct applier_tx_row, next)->row;
+ vclock_follow(&replicaset.applier.vclock, last_row->replica_id,
+ last_row->lsn);
latch_unlock(latch);
return 0;
rollback:
--
2.21.1 (Apple Git-122.3)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row
2020-04-22 18:28 [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row Serge Petrenko
@ 2020-04-23 9:41 ` Cyrill Gorcunov
2020-04-23 9:53 ` Serge Petrenko
2020-04-26 18:55 ` Vladislav Shpilevoy
2020-04-27 10:22 ` Kirill Yukhin
2 siblings, 1 reply; 8+ messages in thread
From: Cyrill Gorcunov @ 2020-04-23 9:41 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy
On Wed, Apr 22, 2020 at 09:28:10PM +0300, Serge Petrenko wrote:
> Since the introduction of transaction boundaries in replication
> protocol, appliers follow replicaset.applier.vclock to the lsn of the
> first row in an arrived batch. This is enough and doesn't lead to errors
> when replicating from other instances, respecting transaction boundaries
> (instances with version 2.1.2 and up). However, if there's a 1.10
> instance in 2.1.2+ cluster, it sends every single tx row as a separate
> transaction, breaking the comparison with replicaset.applier.vclock and
> making the applier apply part of the changes, it has already applied
> when processing a full transaction coming from another 2.x instance.
> Such behaviour leads to ER_TUPLE_FOUND errors in the scenario described
> above.
> In order to guard from such cases, follow replicaset.applier.vclock to
> the lsn of the last row in tx.
>
> Closes #4924
Serge, can we please put this into code comment itself? Say like
(please check that I didn't miss somthing)
---
diff --git a/src/box/applier.cc b/src/box/applier.cc
index 68de3c08c..495bc7393 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -736,6 +736,7 @@ applier_apply_tx(struct stailq *rows)
{
struct xrow_header *first_row = &stailq_first_entry(rows,
struct applier_tx_row, next)->row;
+ struct xrow_header *last_row;
struct replica *replica = replica_by_id(first_row->replica_id);
/*
* In a full mesh topology, the same set of changes
@@ -826,9 +827,16 @@ applier_apply_tx(struct stailq *rows)
if (txn_commit_async(txn) < 0)
goto fail;
- /* Transaction was sent to journal so promote vclock. */
- vclock_follow(&replicaset.applier.vclock,
- first_row->replica_id, first_row->lsn);
+ /*
+ * The transaction was sent to the journal so promote vclock.
+ *
+ * Use the lsn of the last row here for backward compatibility
+ * with 1.10 series where we sent every single tx in a row as
+ * a separate transaction.
+ */
+ last_row = &stailq_last_entry(rows, struct applier_tx_row, next)->row;
+ vclock_follow(&replicaset.applier.vclock, last_row->replica_id,
+ last_row->lsn);
latch_unlock(latch);
return 0;
rollback:
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row
2020-04-23 9:41 ` Cyrill Gorcunov
@ 2020-04-23 9:53 ` Serge Petrenko
2020-04-23 9:54 ` Cyrill Gorcunov
0 siblings, 1 reply; 8+ messages in thread
From: Serge Petrenko @ 2020-04-23 9:53 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tarantool-patches, Vladislav Shpilevoy
> 23 апр. 2020 г., в 12:41, Cyrill Gorcunov <gorcunov@gmail.com> написал(а):
>
> On Wed, Apr 22, 2020 at 09:28:10PM +0300, Serge Petrenko wrote:
>> Since the introduction of transaction boundaries in replication
>> protocol, appliers follow replicaset.applier.vclock to the lsn of the
>> first row in an arrived batch. This is enough and doesn't lead to errors
>> when replicating from other instances, respecting transaction boundaries
>> (instances with version 2.1.2 and up). However, if there's a 1.10
>> instance in 2.1.2+ cluster, it sends every single tx row as a separate
>> transaction, breaking the comparison with replicaset.applier.vclock and
>> making the applier apply part of the changes, it has already applied
>> when processing a full transaction coming from another 2.x instance.
>> Such behaviour leads to ER_TUPLE_FOUND errors in the scenario described
>> above.
>> In order to guard from such cases, follow replicaset.applier.vclock to
>> the lsn of the last row in tx.
>>
>> Closes #4924
>
> Serge, can we please put this into code comment itself? Say like
> (please check that I didn't miss somthing)
> ---
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 68de3c08c..495bc7393 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -736,6 +736,7 @@ applier_apply_tx(struct stailq *rows)
> {
> struct xrow_header *first_row = &stailq_first_entry(rows,
> struct applier_tx_row, next)->row;
> + struct xrow_header *last_row;
> struct replica *replica = replica_by_id(first_row->replica_id);
> /*
> * In a full mesh topology, the same set of changes
> @@ -826,9 +827,16 @@ applier_apply_tx(struct stailq *rows)
> if (txn_commit_async(txn) < 0)
> goto fail;
>
> - /* Transaction was sent to journal so promote vclock. */
> - vclock_follow(&replicaset.applier.vclock,
> - first_row->replica_id, first_row->lsn);
> + /*
> + * The transaction was sent to the journal so promote vclock.
> + *
> + * Use the lsn of the last row here for backward compatibility
> + * with 1.10 series where we sent every single tx in a row as
> + * a separate transaction.
> + */
> + last_row = &stailq_last_entry(rows, struct applier_tx_row, next)->row;
> + vclock_follow(&replicaset.applier.vclock, last_row->replica_id,
> + last_row->lsn);
> latch_unlock(latch);
> return 0;
> rollback:
Hi! Thanks for the review! I’ve added a slightly different comment:
diff --git a/src/box/applier.cc b/src/box/applier.cc
index eb0297f73..42a154a33 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -827,7 +827,13 @@ applier_apply_tx(struct stailq *rows)
if (txn_commit_async(txn) < 0)
goto fail;
- /* Transaction was sent to journal so promote vclock. */
+ /*
+ * The transaction was sent to journal so promote vclock.
+ *
+ * Use the lsn of the last row to guard from 1.10
+ * instances, which send every single tx row as a separate
+ * transaction.
+ */
last_row = &stailq_last_entry(rows, struct applier_tx_row, next)->row;
vclock_follow(&replicaset.applier.vclock, last_row->replica_id,
last_row->lsn);
--
Serge Petrenko
sergepetrenko@tarantool.org
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row
2020-04-23 9:53 ` Serge Petrenko
@ 2020-04-23 9:54 ` Cyrill Gorcunov
2020-04-23 11:19 ` Serge Petrenko
0 siblings, 1 reply; 8+ messages in thread
From: Cyrill Gorcunov @ 2020-04-23 9:54 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tarantool-patches, Vladislav Shpilevoy
On Thu, Apr 23, 2020 at 12:53:30PM +0300, Serge Petrenko wrote:
>
> > 23 апр. 2020 г., в 12:41, Cyrill Gorcunov <gorcunov@gmail.com> написал(а):
> >
> > On Wed, Apr 22, 2020 at 09:28:10PM +0300, Serge Petrenko wrote:
> >> Since the introduction of transaction boundaries in replication
> >> protocol, appliers follow replicaset.applier.vclock to the lsn of the
> >> first row in an arrived batch. This is enough and doesn't lead to errors
> >> when replicating from other instances, respecting transaction boundaries
> >> (instances with version 2.1.2 and up). However, if there's a 1.10
> >> instance in 2.1.2+ cluster, it sends every single tx row as a separate
> >> transaction, breaking the comparison with replicaset.applier.vclock and
> >> making the applier apply part of the changes, it has already applied
> >> when processing a full transaction coming from another 2.x instance.
> >> Such behaviour leads to ER_TUPLE_FOUND errors in the scenario described
> >> above.
> >> In order to guard from such cases, follow replicaset.applier.vclock to
> >> the lsn of the last row in tx.
> >>
> >> Closes #4924
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row
2020-04-23 9:54 ` Cyrill Gorcunov
@ 2020-04-23 11:19 ` Serge Petrenko
0 siblings, 0 replies; 8+ messages in thread
From: Serge Petrenko @ 2020-04-23 11:19 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tarantool-patches, Vladislav Shpilevoy
> 23 апр. 2020 г., в 12:54, Cyrill Gorcunov <gorcunov@gmail.com> написал(а):
>
> On Thu, Apr 23, 2020 at 12:53:30PM +0300, Serge Petrenko wrote:
>>
>>> 23 апр. 2020 г., в 12:41, Cyrill Gorcunov <gorcunov@gmail.com> написал(а):
>>>
>>> On Wed, Apr 22, 2020 at 09:28:10PM +0300, Serge Petrenko wrote:
>>>> Since the introduction of transaction boundaries in replication
>>>> protocol, appliers follow replicaset.applier.vclock to the lsn of the
>>>> first row in an arrived batch. This is enough and doesn't lead to errors
>>>> when replicating from other instances, respecting transaction boundaries
>>>> (instances with version 2.1.2 and up). However, if there's a 1.10
>>>> instance in 2.1.2+ cluster, it sends every single tx row as a separate
>>>> transaction, breaking the comparison with replicaset.applier.vclock and
>>>> making the applier apply part of the changes, it has already applied
>>>> when processing a full transaction coming from another 2.x instance.
>>>> Such behaviour leads to ER_TUPLE_FOUND errors in the scenario described
>>>> above.
>>>> In order to guard from such cases, follow replicaset.applier.vclock to
>>>> the lsn of the last row in tx.
>>>>
>>>> Closes #4924
> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
>
> Thanks!
@ChangeLog
- fix possible ER_TUPLE_FOUND error when bootstrapping
2.2+ replicas in an 1.10/2.1.1 cluster (gh-4924)
--
Serge Petrenko
sergepetrenko@tarantool.org
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row
2020-04-22 18:28 [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row Serge Petrenko
2020-04-23 9:41 ` Cyrill Gorcunov
@ 2020-04-26 18:55 ` Vladislav Shpilevoy
2020-04-27 10:22 ` Kirill Yukhin
2 siblings, 0 replies; 8+ messages in thread
From: Vladislav Shpilevoy @ 2020-04-26 18:55 UTC (permalink / raw)
To: Serge Petrenko, gorcunov; +Cc: tarantool-patches
Hi! Thanks for the patch!
LGTM.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row
2020-04-22 18:28 [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row Serge Petrenko
2020-04-23 9:41 ` Cyrill Gorcunov
2020-04-26 18:55 ` Vladislav Shpilevoy
@ 2020-04-27 10:22 ` Kirill Yukhin
2020-04-27 10:39 ` Kirill Yukhin
2 siblings, 1 reply; 8+ messages in thread
From: Kirill Yukhin @ 2020-04-27 10:22 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy
Hello,
On 22 апр 21:28, Serge Petrenko wrote:
> Since the introduction of transaction boundaries in replication
> protocol, appliers follow replicaset.applier.vclock to the lsn of the
> first row in an arrived batch. This is enough and doesn't lead to errors
> when replicating from other instances, respecting transaction boundaries
> (instances with version 2.1.2 and up). However, if there's a 1.10
> instance in 2.1.2+ cluster, it sends every single tx row as a separate
> transaction, breaking the comparison with replicaset.applier.vclock and
> making the applier apply part of the changes, it has already applied
> when processing a full transaction coming from another 2.x instance.
> Such behaviour leads to ER_TUPLE_FOUND errors in the scenario described
> above.
> In order to guard from such cases, follow replicaset.applier.vclock to
> the lsn of the last row in tx.
>
> Closes #4924
> ---
> https://github.com/tarantool/tarantool/issues/4924
> https://github.com/tarantool/tarantool/tree/sp/gh-4924-applier-duplicate-key
I've checked your patch into 2.3 and master.
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row
2020-04-27 10:22 ` Kirill Yukhin
@ 2020-04-27 10:39 ` Kirill Yukhin
0 siblings, 0 replies; 8+ messages in thread
From: Kirill Yukhin @ 2020-04-27 10:39 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy
On 27 апр 13:22, Kirill Yukhin wrote:
> Hello,
>
> On 22 апр 21:28, Serge Petrenko wrote:
> > Since the introduction of transaction boundaries in replication
> > protocol, appliers follow replicaset.applier.vclock to the lsn of the
> > first row in an arrived batch. This is enough and doesn't lead to errors
> > when replicating from other instances, respecting transaction boundaries
> > (instances with version 2.1.2 and up). However, if there's a 1.10
> > instance in 2.1.2+ cluster, it sends every single tx row as a separate
> > transaction, breaking the comparison with replicaset.applier.vclock and
> > making the applier apply part of the changes, it has already applied
> > when processing a full transaction coming from another 2.x instance.
> > Such behaviour leads to ER_TUPLE_FOUND errors in the scenario described
> > above.
> > In order to guard from such cases, follow replicaset.applier.vclock to
> > the lsn of the last row in tx.
> >
> > Closes #4924
> > ---
> > https://github.com/tarantool/tarantool/issues/4924
> > https://github.com/tarantool/tarantool/tree/sp/gh-4924-applier-duplicate-key
>
> I've checked your patch into 2.3 and master.
And 2.4
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-04-27 10:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-22 18:28 [Tarantool-patches] [PATCH] applier: follow vclock to the last tx row Serge Petrenko
2020-04-23 9:41 ` Cyrill Gorcunov
2020-04-23 9:53 ` Serge Petrenko
2020-04-23 9:54 ` Cyrill Gorcunov
2020-04-23 11:19 ` Serge Petrenko
2020-04-26 18:55 ` Vladislav Shpilevoy
2020-04-27 10:22 ` Kirill Yukhin
2020-04-27 10:39 ` Kirill Yukhin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox