[Tarantool-patches] [PATCH 1/5] [tosquash] replication: fix multiple rollbacks
Serge Petrenko
sergepetrenko at tarantool.org
Sun Jul 5 12:34:17 MSK 2020
03.07.2020 02:40, Vladislav Shpilevoy пишет:
> The problem was that if several transactions time out in one
> event loop iteration, the all will write rollback. Moreover, they
> will do that in a weird order, starting from the oldest, not in
> a reversed order.
>
> This patch makes limbo write only one rollback at once.
> ---
> src/box/txn_limbo.c | 25 +++++++++++++++++++++++++
> test/replication/qsync_basic.result | 2 +-
> 2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index 0402664cb..2cb687f4d 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -44,6 +44,13 @@ txn_limbo_create(struct txn_limbo *limbo)
> limbo->got_rollback = false;
> }
>
> +static inline struct txn_limbo_entry *
> +txn_limbo_first_entry(struct txn_limbo *limbo)
> +{
> + return rlist_first_entry(&limbo->queue, struct txn_limbo_entry,
> + in_queue);
> +}
> +
> struct txn_limbo_entry *
> txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
> {
> @@ -150,6 +157,24 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
> bool timed_out = fiber_yield_timeout(txn_limbo_confirm_timeout(limbo));
> fiber_set_cancellable(cancellable);
> if (timed_out) {
> + assert(!txn_limbo_is_empty(limbo));
> + if (txn_limbo_first_entry(limbo) != entry) {
> + /*
> + * If this is not a first entry in the
> + * limbo, it is definitely not a first
> + * timed out entry. And since it managed
> + * to time out too, it means there is
> + * currently another fiber writing
> + * rollback. Wait when it will finish and
> + * wake us up.
> + */
Why isn't it the first timed out? Is it because once previous entry was
confirmed, it
is removed from the queue immediately?
Looks fragile.
> + bool cancellable = fiber_set_cancellable(false);
> + fiber_yield();
> + fiber_set_cancellable(cancellable);
> + assert(txn_limbo_entry_is_complete(entry));
> + goto complete;
> + }
> +
> txn_limbo_write_rollback(limbo, entry);
> struct txn_limbo_entry *e, *tmp;
> rlist_foreach_entry_safe_reverse(e, &limbo->queue,
> diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
> index cdecf00e8..32deb2ac3 100644
> --- a/test/replication/qsync_basic.result
> +++ b/test/replication/qsync_basic.result
> @@ -272,7 +272,7 @@ box.cfg{replication_synchro_timeout = 0.001, replication_synchro_quorum = 3}
> | ...
> f = fiber.create(box.space.sync.replace, box.space.sync, {6}) s:replace{6}
> | ---
> - | - error: Quorum collection for a synchronous transaction is timed out
> + | - error: A rollback for a synchronous transaction is received
> | ...
> f:status()
> | ---
--
Serge Petrenko
More information about the Tarantool-patches
mailing list