[Tarantool-patches] [PATCH v23 2/3] qsync: order access to the limbo terms

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Oct 22 01:06:05 MSK 2021


Hi! Thanks for the patch!

See 4 comments below.

> diff --git a/src/box/box.cc b/src/box/box.cc
> index e082e1a3d..6a9be745a 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -1686,8 +1684,11 @@ box_issue_promote(uint32_t prev_leader_id, int64_t promote_lsn)
>  		.lsn = promote_lsn,
>  		.term = raft->term,
>  	};
> -	txn_limbo_process(&txn_limbo, &req);
> +	txn_limbo_process_begin(&txn_limbo);
> +	txn_limbo_write_promote(&txn_limbo, req.lsn, req.term);
> +	txn_limbo_process_core(&txn_limbo, &req);
>  	assert(txn_limbo_is_empty(&txn_limbo));
> +	txn_limbo_process_commit(&txn_limbo);

1. What was wrong with txn_limbo_begin/commit/rollback?
I mean `txn_limbo` prefix, without `_process_` suffix. From
this hunk we can see that you call `process_begin` but then
you call txn_limbo_write_promote - it is not 'process'.

Using 'process' because of that looks inconsistent. Also if
you would drop it from begin/commit/rollback names, then you
could leave txn_limbo_process as is, without this _core suffix.

> @@ -1708,8 +1707,12 @@ box_issue_demote(uint32_t prev_leader_id, int64_t promote_lsn)
>  		.lsn = promote_lsn,
>  		.term = box_raft()->term,
>  	};
> -	txn_limbo_process(&txn_limbo, &req);
> +	txn_limbo_process_begin(&txn_limbo);
> +	txn_limbo_write_demote(&txn_limbo, promote_lsn,
> +				box_raft()->term);

2. This expression fits into one line (< 80 symbols) just fine.
Maybe lets write it as one line?

> +	txn_limbo_process_core(&txn_limbo, &req);
>  	assert(txn_limbo_is_empty(&txn_limbo));
> +	txn_limbo_process_commit(&txn_limbo);
>  }
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index 70447caaf..9b643072a 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -542,10 +543,30 @@ txn_limbo_read_demote(struct txn_limbo *limbo, int64_t lsn)
>  }
>  
>  void
> -txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
> +txn_limbo_ack(struct txn_limbo *limbo, uint32_t owner_id,
> +	      uint32_t replica_id, int64_t lsn)
>  {
> +	/*
> +	 * ACKs are collected only by the transactions originator
> +	 * (which is the single master in 100% so far). Other instances
> +	 * wait for master's CONFIRM message instead.
> +	 *
> +	 * Due to cooperative multitasking there might be limbo owner
> +	 * migration in-fly (while writing data to journal), so for
> +	 * simplicity sake the test for owner is done here instead
> +	 * of putting this check to the callers.
> +	 */

3. This does not improve simplicity anyhow TBH, rather vice versa.
This code looks very suspicious, counter-intuitive. But lets wait for
more tests. See comments to the last commit.

4. Why doesn't txn_limbo_ack() keeps the lock for the time of
txn_limbo_write_confirm() and txn_limbo_read_confirm()?

I think that probably all the limbo functions, literally all of them,
must get the latch. From the beginning to the end. And only after
careful testing we could try to drop some of the locks, or at least
reduce their critical sections. box.ctl functions related to the
limbo/raft too. But do not insist on that. This is just how I would do
it maybe.


More information about the Tarantool-patches mailing list