[Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms

Cyrill Gorcunov gorcunov at gmail.com
Tue Jan 11 23:39:41 MSK 2022


On Mon, Jan 10, 2022 at 05:28:43PM +0300, Serge Petrenko wrote:
>    Hi! Thanks for the patch!
>     
>    box_issue_promote() and box_issue_demote() need fine-grained locking
>    anyway.
>    Otherwise it’s possible that promote() is already issued, but not yet
>    written to WAL, and some
>    outdated request is applied by applier at that exact moment.

True. And in previous series Vlad has asked to not move in code which is
not covered by tests. So I think this is a task for the next part. Currently
we cover only the race between appliers.

>     
>    You should take the lock before the WAL write, and release it only after
>    txn_limbo_apply.
>     
>    No need to guard every limbo function there is, but we have to guard
>    everything that
>    writes PROMOTE/DEMOTE.
...
>      @@ -216,7 +225,7 @@ txn_limbo_last_entry(struct txn_limbo *limbo)
>        * @a replica_id.
>        */
>       static inline uint64_t
>      -txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t
>      replica_id)
>      +txn_limbo_replica_term(struct txn_limbo *limbo, uint32_t replica_id)
>       {
> 
>     
>    You’ve forgot to lock the latch here, I guess.

I did it on a purpose. As you remember we've faced many problems when tried
to implement fine-grained locking inside limbo code. So I dropped this idea
eventually and I think we could start with explicit locks to cover the applier
race and then walk via small steps trying to cover the rest.

>      +/**
>      + * Initiate execution of a synchronous replication request.
>      + */
>      +static inline void
>      +txn_limbo_begin(struct txn_limbo *limbo)
>      +{
>      + limbo->promote_latch_cnt++;
>      + latch_lock(&limbo->promote_latch);
> 
>     
>    I suppose you should decrease the latch_cnt right after acquiring the
>    lock.
>     
>    Otherwise you count the sole «limbo user» together with «limbo waiters».

Yes, this will represent accumulated value. To be honest I never saw such
approach in any other code (ie increment/lock/decrement) but I think this
is fine for fibres, will do.

	Cyrill


More information about the Tarantool-patches mailing list