[Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms

Serge Petrenko sergepetrenko at tarantool.org
Thu Jan 13 13:13:07 MSK 2022



13.01.2022 00:30, Vladislav Shpilevoy пишет:
> Hi!
>
> On 12.01.2022 15:01, Serge Petrenko wrote:
>>
>> 11.01.2022 23:39, Cyrill Gorcunov пишет:
>>> On Mon, Jan 10, 2022 at 05:28:43PM +0300, Serge Petrenko wrote:
>>>>      Hi! Thanks for the patch!
>>>>           box_issue_promote() and box_issue_demote() need fine-grained locking
>>>>      anyway.
>>>>      Otherwise it’s possible that promote() is already issued, but not yet
>>>>      written to WAL, and some
>>>>      outdated request is applied by applier at that exact moment.
>>> True. And in previous series Vlad has asked to not move in code which is
>>> not covered by tests. So I think this is a task for the next part. Currently
>>> we cover only the race between appliers.
>> Let's ask Vlad, then.
>>
>> I feel like we should fix this now, not waiting for a full fine-grained locking
>> patch.
>>
>> First of all, this is a known bug (and fine-grained locking was meant to
>> cover everything we don't know of, just in case).
> I am not sure I understand what you both are talking about here. Sergey, do
> you mean 'fine-grained locking' as big critical sections covering a lot of
> code at once or as many small critical sections?

I mean "locking every limbo function", like Cyrill tried to do that in the
previous patch version.
>
> I am confused because of this sentence. "Cover everything we don't know" is
> rather opposite to fine-grained locking. I voted for big locks because
> apparently it was too hard to implement smaller more precise locks.
>
>> Besides, simply locking issue_promote/issue_demote should be
>> much easier than implementing the fine-grained locking patch.
> Yes. I remember the proposal was to lock entire promote/demote and other
> qsync/raft functions from beginning to end. Because it should be relatively
> easy. I didn't look at the code in this patch though, can't comment it.

This particular patch only locks applier_apply_synchro_request(), 
txn_limbo_process()
and txn_limbo_is_replica_outdated(), so that applier cannot apply a 
request from an
already stale term.

My proposal is to lock box_issue_promote() and box_issue_demote()
(not whole promote/demote) to get rid of another race: when promote is 
written
to WAL, but not yed processed.

What you're talking about is what I call "fine grained locking", and it 
turned
out rather hard to implement, so Cyrill abandoned this idea for now.

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list