[Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms
sergepetrenko at tarantool.org
Thu Jan 13 13:13:07 MSK 2022
13.01.2022 00:30, Vladislav Shpilevoy пишет:
> On 12.01.2022 15:01, Serge Petrenko wrote:
>> 11.01.2022 23:39, Cyrill Gorcunov пишет:
>>> On Mon, Jan 10, 2022 at 05:28:43PM +0300, Serge Petrenko wrote:
>>>> Hi! Thanks for the patch!
>>>> box_issue_promote() and box_issue_demote() need fine-grained locking
>>>> Otherwise it’s possible that promote() is already issued, but not yet
>>>> written to WAL, and some
>>>> outdated request is applied by applier at that exact moment.
>>> True. And in previous series Vlad has asked to not move in code which is
>>> not covered by tests. So I think this is a task for the next part. Currently
>>> we cover only the race between appliers.
>> Let's ask Vlad, then.
>> I feel like we should fix this now, not waiting for a full fine-grained locking
>> First of all, this is a known bug (and fine-grained locking was meant to
>> cover everything we don't know of, just in case).
> I am not sure I understand what you both are talking about here. Sergey, do
> you mean 'fine-grained locking' as big critical sections covering a lot of
> code at once or as many small critical sections?
I mean "locking every limbo function", like Cyrill tried to do that in the
previous patch version.
> I am confused because of this sentence. "Cover everything we don't know" is
> rather opposite to fine-grained locking. I voted for big locks because
> apparently it was too hard to implement smaller more precise locks.
>> Besides, simply locking issue_promote/issue_demote should be
>> much easier than implementing the fine-grained locking patch.
> Yes. I remember the proposal was to lock entire promote/demote and other
> qsync/raft functions from beginning to end. Because it should be relatively
> easy. I didn't look at the code in this patch though, can't comment it.
This particular patch only locks applier_apply_synchro_request(),
and txn_limbo_is_replica_outdated(), so that applier cannot apply a
request from an
already stale term.
My proposal is to lock box_issue_promote() and box_issue_demote()
(not whole promote/demote) to get rid of another race: when promote is
to WAL, but not yed processed.
What you're talking about is what I call "fine grained locking", and it
out rather hard to implement, so Cyrill abandoned this idea for now.
More information about the Tarantool-patches