[Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms

Fri Jan 14 02:32:04 MSK 2022

On 13.01.2022 11:13, Serge Petrenko wrote:
> 
> 
> 13.01.2022 00:30, Vladislav Shpilevoy пишет:
>> Hi!
>>
>> On 12.01.2022 15:01, Serge Petrenko wrote:
>>>
>>> 11.01.2022 23:39, Cyrill Gorcunov пишет:
>>>> On Mon, Jan 10, 2022 at 05:28:43PM +0300, Serge Petrenko wrote:
>>>>>      Hi! Thanks for the patch!
>>>>>           box_issue_promote() and box_issue_demote() need fine-grained locking
>>>>>      anyway.
>>>>>      Otherwise it’s possible that promote() is already issued, but not yet
>>>>>      written to WAL, and some
>>>>>      outdated request is applied by applier at that exact moment.
>>>> True. And in previous series Vlad has asked to not move in code which is
>>>> not covered by tests. So I think this is a task for the next part. Currently
>>>> we cover only the race between appliers.
>>> Let's ask Vlad, then.
>>>
>>> I feel like we should fix this now, not waiting for a full fine-grained locking
>>> patch.
>>>
>>> First of all, this is a known bug (and fine-grained locking was meant to
>>> cover everything we don't know of, just in case).
>> I am not sure I understand what you both are talking about here. Sergey, do
>> you mean 'fine-grained locking' as big critical sections covering a lot of
>> code at once or as many small critical sections?
> 
> I mean "locking every limbo function", like Cyrill tried to do that in the
> previous patch version.

Давай по-русски, тут какое-то недопонимание.

В старых версиях Кирилл пытался лочить слишком мелко. Протестировать такое было
тяжеловато. Потому та версия не зашла - тестов было 0.

>> I am confused because of this sentence. "Cover everything we don't know" is
>> rather opposite to fine-grained locking. I voted for big locks because
>> apparently it was too hard to implement smaller more precise locks.
>>
>>> Besides, simply locking issue_promote/issue_demote should be
>>> much easier than implementing the fine-grained locking patch.
>> Yes. I remember the proposal was to lock entire promote/demote and other
>> qsync/raft functions from beginning to end. Because it should be relatively
>> easy. I didn't look at the code in this patch though, can't comment it.
> 
> This particular patch only locks applier_apply_synchro_request(), txn_limbo_process()
> and txn_limbo_is_replica_outdated(), so that applier cannot apply a request from an
> already stale term.
> 
> My proposal is to lock box_issue_promote() and box_issue_demote()
> (not whole promote/demote) to get rid of another race: when promote is written
> to WAL, but not yed processed.

Почему не лочить целиком promote/demote? Может если локи были бы шире, то не
нужно было бы и на триггеры портировать все как в новом тикете?

> What you're talking about is what I call "fine grained locking", and it turned
> out rather hard to implement, so Cyrill abandoned this idea for now.

fine grained значит "мелко-зернистый". То есть локи были бы на мелкие куски кода,
как сначала Кирилл пытался сделать. Я как раз за наоборот топлю - блокировать
сразу большие куски, а не "мелко".