[Tarantool-patches] [PATCH] limbo: introduce request processing hooks

Mon Jul 12 11:04:56 MSK 2021

On 12.07.2021 10:01, Serge Petrenko via Tarantool-patches wrote:
> 
> 
> 11.07.2021 17:00, Vladislav Shpilevoy пишет:
>> Hi! Thanks for the patch!
>>
>> On 11.07.2021 00:28, Cyrill Gorcunov wrote:
>>> Guys, this is an early rfc since I would like to discuss the
>>> design first before going further. Currently we don't interrupt
>>> incoming syncro requests which doesn't allow us to detect cluster
>>> split-brain situation, as we were discussing verbally there are
>>> a number of sign to detect it and we need to stop receiving data
>>> from obsolete nodes.
>>>
>>> The main problem though is that such filtering of incoming packets
>>> should happen at the moment where we still can do a step back and
>>> inform the peer that data has been declined, but currently our
>>> applier code process syncro requests inside WAL trigger, ie when
>>> data is already applied or rolling back.
>>>
>>> Thus we need to separate "filer" and "apply" stages of processing.
>>> What is more interesting is that we filter incomings via in memory
>>> vclock and update them immediately. Thus the following situation
>>> is possible -- a promote request comes in, we remember it inside
>>> promote_term_map but then write to WAL fails and we never revert
>>> the promote_term_map variable, thus other peer won't be able to
>>> resend us this promote request because now we think that we've
>>> alreday applied the promote.
>> Well, I still don't understand what the issue is. We discussed it
>> privately already. You simply should not apply anything until WAL
>> write is done. And it is not happening now on the master. The
>> terms vclock is updated only **after** WAL write.
>>
>> Why do you need all these new vclocks if you should not apply
>> anything before WAL write in the first place?
> 
> If I understand correctly, the issue is that if we filter (and check for
> the split brain) after the WAL write, we will end up with a conflicting
> PROMOTE in our WAL. Cyrill is trying to avoid this, that's why
> he's separating the filter stage. This way the error will reach
> the remote peer before any WAL write, and the WAL write won't happen.
> 
> And if we filter before the WAL write, we need the second vclock, which
> Cyrill has introduced.

Why do you need a second vclock? Why can't you just filter by the
existing vclock and update it after WAL write like now?