[Tarantool-patches] [PATCH 10/12] raft: move box_update_ro_summary to update trigger

Serge Petrenko sergepetrenko at tarantool.org
Thu Nov 19 13:08:33 MSK 2020

19.11.2020 02:21, Vladislav Shpilevoy пишет:
> Hi! Thanks for the review!
>> Raft uses synchronous WAL write, corect?
>> So there's a yield in raft_worker_handle_io(). Now there's a period of time when
>> an instance is a follower, but it isn't read-only.
>> When you reconfigure a leader to become voter, everything's fine, since no
>> writing to disk is involved.
>> However, if an existing leader receives a message with term greater, than its own,
>> it'll first persist this term, and thus yield, and proceed to broadcast and switching
>> to ro later.
>> So now it's possible that a follower is writeable for some period of time.
> You are a savior. Thanks for the deep review and for noticing this.
> Indeed. The issue exists. And it is much deeper, it seems.
> I also glanced at your patch about RO-update vs limbo-clear order.
> Unfortunately, it does not work. As well as my idea about running on_update
> triggers from raft_schedule_broadcast().
> The reason is that currently our on_update trigger yields. In both our
> solutions. Because it calls box_clear_synchro_queue(). And this is exactly
> what I was trying to avoid by using a fiber in the first place. The state
> machine must not yield.
> Also there is another issue, not related to the patch, but which I spotted
> while worked on it today. The issue is - we can't cancel limbo clearance.
> The node could be demoted to a follower during waiting for confirms, but it
> still will wait for confirms and we can't stop it.
> I started thinking that we could resolve both these issues if box/raft.c could
> run update triggers without yields right away, but schedule async work, like
> limbo clear, into a separate fiber, cancellable.
> And I realized that we already have this fiber - raft.worker.
> The idea is that we can move the worker fiber to box/raft.c. And in libraft
> instead of a fiber we will have 2 methods:
> 	- raft_vtab.async_f - virtual method called by Raft, when it wants
> 	  to schedule some async heavy work (network, disk). We will call it
> 	  instead of raft_worker_wakeup().
> 	- raft_process_async - normal method, which the Raft owner should call
> 	  to handle all async events. In a separate fiber. This is the same
> 	  as raft_worker_f(), but not depending on fiber, and finite.
> In box/raft.c we have a worker fiber. To libraft we give async_f, which
> creates the fiber on demand, and wakes it up. No yields. Like now, but in
> box/raft.c.
> The fiber in its body will call raft_process_async() and fiber_yield() until
> it is cancelled.
> Now how does it fix the update triggers? - we fire on_update triggers in
> raft_schedule_broadcast(), but box/raft.c in the trigger will only update
> RO summary. It won't yield. For the limbo clearance it will wakeup the worker
> fiber, which now belongs to box/raft.c, so it is totally fine. The worker
> will call raft_process_async() and will clear the limbo when it is time.
> Also the worker fiber can be cancelled/interrupted somehow if we want to
> stop limbo clear when the node is not a leader anymore.
> I started working on this already, and it seems to be good. Raft is simplified
> even more, and we delete the ugly hack in box_raft_free() about changing struct
> raft with box_raft_global.worker = NULL. We still nullify the fiber, but we
> don't change struct raft.

Thanks for the answer & investigation!

Looks good at first glance.

Serge Petrenko

More information about the Tarantool-patches mailing list