From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp35.i.mail.ru (smtp35.i.mail.ru [94.100.177.95]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 6DBA5469710 for ; Thu, 19 Nov 2020 13:08:34 +0300 (MSK) References: <637105222f13010a9ed0488ee698b2ba2684b46e.1605570907.git.v.shpilevoy@tarantool.org> <85c52e01-c876-8d8e-1543-bf272efb1d79@tarantool.org> <169cf614-cd6f-a62c-07e7-adde82334a6e@tarantool.org> From: Serge Petrenko Message-ID: Date: Thu, 19 Nov 2020 13:08:33 +0300 MIME-Version: 1.0 In-Reply-To: <169cf614-cd6f-a62c-07e7-adde82334a6e@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-GB Subject: Re: [Tarantool-patches] [PATCH 10/12] raft: move box_update_ro_summary to update trigger List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, gorcunov@gmail.com 19.11.2020 02:21, Vladislav Shpilevoy пишет: > Hi! Thanks for the review! > >> Raft uses synchronous WAL write, corect? >> >> So there's a yield in raft_worker_handle_io(). Now there's a period of time when >> an instance is a follower, but it isn't read-only. >> >> When you reconfigure a leader to become voter, everything's fine, since no >> writing to disk is involved. >> >> However, if an existing leader receives a message with term greater, than its own, >> it'll first persist this term, and thus yield, and proceed to broadcast and switching >> to ro later. >> >> So now it's possible that a follower is writeable for some period of time. > You are a savior. Thanks for the deep review and for noticing this. > > Indeed. The issue exists. And it is much deeper, it seems. > > I also glanced at your patch about RO-update vs limbo-clear order. > Unfortunately, it does not work. As well as my idea about running on_update > triggers from raft_schedule_broadcast(). > > The reason is that currently our on_update trigger yields. In both our > solutions. Because it calls box_clear_synchro_queue(). And this is exactly > what I was trying to avoid by using a fiber in the first place. The state > machine must not yield. > > Also there is another issue, not related to the patch, but which I spotted > while worked on it today. The issue is - we can't cancel limbo clearance. > The node could be demoted to a follower during waiting for confirms, but it > still will wait for confirms and we can't stop it. > > I started thinking that we could resolve both these issues if box/raft.c could > run update triggers without yields right away, but schedule async work, like > limbo clear, into a separate fiber, cancellable. > > And I realized that we already have this fiber - raft.worker. > > The idea is that we can move the worker fiber to box/raft.c. And in libraft > instead of a fiber we will have 2 methods: > > - raft_vtab.async_f - virtual method called by Raft, when it wants > to schedule some async heavy work (network, disk). We will call it > instead of raft_worker_wakeup(). > > - raft_process_async - normal method, which the Raft owner should call > to handle all async events. In a separate fiber. This is the same > as raft_worker_f(), but not depending on fiber, and finite. > > In box/raft.c we have a worker fiber. To libraft we give async_f, which > creates the fiber on demand, and wakes it up. No yields. Like now, but in > box/raft.c. > > The fiber in its body will call raft_process_async() and fiber_yield() until > it is cancelled. > > Now how does it fix the update triggers? - we fire on_update triggers in > raft_schedule_broadcast(), but box/raft.c in the trigger will only update > RO summary. It won't yield. For the limbo clearance it will wakeup the worker > fiber, which now belongs to box/raft.c, so it is totally fine. The worker > will call raft_process_async() and will clear the limbo when it is time. > > Also the worker fiber can be cancelled/interrupted somehow if we want to > stop limbo clear when the node is not a leader anymore. > > I started working on this already, and it seems to be good. Raft is simplified > even more, and we delete the ugly hack in box_raft_free() about changing struct > raft with box_raft_global.worker = NULL. We still nullify the fiber, but we > don't change struct raft. Thanks for the answer & investigation! Looks good at first glance. -- Serge Petrenko