From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id A837A469710 for ; Thu, 19 Nov 2020 02:21:14 +0300 (MSK) References: <637105222f13010a9ed0488ee698b2ba2684b46e.1605570907.git.v.shpilevoy@tarantool.org> <85c52e01-c876-8d8e-1543-bf272efb1d79@tarantool.org> From: Vladislav Shpilevoy Message-ID: <169cf614-cd6f-a62c-07e7-adde82334a6e@tarantool.org> Date: Thu, 19 Nov 2020 00:21:12 +0100 MIME-Version: 1.0 In-Reply-To: <85c52e01-c876-8d8e-1543-bf272efb1d79@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH 10/12] raft: move box_update_ro_summary to update trigger List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Serge Petrenko , tarantool-patches@dev.tarantool.org, gorcunov@gmail.com Hi! Thanks for the review! > Raft uses synchronous WAL write, corect? > > So there's a yield in raft_worker_handle_io(). Now there's a period of time when > an instance is a follower, but it isn't read-only. > > When you reconfigure a leader to become voter, everything's fine, since no > writing to disk is involved. > > However, if an existing leader receives a message with term greater, than its own, > it'll first persist this term, and thus yield, and proceed to broadcast and switching > to ro later. > > So now it's possible that a follower is writeable for some period of time. You are a savior. Thanks for the deep review and for noticing this. Indeed. The issue exists. And it is much deeper, it seems. I also glanced at your patch about RO-update vs limbo-clear order. Unfortunately, it does not work. As well as my idea about running on_update triggers from raft_schedule_broadcast(). The reason is that currently our on_update trigger yields. In both our solutions. Because it calls box_clear_synchro_queue(). And this is exactly what I was trying to avoid by using a fiber in the first place. The state machine must not yield. Also there is another issue, not related to the patch, but which I spotted while worked on it today. The issue is - we can't cancel limbo clearance. The node could be demoted to a follower during waiting for confirms, but it still will wait for confirms and we can't stop it. I started thinking that we could resolve both these issues if box/raft.c could run update triggers without yields right away, but schedule async work, like limbo clear, into a separate fiber, cancellable. And I realized that we already have this fiber - raft.worker. The idea is that we can move the worker fiber to box/raft.c. And in libraft instead of a fiber we will have 2 methods: - raft_vtab.async_f - virtual method called by Raft, when it wants to schedule some async heavy work (network, disk). We will call it instead of raft_worker_wakeup(). - raft_process_async - normal method, which the Raft owner should call to handle all async events. In a separate fiber. This is the same as raft_worker_f(), but not depending on fiber, and finite. In box/raft.c we have a worker fiber. To libraft we give async_f, which creates the fiber on demand, and wakes it up. No yields. Like now, but in box/raft.c. The fiber in its body will call raft_process_async() and fiber_yield() until it is cancelled. Now how does it fix the update triggers? - we fire on_update triggers in raft_schedule_broadcast(), but box/raft.c in the trigger will only update RO summary. It won't yield. For the limbo clearance it will wakeup the worker fiber, which now belongs to box/raft.c, so it is totally fine. The worker will call raft_process_async() and will clear the limbo when it is time. Also the worker fiber can be cancelled/interrupted somehow if we want to stop limbo clear when the node is not a leader anymore. I started working on this already, and it seems to be good. Raft is simplified even more, and we delete the ugly hack in box_raft_free() about changing struct raft with box_raft_global.worker = NULL. We still nullify the fiber, but we don't change struct raft.