Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Serge Petrenko <sergepetrenko@tarantool.org>,
	tarantool-patches@dev.tarantool.org, gorcunov@gmail.com
Subject: Re: [Tarantool-patches] [PATCH 10/12] raft: move box_update_ro_summary to update trigger
Date: Thu, 19 Nov 2020 00:21:12 +0100	[thread overview]
Message-ID: <169cf614-cd6f-a62c-07e7-adde82334a6e@tarantool.org> (raw)
In-Reply-To: <85c52e01-c876-8d8e-1543-bf272efb1d79@tarantool.org>

Hi! Thanks for the review!

> Raft uses synchronous WAL write, corect?
> 
> So there's a yield in raft_worker_handle_io(). Now there's a period of time when
> an instance is a follower, but it isn't read-only.
> 
> When you reconfigure a leader to become voter, everything's fine, since no
> writing to disk is involved.
> 
> However, if an existing leader receives a message with term greater, than its own,
> it'll first persist this term, and thus yield, and proceed to broadcast and switching
> to ro later.
> 
> So now it's possible that a follower is writeable for some period of time.

You are a savior. Thanks for the deep review and for noticing this.

Indeed. The issue exists. And it is much deeper, it seems.

I also glanced at your patch about RO-update vs limbo-clear order.
Unfortunately, it does not work. As well as my idea about running on_update
triggers from raft_schedule_broadcast().

The reason is that currently our on_update trigger yields. In both our
solutions. Because it calls box_clear_synchro_queue(). And this is exactly
what I was trying to avoid by using a fiber in the first place. The state
machine must not yield.

Also there is another issue, not related to the patch, but which I spotted
while worked on it today. The issue is - we can't cancel limbo clearance.
The node could be demoted to a follower during waiting for confirms, but it
still will wait for confirms and we can't stop it.

I started thinking that we could resolve both these issues if box/raft.c could
run update triggers without yields right away, but schedule async work, like
limbo clear, into a separate fiber, cancellable.

And I realized that we already have this fiber - raft.worker.

The idea is that we can move the worker fiber to box/raft.c. And in libraft
instead of a fiber we will have 2 methods:

	- raft_vtab.async_f - virtual method called by Raft, when it wants
	  to schedule some async heavy work (network, disk). We will call it
	  instead of raft_worker_wakeup().

	- raft_process_async - normal method, which the Raft owner should call
	  to handle all async events. In a separate fiber. This is the same
	  as raft_worker_f(), but not depending on fiber, and finite.

In box/raft.c we have a worker fiber. To libraft we give async_f, which
creates the fiber on demand, and wakes it up. No yields. Like now, but in
box/raft.c.

The fiber in its body will call raft_process_async() and fiber_yield() until
it is cancelled.

Now how does it fix the update triggers? - we fire on_update triggers in
raft_schedule_broadcast(), but box/raft.c in the trigger will only update
RO summary. It won't yield. For the limbo clearance it will wakeup the worker
fiber, which now belongs to box/raft.c, so it is totally fine. The worker
will call raft_process_async() and will clear the limbo when it is time.

Also the worker fiber can be cancelled/interrupted somehow if we want to
stop limbo clear when the node is not a leader anymore.

I started working on this already, and it seems to be good. Raft is simplified
even more, and we delete the ugly hack in box_raft_free() about changing struct
raft with box_raft_global.worker = NULL. We still nullify the fiber, but we
don't change struct raft.

  parent reply	other threads:[~2020-11-18 23:21 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17  0:02 [Tarantool-patches] [PATCH 00/12] Raft module, part 2 - relocation to src/lib/raft Vladislav Shpilevoy
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 01/12] raft: move sources to raftlib.h/.c Vladislav Shpilevoy
2020-11-17  8:14   ` Serge Petrenko
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 10/12] raft: move box_update_ro_summary to update trigger Vladislav Shpilevoy
2020-11-17 12:42   ` Serge Petrenko
2020-11-17 15:17     ` Serge Petrenko
2020-11-18 23:21     ` Vladislav Shpilevoy [this message]
2020-11-19 10:08       ` Serge Petrenko
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 11/12] raft: introduce RaftError Vladislav Shpilevoy
2020-11-17 15:13   ` Serge Petrenko
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 12/12] raft: move algorithm code to src/lib/raft Vladislav Shpilevoy
2020-11-17 15:13   ` Serge Petrenko
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 02/12] raft: move box_raft_* to src/box/raft.h and .c Vladislav Shpilevoy
2020-11-17  8:14   ` Serge Petrenko
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 03/12] raft: stop using replication_disconnect_timeout() Vladislav Shpilevoy
2020-11-17  8:15   ` Serge Petrenko
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 04/12] raft: stop using replication_synchro_quorum Vladislav Shpilevoy
2020-11-17  8:17   ` Serge Petrenko
2020-11-19 23:42     ` Vladislav Shpilevoy
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 05/12] raft: stop using instance_id Vladislav Shpilevoy
2020-11-17  8:59   ` Serge Petrenko
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 06/12] raft: make raft_request.vclock constant Vladislav Shpilevoy
2020-11-17  9:17   ` Serge Petrenko
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 07/12] raft: stop using replicaset.vclock Vladislav Shpilevoy
2020-11-17  9:23   ` Serge Petrenko
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 08/12] raft: introduce vtab for disk and network Vladislav Shpilevoy
2020-11-17  9:35   ` Serge Petrenko
2020-11-19 23:43     ` Vladislav Shpilevoy
2020-11-17 10:00   ` Serge Petrenko
2020-11-19 23:43     ` Vladislav Shpilevoy
2020-11-20  7:56       ` Serge Petrenko
2020-11-20 19:40         ` Vladislav Shpilevoy
2020-11-23  8:09           ` Serge Petrenko
2020-11-17  0:02 ` [Tarantool-patches] [PATCH 09/12] raft: introduce raft_msg, drop xrow dependency Vladislav Shpilevoy
2020-11-17 10:22   ` Serge Petrenko
2020-11-19 23:43     ` Vladislav Shpilevoy
2020-11-20  8:03       ` Serge Petrenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=169cf614-cd6f-a62c-07e7-adde82334a6e@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 10/12] raft: move box_update_ro_summary to update trigger' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox