From: Serge Petrenko <sergepetrenko@tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, tarantool-patches@dev.tarantool.org, gorcunov@gmail.com Subject: Re: [Tarantool-patches] [PATCH v2 09/10] raft: introduce state machine Date: Wed, 9 Sep 2020 11:04:54 +0300 [thread overview] Message-ID: <7e5fee9d-d032-de31-e1eb-970ffeb5ce37@tarantool.org> (raw) In-Reply-To: <90367ede-7d37-4ed4-82b7-86a4bcd507de@tarantool.org> 08.09.2020 01:57, Vladislav Shpilevoy пишет: > Thanks for the review! Hi! Thanks for your reply! > >>> src/box/applier.cc | 18 +- >>> src/box/box.cc | 7 +- >>> src/box/raft.c | 711 +++++++++++++++++++++++++++++++++++++++++++-- >>> src/box/raft.h | 118 ++++++++ >>> src/box/relay.cc | 19 ++ >>> 5 files changed, 847 insertions(+), 26 deletions(-) >>> >>> diff --git a/src/box/applier.cc b/src/box/applier.cc >>> index 5459a1dc1..c7c486ee4 100644 >>> --- a/src/box/applier.cc >>> +++ b/src/box/applier.cc >>> @@ -916,8 +916,21 @@ applier_handle_raft(struct applier *applier, struct xrow_header *row) >>> * Return 0 for success or -1 in case of an error. >>> */ >>> static int >>> -applier_apply_tx(struct stailq *rows) >>> +applier_apply_tx(struct applier *applier, struct stailq *rows) >>> { >>> + /* >>> + * Rows received not directly from a leader are ignored. That is a >>> + * protection against the case when an old leader keeps sending data >>> + * around not knowing yet that it is not a leader anymore. >>> + * >>> + * XXX: it may be that this can be fine to apply leader transactions by >>> + * looking at their replica_id field if it is equal to leader id. That >>> + * can be investigated as an 'optimization'. Even though may not give >>> + * anything, because won't change total number of rows sent in the >>> + * network anyway. >>> + */ >>> + if (!raft_is_source_allowed(applier->instance_id)) >>> + return 0; >> Have we abandoned Rule 66? > Yes, it was officially deprecated. > >>> diff --git a/src/box/raft.h b/src/box/raft.h >>> index db64cf933..111a9c16e 100644 >>> --- a/src/box/raft.h >>> +++ b/src/box/raft.h >>> @@ -31,34 +31,141 @@ >>> struct raft { >>> + /** Instance ID of leader of the current term. */ >>> uint32_t leader; >>> + /** State of the instance. */ >>> enum raft_state state; >>> + /** >>> + * Volatile part of the Raft state, whose WAL write may be still >>> + * in-progress, and yet the state may be already used. Volatile state is >>> + * never sent to anywhere, but the state machine makes decisions based >>> + * on it. That is vital. >>> + * As an example, volatile vote needs to be used to reject votes inside >>> + * a term, where the instance already voted (even if the vote WAL write >>> + * is not finished yet). Otherwise the instance would try to write >>> + * several votes inside one term. >>> + */ >>> + uint64_t volatile_term; >>> + uint32_t volatile_vote; >>> + /** >>> + * Flag whether Raft is enabled. When disabled, it still persists terms >>> + * so as to quickly enroll into the cluster when (if) it is enabled. In >>> + * everything else disabled Raft does not affect instance work. >>> + */ >>> bool is_enabled; >>> + /** >>> + * Flag whether the node can become a leader. It is an accumulated value >>> + * of configuration options Raft enabled and Raft candidate. If at least >>> + * one is false - the instance is not a candidate. >>> + */ >>> bool is_candidate; >>> + /** Flag whether the instance is allowed to be a leader. */ >>> + bool is_cfg_candidate; >>> + /** >>> + * Flag whether Raft currently tries to write something into WAL. It >>> + * happens asynchronously, not right after Raft state is updated. >>> + */ >>> + bool is_write_in_progress; >>> + /** >>> + * Persisted Raft state. These values are used when need to tell current >>> + * Raft state to other nodes. >>> + */ >>> uint64_t term; >>> uint32_t vote; >>> + /** >>> + * Bit 1 on position N means that a vote from instance with ID = N was >>> + * obtained. >>> + */ >>> + uint32_t vote_mask; >>> + /** Number of votes for this instance. Valid only in candidate state. */ >>> + int vote_count; >>> + /** State machine timed event trigger. */ >>> + struct ev_timer timer; >>> + /** >>> + * Dump of Raft state in the end of event loop, when it is changed. >>> + */ >>> + struct ev_check io; >> You probably need the ev_prepare watcher. >> >> Here's what the libev doc says: >> >> All |ev_prepare| watchers are invoked just /before/ |ev_run| starts >> to gather new events, and all |ev_check| watchers are queued (not >> invoked) just after |ev_run| has gathered them, but before it >> queues any callbacks for any received events. That means >> |ev_prepare| watchers are the last watchers invoked before the >> event loop sleeps or polls for new events, and |ev_check| watchers >> will be invoked before any other watchers of the same or lower >> priority within an event loop iteration. > I am not so sure. The documentation is hard to read. From what I understand, > this is how it works: ev_prepare is invoked, then events are collected, then > ev_check is invoked, then the callbacks are called. Yes, it works this way. Aren't we speaking of the same things? By 'ev_loop iteration' I mean the time when our fibers perform some useful work and various ev callbacks are called. This means EV_CHECK is called in the beginning of 'ev_loop iteration' and EV_PREPARE is called in the end. > In our case it seems to > be not important whether we use prepare or check. But on the other hand, it > is probably not how it works, because in fiber.top check is called in the > beginning of an iteration, and prepare is called afterwards, and it fiber.top > returns sane results. This is so because all the fibers perform their work between ev_check and ev_prepare. > > Here I see that check is called after prepare. So check is at least 'closer' to > event loop iteration end, isn't it?: > > http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#code_ev_prepare_code_and_code_ev_che > Prepare and check watchers are often (but not always) used in pairs: prepare watchers get invoked before the process blocks and check watchers afterwards. As I understand it, EV_CHECK is called in the 'beginning' of an ev_loop iteration. Anyway, looks like may use any of the watchers. I don't think it makes any difference. > > I also couldn't get any info from libev source code, because it is unreadable > mostly. > > Is there a case, when ev_check won't work, and ev_prepare will? Is > ev_prepare actually closer to an event loop iteration end, and I missed something? > >>> + /** Configured election timeout in seconds. */ >>> double election_timeout; >>> }; >>> @@ -786,6 +803,8 @@ relay_raft_msg_push(struct cmsg *base) >>> struct xrow_header row; >>> xrow_encode_raft(&row, &fiber()->gc, &msg->req); >>> try { >>> + if (msg->req.state == RAFT_STATE_LEADER) >>> + relay_restart_recovery(msg->relay); >>> relay_send(msg->relay, &row); >> Looks like you should first send the message and then restart the recovery. >> Otherwise the resent data will be ignored once again. > Yes, true. > > ==================== > diff --git a/src/box/relay.cc b/src/box/relay.cc > index 4f9bbc0de..d63711600 100644 > --- a/src/box/relay.cc > +++ b/src/box/relay.cc > @@ -803,9 +803,14 @@ relay_raft_msg_push(struct cmsg *base) > struct xrow_header row; > xrow_encode_raft(&row, &fiber()->gc, &msg->req); > try { > + /* > + * Send the message before restarting the recovery. Otherwise > + * all the rows would be sent from under a non-leader role and > + * would be ignored again. > + */ > + relay_send(msg->relay, &row); > if (msg->req.state == RAFT_STATE_LEADER) > relay_restart_recovery(msg->relay); > - relay_send(msg->relay, &row); > } catch (Exception *e) { > relay_set_error(msg->relay, e); > fiber_cancel(fiber()); -- Serge Petrenko
prev parent reply other threads:[~2020-09-09 8:04 UTC|newest] Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-03 22:51 [Tarantool-patches] [PATCH v2 00/10] dRaft Vladislav Shpilevoy 2020-09-03 22:51 ` [Tarantool-patches] [PATCH v2 01/10] applier: store instance_id in struct applier Vladislav Shpilevoy 2020-09-04 8:13 ` Serge Petrenko 2020-09-07 22:54 ` Vladislav Shpilevoy 2020-09-03 22:51 ` [Tarantool-patches] [PATCH v2 10/10] raft: introduce box.info.raft Vladislav Shpilevoy 2020-09-04 11:38 ` Serge Petrenko 2020-09-03 22:51 ` [Tarantool-patches] [PATCH v2 02/10] box: introduce summary RO flag Vladislav Shpilevoy 2020-09-04 8:17 ` Serge Petrenko 2020-09-03 22:51 ` [Tarantool-patches] [PATCH v2 03/10] wal: don't touch box.cfg.wal_dir more than once Vladislav Shpilevoy 2020-09-04 8:20 ` Serge Petrenko 2020-09-03 22:51 ` [Tarantool-patches] [PATCH v2 04/10] replication: track registered replica count Vladislav Shpilevoy 2020-09-04 8:24 ` Serge Petrenko 2020-09-07 22:54 ` Vladislav Shpilevoy 2020-09-07 15:45 ` Sergey Ostanevich 2020-09-07 22:54 ` Vladislav Shpilevoy 2020-09-03 22:51 ` [Tarantool-patches] [PATCH v2 05/10] raft: introduce persistent raft state Vladislav Shpilevoy 2020-09-04 8:59 ` Serge Petrenko 2020-09-03 22:51 ` [Tarantool-patches] [PATCH v2 06/10] raft: introduce box.cfg.raft_* options Vladislav Shpilevoy 2020-09-04 9:07 ` Serge Petrenko 2020-09-07 22:55 ` Vladislav Shpilevoy 2020-09-03 22:51 ` [Tarantool-patches] [PATCH v2 07/10] raft: relay status updates to followers Vladislav Shpilevoy 2020-09-03 22:51 ` [Tarantool-patches] [PATCH v2 08/10] [tosquash] raft: pass source instance_id to raft_process_msg() Vladislav Shpilevoy 2020-09-04 9:22 ` Serge Petrenko 2020-09-03 22:51 ` [Tarantool-patches] [PATCH v2 09/10] raft: introduce state machine Vladislav Shpilevoy 2020-09-04 11:36 ` Serge Petrenko 2020-09-07 22:57 ` Vladislav Shpilevoy 2020-09-09 8:04 ` Serge Petrenko [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=7e5fee9d-d032-de31-e1eb-970ffeb5ce37@tarantool.org \ --to=sergepetrenko@tarantool.org \ --cc=gorcunov@gmail.com \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2 09/10] raft: introduce state machine' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox