From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp53.i.mail.ru (smtp53.i.mail.ru [94.100.177.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E12BB469719 for ; Mon, 21 Sep 2020 11:34:22 +0300 (MSK) From: Serge Petrenko References: <3c916b9b7e70c4fbfa2b95c4aeb4146e83c82c58.1599693319.git.v.shpilevoy@tarantool.org> <9c8db026-60e1-5079-e884-7770cdfae0d7@tarantool.org> Message-ID: <3fd85d58-8346-5080-4c53-5a9900e54ded@tarantool.org> Date: Mon, 21 Sep 2020 11:34:20 +0300 MIME-Version: 1.0 In-Reply-To: <9c8db026-60e1-5079-e884-7770cdfae0d7@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-GB Subject: Re: [Tarantool-patches] [PATCH v2 09/11] raft: introduce state machine List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, gorcunov@gmail.com 21.09.2020 11:22, Serge Petrenko пишет: > > 19.09.2020 18:49, Vladislav Shpilevoy пишет: >> Here is a new version of the patch after some squashes. >> >> ==================== >>      raft: introduce state machine >>           The commit is a core part of Raft implementation. It >> introduces >>      the Raft state machine implementation and its integration into the >>      instance's life cycle. >>           The implementation follows the protocol to the letter >> except a few >>      important details. >>           Firstly, the original Raft assumes, that all nodes share >> the same >>      log record numbers. In Tarantool they are called LSNs. But in case >>      of Tarantool each node has its own LSN in its own component of >>      vclock. That makes the election messages a bit heavier, because >>      the nodes need to send and compare complete vclocks of each other >>      instead of a single number like in the original Raft. But logic >>      becomes simpler. Because in the original Raft there is a problem >>      of uncertainty about what to do with records of an old leader >>      right after a new leader is elected. They could be rolled back or >>      confirmed depending on circumstances. The issue disappears when >>      vclock is used. >>           Secondly, leader election works differently during cluster >>      bootstrap, until number of bootstrapped replicas becomes >= >>      election quorum. That arises from specifics of replicas bootstrap >>      and order of systems initialization. In short: during bootstrap a >>      leader election may use a smaller election quorum than the >>      configured one. See more details in the code. >>           Part of #1146 > > Consider these fixes for applier vclock segfault pushed on top of the > branch. I've split the patch in two and placed the commits on top of `raft: relay status updates to followers` and `raft: introduce state machine` > >  src/box/box.cc |  6 +----- >  src/box/raft.c | 12 +++++++++--- >  2 files changed, 10 insertions(+), 8 deletions(-) > > diff --git a/src/box/box.cc b/src/box/box.cc > index c5dcbd959..1169f2cd5 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -2160,11 +2160,7 @@ box_process_subscribe(struct ev_io *io, struct > xrow_header *header) >           * should be 0. >           */ >          struct raft_request req; > -        /* > -         * Omit the candidate vclock, since we've just sent it in > -         * subscribe response. > -         */ > -        raft_serialize_for_network(&req, NULL); > +        raft_serialize_for_network(&req, &vclock); >          xrow_encode_raft(&row, &fiber()->gc, &req); >          coio_write_xrow(io, &row); >      } > diff --git a/src/box/raft.c b/src/box/raft.c > index 7b7ef9c1c..45783f0e3 100644 > --- a/src/box/raft.c > +++ b/src/box/raft.c > @@ -130,8 +130,6 @@ raft_new_random_election_shift(void) >  static inline bool >  raft_can_vote_for(const struct vclock *v) >  { > -    if (v == NULL) > -        return false; >      int cmp = vclock_compare_ignore0(v, &replicaset.vclock); >      return cmp == 0 || cmp == 1; >  } > @@ -369,6 +367,12 @@ raft_process_msg(const struct raft_request *req, > uint32_t source) >                       "candidate"); >                  break; >              } > +            /* Can't vote when vclock is unknown. */ > +            if (req->vclock == NULL) { > +                say_info("RAFT: vote request is skipped - " > +                     "missing candidate vclock."); > +                break; > +            } >              /* Can't vote for too old or incomparable nodes. */ >              if (!raft_can_vote_for(req->vclock)) { >                  say_info("RAFT: vote request is skipped - " > @@ -858,8 +862,10 @@ raft_serialize_for_network(struct raft_request > *req, struct vclock *vclock) >      req->state = raft.state; >      /* >       * Raft does not own vclock, so it always expects it passed > externally. > +     * Vclock is sent out only by candidate instances. >       */ > -    req->vclock = vclock; > +    if (req->state == RAFT_STATE_CANDIDATE) > +        req->vclock = vclock; >  } > >  void -- Serge Petrenko